reference extraction with HTML #2

fchrubasik · 2019-12-06T18:13:52Z

The extractor now also works with input as HTML. This should be a bugfix for issue #1 as
discussed here.
The default is currently set as non-HTML. To change this simply add the argument True
when executing extractor.extract or change the default value of is_html in line 47 of
extractors.py to True.

malteos · 2019-12-09T09:17:00Z

Hi @fchrubasik

thank you very much for your contribution!

I would be happy to merge this but before I do so we definitely need to add a unit test to verify that this feature is working correctly. Also, please check on the existing unit tests, since they are currently failing:

https://travis-ci.org/openlegaldata/legal-reference-extraction/jobs/621724640

TypeError: extract_law_ref_markers() missing 1 required positional argument: 'is_html'

Let me know if you need any assistance, I'm here to help!

Best,
Malte

fchrubasik · 2019-12-15T18:30:18Z

Hi @malteos,

the existing unit test should now work. I also added new unit tests based on the existing tests.
Let me know if these new tests are enough or if I forgot something.

Best,
Fabian

malteos

Looks good! Thank you so much for this Christmas gift. I'll try to deploy this to our production system asap.

malteos · 2020-05-04T11:51:42Z

Finally deployed to production! Sorry for the delay.

fchrubasik added 7 commits December 2, 2019 22:06

add html support

d6e66ab

add html support

756f78f

added backward compatibility

5069136

added backward compatibility and minor fix

026887c

extended list for html

9da4827

minor fix

d7d2ae3

extended word_delimiter

089ab9e

fchrubasik added 4 commits December 15, 2019 19:19

fixed existing unit tests

dc7a47b

new unit tests

53f6378

new unit tests

2b2e8a4

new unit tests

4ae21bf

malteos approved these changes Dec 19, 2019

View reviewed changes

malteos merged commit a4cba7a into openlegaldata:master Dec 19, 2019

malteos mentioned this pull request Dec 19, 2019

Invalid positions #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reference extraction with HTML #2

reference extraction with HTML #2

fchrubasik commented Dec 6, 2019

malteos commented Dec 9, 2019

fchrubasik commented Dec 15, 2019

malteos left a comment

malteos commented May 4, 2020

reference extraction with HTML #2

reference extraction with HTML #2

Conversation

fchrubasik commented Dec 6, 2019

malteos commented Dec 9, 2019

fchrubasik commented Dec 15, 2019

malteos left a comment

Choose a reason for hiding this comment

malteos commented May 4, 2020