add custom ali copyright language #2050

teovin · 2024-05-16T15:09:18Z

Add a new ali_licensed field to the ContentCode model to indicate whether the section/resource has a special ALI copyright text
Update the copyright template to check and display the new language
Retroactively update previous entries via a task to populate the new column
licensed_materials list will be updated when we receive it from ALI, right now I only used it as a placeholder. Thus this PR won't be merged until we finalize it.

web/main/models.py

web/tasks.py

jcushman

Thanks for making a start on this! As Ben noted, the current algorithm of looking for any word in "match_words" won't work, because nodes can contain words like "contracts" without being from the Restatement of the Law of Contracts. What we want is something that will detect each section in this PDF: https://www.ali.org/media/filer_public/f8/e9/f8e919d5-790e-408c-af0a-65db1687dbc5/2023-checklist.pdf , which you can also see listed on this page: https://www.ali.org/publications/

So for example:

If the title contains "restatement" AND "contracts", it should have title="Restatement of the Law, Contracts", "years": "1971-2023"
If the title contains "restatement" AND "agency", it should have title="Restatement of the Law, Agency", "years": "2026-2023"
if it just says "Contracts: An introduction", it should have ali_licensed=false.

A good way to do this would be to add a test function to the test suite for your new get_ali_license_text function that gives it a bunch of challenging examples, run some google searches like "site:opencasebook.org restatement", "site:opencasebook.org contracts", "site:opencasebook.org agency", "site:opencasebook.org torts", and make sure the test suite includes expected results for a range of examples that look challenging to you from those search results.

teovin · 2024-05-17T14:56:00Z

Thanks for the detailed explanation, I will work on it!

teovin · 2024-05-21T20:27:54Z

@jcushman Some of the titles in the pdf contain 2nd, 3rd, and some additional wording, for example Conflict of Laws 2d (vols. 1-2) & Appendix (vols. 3-8). In my list of materials to check, I didn't include the additional wording. I am also not including them in the text displayed. Some materials have several different title variations due to this, such as Property. Due to this, for all the different variations of Property-related titles, for example, I will be displaying the title as below:

Another note is that since there are several publications under each title, and I will be grouping similar titles under one through above, I made the copyright years a range of the earliest and latest dates. For example: 2006-2023 for Agency.

Let me know if you are ok with this approach, or would prefer it to be more granular

jcushman · 2024-05-21T20:30:20Z

Let me know if you are ok with this approach, or would prefer it to be more granular

This is also what I had in mind -- sounds great!

teovin · 2024-05-21T20:32:43Z

One more thing to note is that since we didn't have the Model Penal Code publications in the pdf, I grabbed the two related materials' information from the ALI website (Sentencing, Sexual Assault and Related Offenses). Let me know if anything needs to be corrected for those.

jcushman

This looks promising! The data structure looks right and the logic looks reasonable.

In addition to the notes I left for efficiency, can you add tests to make sure this performs as you intend? You can see in models.py we have various doctests like

    r"""
    Return all text, including spaces, from the html, using the LXML library.
    >>> html = ' \n <p> \r\n <em> ...'
    >>> assert ContentAnnotation.text_from_html(html) == ' \n  \r\n  foo  ...'
    """

This code will actually be run from pytest, and is able to import fixtures defined in web/conftest.py. So you would do something in the docstring like:

>>> node = getfixture('contentnode')
>>> node.title = 'Restatement of Contracts 2d, blah blah blah'
>>> assert node.get_ali_license_text() == '...'

And include a few different examples to cover different scenarios. Alternatively you can add a test function to a reasonable place in main/test/ if the docstring approach starts to feel clunky.

web/main/views.py

web/main/models.py

jcushman

Just a typo in the "Principles" titles, then this is ready to go!

web/static/data/ali_materials.json

teovin added 4 commits May 16, 2024 11:04

add custom ali copyright language

e15d90d

linting

eda8624

typing update

cc4f69a

run makemigrations per convo in lil-dev

bfa5f3f

teovin marked this pull request as ready for review May 16, 2024 18:13

teovin requested a review from a team as a code owner May 16, 2024 18:13

teovin requested review from bensteinberg and removed request for a team May 16, 2024 18:13

bensteinberg reviewed May 16, 2024

View reviewed changes

web/main/models.py Outdated Show resolved Hide resolved

web/main/models.py Outdated Show resolved Hide resolved

web/tasks.py Show resolved Hide resolved

remove split

18c2dd3

jcushman requested changes May 17, 2024

View reviewed changes

add ali materials to check against

2797a90

jcushman requested changes May 21, 2024

View reviewed changes

web/main/views.py Outdated Show resolved Hide resolved

web/main/models.py Outdated Show resolved Hide resolved

web/main/models.py Show resolved Hide resolved

teovin added 2 commits May 22, 2024 11:04

add doctests, move json load to settings

19904af

remove fixture

9261950

jcushman requested changes May 22, 2024

View reviewed changes

web/static/data/ali_materials.json Outdated Show resolved Hide resolved

fix typo in principles

9ff0c5c

jcushman approved these changes May 22, 2024

View reviewed changes

teovin merged commit 9749150 into harvard-lil:develop May 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add custom ali copyright language #2050

add custom ali copyright language #2050

teovin commented May 16, 2024 •

edited

Loading

jcushman left a comment

teovin commented May 17, 2024

teovin commented May 21, 2024

jcushman commented May 21, 2024

teovin commented May 21, 2024

jcushman left a comment

jcushman left a comment

add custom ali copyright language #2050

add custom ali copyright language #2050

Conversation

teovin commented May 16, 2024 • edited Loading

jcushman left a comment

Choose a reason for hiding this comment

teovin commented May 17, 2024

teovin commented May 21, 2024

jcushman commented May 21, 2024

teovin commented May 21, 2024

jcushman left a comment

Choose a reason for hiding this comment

jcushman left a comment

Choose a reason for hiding this comment

teovin commented May 16, 2024 •

edited

Loading