Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add custom ali copyright language #2050

Merged
merged 9 commits into from
May 22, 2024

Conversation

teovin
Copy link
Contributor

@teovin teovin commented May 16, 2024

  • Add a new ali_licensed field to the ContentCode model to indicate whether the section/resource has a special ALI copyright text
  • Update the copyright template to check and display the new language
  • Retroactively update previous entries via a task to populate the new column
  • licensed_materials list will be updated when we receive it from ALI, right now I only used it as a placeholder. Thus this PR won't be merged until we finalize it.

@teovin teovin marked this pull request as ready for review May 16, 2024 18:13
@teovin teovin requested a review from a team as a code owner May 16, 2024 18:13
@teovin teovin requested review from bensteinberg and removed request for a team May 16, 2024 18:13
web/main/models.py Outdated Show resolved Hide resolved
web/main/models.py Outdated Show resolved Hide resolved
web/tasks.py Show resolved Hide resolved
Copy link
Contributor

@jcushman jcushman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making a start on this! As Ben noted, the current algorithm of looking for any word in "match_words" won't work, because nodes can contain words like "contracts" without being from the Restatement of the Law of Contracts. What we want is something that will detect each section in this PDF: https://www.ali.org/media/filer_public/f8/e9/f8e919d5-790e-408c-af0a-65db1687dbc5/2023-checklist.pdf , which you can also see listed on this page: https://www.ali.org/publications/

So for example:

  • If the title contains "restatement" AND "contracts", it should have title="Restatement of the Law, Contracts", "years": "1971-2023"
  • If the title contains "restatement" AND "agency", it should have title="Restatement of the Law, Agency", "years": "2026-2023"
  • if it just says "Contracts: An introduction", it should have ali_licensed=false.

A good way to do this would be to add a test function to the test suite for your new get_ali_license_text function that gives it a bunch of challenging examples, run some google searches like "site:opencasebook.org restatement", "site:opencasebook.org contracts", "site:opencasebook.org agency", "site:opencasebook.org torts", and make sure the test suite includes expected results for a range of examples that look challenging to you from those search results.

@teovin
Copy link
Contributor Author

teovin commented May 17, 2024

Thanks for the detailed explanation, I will work on it!

@teovin
Copy link
Contributor Author

teovin commented May 21, 2024

@jcushman Some of the titles in the pdf contain 2nd, 3rd, and some additional wording, for example Conflict of Laws 2d (vols. 1-2) & Appendix (vols. 3-8). In my list of materials to check, I didn't include the additional wording. I am also not including them in the text displayed. Some materials have several different title variations due to this, such as Property. Due to this, for all the different variations of Property-related titles, for example, I will be displaying the title as below:

image

Another note is that since there are several publications under each title, and I will be grouping similar titles under one through above, I made the copyright years a range of the earliest and latest dates. For example: 2006-2023 for Agency.

Let me know if you are ok with this approach, or would prefer it to be more granular

@jcushman
Copy link
Contributor

Let me know if you are ok with this approach, or would prefer it to be more granular

This is also what I had in mind -- sounds great!

@teovin
Copy link
Contributor Author

teovin commented May 21, 2024

One more thing to note is that since we didn't have the Model Penal Code publications in the pdf, I grabbed the two related materials' information from the ALI website (Sentencing, Sexual Assault and Related Offenses). Let me know if anything needs to be corrected for those.

Copy link
Contributor

@jcushman jcushman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising! The data structure looks right and the logic looks reasonable.

In addition to the notes I left for efficiency, can you add tests to make sure this performs as you intend? You can see in models.py we have various doctests like

    r"""
    Return all text, including spaces, from the html, using the LXML library.
    >>> html = ' \n <p> \r\n <em> ...'
    >>> assert ContentAnnotation.text_from_html(html) == ' \n  \r\n  foo  ...'
    """

This code will actually be run from pytest, and is able to import fixtures defined in web/conftest.py. So you would do something in the docstring like:

>>> node = getfixture('contentnode')
>>> node.title = 'Restatement of Contracts 2d, blah blah blah'
>>> assert node.get_ali_license_text() == '...'

And include a few different examples to cover different scenarios. Alternatively you can add a test function to a reasonable place in main/test/ if the docstring approach starts to feel clunky.

web/main/views.py Outdated Show resolved Hide resolved
web/main/models.py Outdated Show resolved Hide resolved
web/main/models.py Show resolved Hide resolved
Copy link
Contributor

@jcushman jcushman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a typo in the "Principles" titles, then this is ready to go!

web/static/data/ali_materials.json Outdated Show resolved Hide resolved
@teovin teovin merged commit 9749150 into harvard-lil:develop May 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants