Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enslaved.org integration #239

Open
jduss4 opened this issue Jan 14, 2021 · 2 comments
Open

enslaved.org integration #239

jduss4 opened this issue Jan 14, 2021 · 2 comments
Assignees
Labels

Comments

@jduss4
Copy link
Contributor

jduss4 commented Jan 14, 2021

We have been asked to contribute data to enslaved.org in the form of a CSV.

Without further information, from their search page I am assuming they are interested in:

  • people
  • events
  • places
  • sources

We could likely line up the OSCYS data as:

  • people
  • events (cases)
  • places (jurisdictions / birth places)
  • sources (documents)

I do not believe all of the information we would want to send them is in Solr, unfortunately, so we would likely need to write a script that got some results from Solr and combined them with personography files and the TTL file.

Documents should be good to go either from Solr or from the TEI files, as they list person ids and case ids.
Cases are something we can get entirely from Solr, as they are aggregated from documents.
People are tricky because there is likely more information in the personography than Solr (need to confirm) and there is also a lot of relationship information by way of our TTL file.

What we need to find out:

  • Are they interested in information about people who are not enslaved? (enslavers, judges, attorneys, etc)
  • Do they want cases and documents?
  • How much relationship information do they want about people?
  • Will this be something that we can update or is this a one-time thing? (this will determine what kind of script I write)
  • When is the deadline?
@karindalziel
Copy link
Member

I have more details about this, I think this is the way forward to start with:

  • build two datasets in csv format, one for "people" and one for "events"
    • people: will be built from the personography, reformatting the fields in that as CSV fields, and separating out multiple values with ";". ook to the PDF linked from this page for help with field definitions: https://docs.enslaved.org/metadata/personMetadata/
      • for ID, I think we should ahve two ID's "local" which is the last part of the URL, and "namespaced" which is the whole URL
    • "events" - there will be one event per case file, which are the tei documents that start with "oscys.caseid". Each case file should be a row in the csv, and follow the metadata defined https://docs.enslaved.org/metadata/eventMetadata/

Write scripts in any form that's useful (ruby, xslt, python) and save them in the scripts folder (https://github.com/CDRH/data_oscys/tree/main/scripts) in an "enslaved.org_scripts" (or something) folder. save outputted files in https://github.com/CDRH/data_oscys/tree/main/output/data_export (we will need to update the readme for that folder and the filenames some)

@nichgray
Copy link

Based on the request for source information, which is only included in document files, we are likely to need an additional document dataset.

@nichgray nichgray self-assigned this Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants