Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move TEI idno identifiers under <analytics> #1193

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lfoppiano
Copy link
Collaborator

@lfoppiano lfoppiano commented Oct 28, 2024

See #1192 . The same treatment is applied to any identifier: PMCID, PMID, halID, etc

See example:

                                </address>
                            </affiliation>
                        </author>
                        <title level="a" type="main">Transgressive phenotypes from outbreeding between the Trichoderma reesei hyper producer RutC30 and a natural isolate</title>
                        <idno type="DOI">10.1128/spectrum.00441-24</idno>
                    </analytic>
                    <monogr>
                        <imprint>
                            <date type="published" when="2024-08-20">20 August 2024</date>
                        </imprint>
                    </monogr>
                    <idno type="MD5">9E9A05DAEBD10C49EB098AF73FA55CD1</idno>
                    <idno type="DOI" status="deprecatedLocation">10.1128/spectrum.00441-24</idno>
                    <note type="submission">Received 22 February 2024 Accepted 3 July 2024</note>
                </biblStruct>

@coveralls
Copy link

Coverage Status

coverage: 40.766% (+0.01%) from 40.755%
when pulling 60d7a19 on bugfix/move-idno-under-analytics
into be44579 on master.

@kermitt2
Copy link
Owner

Hi Luca !

Thinking about it a second time, it might be more complicated than that and I think I understand the motivation for letting the identifiers under in the case of Grobid.

If we extract a raw DOI from a PDF, the position of the DOI in normal TEI depends on the type of document we process, either under if this is a part of a monograph or journal, or under if we have a report, standalone article, etc. As we can't know for sure about the document type, it make sense to put it under in Grobid, to avoid errors, as a relaxed rule.

Second, we have HAL ID which corresponds to standalone document, same for arXiv ID. We cannot put them under , but it does not make sense to put it under neither. is a valid default choice that explicitly capture the fact that we cant know the level of the document.

The idea behind this is: when we don't know the level associated to a extracted ID, we explicitly put it under .

@laurentromary for comments :)

@kermitt2
Copy link
Owner

Note to myself: check how the DOI/ids are positioned in the case of consolidated header. We can have 2 different DOIs, one for the part and one for the "hosting" document.

@lfoppiano lfoppiano marked this pull request as draft November 20, 2024 16:26
@lfoppiano lfoppiano added this to the 0.9.0 milestone Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants