Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some training data with examples with docs containing highlights #978

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Aazhar
Copy link
Collaborator

@Aazhar Aazhar commented Dec 6, 2022

some documents contains highlights in the first page, this part is confused with the abstract, I'm submitting some training data for this case

@lfoppiano lfoppiano self-assigned this Dec 26, 2024
@lfoppiano lfoppiano force-pushed the added_training_data_for_examples_highlights branch from e9a1b20 to bf10740 Compare December 26, 2024 11:49
@coveralls
Copy link

Coverage Status

coverage: 40.868%. remained the same
when pulling bf10740 on added_training_data_for_examples_highlights
into 4c85ab0 on master.

@lfoppiano
Copy link
Collaborator

HI @achrafazharccsd apologize for checking this 2 years in late.

There is a problem with these articles, both of them are not CC-BY. The Cell's one is CC-BY-ND-NC (ND = Non derivative), which does not allow derivative work. The Pacman is not CC. So both of them cannot be really added to the grobid public data.

Also the annotation require some correction (I checked them before realising the licence), If you have some equivalent articles that are CC-BY, I'm happy to help you with the training data and providing an updated model for it.

@lfoppiano lfoppiano added the licence:needs_CC-BY The articles are not CC-BY label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
licence:needs_CC-BY The articles are not CC-BY
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants