Peraire Ground Truth

License

This dataset and model are published under the CC-BY 4.0 License.

To cite this dataset:

Chagué, A., & Pérez, G. (2023). Peraire Ground Truth (Version 2.0.0) [Data set]. https://doi.org/10.5281/zenodo.7185907

Description

This dataset was created in order to produce an HTR model for the Digital Peraire project. The documents are handwritten, dating from the second half of the 20th century, written in French with a blue ink pen or, more frequently, with a blue pencil. Occasional marginal notes appear in red.

Transcription guidelines

The transcription respects what is written on the document, including ponctuation and spelling errors.

The case is respected: capital letters are transcribed with capital letters.

Crossed out words are signaled by # which isn't used to transcribe anything else.

When a "v"-like sign is used to signal an insertion, it is transcribed with the character ⋎.

Segmentation guidelines

The SegmOnto ontology was used for the segmentation of this dataset.

For regions, MainZone and MarginTextZone were used. For lines, DefaultLine and InterlinearLine were used.

Regions	Lines

Warning: Since the main goal of this dataset was to produce ground truth for the transcription phase, and given how faded the text is on some pages, it is not recomended to use the following images to train a segmentation model:

B.1.intro-eurasie_0005.jpg
B.1b.europe-centrale_0005.jpg
B.2.europe-orientale_0007.jpg
B.26.malais_0048.jpg
B.28.java2_0017.jpg

Sources

The original documents are held at the Bibliothèque Sébert, Espéranto-France, Paris. They should be mentionned every time the images are used.

Model

See the models' README for more information about the training of the model.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
assets/img		assets/img
badges		badges
data		data
models		models
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
chars.csv		chars.csv
htr-united.yml		htr-united.yml
peraire_keyboard.json		peraire_keyboard.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peraire Ground Truth

License

Description

Transcription guidelines

Segmentation guidelines

Sources

Model

About

Releases 3

Packages

Contributors 2

License

alix-tz/peraire-ground-truth

Folders and files

Latest commit

History

Repository files navigation

Peraire Ground Truth

License

Description

Transcription guidelines

Segmentation guidelines

Sources

Model

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Packages