New repository

This repository is outdated. A more comperehensive, more precisely aligned dataset is found here: https://github.com/sebastian-nehrdich/sanstib

sanskrit-tibetan-etext

This is a collection of sentence-level aligned Sanskrit-Tibetan etexts. The Tibetan etexts have been taken from ACIP (https://asianclassics.org/), the Sanskrit etexts from GRETIL (http://gretil.sub.uni-goettingen.de). The HTML-versions of the aligned files are found in the html-folder. The org-folder contains .org-files (which are just plain text files). the matrices-folder contains pictures of the alignment-matrices of the different files. These can be useful to see to what extent the alignment has been successful.

The alignment-quality is good (overall average quality be somewhere around 97% in the case that the etexts are not noisy and no larger chunks in either of the languages are missing). However be prepared to find occasional mistakes. Some sort of compression occurs because the aligner is doing one-to-many and many-to-one alignments.
Feel free to open an issue or send me a mail in case you have own etexts that you want to have aligned or any further questions/suggestions!

The etexts have been created by a combination of a classifier using a convolutional neuronal network and the YASA sentence aligner (http://rali.iro.umontreal.ca/rali/?q=en/yasa).

Table of aligned files

Author / Name	Quality	% of sentences	date added	HTML	TXT	Alignment Matrix	Remarks
Abhidharmakośabhāṣyam	>97%	75%	10.6.18	HTML	TXT	PNG	Very high alignment quality
Abhidharmakośavyākhyā	90%	50%	10.6.18	HTML	TXT		High alignment quality, occasional disagreement between the SKT etext and the Tibetan translation accounts for a certain number of errors. Also note that bot etexts contain rather much noise.
Abhidharmasamuccaya	>97%	50%	10.6.18	HTML	TXT	PNG	Very high alignment quality
Abhidharmasamuccayabhāṣya	>97%	50%	10.6.18	HTML	TXT	PNG	Very high alignment quality
Madhyāntavibhāgabhāṣyam	>97%	50%	10.6.18	HTML	TXT	PNG	Very high alignment quality
Prasannapadā	90%	50%	10.6.18	HTML	TXT	PNG	High alignment quality, occasional disagreement between the SKT etext and the Tibetan translation accounts for a certain number of errors. Here both the Sanskrit Etext as well as the Tibetan translation are rather noisy.
Triṃśikavijñaptibhāṣyam	>97%	50%	10.6.18	HTML	TXT	PNG	Very high alignment quality

About the alignment matrices

Each point on the y-axis represents a Sanskrit sentence, each point on the x-axis represents a Tibetan sentence. The images can be very useful to get an impression about the quality of the translations and whether loss of longer sections has occoured.

Remarks about the alignment quality

The aligner is not able to cut sentences into smaller units, but it can do one-to-many and many-to-one alignments.
Errors most likely occur at the end of sentences when smaller units get aligned to the wrong corresponding sentence; this is due to the fact that the algorithm is a little bit weak in reliably detecting units that are shorter than 3 tokens.
If a longer part is missing in either of the two languages, the algorithm might loose it's track and therefore produce a couple of misalignments; such instances have to be located manually and fixed.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
html		html
matrices		matrices
org		org
txt		txt
README.md		README.md
abhidharmasamuccaya-aligned.html		abhidharmasamuccaya-aligned.html
abhidharmasamuccayabhasya-aligned.html		abhidharmasamuccayabhasya-aligned.html
mavt-aligned.html		mavt-aligned.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New repository

sanskrit-tibetan-etext

Table of aligned files

About the alignment matrices

Remarks about the alignment quality

About

Releases

Packages

Languages

sebastian-nehrdich/sanskrit-tibetan-etexts

Folders and files

Latest commit

History

Repository files navigation

New repository

sanskrit-tibetan-etext

Table of aligned files

About the alignment matrices

Remarks about the alignment quality

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages