Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weaving word alignments #11

Open
amandakann opened this issue Nov 1, 2024 · 0 comments
Open

Weaving word alignments #11

amandakann opened this issue Nov 1, 2024 · 0 comments

Comments

@amandakann
Copy link

in my research, i work a lot with word alignments – links between the index of a word in one sentence, and the index of the rquivalent word in a translation of the same sentence.

勇気 は どこ に ?きみ の 胸 に !
2 1 0 -1 3 5 5 6 4 7
where is courage ? in your heart !

these alignments are meant to link two translations of the same sentence together, but could just as well be pointed to a different sentence entirely.

for nanogenmo 2024, i want to write a little program that traverses a corpus of aligned sentences, starting at the first word of the first sentence and following its alignment to the next sentence in the corpus, weaving fragments of each sentence into a connected string of words.
since all aligned corpora are (at least) bilingual, two weaves can be made at the same time – woven in different languages, but formed by fragments from the same sentences in the same order – starting from the same point, diverging quicker the more different the two languages are, sometimes briefly woven back together by chance.
the resulting weaves will end up being mostly ungrammatical and incomprehensible on their own, but maybe more meaningful together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant