-
Notifications
You must be signed in to change notification settings - Fork 0
Meeting 28. November
Currently training on 20.614.679
(20M) sentences and 333.397.520
(330M) words.
We're building a training set in the manner of Mikolov et al.
Is not specific to German. We want to use DBpedia to obtain many of the following relationships (?).
- Capitals of German federal states
- Country - Capital
- Country - Currency
- Country - Language
- Country - head of state
- Country - Demomym
- Company - Product
- Company - CEO
- Famous person - job
- Man-Woman
Is hard for German:
- Opposites (groß ➝ klein)
- Positive ➝ Comparative (groß ➝ größer)
- Positive ➝ Superlative (groß ➝ größten)
In German, we have four cases. We think it's only possible to learn the relationship between nominative and genitive. The rest is too often too close to either the singular or the plural.
-
Singular ➝ Plural (Buch ➝ Bücher)
-
Declension - [ ] nominativ ➝ genitiv (Buch ➝ Buches)
-
Buch, Buches, Buch, Buch
-
Mensch, Menschen, Menschen, Menschen
-
Gedanke, Gedankens, Gedanken, Gedanken
-
Junge, Jungens, Jungen, Jungen
-
Katze, Katze, Katze, Katze
With verbs there is a lot. We have
- 2 participles
- 6 persons (3 singular, 3 plural)
- three tenses
- 2 subjunktives (denke, dächte)
Also here the syntax requires auxiliary words like haben, and würde. E.g.
- habe gedacht (perfect/Perfekt )
- hatte gedacht (past perfect/Plusquamperfekt)
- werde denken (Futur I)
- werde gedacht haben (Futur II)
For now:
- infinitive ➝ passive participle (denken ➝ gedacht)
- infinitive ➝ present - [ ] 1st person (denke) - [ ] 2nd person (denkst) - [ ] 3rd person (denkt)
- infinitive ➝ past - [ ] 1st person (dachte) - [ ] 2nd person (dachtest) - [ ] 3rd person (dachte)