Skip to content

Meeting 28. November

Nico Ring edited this page Dec 7, 2016 · 2 revisions

Current Models

Currently training on 20.614.679 (20M) sentences and 333.397.520 (330M) words.

Evaluation

Next steps

We're building a training set in the manner of Mikolov et al.

Semantic

Is not specific to German. We want to use DBpedia to obtain many of the following relationships (?).

  • Capitals of German federal states
  • Country - Capital
  • Country - Currency
  • Country - Language
  • Country - head of state
  • Country - Demomym
  • Company - Product
  • Company - CEO
  • Famous person - job
  • Man-Woman

Syntactic

Is hard for German:

Adjectives

  • Opposites (großklein)
  • Positive ➝ Comparative (großgrößer)
  • Positive ➝ Superlative (großgrößten)

Nouns

In German, we have four cases. We think it's only possible to learn the relationship between nominative and genitive. The rest is too often too close to either the singular or the plural.

  • Singular ➝ Plural (BuchBücher)

  • Declension - [ ] nominativ ➝ genitiv (Buch ➝ Buches)

  • Buch, Buches, Buch, Buch

  • Mensch, Menschen, Menschen, Menschen

  • Gedanke, Gedankens, Gedanken, Gedanken

  • Junge, Jungens, Jungen, Jungen

  • Katze, Katze, Katze, Katze

Verbs

With verbs there is a lot. We have

  • 2 participles
  • 6 persons (3 singular, 3 plural)
  • three tenses
  • 2 subjunktives (denke, dächte)

Also here the syntax requires auxiliary words like haben, and würde. E.g.

  • habe gedacht (perfect/Perfekt )
  • hatte gedacht (past perfect/Plusquamperfekt)
  • werde denken (Futur I)
  • werde gedacht haben (Futur II)

For now:

  • infinitive ➝ passive participle (denkengedacht)
  • infinitive ➝ present - [ ] 1st person (denke) - [ ] 2nd person (denkst) - [ ] 3rd person (denkt)
  • infinitive ➝ past - [ ] 1st person (dachte) - [ ] 2nd person (dachtest) - [ ] 3rd person (dachte)
Clone this wiki locally