Deep learning for Texts and Sequences course at Uni, repository for assignments
Language: Python
Framework: PyTorch
Window based tagging
Solve 2 common problems in NLP: POS (part of speech) tagging & NER (named entity recognition).
Do this using several approaches, model is always a single layer MLP (multi layered perceptron), with input of constant size: a window
- window of words, every word is an id.
- window of words, every word vector is the word embedding (pretrained)
- window of words, every word vector is the sum of it's: (a) word embedding. (b) suffix vector (c) prefix vector
RNNs and BiLSTM
Solve POS and NER using BiLSTM (bidirectional LSTM). Also, explore RNN limitations, what kind of sequences are easy for LSTM to learn, what is difficult?
Challenges included: batching for sequences (non-uniform length).
Implement an SNLI paper - An exercise in Attention
NLI (Natural Language Inference) is a problem in NLP in which given two sentences, the computer needs to know whether they: (a) contradict each other (b) neutral (c) they agree.
The SNLI is an NLI challenge managed by Stanford Univesity, and recent papers managed to get 89% accuracy (2018).
The paper I chose (https://arxiv.org/abs/1606.01933), solved this problem using a relatively intuitive approach:
High level description:
- First, each sentence is converted to it's word embeddings (Glove pretrained).
- Every pair of sentences are softly aligned to one another. Meaning every word from sentence 1 is softly aligned to all the words in sentence 2. Then you concat: every word from sentence 1 with subphrase from sentence 2.
- the concatination of word and subphrase is passed to a MLP which determines weather they "agree", "contradict", or "neutral" with one another.
- The result is summed, thus you "count" how many parts in the sentences "agree", "contradict" or are "neutral" with one another. And the result is this passed through a softmax to get the discrete class.
The Pros of this approach:
- Intuitive - much simpler and cleaner then other approches.
- Quick to run - this is an Attention without LSTM. LSTMs are knowns for their long run times: limited parallelization & need alot of data.
- High accuracy - 86% (paper), 81% (My own)