Skip to content

maor121/NLP-89680

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-89680

A repository for NLP course at uni.

Language: Python

Assignment 1

Assignment 2

PCFG (Probabilistic Context Free Grammar)
Writing a grammar that generates legal sentences in English. Exploring some (but not all) aspects of the English language, and their implementation in a grammar.

Assignment 3

Distributional Semantics
Finding similar words by meaning using a combination of algorithms, and making a detailed report comparing them.
Data: wikipedia
Algorithms:

  • word contexts: (a) sentence (b) window (c) dependency tree (parent\son, direction of arc, jump over preposition)
  • similarity: (a) cosine distance (b) PMI
  • order: (a) 1st order similarity (b) 2nd order similarity

Assignment 4

Relation Extraction
Given a small amount of data (news articles), extract Named Entities, from each sentence, and the relation between them.
i.e Yosi (work for) CBS

Algorithm:

  1. For each sentence, extract Named Entities, dependency tree and POS tagging using spacy library.
  2. Generate a sequence from the path between the two entities on the dependency tree.
  3. Run LSTM on the path, concat output with other feature vectors, such as: Named Entity type, Named Entity POS tag.
  4. Pass through MLP with softmax activation.

Challenges:
  • small dataset
  • missing labels (entities\relations that should have been included in the gold file)
  • mismatches between gold file Named Entities, and spacy output Named Entities.

Architecture choice:
pure ML approach, instead of hybrid ML and rule based. A hybrid could be made after error analysis, for example: the model sometimes confuses relation (work for) with (kill), because both relations contain PERSONs.

About

A repository for NLP course at uni.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •