Skip to content

Latest commit

 

History

History
28 lines (23 loc) · 984 Bytes

README.md

File metadata and controls

28 lines (23 loc) · 984 Bytes

Language Detection Using N-Grams

https://github.com/melanie-t/twitter-language-detection

Project

This project uses Naive Bayes Classification for Natural Language Processing. The goal of the project is to detect the language (in a pre-specified list) of tweets using variations of N-Grams models. The languages supported are:

  • Basque (eu)
  • Catalan (ca)
  • Galician (gl)
  • Spanish (es)
  • English (en)
  • Portuguese (pt)

Requirements

  • Python Version 3.7+
  • Required Python packages
    • numpy

Setting Up Project

  1. Download the project via clone (on Git Repository) or ZIP file and extract the folder
  2. Open the folder (twitter-language-detection) as a Python project with your choice of IDE
    • Ensure that your Python interpreter is set to Python 3.7
    • Set working directory to twitter-language-detection/src

Running the Project

  1. Run Main.py
  2. Enter the absolute path to the test file
  3. The trace and evaluation files will be saved in src/output