Language Detection Using N-Grams

https://github.com/melanie-t/twitter-language-detection

Project

This project uses Naive Bayes Classification for Natural Language Processing. The goal of the project is to detect the language (in a pre-specified list) of tweets using variations of N-Grams models. The languages supported are:

Basque (eu)
Catalan (ca)
Galician (gl)
Spanish (es)
English (en)
Portuguese (pt)

Requirements

Python Version 3.7+
Required Python packages
- numpy

Setting Up Project

Download the project via clone (on Git Repository) or ZIP file and extract the folder
Open the folder (twitter-language-detection) as a Python project with your choice of IDE
- Ensure that your Python interpreter is set to Python 3.7
- Set working directory to twitter-language-detection/src

Running the Project

Run Main.py
Enter the absolute path to the test file
The trace and evaluation files will be saved in src/output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Language Detection Using N-Grams

Project

Requirements

Setting Up Project

Running the Project

Files

README.md

Latest commit

History

README.md

File metadata and controls

Language Detection Using N-Grams

Project

Requirements

Setting Up Project

Running the Project