LING-L715

Hate Speech on Levantine Tweets

Datasets: 1. DART

    a.) /cf-data 
            Contains .txt files, each with Tweets from that specific dialect
            (e.g. EGY.txt contains Tweets of Egyptain dialect)
            Format: 3 columns; (1) score, (2) tweet_ID, (3) tweet_text

    b.) get_DART_transcripts.py
            Code to extract Tweets from .txt files in /cf-data
            Creates 2 new files: (1) clean_LEV.tsv, (2) clean_NONLEV.tsv
            The --other_paths argument takes at least one path; simply write out the paths one after another
            From terminal: python get_DART_transcripts.py --lev_path [PATH TO '/DART/cf-data/LEV.txt'] --other_paths [PATHS TO '/DART/cf-data/[^LEV].txt']

    c.) clean_LEV.tsv
            One of the output .tsv files after running get_DART_transcripts.py
            Contains Levantine-only Tweets, each labeled with "LEV"

    d.) clean_NONLEV.tsv
            One of the output .tsv files after running get_DART_transcripts.py
            Contains Levantine-only Tweets, each labeled with "NONLEV"

    e.) classify_LEV.py
            From terminal: python classify_LEV.py --clean_lev_path [path to 'clean_LEV.tsv'] --clean_nonlev_path [PATH TO 'clean_NONLEV.tsv']

    f.) classify_LEV_metrics.txt
            Results from Linear SVM classifier using SelectKBest(chi2, k=500)
            Binary classification ("LEV" vs "NONLEV")
    
    g.) 
            

2. LHSAB
3. OSACT

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Archived		Archived
DART		DART
LHSAB		LHSAB
OSACT		OSACT
.DS_Store		.DS_Store
7.5K tweets.xlsx		7.5K tweets.xlsx
Classification Results.txt		Classification Results.txt
HuggingFace LLMS Metrics.txt		HuggingFace LLMS Metrics.txt
MoreHate.csv		MoreHate.csv
PreProcessing.py		PreProcessing.py
README.md		README.md
acl.sty		acl.sty
combined_LHSAB_allOSACT_metrics.txt		combined_LHSAB_allOSACT_metrics.txt
combined_LHSAB_levOSACT.tsv		combined_LHSAB_levOSACT.tsv
combined_LHSAB_levOSACT_metrics.txt		combined_LHSAB_levOSACT_metrics.txt
combined_train_LHSAB_allOSACT.tsv		combined_train_LHSAB_allOSACT.tsv
combined_train_LHSAB_levOSACT.tsv		combined_train_LHSAB_levOSACT.tsv
get_combined_tsv.py		get_combined_tsv.py
svm.py		svm.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LING-L715

About

Releases

Packages

Contributors 2

Languages

lilykaw/LevArab_HateSpeech

Folders and files

Latest commit

History

Repository files navigation

LING-L715

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages