Consistency-based Active Learning for Sentiment Analysis

Right now this is using financial news, but this can also be repurposed to use IMDB dataset.

Project Structure

├── datasets/
│   ├── financial_news.csv
│   ├── financial_news_train.csv
│   └── financial_news_test.csv
├── base_model_testing.py
├── calsa.py
├── model_training.py
├── random_sampling.py
├── text_augmentation.py
└── financial_news_preprep.py

Setup

Requirements

Python 3.8+
PyTorch
Transformers
pandas
numpy
scikit-learn
nlpaug
datasets

Dataset Preparation

Place your financial news dataset in datasets/financial_news.csv
Run the preprocessing script:

python financial_news_preprep.py

Running Experiments

1. Base Model Testing

Test the performance of the pre-trained DistilBERT model:

python base_model_testing.py

2. Random Sampling Baseline

Run experiments with different sample sizes (100, 300, 500):

python random_sampling.py

3. CALSA Active Learning

Run the CALSA pipeline with text augmentation:

python calsa.py

4. Model Training

Train models using selected samples:

python model_training.py

Model Configuration

Base Model: DistilBERT (distilbert-base-uncased-finetuned-sst-2-english)
Batch Size: 8
Number of Epochs: 3
Learning Rate: Default from Hugging Face Trainer
Max Sequence Length: 512

Augmentation Techniques

The text augmentation pipeline includes:

Synonym replacement (WordNet)
Back-translation (French, German, Spanish)
Random word insertion/deletion
Sentence shuffling

Results

Results are saved in:

fine_tuned_models_random_sampling_financial_news/: Random sampling results
results_calsa/: CALSA results
base_model_test_outputs/: Base model performance

Each experiment generates:

Trained model checkpoints
Confusion matrices
Classification reports

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
annotations		annotations
imdb_augmented_datasets		imdb_augmented_datasets
original_datasets		original_datasets
util		util
BackTranslation_Data_Augmentation.ipynb		BackTranslation_Data_Augmentation.ipynb
GPT2-Generated.ipynb		GPT2-Generated.ipynb
README.md		README.md
RoBERTa.ipynb		RoBERTa.ipynb
active_learning_state.json		active_learning_state.json
annotation_manager.py		annotation_manager.py
augmented_test_reviews.csv		augmented_test_reviews.csv
augmented_test_reviews_p.csv		augmented_test_reviews_p.csv
augmented_train_reviews.csv		augmented_train_reviews.csv
augmented_train_reviews_p.csv		augmented_train_reviews_p.csv
base_model_testing.py		base_model_testing.py
calsa.ipynb		calsa.ipynb
calsa.py		calsa.py
data_augmentation.ipynb		data_augmentation.ipynb
financial_news_preprep.py		financial_news_preprep.py
model_evaluation.ipynb		model_evaluation.ipynb
model_fitting.ipynb		model_fitting.ipynb
model_training.py		model_training.py
normal_test_reviews.csv		normal_test_reviews.csv
normal_train_reviews.csv		normal_train_reviews.csv
random_sampling.ipynb		random_sampling.ipynb
random_sampling.py		random_sampling.py
random_sampling_dataset.py		random_sampling_dataset.py
sentiment-analysis-of-imdb-movie-reviews.ipynb		sentiment-analysis-of-imdb-movie-reviews.ipynb
text_augmentation.ipynb		text_augmentation.ipynb
text_augmentation.py		text_augmentation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Consistency-based Active Learning for Sentiment Analysis

Project Structure

Setup

Requirements

Dataset Preparation

Running Experiments

1. Base Model Testing

2. Random Sampling Baseline

3. CALSA Active Learning

4. Model Training

Model Configuration

Augmentation Techniques

Results

About

Releases

Packages

Contributors 2

Languages

ItsCarmine/SentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

Consistency-based Active Learning for Sentiment Analysis

Project Structure

Setup

Requirements

Dataset Preparation

Running Experiments

1. Base Model Testing

2. Random Sampling Baseline

3. CALSA Active Learning

4. Model Training

Model Configuration

Augmentation Techniques

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages