social-media-usda

Setting up the environment:

git clone https://github.com/ErikKBethke/social-media-usda

sudo apt-get install python3-pip

pip3 install nltk==3.2.4

sudo python3 -m nltk.downloader all

Training data text files [neg_tweets.txt, pos_tweets.txt] must be in the root folder
The python file Twitter_Sentiment_ETL.py must be in the root folder
The USDA Twitter data feed must come in formatting established by PJ, and must have "Twitter_Full" in the file name. This file must be in the root folder

Positive and negative tweet training data is fed into a Naive Bayes Classifier
USDA social media data is pulled into a pandas data frame
The Naive Bayes classification runs sentiment analysis on each Tweet, and sentiment is appended to the data frame
Each sentence is parsed through, creating a new data frame that contains rows for each word of every Tweet with associated data (sentiment, date, etc.)
Three files are output:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
archive		archive
.gitattributes		.gitattributes
README.md		README.md
Twitter_Master.xlsx		Twitter_Master.xlsx
Twitter_Sentiment_ETL-Custom.ipynb		Twitter_Sentiment_ETL-Custom.ipynb
Twitter_Sentiment_ETL-Custom.py		Twitter_Sentiment_ETL-Custom.py
Twitter_Sentiment_ETL.ipynb		Twitter_Sentiment_ETL.ipynb
Twitter_Sentiment_ETL.py		Twitter_Sentiment_ETL.py
neg_tweets.txt		neg_tweets.txt
pos_tweets.txt		pos_tweets.txt
stopwords.xlsx		stopwords.xlsx