This repository is a compilation of my KaggleX project, Political Banter.
-
The goal of the project is to develop a tool that can generate political messaging that captures the tone and voice needed for producing quality and realistic content.
- Objective 1: Analyze text data distributed by politicians
- Objective 2: Generate fine-tuned messaging based on content produced by politicians
- Objective 3: Guide messaging with few shot prompting
- Objective 4: Build algorithm into app
-
The system will generate text for the following use cases:
- Headlines
- Press Releases
- Tweets
- Facebook posts
- Instagram Posts
- The .streamlit folder contains a config file used for designing the app
- The Logo Folder possesses logos generated by Adobe's logo generator
- The Baseline file contains basic prompt templates that inform the Default OpenAI Model. The model that is used is gpt-3.5-turbo-instruct-0914.
- The Finetuned file contains few shot prompts and enhanced prompting templates. The Fined-Tuned OpenAI Model used is fine-tuned based on a database of tweets by politicians across the United States.
- The ingest file is a tool used to input and format text-based data in preparation for fine-tuning. The output of the file results in a new fine-tuned model.
- The Kaggle_Data notebook provides a walkthrough of the exploratory data analysis that was done on a Kaggle database of tweets posted by politicians across the United States. NLP was used to clean and analyze the data and identify top words and overarching topic themes across the tweets.
- The political_social_media file is the dataset of political social media posts retrieved from Kaggle that was used to fine-tune the model: Political Social Media Posts.
- The prompt_app file creates an app called Political Banter, which generates catchy political content in the form of a headline, press release, tweet, facebook post, and instagram post.
- The requirements file contains installation requirements used to run the app.
- Used Spacy for lemmatization
- Used scikit-learn CountVectorizer for tokenization and Latent Dirichlet Allocation (LDA) for topic modeling