- Member: Efe Celik
- Member: Ulas Ucrak
- Member: Eray Acikgoz
In this study, we aimed to analyze the sentiment and topic distribution of Bitcoin-related tweets using both synthetic and real datasets. Our goal was to:
- Predict tweet sentiments (bearish, bullish, neutral). Hugging Face App
- Assess the relationship between tweet sentiment and Bitcoin price movements.
- Evaluate the influence of users based on follower count and verification status.
- Compare topic similarity between synthetic and real datasets.
- Determine if influential users make more accurate predictions.
- Synthetic Tweet Dataset: Hugging Face Dataset
- Real Tweet Dataset: Kaggle Dataset
- Bitcoin Price Dataset: Kaggle Dataset
- Synthetic Dataset: Text cleaning, tokenization, stop-word removal, and lemmatization.
- Real Dataset: Applied similar preprocessing steps for consistency.
- Word2Vec was used for text embeddings.
- Trained Models:
- Logistic Regression
- Neural Network
- Long Short-Term Memory (LSTM)
- Model Outputs: Saved for further analysis and prediction tasks.
- Merged Data: Real tweets and Bitcoin price data aligned using timestamps.
- Prediction Models: Logistic Regression and Neural Network were used to label real tweets.
- Sentiment Labels: Bearish, Bullish, Neutral.
- Metrics: Follower count, retweets, verification status.
- Accuracy Analysis: Compared average influence scores for correct and incorrect predictions.
- Impact Study: Analyzed if influential users made more accurate predictions.
- Analyzed how sentiment trends influenced Bitcoin price changes.
- Compared Logistic Regression and Neural Network results.
- Users were labeled as 'Trust' or 'Don't Trust' based on prediction accuracy and influence scores.
- Applied LDA (Latent Dirichlet Allocation) and DTM (Document-Term Matrix).
- Compared topic distributions between synthetic and real datasets.
- Developed an interactive sentiment prediction application using the Logistic Regression model.
- Input: Text (e.g., tweets or comments).
- Output: Sentiment prediction (bearish, bullish, neutral).
This study demonstrated the effectiveness of using synthetic datasets for training sentiment analysis models and validated their performance on real-world data.
- Explored the connection between sentiment trends, price changes, and user influence.
- Despite limitations in measuring market impact, the developed application serves as a practical tool for real-time sentiment analysis.
🚀 Future Work: Further refine influence measurement techniques and enhance market impact analysis.
🔗 Explore the Project
Stay tuned for updates and feel free to contribute! 🚀