Data Science Internship Project [Team M]
This project involves scraping customer reviews from the Skytrax website, cleaning and analyzing the data to gain insights, and presenting the findings to influence key business decisions at British Airways (BA). The project focuses on understanding customer feedback and sentiments to improve service quality, reduce costs, and increase revenue.
- Objective: Collect customer review data specifically related to British Airways from the Skytrax website.
- Method: Utilized web scraping techniques in Python to extract review data, focusing on comments related to the airline’s services.
- Output: Gathered a comprehensive dataset containing customer reviews, including comments, ratings, and review dates.
- Objective: Prepare the raw data for analysis by cleaning and organizing it.
- Process:
- Removed duplicates and irrelevant entries.
- Standardized text formatting (e.g., lowercasing, removing special characters).
- Handled missing values and corrected any data inconsistencies.
- Output: A clean dataset ready for analysis, with relevant and structured information.
- Objective: Analyze the cleaned data to uncover insights into customer sentiments and common topics.
- Approaches:
- Sentiment Analysis: Assessed the overall sentiment (positive, neutral, negative) of customer reviews using natural language processing (NLP) techniques.
- Topic Modelling: Identified common themes and topics discussed in the reviews using methods like Latent Dirichlet Allocation (LDA).
- Word Clouds: Created visual representations of the most frequently mentioned words and phrases.
- Output: Generated insights into customer satisfaction, common pain points, and areas for improvement.
- Objective: Present the findings in a clear, concise, and visually appealing format.
- Process:
- Created visualizations such as sentiment distribution graphs, topic frequency charts, and word clouds.
- Summarized key insights and metrics into a single PowerPoint slide.
- Output: A presentation-ready slide with visualizations and explanatory notes for board-level discussions.
- Programming Language: Python
- Libraries: BeautifulSoup, Pandas, NLTK, Scikit-learn, Matplotlib, WordCloud, Seaborn
- IDE: Jupyter Notebook
- Visualization Tool: PowerPoint
- Clone this repository.
- Install the necessary Python libraries from
requirements.txt
. - Run the Jupyter Notebook to scrape and analyze the data.
- Generate visualizations and compile the final PowerPoint slide.