This project dives into the world of Uber data analytics using modern data engineering practices on Google Cloud Platform (GCP). We'll leverage tools like Mage.ai for building an ETL pipeline, BigQuery for data warehousing, Looker Studio for data visualization, and Cloud Storage for managing data throughout the process.
- Programming Language - Python
- Scripting Language - SQL
- Google Cloud Platform
- BigQuery
- Cloud Storage
- Looker Studio
- Compute Instance
- Mage.AI (modern data pipeline tool)
Modern data Pipeline Tool: https://www.mage.ai/
Contribute to this project here: https://github.com/mage-ai/mage-ai
TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
Here is the dataset used in the video - https://github.com/darshilparmar/uber-data-engineering-mage-project/blob/main/data/uber_data.csv
- Original Data Source - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
- Data Dictionary - https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
Video Link - https://www.youtube.com/watch?v=WpQECq5Hx9g