Data-Modeling-with-Postgres

Applying data modeling with Postgres and build an ETL pipeline using Python. By defining fact and dimension tables for a star schema for a particular analytic focus, and writing an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL.

Files:

test.ipynb displays the first few rows of each table to let you check your database.
create_tables.py drops and creates your tables. You run this file to reset your tables before each time you run your ETL scripts.
etl.ipynb reads and processes a single file from song_data and log_data and loads the data into your tables. This notebook contains detailed instructions on the ETL process for each of the tables.
etl.py reads and processes files from song_data and log_data and loads them into your tables. You can fill this out based on your work in the ETL notebook.
sql_queries.py contains all your sql queries, and is imported into the last three files above.
helpers.py some python functions to be used on the other files.
db_config.yml data base configuration fields.

Project Steps

Create Tables

Write CREATE statements in sql_queries.py to create each table.
Write DROP statements in sql_queries.py to drop each table if it exists.
Run create_tables.py to create database and tables.
Build ETL in etl.ipynb and etl.py
Run test.ipynb to conf after each step to run Sanity Checks.

Schema:

Star schema

Fact Table

songplays - records in log data associated with song plays i.e. records with page NextSong
- songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables

users - users in the app
- user_id, first_name, last_name, gender, level
songs - songs in music database
- song_id, title, artist_id, year, duration
artists - artists in music database
- artist_id, name, location, latitude, longitude
time - timestamps of records in songplays broken down into specific units
- start_time, hour, day, week, month, year, weekday

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Modeling-with-Postgres

Files:

Project Steps

Create Tables

Schema:

Fact Table

Dimension Tables

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
README.md		README.md
create_tables.py		create_tables.py
db_config.yml		db_config.yml
etl.ipynb		etl.ipynb
etl.py		etl.py
helpers.py		helpers.py
songs_schema.png		songs_schema.png
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

adelabuhashim/Data-Modeling-with-Postgres

Folders and files

Latest commit

History

Repository files navigation

Data-Modeling-with-Postgres

Files:

Project Steps

Create Tables

Schema:

Fact Table

Dimension Tables

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages