ETL using Python and PostgreSQL

Introduction

This project demonstrates an ETL process that reads attribute information and log data using Python and writes the information into a normalized database build in Postgre. The data represents songs in a streaming service called Sparkify and log data that contains information about who listened to which songs and other related attributes.

This project was developed as part of the Udacity Data Engineering nanodegree program.

How to install

There is no installation package. The folder structure and all of the files can be downloaded from the repositoy and saved directly on a local computer. The code expects the files to be saved in the /home/workspace folder of the local machine

How to use

The database is (re)created and refreshed by executing the create_tables.py script from the command line.
The logic of the ddl and dml scripts in the sql_queries.py script can be tested by executing the etl.ipynb and test.ipnyb python notebooks.
The etl.py script is the main script that reads and process all of the files in the data folder

Technologies used

package (version)
python (3.6.3)
psycopg2 (2.7.4)
pandas (0.23.3)
numpy (1.12.1)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.gitignore		.gitignore
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL using Python and PostgreSQL

Introduction

How to install

How to use

Technologies used

About

Releases

Packages

Languages

ldself/dataeng01_datamodelingwithpostgres

Folders and files

Latest commit

History

Repository files navigation

ETL using Python and PostgreSQL

Introduction

How to install

How to use

Technologies used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages