-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
29 changed files
with
784 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
+++ | ||
title = "Checkpoint 4: Tableau Story" | ||
date = 2023-05-25T12:55:09-05:00 | ||
draft = false | ||
weight = 4 | ||
+++ | ||
|
||
## Before You Start | ||
|
||
You want to first check to see if you have received any feedback from Checkpoints 2 and 3. This | ||
feedback could influence the direction of your work on Checkpoint 4. If you want to change anything | ||
about what you have done so far in earlier checkpoints, you do not have to re-submit any previous | ||
checkpoints unless your IA or instructor requests you do so. You can simply add any updated work and notes to | ||
the current checkpoint. | ||
|
||
## Getting Started | ||
|
||
For this checkpoint, you will need to manipulate your data and produce a Tableau story that shows off skills from class, such as filtering and table calculations. You may find yourself wanting to use pandas and Jupyter notebooks for data manipulation. If you do, make sure to add code comments explaining your thought process and push your work up to Github. | ||
|
||
No matter what visualizations you add to your Tableau story, all of your captions should include explanations as to your thought process for each visualization. The first caption should include a link to your dataset and the final story point should include links to any supporting materials, such as the Github repository if you used a Jupyter notebook for this checkpoint. | ||
|
||
## Examples | ||
|
||
{{% notice blue Note "rocket" %}} | ||
Checkpoint 4 examples can be found here: [Checkpoint 4 Examples](https://github.com/LaunchCodeEducation/finalProjectDAExamples/tree/main/Checkpoint%204). | ||
{{% /notice %}} | ||
|
||
## Submitting Your Work | ||
|
||
When finished paste the link to your Tableau story into the submission box in Canvas for Graded | ||
Assignment #4: Checkpoint 4 and click *Submit*. | ||
|
||
[Back to Final Project Overview]({{% relref "./../" %}}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
content/how-programs-work/reading/data-analysis-projects/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
+++ | ||
title = "GitHub Repository: Data Analysis Projects" | ||
date = 2024-01-08T10:24:35-06:00 | ||
draft = false | ||
weight = 7 | ||
+++ | ||
|
||
Throughout this course, you will be utilizing a GitHub repository that holds starter code for exercises, studios, and in some lessons, example code from readings. | ||
|
||
Fork the following repository and clone it to your machine: [LaunchCode Education: Data Analysis Projects](https://github.com/launchcodeeducation/data-analysis-projects) | ||
|
||
{{% notice blue Note "rocket" %}} | ||
You will begin using the `data-analysis-projects` repository beginning in the next chapter. Do not move on to the next page without it! | ||
{{% /notice %}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
+++ | ||
pre = "<b>23. </b>" | ||
chapter = true | ||
title = "Databases with Python and pandas" | ||
date = 2024-04-17T10:00:24-05:00 | ||
draft = false | ||
weight = 23 | ||
+++ | ||
|
||
## Learning Objectives | ||
Upon completing all the content in this chapter, you should be able to do the following: | ||
1. Establish a connection to a sqlite3 database using python. | ||
1. Create a cursor object to interact with the database. | ||
1. Create a pandas DataFrame using data from a sqlite3 database. | ||
1. Add data from a pandas DataFrame into a slite3 database. | ||
|
||
## Key Terminology | ||
|
||
### Databases with Python | ||
1. sqlite3 | ||
1. Cursor object | ||
1. parameterized queries | ||
|
||
## Content Links | ||
|
||
{{% children %}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
+++ | ||
title = "Exercises: Working with Databases in Python" | ||
date = 2021-10-01T09:28:27-05:00 | ||
draft = false | ||
weight = 2 | ||
+++ | ||
|
||
## Getting Started | ||
|
||
Open up `data-analysis-projects/databases-python-pandas/studio/databases-and-py.ipynb` file inside of Jupyter Notebook and begin working through the exercises! | ||
|
||
## Submitting Your Work | ||
|
||
When finished make sure to push your changes up to GitHub. | ||
|
||
Copy the link to your GitHub repository and paste it into the submission box in Canvas for **Exercises: Working with Databases in Python** and click *Submit*. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
+++ | ||
title = "Next Steps" | ||
date = 2021-10-01T09:28:27-05:00 | ||
draft = false | ||
weight = 4 | ||
+++ | ||
|
||
You are now ready to dive into the first major visualization tool that we will use called [Tableau](https://www.tableau.com/). If you would like to further explore content related to interacting with databases using python and pandas you can find some of our favorite resources below: | ||
|
||
1. [GeeksforGeeks: Working with Databases using pandas](https://www.geeksforgeeks.org/working-with-database-using-pandas/) | ||
1. [Tutorialspoint: Python - Databases and SQL](https://www.tutorialspoint.com/python_network_programming/python_databases_and_sql.htm) | ||
1. [DigitalOcean: How To Use the sqlite3 Module in Python3](https://www.digitalocean.com/community/tutorials/how-to-use-the-sqlite3-module-in-python-3) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
+++ | ||
title = "Reading" | ||
date = 2024-04-17T10:00:24-05:00 | ||
draft = false | ||
weight = 1 | ||
+++ | ||
|
||
## Reading Content | ||
|
||
{{% children %}} |
76 changes: 76 additions & 0 deletions
76
content/python-pandas-databases/reading/pandas-databases/_index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
+++ | ||
title = "Databases with pandas" | ||
date = 2024-04-17T10:00:24-05:00 | ||
draft = false | ||
weight = 2 | ||
+++ | ||
|
||
In addition to all the great things pandas is capable of, the library also makes it possible to inject data stored elsewhere into a pandas DataFrame or Series. This lesson will walk through the process of creating a pandas DataFrame from an existing table within a SQLite datastore. | ||
|
||
This lesson will also utilize `sqlite3` as the database used to demonstrate how to interact with a database using a separate tool or library (pandas). Since we have already covered how to manipulate data with pandas in previous lessons, we will instead focus on the following: | ||
1. Reading data from the database | ||
1. Storing the data inside of a pandas DataFrame | ||
1. Creating a new table inside of the database | ||
- Adding the DataFrame data into the new table | ||
|
||
{{% notice blue Note "rocket" %}} | ||
The following examples can be found within the `data-analysis-projects/databases-python-pandas/pandas-db-walkthrough.ipynb` file. | ||
{{% /notice %}} | ||
|
||
## Create a DataFrame | ||
|
||
{{% notice blue Example "rocket" %}} | ||
```python | ||
import sqlite3 | ||
import pandas as pd | ||
|
||
# Create SQLite connection to Movies.db file | ||
movies_db = sqlite3.connect('Movies.db') | ||
|
||
# Use the pandas read_sql_query function to return a pandas DataFrame | ||
df = pd.read_sql_query('Select * from movies;', movies_db) | ||
|
||
# Use .head() function to return first five rows (there are only 5 rows currently) | ||
df.head() | ||
``` | ||
{{% /notice %}} | ||
|
||
{{% notice blue Note "rocket" %}} | ||
The `read_sql_query` pandas function in the above example is used to read queries into a DataFrame. You can find it's documentation here: [pandas.read_sql_query API reference](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html) | ||
{{% /notice %}} | ||
|
||
## Create New Table from DataFrame | ||
|
||
After exploring, cleaning, or manipulating data with pandas, you can add that data back into your database. In the scenario below we will add a new movie to an existing DataFrame and then store the DataFrame inside of a new table within the SQLite database. | ||
|
||
{{% notice blue Example "rocket" %}} | ||
We will first start by adding a row to our existing DataFrame: | ||
|
||
```python | ||
new_movie = pd.DataFrame([{'title':'Dune', 'genre':'Science Fiction', 'release':2021, 'rt_score': 83}]) | ||
df = pd.concat([df, new_movie], ignore_index=True) | ||
``` | ||
|
||
It was not necessary to update our DataFrame to add a new table to the database, but it will help visually when reading data to show that it was populated into a new table correctly. | ||
|
||
```python {linenos=table} | ||
# Inject dataframe into database as new table, if the table exists - replace it | ||
df.to_sql('df', movies_db, if_exists="replace") | ||
# Execute command to create a new table called new_movie_table with the new_movie dataframe data | ||
movies_db.execute( | ||
""" | ||
create table new_movie_table as | ||
select * from new_movie | ||
""" | ||
) | ||
``` | ||
|
||
The pandas `DataFrame.to_sql` function documentation in the above code block can be found here: [pandas.DataFrame.to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) | ||
|
||
```python | ||
# Read data from newly created table, passing in existing movies_db connection as parameter | ||
new_movies_df = pd.read_sql_query('Select * from new_movie_table;', movies_db) | ||
# Read first 6 rows | ||
new_movies_df.head(6) | ||
``` | ||
{{% /notice %}} |
136 changes: 136 additions & 0 deletions
136
content/python-pandas-databases/reading/python-databases/_index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
+++ | ||
title = "Databases with Python" | ||
date = 2024-04-17T10:00:24-05:00 | ||
draft = false | ||
weight = 1 | ||
+++ | ||
|
||
Working within a database command line interface can oftentimes be cumbersome and difficult to execute multiple commands. Because of this, analysts oftentimes prefer to use a user-friendly graphical user interface (GUI) or leverage programming languages like Python and supported libraries to interact with them. We will use Python and the **sqlite3** library to complete the following: | ||
1. Create a new SQLite database | ||
1. Add a table | ||
1. Perform CRUD operations on the table | ||
|
||
While you can accomplish more than just the above using python and pandas, like performing joins, it is not always best practice. As it relates to joins, database engines are built and optimized to perform joins extremely well. It is always important to know what you will be doing with your data before acting. | ||
|
||
{{% notice blue Note "rocket" %}} | ||
The examples below are also available in the `data-analysis-projects/databases-python-pandas/python-db-walkthrough.ipynb` file. | ||
{{% /notice %}} | ||
|
||
## sqlite3 with Python | ||
|
||
[sqlite3](https://docs.python.org/3/library/sqlite3.html) works in conjunction with python by allowing the user to establish a connection to a file located on your machine. You can then reference the connection variable to begin executing sql commands. | ||
|
||
The basic syntax is as follows: | ||
|
||
```python | ||
import sqlite3 | ||
|
||
# If the 'Movies.db` database does not already exist, sqlite3 will create one! | ||
movies_db = sqlite3.connect('Movies.db') # connect to database | ||
``` | ||
|
||
{{% notice blue Note "rocket" %}} | ||
If we were to print the `connection_variable` we would see the following output: | ||
|
||
```python | ||
<sqlite3.Connection object at 0x7334db1d3940> # the 0x7334db1d3940 portion will vary | ||
``` | ||
|
||
This shows that a `sqlite3.Connection` object was created and can now be referenced using the `movies_db` variable. | ||
{{% /notice %}} | ||
|
||
## Cursor Objects | ||
|
||
Now that we have established a connection to the database we need a way to execute commands. The **cursor object** is a database cursor which allows us to do so. | ||
|
||
We can create a new cursor object by referencing the cursor function and storing it within a variable: | ||
|
||
```python | ||
# variable named "cur" that references the connection object: | ||
cur = movies_db.cursor() | ||
``` | ||
|
||
The basic syntax for executing a command with the cursor object is as follows: | ||
|
||
```python | ||
cur.execute("SQL statement") | ||
``` | ||
|
||
## Creating a table | ||
|
||
```python | ||
cur.execute("CREATE TABLE table_name (column DATA TYPE, column DATA TYPE, etc..)") | ||
``` | ||
|
||
{{% notice blue Note "rocket" %}} | ||
You can find a list of SQLite data types here: [Data Types in SQLite](https://sqlite.org/datatype3.html). | ||
{{% /notice %}} | ||
|
||
### Insert Table Values | ||
|
||
```python | ||
cur.execute("INSERT INTO table_name ('value-one', 'value-two', etc..)") | ||
``` | ||
|
||
### Reading Data | ||
|
||
There are a couple strategies that you can use to read data from your database. Since the cursor object is an iterator in and of itself, you can iterate over the cursor object to fetch data. | ||
|
||
{{% notice blue Example "rocket" %}} | ||
|
||
```python | ||
# For loop to iterate over cursor object | ||
for row in cur.execute("SELECT column FROM table_name") | ||
print(row) | ||
``` | ||
|
||
The above for loop will return all rows within the specified column inside of the `SELECT` statement. You could also pass the `*` flag to return all values from all rows within the database. | ||
{{% /notice %}} | ||
|
||
You can also use the `fetchall()` function to read data from the database like so: | ||
|
||
```python | ||
cur.execute("SELECT * FROM table_name").fetchall() | ||
``` | ||
|
||
### Updating Data | ||
|
||
When running dynamic queries against a database there are some risks to be made aware of, specifically SQL injection attacks or SQLi attacks. While we have multiple strategies to avoid SQLi attacks, the one we will focus on in this class is using **parameterized queries**. | ||
|
||
Parameterized queries allow you to inject a placeholder (`?`) into your SQL statement and pass in the desired value as a parameter. | ||
|
||
{{% notice blue Example "rocket" %}} | ||
```python | ||
# Desired value | ||
update_release_year = 1997 # Value that needs to be updated | ||
movie_to_update = 'Good Will Hunting' | ||
# Execute an UPDATE statement using the ? placeholder, passing in the update variables as a list literal | ||
cur.execute("UPDATE movies SET release = ? WHERE title = ?", [update_release_year, movie_to_update]) | ||
``` | ||
{{% /notice %}} | ||
|
||
### Deleting Data | ||
|
||
Similar to updating data we will want to use parameterized queries as best and safe practice! | ||
|
||
{{% notice blue Example "rocket" %}} | ||
```python | ||
movie_to_delete = 'Inception' # Too many sci fi movies! | ||
# Execute a DELETE statement using the ? placeholder, passing in the variable as a list literal | ||
cur.execute("DELETE FROM movies WHERE title = ?", [movie_to_delete]) | ||
``` | ||
{{% /notice %}} | ||
|
||
## Check Your Understanding | ||
|
||
{{% notice green Question "rocket" %}} | ||
What type of database is SQLite? | ||
|
||
<!-- Solution: disk-based database, does not require its own server. Stored isnide of a file on your machine --> | ||
{{% /notice %}} | ||
|
||
{{% notice green Question "rocket" %}} | ||
What is the primary reason for creating a cursor object? | ||
|
||
<!-- Solution: Executing commands inside of the datastore --> | ||
{{% /notice %}} |
Oops, something went wrong.