Skip to content

Commit

Permalink
Merge branch 'main' into a3-9-to-5
Browse files Browse the repository at this point in the history
  • Loading branch information
gildedgardenia committed May 29, 2024
2 parents dcb4ac1 + 20ddebf commit c93374d
Show file tree
Hide file tree
Showing 29 changed files with 784 additions and 8 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/hugo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ jobs:
- name: Install Dart Sass Embedded
run: sudo snap install dart-sass-embedded
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
submodules: recursive
- name: Setup Pages
id: pages
uses: actions/configure-pages@v3
uses: actions/configure-pages@v4
- name: Install Node.js dependencies
run: "[[ -f package-lock.json || -f npm-shrinkwrap.json ]] && npm ci || true"
- name: Build with Hugo
Expand All @@ -57,7 +57,7 @@ jobs:
--minify \
--baseURL "https://education.launchcode.org/data-analysis-curriculum"
- name: Upload artifact
uses: actions/upload-pages-artifact@v2
uses: actions/upload-pages-artifact@v3
with:
path: ./public

Expand All @@ -71,4 +71,4 @@ jobs:
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v2
uses: actions/deploy-pages@v4
33 changes: 33 additions & 0 deletions content/assignments/assignment4/checkpoint-4/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
+++
title = "Checkpoint 4: Tableau Story"
date = 2023-05-25T12:55:09-05:00
draft = false
weight = 4
+++

## Before You Start

You want to first check to see if you have received any feedback from Checkpoints 2 and 3. This
feedback could influence the direction of your work on Checkpoint 4. If you want to change anything
about what you have done so far in earlier checkpoints, you do not have to re-submit any previous
checkpoints unless your IA or instructor requests you do so. You can simply add any updated work and notes to
the current checkpoint.

## Getting Started

For this checkpoint, you will need to manipulate your data and produce a Tableau story that shows off skills from class, such as filtering and table calculations. You may find yourself wanting to use pandas and Jupyter notebooks for data manipulation. If you do, make sure to add code comments explaining your thought process and push your work up to Github.

No matter what visualizations you add to your Tableau story, all of your captions should include explanations as to your thought process for each visualization. The first caption should include a link to your dataset and the final story point should include links to any supporting materials, such as the Github repository if you used a Jupyter notebook for this checkpoint.

## Examples

{{% notice blue Note "rocket" %}}
Checkpoint 4 examples can be found here: [Checkpoint 4 Examples](https://github.com/LaunchCodeEducation/finalProjectDAExamples/tree/main/Checkpoint%204).
{{% /notice %}}

## Submitting Your Work

When finished paste the link to your Tableau story into the submission box in Canvas for Graded
Assignment #4: Checkpoint 4 and click *Submit*.

[Back to Final Project Overview]({{% relref "./../" %}})
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ import pandas as pd
# Create a pandas DataFrame by providing a list of lists
movie_list_of_lists = pd.DataFrame([["Interstellar", "Pride and Prejudice", "Inception", "Barbie"],["Marley & Me", "Two Weeks Notice", "The Guardian", "Bridesmaids"]])

# Create a pandas Series from a pre-existing list of lists
# Create a pandas DataFrame from a pre-existing list of lists
movies_dataframe_data = [["Interstellar", "Pride and Prejudice", "Inception", "Barbie"],["Marley & Me", "The Proposal", "The Guardian", "Bridesmaids"]]

dataframe_from_existing_list = pd.DataFrame(movies_dataframe_data)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ The pandas library is incredibly powerful and was built specifically for data an
We will use pandas to create, manipulate, and view data structures based on certain conditions. We will also cover some of the most common functions used when exploring data with pandas that we can use to our advantage during the exploration process.

{{% notice blue Note "rocket" %}}
To install pandas, you will need to run the following command:

```python
pip install pandas
```

If the above command does not work, you may need to specify `pip3` in the command.

Once pandas is installed, it can be imported into your workspace in the following way:

```python
Expand Down
14 changes: 14 additions & 0 deletions content/how-programs-work/reading/data-analysis-projects/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
+++
title = "GitHub Repository: Data Analysis Projects"
date = 2024-01-08T10:24:35-06:00
draft = false
weight = 7
+++

Throughout this course, you will be utilizing a GitHub repository that holds starter code for exercises, studios, and in some lessons, example code from readings.

Fork the following repository and clone it to your machine: [LaunchCode Education: Data Analysis Projects](https://github.com/launchcodeeducation/data-analysis-projects)

{{% notice blue Note "rocket" %}}
You will begin using the `data-analysis-projects` repository beginning in the next chapter. Do not move on to the next page without it!
{{% /notice %}}
26 changes: 26 additions & 0 deletions content/python-pandas-databases/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
+++
pre = "<b>23. </b>"
chapter = true
title = "Databases with Python and pandas"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 23
+++

## Learning Objectives
Upon completing all the content in this chapter, you should be able to do the following:
1. Establish a connection to a sqlite3 database using python.
1. Create a cursor object to interact with the database.
1. Create a pandas DataFrame using data from a sqlite3 database.
1. Add data from a pandas DataFrame into a slite3 database.

## Key Terminology

### Databases with Python
1. sqlite3
1. Cursor object
1. parameterized queries

## Content Links

{{% children %}}
16 changes: 16 additions & 0 deletions content/python-pandas-databases/exercises/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
+++
title = "Exercises: Working with Databases in Python"
date = 2021-10-01T09:28:27-05:00
draft = false
weight = 2
+++

## Getting Started

Open up `data-analysis-projects/databases-python-pandas/studio/databases-and-py.ipynb` file inside of Jupyter Notebook and begin working through the exercises!

## Submitting Your Work

When finished make sure to push your changes up to GitHub.

Copy the link to your GitHub repository and paste it into the submission box in Canvas for **Exercises: Working with Databases in Python** and click *Submit*.
12 changes: 12 additions & 0 deletions content/python-pandas-databases/next-steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
+++
title = "Next Steps"
date = 2021-10-01T09:28:27-05:00
draft = false
weight = 4
+++

You are now ready to dive into the first major visualization tool that we will use called [Tableau](https://www.tableau.com/). If you would like to further explore content related to interacting with databases using python and pandas you can find some of our favorite resources below:

1. [GeeksforGeeks: Working with Databases using pandas](https://www.geeksforgeeks.org/working-with-database-using-pandas/)
1. [Tutorialspoint: Python - Databases and SQL](https://www.tutorialspoint.com/python_network_programming/python_databases_and_sql.htm)
1. [DigitalOcean: How To Use the sqlite3 Module in Python3](https://www.digitalocean.com/community/tutorials/how-to-use-the-sqlite3-module-in-python-3)
10 changes: 10 additions & 0 deletions content/python-pandas-databases/reading/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
+++
title = "Reading"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 1
+++

## Reading Content

{{% children %}}
76 changes: 76 additions & 0 deletions content/python-pandas-databases/reading/pandas-databases/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
+++
title = "Databases with pandas"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 2
+++

In addition to all the great things pandas is capable of, the library also makes it possible to inject data stored elsewhere into a pandas DataFrame or Series. This lesson will walk through the process of creating a pandas DataFrame from an existing table within a SQLite datastore.

This lesson will also utilize `sqlite3` as the database used to demonstrate how to interact with a database using a separate tool or library (pandas). Since we have already covered how to manipulate data with pandas in previous lessons, we will instead focus on the following:
1. Reading data from the database
1. Storing the data inside of a pandas DataFrame
1. Creating a new table inside of the database
- Adding the DataFrame data into the new table

{{% notice blue Note "rocket" %}}
The following examples can be found within the `data-analysis-projects/databases-python-pandas/pandas-db-walkthrough.ipynb` file.
{{% /notice %}}

## Create a DataFrame

{{% notice blue Example "rocket" %}}
```python
import sqlite3
import pandas as pd

# Create SQLite connection to Movies.db file
movies_db = sqlite3.connect('Movies.db')

# Use the pandas read_sql_query function to return a pandas DataFrame
df = pd.read_sql_query('Select * from movies;', movies_db)

# Use .head() function to return first five rows (there are only 5 rows currently)
df.head()
```
{{% /notice %}}

{{% notice blue Note "rocket" %}}
The `read_sql_query` pandas function in the above example is used to read queries into a DataFrame. You can find it's documentation here: [pandas.read_sql_query API reference](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html)
{{% /notice %}}

## Create New Table from DataFrame

After exploring, cleaning, or manipulating data with pandas, you can add that data back into your database. In the scenario below we will add a new movie to an existing DataFrame and then store the DataFrame inside of a new table within the SQLite database.

{{% notice blue Example "rocket" %}}
We will first start by adding a row to our existing DataFrame:

```python
new_movie = pd.DataFrame([{'title':'Dune', 'genre':'Science Fiction', 'release':2021, 'rt_score': 83}])
df = pd.concat([df, new_movie], ignore_index=True)
```

It was not necessary to update our DataFrame to add a new table to the database, but it will help visually when reading data to show that it was populated into a new table correctly.

```python {linenos=table}
# Inject dataframe into database as new table, if the table exists - replace it
df.to_sql('df', movies_db, if_exists="replace")
# Execute command to create a new table called new_movie_table with the new_movie dataframe data
movies_db.execute(
"""
create table new_movie_table as
select * from new_movie
"""
)
```

The pandas `DataFrame.to_sql` function documentation in the above code block can be found here: [pandas.DataFrame.to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)

```python
# Read data from newly created table, passing in existing movies_db connection as parameter
new_movies_df = pd.read_sql_query('Select * from new_movie_table;', movies_db)
# Read first 6 rows
new_movies_df.head(6)
```
{{% /notice %}}
136 changes: 136 additions & 0 deletions content/python-pandas-databases/reading/python-databases/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
+++
title = "Databases with Python"
date = 2024-04-17T10:00:24-05:00
draft = false
weight = 1
+++

Working within a database command line interface can oftentimes be cumbersome and difficult to execute multiple commands. Because of this, analysts oftentimes prefer to use a user-friendly graphical user interface (GUI) or leverage programming languages like Python and supported libraries to interact with them. We will use Python and the **sqlite3** library to complete the following:
1. Create a new SQLite database
1. Add a table
1. Perform CRUD operations on the table

While you can accomplish more than just the above using python and pandas, like performing joins, it is not always best practice. As it relates to joins, database engines are built and optimized to perform joins extremely well. It is always important to know what you will be doing with your data before acting.

{{% notice blue Note "rocket" %}}
The examples below are also available in the `data-analysis-projects/databases-python-pandas/python-db-walkthrough.ipynb` file.
{{% /notice %}}

## sqlite3 with Python

[sqlite3](https://docs.python.org/3/library/sqlite3.html) works in conjunction with python by allowing the user to establish a connection to a file located on your machine. You can then reference the connection variable to begin executing sql commands.

The basic syntax is as follows:

```python
import sqlite3

# If the 'Movies.db` database does not already exist, sqlite3 will create one!
movies_db = sqlite3.connect('Movies.db') # connect to database
```

{{% notice blue Note "rocket" %}}
If we were to print the `connection_variable` we would see the following output:

```python
<sqlite3.Connection object at 0x7334db1d3940> # the 0x7334db1d3940 portion will vary
```

This shows that a `sqlite3.Connection` object was created and can now be referenced using the `movies_db` variable.
{{% /notice %}}

## Cursor Objects

Now that we have established a connection to the database we need a way to execute commands. The **cursor object** is a database cursor which allows us to do so.

We can create a new cursor object by referencing the cursor function and storing it within a variable:

```python
# variable named "cur" that references the connection object:
cur = movies_db.cursor()
```

The basic syntax for executing a command with the cursor object is as follows:

```python
cur.execute("SQL statement")
```

## Creating a table

```python
cur.execute("CREATE TABLE table_name (column DATA TYPE, column DATA TYPE, etc..)")
```

{{% notice blue Note "rocket" %}}
You can find a list of SQLite data types here: [Data Types in SQLite](https://sqlite.org/datatype3.html).
{{% /notice %}}

### Insert Table Values

```python
cur.execute("INSERT INTO table_name ('value-one', 'value-two', etc..)")
```

### Reading Data

There are a couple strategies that you can use to read data from your database. Since the cursor object is an iterator in and of itself, you can iterate over the cursor object to fetch data.

{{% notice blue Example "rocket" %}}

```python
# For loop to iterate over cursor object
for row in cur.execute("SELECT column FROM table_name")
print(row)
```

The above for loop will return all rows within the specified column inside of the `SELECT` statement. You could also pass the `*` flag to return all values from all rows within the database.
{{% /notice %}}

You can also use the `fetchall()` function to read data from the database like so:

```python
cur.execute("SELECT * FROM table_name").fetchall()
```

### Updating Data

When running dynamic queries against a database there are some risks to be made aware of, specifically SQL injection attacks or SQLi attacks. While we have multiple strategies to avoid SQLi attacks, the one we will focus on in this class is using **parameterized queries**.

Parameterized queries allow you to inject a placeholder (`?`) into your SQL statement and pass in the desired value as a parameter.

{{% notice blue Example "rocket" %}}
```python
# Desired value
update_release_year = 1997 # Value that needs to be updated
movie_to_update = 'Good Will Hunting'
# Execute an UPDATE statement using the ? placeholder, passing in the update variables as a list literal
cur.execute("UPDATE movies SET release = ? WHERE title = ?", [update_release_year, movie_to_update])
```
{{% /notice %}}

### Deleting Data

Similar to updating data we will want to use parameterized queries as best and safe practice!

{{% notice blue Example "rocket" %}}
```python
movie_to_delete = 'Inception' # Too many sci fi movies!
# Execute a DELETE statement using the ? placeholder, passing in the variable as a list literal
cur.execute("DELETE FROM movies WHERE title = ?", [movie_to_delete])
```
{{% /notice %}}

## Check Your Understanding

{{% notice green Question "rocket" %}}
What type of database is SQLite?

<!-- Solution: disk-based database, does not require its own server. Stored isnide of a file on your machine -->
{{% /notice %}}

{{% notice green Question "rocket" %}}
What is the primary reason for creating a cursor object?

<!-- Solution: Executing commands inside of the datastore -->
{{% /notice %}}
Loading

0 comments on commit c93374d

Please sign in to comment.