Skip to content

Commit

Permalink
Review
Browse files Browse the repository at this point in the history
  • Loading branch information
s2t2 committed Oct 2, 2024
1 parent ddc92ce commit 3ae648c
Show file tree
Hide file tree
Showing 5 changed files with 76 additions and 27 deletions.
20 changes: 14 additions & 6 deletions docs/notes/fetching-data/csv.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ execute:
# Fetching CSV Data


If the data you want to fetch is in CSV format, we can use the `pandas` package to fetch and process it.
If the data yweou want to fetch is in CSV format, we can use the `pandas` package to fetch and process it.

Let's consider this example \"students.csv\" file we have hosted on the Internet:

Expand All @@ -30,7 +30,7 @@ student_id,final_grade
10,92.5
```

First we note the URL of where the data resides. Then we pass that as a parameter to the [`read_csv` function](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) from the `pandas` package, to issue an HTTP GET request:
First we note the URL of where the data resides. Then we pass that as a parameter to the [`read_csv` function](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html){target=blank} from the `pandas` package, to issue an HTTP GET request:

```{python}
from pandas import read_csv
Expand All @@ -43,13 +43,21 @@ print(type(df))
df
```

The resulting data is a [`DataFrame` object](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) from `pandas`. We will return to working with dataframes in more detail in the future. But as some foreshadowing, if we wanted to work with the column of grades, we could access them like this:
The resulting data is a spreadsheet-like object, with rows and columns, called the [`pandas.DataFrame` datatype](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html){target=blank}.

To work with the column of grades, we can access them by specifying the name of the column, which in this case is `"final_grade"`:

```{python}
df["final_grade"]
grades_column = df["final_grade"]
print(type(grades_column))
grades_column
```

The resulting column of grades is a list-like object called the [`pandas.Series` datatype](https://pandas.pydata.org/docs/reference/api/pandas.Series.html){target=blank}.

Calculating the average grade (using series aggregation methods):

```{python}
print(df["final_grade"].mean())
print(df["final_grade"].median())
print(grades_column.mean())
print(grades_column.median())
```
28 changes: 19 additions & 9 deletions docs/notes/fetching-data/html-web-scraping.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,13 @@ execute:

# Fetching HTML Data (i.e. "Web Scraping")

If the data you want to fetch is in HTML format, like most web pages, we can use the `requests` package to fetch it, and the `beautifulsoup4` package to process it.
If the data we want to fetch is in HTML format, like most web pages are, we can use the `requests` package to fetch it, and the [`beautifulsoup4` package](https://www.crummy.com/software/BeautifulSoup/bs4/doc/){target=blank} to process it.

Before moving on to process HTML formatted data, it will be important to first review [Basic HTML](https://www.w3schools.com/html/html_basic.asp), [HTML Lists](https://www.w3schools.com/html/html_lists.asp), and [HTML Tables](https://www.w3schools.com/html/html_tables.asp).
Before moving on to process HTML formatted data, it will be important to first review HTML format, using these resources from W3 Schools:

+ [Basic HTML](https://www.w3schools.com/html/html_basic.asp){target=blank}
+ [HTML Lists](https://www.w3schools.com/html/html_lists.asp){target=blank}
+ [HTML Tables](https://www.w3schools.com/html/html_tables.asp){target=blank}


## HTML Lists
Expand Down Expand Up @@ -63,7 +67,7 @@ response = requests.get(request_url)
print(type(response))
```

Then we pass the response text (an HTML formatted string) to the `BeautifulSoup` class constructor.
Then we pass the response text (an HTML formatted string) to the [`BeautifulSoup` class](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#bs4.BeautifulSoup){target=blank} constructor.

```{python}
from bs4 import BeautifulSoup
Expand All @@ -72,29 +76,31 @@ soup = BeautifulSoup(response.text)
type(soup)
```

The soup object is able to intelligently process the data. We can invoke a `find` or `find_all` method on the soup object to find elements or tags based on their names or other attributes.
## Finding Elements

The resulting soup object is able to intelligently process the data. We can use the soup's finder methods to search for specific data elements, called "tags", based on their names or other attributes. If we want to return the first matching element, we use the `find` method, whereas if we want to get all matching elements, we use the `find_all` method.

### Finding Elements by Identifier

Since the example HTML contains an ordered list (`ol` element) with a unique identifier of \"my-fav-flavors\", we can use the following code to access it:


```{python}
# get first <ol> element that has a given identifier of "my-fav-flavors":
ul = soup.find("ol", id="my-fav-flavors")
print(type(ul))
ul
```

Getting all child `<li>` elements from that list:

```{python}
# get all child <li> elements from that list:
flavors = ul.find_all("li")
print(type(flavors))
print(len(flavors))
flavors
```


Looping through the items:

```{python}
for li in flavors:
Expand All @@ -105,6 +111,8 @@ for li in flavors:

### Finding Elements by Class

In that first example, we accessed an item based on its unique identifier, but in this example we will access a number of items by their class. In HTML, only one element can have a given `id`, but many elements can be members of the same `class`.

Since the example HTML contains an unordered list (`ul` element) of skills, where each list item shares the same class of \"skill\", we can use the following code to access the list items directly:

```{python}
Expand All @@ -115,6 +123,8 @@ print(len(skills))
skills
```

Looping through the results:

```{python}
for li in skills:
print("-----------")
Expand Down Expand Up @@ -187,14 +197,14 @@ Since the example HTML contains a `table` element with a unique identifier of \"


```{python}
# get first <table> element that has a given identifier of "products":
table = soup.find("table", id="products")
print(type(ul))
table
```

Getting all child rows (`tr` elements) from that table:

```{python}
# get all child <tr> elements from that list:
rows = table.find_all("tr")
print(type(rows))
print(len(rows))
Expand Down
20 changes: 16 additions & 4 deletions docs/notes/fetching-data/json.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ execute:

# Fetching JSON Data

If the data you want to fetch is in JSON format, we can use the [`requests` package](https://requests.readthedocs.io/en/latest/) to fetch and process it.
If the data we want to fetch is in JSON format, we can use the [`requests` package](https://requests.readthedocs.io/en/latest/){target=blank} to fetch and process it.

Let's consider this example \"students.json\" file we have hosted on the Internet:

Expand Down Expand Up @@ -61,7 +61,7 @@ print(type(response.text))
response.text
```

The final step is to convert this JSON-formatted string to actual Python data. We can do this using the `json` method of the response object, or by leveraging the `loads` function from the [`json` module](https://docs.python.org/3/library/json.html):
The final step is to convert this JSON-formatted string to actual Python data. We can do this using the `json` method of the response object, or by leveraging the `loads` function from the [`json` module](https://docs.python.org/3/library/json.html){target=blank}:


```{python}
Expand All @@ -78,17 +78,29 @@ print(type(data))
data
```

Once we have the data, we can work with it in the ways we know how. For example, looping through a list of dictionaries:
This data happens to be dictionary-like at the top level. However be aware, when we fetch JSON, it can be either list-like, or dictionary-like on the top level. So you must observe the structure of your particular data before processing it further.

```{python}
students = data["students"]
print(type(students))
len(students)
```

Looping through the items:

```{python}
for student in students:
print(student["studentId"], student["finalGrade"])
```

This data happens to be dictionary-like at the top level. However be aware, when we fetch JSON, it can be either list-like, or dictionary-like on the top level. So you must observe the structure of your particular data before processing it further.

Calculating the average grade:

```{python}
from statistics import mean, median
grades = [student["finalGrade"] for student in students]
print(mean(grades))
print(median(grades))
```
29 changes: 22 additions & 7 deletions docs/notes/fetching-data/xml.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ execute:

# Fetching XML Data

If the data you want to fetch is in XML format, including in an RSS feed, we can use the `requests` package to fetch it, and the `beautifulsoup4` package to process it.
If the data we want to fetch is in XML format, including in an RSS feed, we can use the `requests` package to fetch it, and the [`beautifulsoup4` package](https://www.crummy.com/software/BeautifulSoup/bs4/doc/){target=blank} to process it.

Let's consider this example \"students.xml\" file we have hosted on the Internet:

Expand Down Expand Up @@ -69,13 +69,13 @@ First we note the URL of where the data resides. Then we pass that as a paramete
import requests
# the URL of some XML data we stored online:
request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.xml"
request_url = f"https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.xml"
response = requests.get(request_url)
print(type(response))
```

Then we pass the response text (an HTML or XML formatted string) to the `BeautifulSoup` class constructor.
Then we pass the response text (an HTML or XML formatted string) to the [`BeautifulSoup` class](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#bs4.BeautifulSoup){target=blank} constructor.

```{python}
from bs4 import BeautifulSoup
Expand All @@ -84,30 +84,45 @@ soup = BeautifulSoup(response.text)
type(soup)
```

The soup object is able to intelligently process the data.
## Finding Elements

The resulting soup object is able to intelligently process the data. We can use the soup's finder methods to search for specific data elements, called "tags", based on their names or other attributes. If we want to return the first matching element, we use the `find` method, whereas if we want to get all matching elements, we use the `find_all` method.

We can invoke a `find` or `find_all` method on the soup object to find elements or tags based on their names or other attributes. For example, finding all the student tags in this structure:
For example, finding all the student tags in this structure:

```{python}
students = soup.find_all("student")
print(type(students))
print(len(students))
```

Examining the first item for reference:

```{python}
# examining the first item for reference:
print(type(students[0]))
students[0]
```

Looping through all the items:

```{python}
# looping through all the items:
for student in students:
print("-----------")
print(type(student))
student_id = student.studentid.text
final_grade = student.finalgrade.text
print(student_id, final_grade)
```

Calculating the average grade:

```{python}
from statistics import mean, median
grades = [float(student.finalgrade.text) for student in students]
print(mean(grades))
print(median(grades))
```
6 changes: 5 additions & 1 deletion docs/notes/python-lang/control-flow/unit-testing.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,14 @@ assert enlarge(9.0) == 900

The way the `assert` keyword works is that if expectations are met, it won't yell or raise an error. So if your test code doesn't complain, it is considered to be "passing".

```{python}
assert 2 + 2 == 4 # PASSING
```

But if the expectation is not met, it will raise an `AssertionError`.

```{python}
#assert enlarge(9) == 900000 # FAILING
#assert 2 + 2 == 5 # FAILING
#> AssertionError
```

Expand Down

0 comments on commit 3ae648c

Please sign in to comment.