Review

prof-rossetti · Oct 2, 2024 · 3ae648c · 3ae648c
1 parent ddc92ce
commit 3ae648c
Show file tree

Hide file tree

Showing 5 changed files with 76 additions and 27 deletions.
diff --git a/docs/notes/fetching-data/csv.qmd b/docs/notes/fetching-data/csv.qmd
@@ -11,7 +11,7 @@ execute:
 # Fetching CSV Data
 
 
-If the data you want to fetch is in CSV format, we can use the `pandas` package to fetch and process it.
+If the data yweou want to fetch is in CSV format, we can use the `pandas` package to fetch and process it.
 
 Let's consider this example \"students.csv\" file we have hosted on the Internet:
 
@@ -30,7 +30,7 @@ student_id,final_grade
 10,92.5
 ```
 
-First we note the URL of where the data resides. Then we pass that as a parameter to the [`read_csv` function](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) from the `pandas` package, to issue an HTTP GET request:
+First we note the URL of where the data resides. Then we pass that as a parameter to the [`read_csv` function](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html){target=blank} from the `pandas` package, to issue an HTTP GET request:
 
 ```{python}
 from pandas import read_csv
@@ -43,13 +43,21 @@ print(type(df))
 df
 ```
 
-The resulting data is a [`DataFrame` object](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) from `pandas`. We will return to working with dataframes in more detail in the future. But as some foreshadowing, if we wanted to work with the column of grades, we could access them like this:
+The resulting data is a spreadsheet-like object, with rows and columns, called the [`pandas.DataFrame` datatype](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html){target=blank}.
+
+To work with the column of grades, we can access them by specifying the name of the column, which in this case is `"final_grade"`:
 
 ```{python}
-df["final_grade"]
+grades_column = df["final_grade"]
+print(type(grades_column))
+grades_column
 ```
 
+The resulting column of grades is a list-like object called the [`pandas.Series` datatype](https://pandas.pydata.org/docs/reference/api/pandas.Series.html){target=blank}.
+
+Calculating the average grade (using series aggregation methods):
+
 ```{python}
-print(df["final_grade"].mean())
-print(df["final_grade"].median())
+print(grades_column.mean())
+print(grades_column.median())
 ```
diff --git a/docs/notes/fetching-data/html-web-scraping.qmd b/docs/notes/fetching-data/html-web-scraping.qmd
@@ -12,9 +12,13 @@ execute:
 
 # Fetching HTML Data (i.e. "Web Scraping")
 
-If the data you want to fetch is in HTML format, like most web pages, we can use the `requests` package to fetch it, and the `beautifulsoup4` package to process it.
+If the data we want to fetch is in HTML format, like most web pages are, we can use the `requests` package to fetch it, and the [`beautifulsoup4` package](https://www.crummy.com/software/BeautifulSoup/bs4/doc/){target=blank} to process it.
 
-Before moving on to process HTML formatted data, it will be important to first review [Basic HTML](https://www.w3schools.com/html/html_basic.asp), [HTML Lists](https://www.w3schools.com/html/html_lists.asp), and [HTML Tables](https://www.w3schools.com/html/html_tables.asp).
+Before moving on to process HTML formatted data, it will be important to first review HTML format, using these resources from W3 Schools:
+
+  + [Basic HTML](https://www.w3schools.com/html/html_basic.asp){target=blank}
+  + [HTML Lists](https://www.w3schools.com/html/html_lists.asp){target=blank}
+  + [HTML Tables](https://www.w3schools.com/html/html_tables.asp){target=blank}
 
 
 ## HTML Lists
@@ -63,7 +67,7 @@ response = requests.get(request_url)
 print(type(response))
 ```
 
-Then we pass the response text (an HTML formatted string) to the `BeautifulSoup` class constructor.
+Then we pass the response text (an HTML formatted string) to the [`BeautifulSoup` class](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#bs4.BeautifulSoup){target=blank} constructor.
 
 ```{python}
 from bs4 import BeautifulSoup
@@ -72,29 +76,31 @@ soup = BeautifulSoup(response.text)
 type(soup)
 ```
 
-The soup object is able to intelligently process the data. We can invoke a `find` or `find_all` method on the soup object to find elements or tags based on their names or other attributes.
+## Finding Elements
+
+The resulting soup object is able to intelligently process the data. We can use the soup's finder methods to search for specific data elements, called "tags", based on their names or other attributes. If we want to return the first matching element, we use the `find` method, whereas if we want to get all matching elements, we use the `find_all` method.
 
 ### Finding Elements by Identifier
 
 Since the example HTML contains an ordered list (`ol` element) with a unique identifier of \"my-fav-flavors\", we can use the following code to access it:
 
-
 ```{python}
-# get first <ol> element that has a given identifier of "my-fav-flavors":
 ul = soup.find("ol", id="my-fav-flavors")
 print(type(ul))
 ul
 ```
 
+Getting all child `<li>` elements from that list:
+
 ```{python}
-# get all child <li> elements from that list:
 flavors = ul.find_all("li")
 print(type(flavors))
 print(len(flavors))
 flavors
 ```
 
 
+Looping through the items:
 
 ```{python}
 for li in flavors:
@@ -105,6 +111,8 @@ for li in flavors:
 
 ### Finding Elements by Class
 
+In that first example, we accessed an item based on its unique identifier, but in this example we will access a number of items by their class. In HTML, only one element can have a given `id`, but many elements can be members of the same `class`.
+
 Since the example HTML contains an unordered list (`ul` element) of skills, where each list item shares the same class of \"skill\", we can use the following code to access the list items directly:
 
 ```{python}
@@ -115,6 +123,8 @@ print(len(skills))
 skills
 ```
 
+Looping through the results:
+
 ```{python}
 for li in skills:
     print("-----------")
@@ -187,14 +197,14 @@ Since the example HTML contains a `table` element with a unique identifier of \"
 
 
 ```{python}
-# get first <table> element that has a given identifier of "products":
 table = soup.find("table", id="products")
 print(type(ul))
 table
 ```
 
+Getting all child rows (`tr` elements) from that table:
+
 ```{python}
-# get all child <tr> elements from that list:
 rows = table.find_all("tr")
 print(type(rows))
 print(len(rows))

diff --git a/docs/notes/fetching-data/json.qmd b/docs/notes/fetching-data/json.qmd
@@ -10,7 +10,7 @@ execute:
 
 # Fetching JSON Data
 
-If the data you want to fetch is in JSON format, we can use the [`requests` package](https://requests.readthedocs.io/en/latest/) to fetch and process it.
+If the data we want to fetch is in JSON format, we can use the [`requests` package](https://requests.readthedocs.io/en/latest/){target=blank} to fetch and process it.
 
 Let's consider this example \"students.json\" file we have hosted on the Internet:
 
@@ -61,7 +61,7 @@ print(type(response.text))
 response.text
 ```
 
-The final step is to convert this JSON-formatted string to actual Python data. We can do this using the `json` method of the response object, or by leveraging the `loads` function from the [`json` module](https://docs.python.org/3/library/json.html):
+The final step is to convert this JSON-formatted string to actual Python data. We can do this using the `json` method of the response object, or by leveraging the `loads` function from the [`json` module](https://docs.python.org/3/library/json.html){target=blank}:
 
 
 ```{python}
@@ -78,17 +78,29 @@ print(type(data))
 data
 ```
 
-Once we have the data, we can work with it in the ways we know how. For example, looping through a list of dictionaries:
+This data happens to be dictionary-like at the top level. However be aware, when we fetch JSON, it can be either list-like, or dictionary-like on the top level. So you must observe the structure of your particular data before processing it further.
 
 ```{python}
 students = data["students"]
 print(type(students))
 len(students)
 ```
 
+Looping through the items:
+
 ```{python}
 for student in students:
     print(student["studentId"], student["finalGrade"])
 ```
 
-This data happens to be dictionary-like at the top level. However be aware, when we fetch JSON, it can be either list-like, or dictionary-like on the top level. So you must observe the structure of your particular data before processing it further.
+
+Calculating the average grade:
+
+```{python}
+from statistics import mean, median
+
+grades = [student["finalGrade"] for student in students]
+
+print(mean(grades))
+print(median(grades))
+```
diff --git a/docs/notes/fetching-data/xml.qmd b/docs/notes/fetching-data/xml.qmd
@@ -10,7 +10,7 @@ execute:
 
 # Fetching XML Data
 
-If the data you want to fetch is in XML format, including in an RSS feed, we can use the `requests` package to fetch it, and the `beautifulsoup4` package to process it.
+If the data we want to fetch is in XML format, including in an RSS feed, we can use the `requests` package to fetch it, and the [`beautifulsoup4` package](https://www.crummy.com/software/BeautifulSoup/bs4/doc/){target=blank} to process it.
 
 Let's consider this example \"students.xml\" file we have hosted on the Internet:
 
@@ -69,13 +69,13 @@ First we note the URL of where the data resides. Then we pass that as a paramete
 import requests
 
 # the URL of some XML data we stored online:
-request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.xml"
+request_url = f"https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.xml"
 
 response = requests.get(request_url)
 print(type(response))
 ```
 
-Then we pass the response text (an HTML or XML formatted string) to the `BeautifulSoup` class constructor.
+Then we pass the response text (an HTML or XML formatted string) to the [`BeautifulSoup` class](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#bs4.BeautifulSoup){target=blank} constructor.
 
 ```{python}
 from bs4 import BeautifulSoup
@@ -84,30 +84,45 @@ soup = BeautifulSoup(response.text)
 type(soup)
 ```
 
-The soup object is able to intelligently process the data.
+## Finding Elements
 
+The resulting soup object is able to intelligently process the data. We can use the soup's finder methods to search for specific data elements, called "tags", based on their names or other attributes. If we want to return the first matching element, we use the `find` method, whereas if we want to get all matching elements, we use the `find_all` method.
 
-We can invoke a `find` or `find_all` method on the soup object to find elements or tags based on their names or other attributes. For example, finding all the student tags in this structure:
+For example, finding all the student tags in this structure:
 
 ```{python}
 students = soup.find_all("student")
 print(type(students))
 print(len(students))
 ```
 
+Examining the first item for reference:
 
 ```{python}
-# examining the first item for reference:
 print(type(students[0]))
 students[0]
 ```
 
+Looping through all the items:
+
 ```{python}
-# looping through all the items:
 for student in students:
     print("-----------")
     print(type(student))
+
     student_id = student.studentid.text
     final_grade = student.finalgrade.text
     print(student_id, final_grade)
+
+```
+
+Calculating the average grade:
+
+```{python}
+from statistics import mean, median
+
+grades = [float(student.finalgrade.text) for student in students]
+
+print(mean(grades))
+print(median(grades))
 ```
diff --git a/docs/notes/python-lang/control-flow/unit-testing.qmd b/docs/notes/python-lang/control-flow/unit-testing.qmd
@@ -52,10 +52,14 @@ assert enlarge(9.0) == 900
 
 The way the `assert` keyword works is that if expectations are met, it won't yell or raise an error. So if your test code doesn't complain, it is considered to be "passing".
 
+```{python}
+assert 2 + 2 == 4 # PASSING
+```
+
 But if the expectation is not met, it will raise an `AssertionError`.
 
 ```{python}
-#assert enlarge(9) == 900000 # FAILING
+#assert 2 + 2 == 5 # FAILING
 #> AssertionError
 ```