Financial dataviz

prof-rossetti · Dec 27, 2024 · a346881 · a346881
1 parent b99e31b
commit a346881
Show file tree

Hide file tree

Showing 5 changed files with 161 additions and 11 deletions.
diff --git a/docs/_quarto.yml b/docs/_quarto.yml
@@ -141,12 +141,12 @@ book:
       chapters:
         - href: notes/dataviz/overview.qmd
           text: "Data Visualization Overview"
-        #- href: notes/dataviz/trendlines.qmd
-        #  text: "Charts with Trendlines"
+        - href: notes/dataviz/trendlines.qmd
+          text: "Charts with Trendlines"
         #- href: notes/dataviz/multiple-objects.qmd
         #  text: "Charts with Multiple Objects"
-        #- href: notes/dataviz/candlestick-charts.qmd
-        #  text: "Candlestick Charts"
+        - href: notes/dataviz/candlesticks.qmd
+          text: "Candlestick Charts"
 
     - part: "Fetching Data from the Internet"
       chapters:

diff --git a/docs/notes/dataviz/candlesticks.qmd b/docs/notes/dataviz/candlesticks.qmd
@@ -0,0 +1,58 @@
+---
+format:
+  html:
+    code-fold: false
+jupyter: python3
+execute:
+  cache: true # re-render only when source changes
+---
+
+# Candlestick Charts with `plotly`
+
+In financial applications, we often have access to OHLC data (containing the open, high, low, and close price on each day). We can use a candlestick chart can help us see the movement of the price within each day.
+
+
+To implement a [candlestick chart](https://plotly.com/python/candlestick-charts/), we can use the [`Candlestick` class](https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Candlestick.html) from plotly's Graph Objects sub-library.
+
+We start with some OHLC data:
+
+```{python}
+ohlc_data = [
+    {"date": "2030-03-16", "open": 236.2800, "high": 240.0550, "low": 235.9400, "close": 237.7100, "volume": 28092196},
+    {"date": "2030-03-15", "open": 234.9600, "high": 235.1850, "low": 231.8100, "close": 234.8100, "volume": 26042669},
+    {"date": "2030-03-12", "open": 234.0100, "high": 235.8200, "low": 233.2300, "close": 235.7500, "volume": 22653662},
+    {"date": "2030-03-11", "open": 234.9600, "high": 239.1700, "low": 234.3100, "close": 237.1300, "volume": 29907586},
+    {"date": "2030-03-10", "open": 237.0000, "high": 237.0000, "low": 232.0400, "close": 232.4200, "volume": 29746812}
+]
+```
+
+Mapping the data to get into a format the chart likes (separate lists):
+
+```{python}
+dates = []
+opens = []
+highs = []
+lows = []
+closes = []
+
+for item in ohlc_data:
+    dates.append(item["date"])
+    opens.append(item["open"])
+    highs.append(item["high"])
+    lows.append(item["low"])
+    closes.append(item["close"])
+
+```
+
+Finally, creating the chart:
+
+
+```{python}
+from plotly.graph_objects import Figure, Candlestick
+
+stick = Candlestick(x=dates, open=opens, high=highs, low=lows, close=closes)
+
+fig = Figure(data=[stick])
+fig.update_layout(title="Example Candlestick Chart")
+fig.show()
+```
diff --git a/docs/notes/dataviz/overview.qmd b/docs/notes/dataviz/overview.qmd
@@ -149,13 +149,13 @@ Starting with some example data:
 ```{python}
 scatter_data = [
     {"income": 30_000, "life_expectancy": 65.5},
-    {"income": 30_000, "life_expectancy": 62.1},
+    {"income": 35_000, "life_expectancy": 62.1},
     {"income": 50_000, "life_expectancy": 66.7},
-    {"income": 50_000, "life_expectancy": 71.0},
+    {"income": 55_000, "life_expectancy": 71.0},
     {"income": 70_000, "life_expectancy": 72.5},
-    {"income": 70_000, "life_expectancy": 77.3},
+    {"income": 75_000, "life_expectancy": 77.3},
     {"income": 90_000, "life_expectancy": 82.9},
-    {"income": 90_000, "life_expectancy": 80.0},
+    {"income": 95_000, "life_expectancy": 80.0},
 ]
 ```
 

diff --git a/docs/notes/dataviz/trendlines.qmd b/docs/notes/dataviz/trendlines.qmd
@@ -0,0 +1,92 @@
+---
+format:
+  html:
+    code-fold: false
+jupyter: python3
+execute:
+  cache: true # re-render only when source changes
+---
+
+# Scatter Plot Trendlines w/ `plotly`
+
+In many cases, it may be helpful to add a trendline to a chart, to help examine relationships between variables.
+
+The [`scatter` function](https://plotly.com/python-api-reference/generated/plotly.express.scatter) in `plotly` is the only type of chart that supports trendlines.
+
+To illustrate how to add trendlines, let's revisit the previous scatter plot example:
+
+
+```{python}
+#| code-fold: true
+
+scatter_data = [
+    {"income": 30_000, "life_expectancy": 65.5},
+    {"income": 35_000, "life_expectancy": 62.1},
+    {"income": 50_000, "life_expectancy": 66.7},
+    {"income": 55_000, "life_expectancy": 71.0},
+    {"income": 70_000, "life_expectancy": 72.5},
+    {"income": 75_000, "life_expectancy": 77.3},
+    {"income": 90_000, "life_expectancy": 82.9},
+    {"income": 95_000, "life_expectancy": 80.0},
+]
+
+incomes = []
+expectancies = []
+for item in scatter_data:
+    incomes.append(item["income"])
+    expectancies.append(item["life_expectancy"])
+
+```
+
+```{python}
+from plotly.express import scatter
+
+fig = scatter(x=incomes, y=expectancies, height=350,
+                title="Life Expectancy by Income",
+                labels={"x": "Income", "y": "Life Expectancy (years)"},
+)
+fig.show()
+```
+
+Upon viewing the chart, looks like there may be evidence of a trend.
+
+## Linear Trends
+
+The [`scatter` function](https://plotly.com/python-api-reference/generated/plotly.express.scatter) has some trend-line related parameters:
+
+```{python}
+from plotly.express import scatter
+
+fig = scatter(x=incomes, y=expectancies, height=350,
+                title="Life Expectancy by Income",
+                labels={"x": "Income", "y": "Life Expectancy (years)"},
+                trendline="ols", trendline_color_override="red"
+)
+fig.show()
+```
+
+:::{.callout-note title="FYI"}
+Under the hood, `plotly` uses the `statsmodels` package to calculate the trend, so you may have to install that package as well.
+:::
+
+A linear trend assumes that there is a straight-line relationship between the independent and dependent variables. In the context of US GDP data, a linear trend suggests that GDP changes at a constant rate over time. When applying linear regression, the goal is to find the best-fit line that minimizes the residuals (differences between the predicted and actual values) under the assumption that the underlying relationship is linear.
+
+Linear regression is simple and interpretable but can be overly restrictive when the real-world data follows a more complex, non-linear pattern.
+
+
+## Non-linear Trends
+
+In addition to the \"ols\" (Ordinary Least Squares) linear trend, we can use a \"lowess\" (Locally Weighted Scatterplot Smoothing) trend, which may be a better fit for non-linear relationships.
+
+```{python}
+from plotly.express import scatter
+
+fig = scatter(x=incomes, y=expectancies, height=350,
+                title="Life Expectancy by Income",
+                labels={"x": "Income", "y": "Life Expectancy (years)"},
+                trendline="lowess", trendline_color_override="red"
+)
+fig.show()
+```
+
+[LOWESS](https://en.wikipedia.org/wiki/Local_regression) is a [non-parametric method](https://www.investopedia.com/terms/n/nonparametric-statistics.asp) that fits multiple local regressions to different segments of the data. Instead of assuming a global linear relationship, it captures local patterns by fitting simple models in small neighborhoods around each point. These local models are then combined to create a smooth curve that adjusts to non-linearities in the data. A LOWESS trend can adapt to sudden changes, curves, and other complex behaviors in the data, making it ideal for datasets where the relationship between variables changes over time.
diff --git a/docs/why-python.qmd b/docs/why-python.qmd
@@ -3,7 +3,7 @@
 
 Many students and professionals already possess a deep knowledge of spreadsheet software. So why is it helpful to learn [Python](https://www.python.org/) as well?
 
-![Python logo](/images/python-logo.png){.img-fluid style="max-height:150;"}
+![Python logo.](/images/python-logo.png){height=120}
 
 Like any computer programming language, Python offers benefits of automation. And among programming languages, Python is a popular, easy to use, powerful, and versatile choice.
 
@@ -24,7 +24,7 @@ Python is one of the most popular programming languages. According to recent rep
 
 Python is one of the most popular and fastest-growing programming languages. According to recent reports and developer surveys, it has overtaken JavaScript to become the most popular language.
 
-![Top programming languages. Source: [GitHub Octoverse 2024](https://github.blog/news-insights/octoverse/octoverse-2024)](/images/github-octoverse-top-langs-2024-cropped.png){height=350}
+![Top programming languages. Source: [GitHub Octoverse 2024](https://github.blog/news-insights/octoverse/octoverse-2024).](/images/github-octoverse-top-langs-2024-cropped.png){height=320}
 
 
 This popularity translates into a strong and vibrant community that contributes to the language's development and support.
@@ -33,7 +33,7 @@ This means a wealth of resources, tutorials, and community support is readily av
 
 Additionally, the high demand for Python skills in the job market creates numerous career opportunities for analysts and programmers proficient in Python, especially in the world of data science and software development.
 
-![Top jobs related to Python programming. Source: [US News and World Report](https://money.usnews.com/careers/best-jobs/rankings/the-100-best-jobs)](/images/python-top-jobs.png)
+![Top jobs related to Python programming. Source: [US News and World Report](https://money.usnews.com/careers/best-jobs/rankings/the-100-best-jobs).](/images/python-top-jobs.png)
 
 
 ## Ease of Use