Skip to content

Commit

Permalink
update docs/img
Browse files Browse the repository at this point in the history
  • Loading branch information
m-clark committed Sep 2, 2024
1 parent ab9db73 commit 78adc40
Show file tree
Hide file tree
Showing 40 changed files with 8,831 additions and 3,855 deletions.
17 changes: 6 additions & 11 deletions docs/causal.html
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,6 @@ <h1 class="title d-none d-lg-block"><span id="sec-causal" class="quarto-section-
</dd>
</dl>
</blockquote>
<p>TODO: Reviewer - provide a code demo of confounding - related: let’s move explanation vs.&nbsp;prediction to this chapter.</p>
<p>Causal inference is a very important topic in machine learning and statistics, and it is also a very difficult one to understand well, or consistently, because <em>not everyone agrees on how to define a cause in the first place</em>. Our focus here is merely practical- we just want to show some of the common model approaches used when attempting to answer causal questions. But causal modeling in general is such a rabbit hole that we won’t be able to go into much detail, but we will try to give you a sense of the landscape, and some of the key ideas.</p>
<section id="sec-causal-key-ideas" class="level2" data-number="12.1">
<h2 data-number="12.1" class="anchored" data-anchor-id="sec-causal-key-ideas"><span class="header-section-number">12.1</span> Key Ideas</h2>
Expand All @@ -431,20 +430,16 @@ <h3 data-number="12.1.1" class="anchored" data-anchor-id="sec-causal-why"><span
<section id="sec-causal-good-to-know" class="level3" data-number="12.1.2">
<h3 data-number="12.1.2" class="anchored" data-anchor-id="sec-causal-good-to-know"><span class="header-section-number">12.1.2</span> Helpful context</h3>
<p>This section is pretty high level, and we are not going to go into much detail here so even just some understanding of correlation and modeling would likely be enough.</p>
<div class="cell">
<div class="cell-output-display">
<div id="fig-causal-dag" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-causal-dag-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="causal_files/figure-html/fig-causal-dag-1.png" class="img-fluid figure-img" width="672">
<img src="img/causal-dag.svg" class="img-fluid figure-img" style="width:75.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-causal-dag-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;12.1: A Causal DAG
</figcaption>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="sec-causal-classic" class="level2" data-number="12.2">
Expand Down Expand Up @@ -690,7 +685,7 @@ <h3 data-number="12.5.5" class="anchored" data-anchor-id="sec-causal-meta"><span
<p><a href="https://arxiv.org/pdf/1706.03461.pdf">Meta-learners</a> are used in machine learning contexts to assess potentially causal relationships between some treatment and outcome. The core model can actually be any kind you might want to use, but in which extra steps are taken to assess the causal relationship. The most common types of meta-learners are:</p>
<ul>
<li><strong>S-learner</strong> - <strong>s</strong>ingle model for both groups; predict the (counterfactual) difference as when all observations are treated vs when all are not, similar to our previous code demo.</li>
<li><strong>T-learner</strong> - <strong>t</strong>wo models, one for each of the control and treatment groups; predict the values as if all observations are treated vs when all are control using both models, and take the difference.</li>
<li><strong>T-learner</strong> - <strong>t</strong>wo models, one for each of the control and treatment groups; predict the values as if all observations are treated’ versus when all are control using both models, and take the difference.</li>
<li><strong>X-learner</strong> - a more complicated modification to the T-learner also using a multi-step approach.</li>
</ul>
<p>Some additional variants of these models exist, and they can be used in a variety of settings, not just uplift modeling. The key idea is to use the model to predict the potential outcomes of the treatment, and then to take the difference between the two predictions as the causal effect.</p>
Expand Down Expand Up @@ -743,7 +738,7 @@ <h2 data-number="12.6" class="anchored" data-anchor-id="sec-causal-prediction-ex
</figure>
</div>
<p>But if we are interested in predictive performance, we would be disappointed with this model. It predicts the target at about the same rate as guessing, even on the data it’s fit on, and does even worse with new data. Even the effect as shown is quite small by typical standards, as it would take a standard deviation change in the feature to get a ~1% change in the probability of the target (x is standardized).</p>
<p>If we are concerned solely with explanation, we now would want to ask ourselves first if we can trust our result based on the data, model, and various issues that went into producing it. If so, we can then if the effect is large enough to be of interest, and if the result is useful in making decisions<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>. It may very well be, maybe the target concerns the rate of survival, where any increase is worthwhile. Or perhaps the data circumstances demand such interpretation, because it is extremely costly to obtain more. For more exploratory efforts however, this sort of result would likely not be enough to come to any strong conclusion even if explanation is the only goal.</p>
<p>If we are concerned solely with explanation, we now would want to ask ourselves first if we can trust our result based on the data, model, and various issues that went into producing it. If so, we can then see if the effect is large enough to be of interest, and if the result is useful in making decisions<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>. It may very well be, maybe the target concerns the rate of survival, where any increase is worthwhile. Or perhaps the data circumstances demand such interpretation, because it is extremely costly to obtain more. For more exploratory efforts however, this sort of result would likely not be enough to come to any strong conclusion even if explanation is the only goal.</p>
<p>As another example, consider the world happiness data we’ve used in previous demonstrations. We want to explain the association of country level characteristics and the population’s happiness. We likely aren’t going to be as interested in predicting next year’s happiness score, but rather what attributes are correlated with a happy populace in general. In this election year (2024) in the U.S., we’d be interested in specific factors related to presidential elections, of which there are relatively very few data points. In these cases, explanation is the focus, and we may not even need a model at all to come to our conclusions.</p>
<p>So we can understand that in some settings we may be more interested in understanding the underlying mechanisms of the data, as with these examples, and in others we may be more interested in predictive performance, as in our demonstrations of machine learning. However, the distinction between prediction and explanation in the end is a bit problematic, not the least of which is that we often want to do both.</p>
<p>Although it’s often implied as such, <em>prediction is not just what we do with new data</em>. It is the very means by which we get any explanation of effects via coefficients, marginal effects, visualizations, and other model results. Additionally, where the focus is on predictive performance, if we can’t explain the results we get, we will typically feel dissatisfied, and may still question how well the model is actually doing.</p>
Expand All @@ -754,7 +749,7 @@ <h2 data-number="12.6" class="anchored" data-anchor-id="sec-causal-prediction-ex
<li><strong>Causal Modeling</strong>: Using models to understand causal effects. We focus on explanation, and prediction on the current data. We may very well be interested in predictive performance also, and often are in industry.</li>
<li><strong>Generalization</strong>: When our goal is generalizing to unseen data, the focus is always on predictive performance. This does not mean we can’t use the model to understand the data though, and explanation could possibly be as important.</li>
</ul>
<p>Depending on the context, we may be more interested explanation or predictive performance, but in practice we often, and usually, want both. It is crucial to remind yourself why you are interested in the problem, what a model is capable telling you about it, and to be clear about what you want to get out of the result.</p>
<p>Depending on the context, we may be more interested explanation or predictive performance, but in practice we often, and usually, want both. It is crucial to remind yourself why you are interested in the problem, what a model is capable of telling you about it, and to be clear about what you want to get out of the result.</p>
</section>
<section id="causal-wrap" class="level2" data-number="12.7">
<h2 data-number="12.7" class="anchored" data-anchor-id="causal-wrap"><span class="header-section-number">12.7</span> Wrapping Up</h2>
Expand All @@ -767,7 +762,7 @@ <h3 data-number="12.7.1" class="anchored" data-anchor-id="causal-common"><span c
</section>
<section id="causal-adventure" class="level3" data-number="12.7.2">
<h3 data-number="12.7.2" class="anchored" data-anchor-id="causal-adventure"><span class="header-section-number">12.7.2</span> Choose your own adventure</h3>
<p>From here you might revisit some of the previous models and think about how you might use them to answer a causal question. You might also look into some of the other models we’ve mentioned here, and see how they are used in practice via the additional resources below.</p>
<p>From here you might revisit some of the previous models and think about how you might use them to answer a causal question. You might also look into some of the other models we’ve mentioned here, and see how they are used in practice via the additional resources.</p>
</section>
<section id="causal-resources" class="level3" data-number="12.7.3">
<h3 data-number="12.7.3" class="anchored" data-anchor-id="causal-resources"><span class="header-section-number">12.7.3</span> Additional resources</h3>
Expand Down Expand Up @@ -814,7 +809,7 @@ <h2 data-number="12.8" class="anchored" data-anchor-id="causal-exercise"><span c
<li id="fn5"><p>Your authors have to admit some bias here. We’ve spent a lot of our past dealing with SEMs, and almost every application we saw had too little data and too little generalization, and were grossly overfit. Many SEM programs even added multiple ways to overfit the data even further, and it is difficult to trust the results reported in many papers that used them. But that’s not the fault of SEM in general- it can be a useful tool when used correctly, and it can help answer causal questions, but it is not a magic bullet, and it doesn’t make anyone look fancier by using it.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>This is basically the <strong>S-Learner</strong> approach to meta-learning, which we’ll discuss in a bit. It is generally too weak<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>The G-computation approach and S-learners are essentially the same approach, but came about from different domain contexts.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>This is a contrived example, but it is definitely something what you might see in the wild. The relationship is weak, and though statistically significant, the model can’t predict the target well at all. The <strong>statistical power</strong> is actually decent in this case, roughly 70%, but this is mainly because the sample size is so large and it is a very simple model setting. <br> This is a common issue in many academic fields, and it’s why we always need to be careful about how we interpret our models. In practice, we would generally need to consider other factors, such as the cost of a false positive or false negative, or the cost of the data and running the model itself, to determine if the model is worth using.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>This is a contrived example, but it is definitely something that you might see in the wild. The relationship is weak, and though statistically significant, the model can’t predict the target well at all. The <strong>statistical power</strong> is actually decent in this case, roughly 70%, but this is mainly because the sample size is so large and it is a very simple model setting. <br> This is a common issue in many academic fields, and it’s why we always need to be careful about how we interpret our models. In practice, we would generally need to consider other factors, such as the cost of a false positive or false negative, or the cost of the data and running the model itself, to determine if the model is worth using.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>Gentle reminder that making an assumption does not mean the assumption is correct, or even provable.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
Expand Down
Binary file modified docs/causal_files/figure-html/fig-causal-dag-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/danger_zone.html
Original file line number Diff line number Diff line change
Expand Up @@ -730,7 +730,7 @@ <h3 data-number="14.4.3" class="anchored" data-anchor-id="sec-danger-outliers"><
</section>
<section id="sec-danger-bigdata" class="level3" data-number="14.4.4">
<h3 data-number="14.4.4" class="anchored" data-anchor-id="sec-danger-bigdata"><span class="header-section-number">14.4.4</span> Big data isn’t always as big as you think</h3>
<p>Consider a model setting with 100,000 samples. Is this large? Let’s say you have a rare outcome that occurs 1% of the time. This means you have 1000 samples where outcome label you’re interested in occurs. Now consider a categorical feature (A) that has four categories, and one of those categories is relatively small, say 5% of the data, or 5000 cases, and you want to interact it with another categorical feature (B), one whose categories are all equally distributed. Assuming no particular correlation between the two, you’d be down to ~1% of the data for the least category of A across the levels of B. Now if there is an actual interaction, some of those interaction cells may have only a dozen or so positive target values. Odds are pretty good that you don’t have enough data to make a reliable estimate of the interaction effect.</p>
<p>Consider a model setting with 100,000 samples. Is this large? Let’s say you have a rare outcome that occurs 1% of the time. This means you have 1000 samples where the outcome label you’re interested in is present. Now consider a categorical feature (A) that has four categories, and one of those categories is relatively small, say 5% of the data, or 5000 cases, and you want to interact it with another categorical feature (B), one whose categories are all equally distributed. Assuming no particular correlation between the two, you’d be down to ~1% of the data for the least category of A across the levels of B. Now if there is an actual interaction effect on the target, some of those interaction cells may have only a dozen or so positive target values. Odds are pretty good that you don’t have enough data to make a reliable estimate of the interaction effect.</p>
<p>Oh wait, did you want to use cross-validation also? A simple random sample approach might result in some validation sets with no positive values at all! Don’t forget that you may have already split your 100,000 samples into training and test sets, so you have even less data to start with! The following table shows the final cell count for a dataset with these properties.</p>
<p>The point is that it’s easy to forget that large data can get small very quickly due to class imbalance, interactions, etc. There is not much you can do about this, but you should not be surprised when these situations are not very revealing in terms of your model results.</p>
</section>
Expand Down
Loading

0 comments on commit 78adc40

Please sign in to comment.