Sources of Reading Material for PQHS/CRSP 432 with Dr. Love
This is a repository of things I may use or refer to during the semester. If you're looking for additional things to read, this is the right place.
Some items are password-protected, which if you click them will only let you download them from Github, and then open with a password on your own machine. The password is revealed in the first class session.
All of these items will be linked through Class READMEs, as well, as they become useful to us.
- The 431 notes are here and will remain there until 2025-06-01.
- The 432 notes are here and will remain there until 2025-06-01.
- (431) David Spiegelhalter The Art of Statistics available at Amazon and other retailers.
- Here's the book's website which contains R code, corrections and other materials.
- (432) Jeffrey Leek How to be a Modern Scientist available via Leanpub.
- Frank Harrell's Biostatistics for Biomedical Research (BBR) Course includes a series of lectures on many of the topics we'll be discussing in 432, in addition to several late-breaking items. Details on the course are available here and the notes are linked in the Statistics and Modeling books below.
- Richard McElreath Statistical Rethinking (Winter 2023) lecture series is the best introduction to Bayesian ideas and methods available.
- Read more about the course at https://github.com/rmcelreath/stat_rethinking_2023.
- If you want to just spend two hours rather than dive in to the full thing, Richard's lecture Bayesian Inference is Just Counting is great, too.
- Richard's book Statistical Rethinking (2nd edition) is available to you through the CWRU library system as an ebook. See also the freely available book by Kurz, below.
- Frank E. Harrell and Chris Slaughter Biostatistics for Biomedical Research Notes (pdf) - also see the YouTube course above.
- Frank E. Harrell Regression Modeling Strategies, 2nd Edition, 2015.
- Max Kuhn and Julia Silge Tidy Modeling with R
- Paul Roback and Julie Legler Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R
- A Solomon Kurz Statistical rethinking with brms, ggplot2, and the tidyverse
- Julian J. Faraway Practical Regression and Anova using R, 2002.
- David G. Kleinbaum and Mitchel Klein Logistic Regression: A Self-Learning Text, 3rd Edition, 2010.
- Simon J. Sheather A Modern Approach to Regression with R, 2009.
- Eric Vittinghoff, David V. Glidden, Stephen C. Shiboski and Charles E. McCulloch Regression Methods in Biostatistics, 2nd Edition, 2012.
- Rob J Hyndman and George Athanasopoulos Forecasting: Principles and Practice
- Winston Chang R-Graphics Cookbook, version 2.0
- Kieran Healy Data Visualization: A practical introduction
- Claus O. Wilke Fundamentals of Data Visualization
- Rob Kabacoff Data Visualization with R
- Hadley Wickham and Garrett Grolemund R for Data Science, 2nd edition
- Max Kuhn and Julia Silge Tidy Modeling with R (Yes, I'm listing it twice, because it's useful in both contexts.)
- Carrie Wright, Shannon Ellis, Stephanie Hicks, and Roger D. Peng Tidyverse Skills for Data Science in R.
- Chester Ismay and Albert Y. Kim Statistical Inference via Data Science: A Modern Dive into R and the Tidyverse
- Yihui Xie, Christophe Dervieux, Emily Riederer R Markdown Cookbook
- Yihui Xie, Amber Thomas, Alison Presmanes Hill blogdown: Creating Websites with R Markdown
- Yihui Xie, J. J. Allaire, Garrett Grolemund R Markdown: The Definitive Guide
- Hadley Wickham and Jenny Bryan R Packages
- Peter D. R. Higgins Reproducible Medical Research with R
- Statistical Inference in the 21st Century: A World Beyond p < 0.05 from 2019 in The American Statistician
- The American Statistical Association's 2016 Statement on p-Values: Context, Process and Purpose.
- Frank Harrell and colleagues' Glossary of Statistical Terms (pdf)
- Project-oriented workflow at tidyverse.org from Jenny Bryan.
- From the Ten Simple Rules series at PLOS Computational Biology:
- Ten Simple Rules for Effective Statistical Practice by Kass RE et al. 2016
- Ten Simple Rules for Graduate Students by Gu J Bourne PE 2007
- Ten Simple Rules for Better Figures by Rougier NP Droettboom M Bourne PE 2014
- Ten Simple Rules for Creating a Good Data Management Plan by Michener WK 2015
- Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10, DOI: 10.1080/00031305.2017.1375989
- Min Q. Wang, Alice F. Yan and Ralph V. Katz (2018) Researcher Requests for Inappropriate Analysis and Reporting: A U.S. Survey of Consulting Biostatisticians Annals of Internal Medicine https://doi.org/10.7326/M18-1230.
- Accompanying editorial by A. Russell Localio, Catherine B. Stack, Anne R. Meibohm et al. (2018) Inappropriate Statistical Analysis and Reporting in Medical Research: Perverse Incentives and Institutional Solutions Annals of Internal Medicine https://doi.org/10.7326/M18-2516.
- Peter C. Austin and Ewout W. Steyerberg (2015) The number of subjects per variable required in linear regression analyses J Clinical Epidemiology 68: 627-636.
- Richard D Riley, Joie Ensor, Kym I E Snell et al. Calculating the sample size required for developing a clinical prediction model (pdf) BMJ 2020; 368:m441. Link at BMJ.
- Andrew Gelman and John Carlin Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors
- Andrew Gelman (and others) blog: Statistical Modeling, Causal Inference, and Social Science
- datamethods "is a place where statisticians, epidemiologists, informaticists, machine learning practitioners, and other research methodologists communicate with themselves and with clinical, translational, and health services researchers to discuss issues related to data: research methods, quantitative methods, study design, measurement, statistical analysis, interpretation of data and statistical results, clinical trials, journal articles, statistical graphics, causal inference, medical decision making, and more." This new (2018-) resource's rationale is here.
- RStudio Community "is a community for all things R and RStudio."
- RStudio Cheat Sheets which have expanded enormously in recent years.
- Dr. Love's favorite list of Colors in R.
- Virtually any code you have written in R Markdown can be run using Quarto instead, by simply switching the file extension from .Rmd to .qmd.
- It's still worth it to learn about how Quarto works, and why it differs from R Markdown when it does.
Here are some suggestions:
- https://quarto.org/ is the main website for all things Quarto, will help you get started making the transition, and has a detailed set of guides and references.
- Here's the start of the Tutorial, including Hello, Quarto that shows you how to use Quarto with RStudio, which is what you'll be doing in this class.
- This FAQ for R Markdown users might be a good starting point.
- Alison Hill wrote a great blog post We don't talk about Quarto which got me started last April.
- R for Data Science (2nd edition) has three chapters on Communication which include sections dedicated to Quarto, Quarto formats and a Quarto workflow.
- YouTube videos discussing Quarto that may be of interest to you include (in no special order):
- From Tom Mock at Posit (new name of RStudio the company)
- Quarto for the Curious (runs 21 minutes)
- Welcome to Quarto workshop (2 hours 23 minutes)
- Beautiful Reports and Presentations with Quarto (1 hour 45 minutes)
- Create & Publish a Quarto Blog on Quarto Pub in 100 Seconds (2 minutes).
- Reproducible Medical Research with Quarto (3 hours 15 minutes)
- Isabella Velásquez: Building a Blog with Quarto (1 hour 13 minutes)
- Devin Pastoor: Websites & Books & Blogs, oh my! Creating Rich Content with Quarto (21 minutes)
- Frank Harrell: R Workflow for Reproducible Biomedical Research using Quarto (1 hour, 2 minutes)
- Lyndon Walker: Create beautiful documents with Quarto and R (29 minutes)
- Mine Çetinkaya-Rundel: Hello, Quarto!
- Ted Laderas on Quarto/R Markdown: What's Different? (28 minutes)
- From Tom Mock at Posit (new name of RStudio the company)
- The ASA Section on Teaching of Statistics in the Health Sciences has some interesting material.
- Ronald L. Wasserstein, Allen L. Schirm & Nicole A. Lazar (2019) Moving to a World Beyond "p < 0.05", The American Statistician, 73: sup1, 1-19, DOI: 10.1080/00031305.2019.1583913. PDF available here, too.
- Ronald L. Wasserstein & Nicole A. Lazar (2016) The ASA's Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108. PDF available here, too.
- Jeff Leek and Roger Peng's article in Nature (2015) P values are just the tip of the iceberg
- I've given these posts: Why I've lost faith in p values, part 1 and Why I've lost faith in p values, part 2 to a few people. Maybe they'll help you.
- "Abandoning statistical significance is both sensible and practical" by Amrhein, Gelman, Greenland and McShane at PeerJ Preprints.
- Frank Harrell's post about "Language for communicating frequentist results about treatment effects"
- "Calculating Observed Power Is Just Transforming Noise at LessLikely
- "A discussion of events that transpired in the past year, where a group of surgical researchers decided to ignore much of the statistical literature and promote a highly misleading practice of calculating post-hoc power using the observed effect size."
- Some related pieces at LessLikely include "Misplaced Confidence in Observed Power" and "P-Values Are Tough and S-Values Can Help".
- Andrew Gelman: Statistical-significance thinking is not just a bad way to publish, it’s also a bad way to think - the money quote: "it’s ultimately not about what it takes, or should take, to get a result published, but rather how we as researchers can navigate through uncertainty and not get faked out by noise in our own data."
I've separated this into its own page now.