diff --git a/.quarto/idx/index.qmd.json b/.quarto/idx/index.qmd.json
index d1f07d1..a58c907 100644
--- a/.quarto/idx/index.qmd.json
+++ b/.quarto/idx/index.qmd.json
@@ -1 +1 @@
-{"title":"Advanced R for Data Analysis","markdown":{"yaml":{"title":"Advanced R for Data Analysis","subtitle":"University of Utah"},"headingText":"General Information","containsRefs":false,"markdown":"\n\n[![](DELPHI-long.png)](https://uofuhealth.utah.edu/delphi-data-science-initiative)\n\n\n::: {.callout-note appearance=\"simple\" icon=false}\n**Date**: October 29 2024\n\n**Time**: 9:00 am -- 4:00 pm MDT\n\n**Location**: HELIX Rm - GS150 - Chokecherry\n\n**Instructors**: Rebecca Barter\n:::\n\n**Registration**: Use the [following link](????) to sign up for this workshop.\n\n[**Sign up for the DELPHI mailing list**](https://www.lists.utah.edu/wws/subscribe/delphi?previous_action=info) to stay in the loop about future workshops and funding opportunities.\n\n\n**What**: This workshop will build upon the foundations covered in the Introduction to R workshop, and will introduce more advanced tools, including creating your own custom functions, iterating with purrr map functions, advanced data manipulations with dplyr, and reshaping and joining datasets in R.\n\n**Who**: The course is aimed at graduate students, postdocs, staff, faculty, and other researchers across campus who are interested in taking their R skills to the next level to learn how to conduct more sophisticated data manipulations and analysis. \n\nParticipants in this workshop should already be comfortable working in Quarto (or R Markdown) documents in RStudio, and have basic knowledge of the the R programming language and the tidyverse, including the dplyr select, filter, mutate, and summarize functions, as well as creating basic visualizations using ggplot2.\n\n**Requirements**: Participants must bring a laptop onto which they can [download R and Rstudio](https://posit.co/download/rstudio-desktop/) (and you should do so before the workshop).\n\n**Contact**: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information.\n\n## Schedule\n\nNote that the schedule below serves as a guideline. The start, end, and break times are fixed, but timing for each topics covered may vary as we may go faster or slower through the content.\n\n\n\n| Time  | Topic     | \n| ----:|:----|\n| 9:00   | Introduction and Setup |   \n| 9:30    | [Writing Functions](content/complete/01_custom_functions_complete.html) | [Writing Functions]   | \n| 10:30   | [Break] |   \n| 10:45   | [Iteration with Purrr](content/complete/02_iteration_purrr_complete.html) |     \n| 12:00  | [Lunch] | \n| 1:00   | [Across()](content/complete/03_across_complete.html) |\n| 1:45   | [Recoding variables](content/complete/04_recoding_variables_complete.html)| \n| 2:30   | [Break] |\n| 2:45   | [Reshaping Data](content/complete/05_pivoting_complete.html) |     \n| 3:15   | [Joining Data](content/complete/06_joining_complete.html) |  \n| 4:00   | [End] |  \n\n","srcMarkdownNoYaml":"\n\n[![](DELPHI-long.png)](https://uofuhealth.utah.edu/delphi-data-science-initiative)\n\n\n::: {.callout-note appearance=\"simple\" icon=false}\n**Date**: October 29 2024\n\n**Time**: 9:00 am -- 4:00 pm MDT\n\n**Location**: HELIX Rm - GS150 - Chokecherry\n\n**Instructors**: Rebecca Barter\n:::\n\n**Registration**: Use the [following link](????) to sign up for this workshop.\n\n[**Sign up for the DELPHI mailing list**](https://www.lists.utah.edu/wws/subscribe/delphi?previous_action=info) to stay in the loop about future workshops and funding opportunities.\n\n## General Information\n\n**What**: This workshop will build upon the foundations covered in the Introduction to R workshop, and will introduce more advanced tools, including creating your own custom functions, iterating with purrr map functions, advanced data manipulations with dplyr, and reshaping and joining datasets in R.\n\n**Who**: The course is aimed at graduate students, postdocs, staff, faculty, and other researchers across campus who are interested in taking their R skills to the next level to learn how to conduct more sophisticated data manipulations and analysis. \n\nParticipants in this workshop should already be comfortable working in Quarto (or R Markdown) documents in RStudio, and have basic knowledge of the the R programming language and the tidyverse, including the dplyr select, filter, mutate, and summarize functions, as well as creating basic visualizations using ggplot2.\n\n**Requirements**: Participants must bring a laptop onto which they can [download R and Rstudio](https://posit.co/download/rstudio-desktop/) (and you should do so before the workshop).\n\n**Contact**: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information.\n\n## Schedule\n\nNote that the schedule below serves as a guideline. The start, end, and break times are fixed, but timing for each topics covered may vary as we may go faster or slower through the content.\n\n\n\n| Time  | Topic     | \n| ----:|:----|\n| 9:00   | Introduction and Setup |   \n| 9:30    | [Writing Functions](content/complete/01_custom_functions_complete.html) | [Writing Functions]   | \n| 10:30   | [Break] |   \n| 10:45   | [Iteration with Purrr](content/complete/02_iteration_purrr_complete.html) |     \n| 12:00  | [Lunch] | \n| 1:00   | [Across()](content/complete/03_across_complete.html) |\n| 1:45   | [Recoding variables](content/complete/04_recoding_variables_complete.html)| \n| 2:30   | [Break] |\n| 2:45   | [Reshaping Data](content/complete/05_pivoting_complete.html) |     \n| 3:15   | [Joining Data](content/complete/06_joining_complete.html) |  \n| 4:00   | [End] |  \n\n"},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":null,"freeze":false,"echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"engine":"markdown"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":"none","code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":false,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","css":["styles.css"],"toc":true,"output-file":"index.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","appendix-view-license":"View License","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words","listing-page-filter":"Filter","draft":"Draft"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.5.56","theme":"sandstone","title":"Advanced R for Data Analysis","subtitle":"University of Utah"},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]}
\ No newline at end of file
+{"title":"Advanced R for Data Analysis","markdown":{"yaml":{"title":"Advanced R for Data Analysis","subtitle":"University of Utah"},"headingText":"General Information","containsRefs":false,"markdown":"\n\n[![](DELPHI-long.png)](https://uofuhealth.utah.edu/delphi-data-science-initiative)\n\n\n::: {.callout-note appearance=\"simple\" icon=false}\n**Date**: October 29 2024\n\n**Time**: 9:00 am -- 4:00 pm MDT\n\n**Location**: HELIX Rm - GS150 - Chokecherry\n\n**Instructors**: Rebecca Barter\n:::\n\n**Registration**: Use the [following link](????) to sign up for this workshop.\n\n[**Sign up for the DELPHI mailing list**](https://www.lists.utah.edu/wws/subscribe/delphi?previous_action=info) to stay in the loop about future workshops and funding opportunities.\n\n\n**What**: This workshop will build upon the foundations covered in the Introduction to R workshop, and will introduce more advanced tools, including creating your own custom functions, iterating with purrr map functions, advanced data manipulations with dplyr, and reshaping and joining datasets in R.\n\n**Who**: The course is aimed at graduate students, postdocs, staff, faculty, and other researchers across campus who are interested in taking their R skills to the next level to learn how to conduct more sophisticated data manipulations and analysis. \n\nParticipants in this workshop should already be comfortable working in Quarto (or R Markdown) documents in RStudio, and have basic knowledge of the the R programming language and the tidyverse, including the dplyr select, filter, mutate, and summarize functions, as well as creating basic visualizations using ggplot2.\n\n**Requirements**: Participants must bring a laptop onto which they can [download R and Rstudio](https://posit.co/download/rstudio-desktop/) (and you should do so before the workshop).\n\n**Contact**: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information.\n\n\n### Posit Cloud\n\n[Click here to join the Posit Cloud workspace](https://posit.cloud/spaces/569320/join?access_code=wEBdUgVZCTVK1UwPUOvHG9EnMXi2NznUTO1uHruB)\n\n### Download files and data\n\nIf you are working in RStudio locally (rather than using posit cloud, above) [click here](content.zip \"download\") to download all of the complete and incomplete .qmd files and data files we will be using throughout the workshop.\n\n\n\n\n## Schedule\n\nNote that the schedule below serves as a guideline. The start, end, and break times are fixed, but timing for each topics covered may vary as we may go faster or slower through the content.\n\n\n\n| Time  | Topic     | \n| ----:|:----|\n| 9:00   | Introduction and Setup |   \n| 9:30    | [Writing Functions](content/complete/01_custom_functions_complete.html) | [Writing Functions]   | \n| 10:30   | [Break] |   \n| 10:45   | [Iteration with Purrr](content/complete/02_iteration_purrr_complete.html) |     \n| 12:00  | [Lunch] | \n| 1:00   | [Across()](content/complete/03_across_complete.html) |\n| 1:45   | [Recoding variables](content/complete/04_recoding_variables_complete.html)| \n| 2:30   | [Break] |\n| 2:45   | [Reshaping Data](content/complete/05_pivoting_complete.html) |     \n| 3:15   | [Joining Data](content/complete/06_joining_complete.html) |  \n| 4:00   | [End] |  \n\n","srcMarkdownNoYaml":"\n\n[![](DELPHI-long.png)](https://uofuhealth.utah.edu/delphi-data-science-initiative)\n\n\n::: {.callout-note appearance=\"simple\" icon=false}\n**Date**: October 29 2024\n\n**Time**: 9:00 am -- 4:00 pm MDT\n\n**Location**: HELIX Rm - GS150 - Chokecherry\n\n**Instructors**: Rebecca Barter\n:::\n\n**Registration**: Use the [following link](????) to sign up for this workshop.\n\n[**Sign up for the DELPHI mailing list**](https://www.lists.utah.edu/wws/subscribe/delphi?previous_action=info) to stay in the loop about future workshops and funding opportunities.\n\n## General Information\n\n**What**: This workshop will build upon the foundations covered in the Introduction to R workshop, and will introduce more advanced tools, including creating your own custom functions, iterating with purrr map functions, advanced data manipulations with dplyr, and reshaping and joining datasets in R.\n\n**Who**: The course is aimed at graduate students, postdocs, staff, faculty, and other researchers across campus who are interested in taking their R skills to the next level to learn how to conduct more sophisticated data manipulations and analysis. \n\nParticipants in this workshop should already be comfortable working in Quarto (or R Markdown) documents in RStudio, and have basic knowledge of the the R programming language and the tidyverse, including the dplyr select, filter, mutate, and summarize functions, as well as creating basic visualizations using ggplot2.\n\n**Requirements**: Participants must bring a laptop onto which they can [download R and Rstudio](https://posit.co/download/rstudio-desktop/) (and you should do so before the workshop).\n\n**Contact**: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information.\n\n\n### Posit Cloud\n\n[Click here to join the Posit Cloud workspace](https://posit.cloud/spaces/569320/join?access_code=wEBdUgVZCTVK1UwPUOvHG9EnMXi2NznUTO1uHruB)\n\n### Download files and data\n\nIf you are working in RStudio locally (rather than using posit cloud, above) [click here](content.zip \"download\") to download all of the complete and incomplete .qmd files and data files we will be using throughout the workshop.\n\n\n\n\n## Schedule\n\nNote that the schedule below serves as a guideline. The start, end, and break times are fixed, but timing for each topics covered may vary as we may go faster or slower through the content.\n\n\n\n| Time  | Topic     | \n| ----:|:----|\n| 9:00   | Introduction and Setup |   \n| 9:30    | [Writing Functions](content/complete/01_custom_functions_complete.html) | [Writing Functions]   | \n| 10:30   | [Break] |   \n| 10:45   | [Iteration with Purrr](content/complete/02_iteration_purrr_complete.html) |     \n| 12:00  | [Lunch] | \n| 1:00   | [Across()](content/complete/03_across_complete.html) |\n| 1:45   | [Recoding variables](content/complete/04_recoding_variables_complete.html)| \n| 2:30   | [Break] |\n| 2:45   | [Reshaping Data](content/complete/05_pivoting_complete.html) |     \n| 3:15   | [Joining Data](content/complete/06_joining_complete.html) |  \n| 4:00   | [End] |  \n\n"},"formats":{"html":{"identifier":{"display-name":"HTML","target-format":"html","base-format":"html"},"execute":{"fig-width":7,"fig-height":5,"fig-format":"retina","fig-dpi":96,"df-print":"default","error":false,"eval":true,"cache":null,"freeze":false,"echo":true,"output":true,"warning":true,"include":true,"keep-md":false,"keep-ipynb":false,"ipynb":null,"enabled":null,"daemon":null,"daemon-restart":false,"debug":false,"ipynb-filters":[],"ipynb-shell-interactivity":null,"plotly-connected":true,"engine":"markdown"},"render":{"keep-tex":false,"keep-typ":false,"keep-source":false,"keep-hidden":false,"prefer-html":false,"output-divs":true,"output-ext":"html","fig-align":"default","fig-pos":null,"fig-env":null,"code-fold":"none","code-overflow":"scroll","code-link":false,"code-line-numbers":false,"code-tools":false,"tbl-colwidths":"auto","merge-includes":true,"inline-includes":false,"preserve-yaml":false,"latex-auto-mk":true,"latex-auto-install":true,"latex-clean":true,"latex-min-runs":1,"latex-max-runs":10,"latex-makeindex":"makeindex","latex-makeindex-opts":[],"latex-tlmgr-opts":[],"latex-input-paths":[],"latex-output-dir":null,"link-external-icon":false,"link-external-newwindow":false,"self-contained-math":false,"format-resources":[],"notebook-links":true},"pandoc":{"standalone":true,"wrap":"none","default-image-extension":"png","to":"html","css":["styles.css"],"toc":true,"output-file":"index.html"},"language":{"toc-title-document":"Table of contents","toc-title-website":"On this page","related-formats-title":"Other Formats","related-notebooks-title":"Notebooks","source-notebooks-prefix":"Source","other-links-title":"Other Links","code-links-title":"Code Links","launch-dev-container-title":"Launch Dev Container","launch-binder-title":"Launch Binder","article-notebook-label":"Article Notebook","notebook-preview-download":"Download Notebook","notebook-preview-download-src":"Download Source","notebook-preview-back":"Back to Article","manuscript-meca-bundle":"MECA Bundle","section-title-abstract":"Abstract","section-title-appendices":"Appendices","section-title-footnotes":"Footnotes","section-title-references":"References","section-title-reuse":"Reuse","section-title-copyright":"Copyright","section-title-citation":"Citation","appendix-attribution-cite-as":"For attribution, please cite this work as:","appendix-attribution-bibtex":"BibTeX citation:","appendix-view-license":"View License","title-block-author-single":"Author","title-block-author-plural":"Authors","title-block-affiliation-single":"Affiliation","title-block-affiliation-plural":"Affiliations","title-block-published":"Published","title-block-modified":"Modified","title-block-keywords":"Keywords","callout-tip-title":"Tip","callout-note-title":"Note","callout-warning-title":"Warning","callout-important-title":"Important","callout-caution-title":"Caution","code-summary":"Code","code-tools-menu-caption":"Code","code-tools-show-all-code":"Show All Code","code-tools-hide-all-code":"Hide All Code","code-tools-view-source":"View Source","code-tools-source-code":"Source Code","tools-share":"Share","tools-download":"Download","code-line":"Line","code-lines":"Lines","copy-button-tooltip":"Copy to Clipboard","copy-button-tooltip-success":"Copied!","repo-action-links-edit":"Edit this page","repo-action-links-source":"View source","repo-action-links-issue":"Report an issue","back-to-top":"Back to top","search-no-results-text":"No results","search-matching-documents-text":"matching documents","search-copy-link-title":"Copy link to search","search-hide-matches-text":"Hide additional matches","search-more-match-text":"more match in this document","search-more-matches-text":"more matches in this document","search-clear-button-title":"Clear","search-text-placeholder":"","search-detached-cancel-button-title":"Cancel","search-submit-button-title":"Submit","search-label":"Search","toggle-section":"Toggle section","toggle-sidebar":"Toggle sidebar navigation","toggle-dark-mode":"Toggle dark mode","toggle-reader-mode":"Toggle reader mode","toggle-navigation":"Toggle navigation","crossref-fig-title":"Figure","crossref-tbl-title":"Table","crossref-lst-title":"Listing","crossref-thm-title":"Theorem","crossref-lem-title":"Lemma","crossref-cor-title":"Corollary","crossref-prp-title":"Proposition","crossref-cnj-title":"Conjecture","crossref-def-title":"Definition","crossref-exm-title":"Example","crossref-exr-title":"Exercise","crossref-ch-prefix":"Chapter","crossref-apx-prefix":"Appendix","crossref-sec-prefix":"Section","crossref-eq-prefix":"Equation","crossref-lof-title":"List of Figures","crossref-lot-title":"List of Tables","crossref-lol-title":"List of Listings","environment-proof-title":"Proof","environment-remark-title":"Remark","environment-solution-title":"Solution","listing-page-order-by":"Order By","listing-page-order-by-default":"Default","listing-page-order-by-date-asc":"Oldest","listing-page-order-by-date-desc":"Newest","listing-page-order-by-number-desc":"High to Low","listing-page-order-by-number-asc":"Low to High","listing-page-field-date":"Date","listing-page-field-title":"Title","listing-page-field-description":"Description","listing-page-field-author":"Author","listing-page-field-filename":"File Name","listing-page-field-filemodified":"Modified","listing-page-field-subtitle":"Subtitle","listing-page-field-readingtime":"Reading Time","listing-page-field-wordcount":"Word Count","listing-page-field-categories":"Categories","listing-page-minutes-compact":"{0} min","listing-page-category-all":"All","listing-page-no-matches":"No matching items","listing-page-words":"{0} words","listing-page-filter":"Filter","draft":"Draft"},"metadata":{"lang":"en","fig-responsive":true,"quarto-version":"1.5.56","theme":"sandstone","title":"Advanced R for Data Analysis","subtitle":"University of Utah"},"extensions":{"book":{"multiFile":true}}}},"projectFormats":["html"]}
\ No newline at end of file
diff --git a/.quarto/xref/ed6935ed b/.quarto/xref/ed6935ed
index 6004c8f..d91ba97 100644
--- a/.quarto/xref/ed6935ed
+++ b/.quarto/xref/ed6935ed
@@ -1 +1 @@
-{"headings":["general-information","schedule"],"entries":[]}
\ No newline at end of file
+{"entries":[],"headings":["general-information","posit-cloud","download-files-and-data","schedule"]}
\ No newline at end of file
diff --git a/content.zip b/content.zip
new file mode 100644
index 0000000..e21f868
Binary files /dev/null and b/content.zip differ
diff --git a/content/complete/.Rhistory b/content/complete/.Rhistory
index d0a4c8f..b08f20e 100644
--- a/content/complete/.Rhistory
+++ b/content/complete/.Rhistory
@@ -1,281 +1,396 @@
-# Chunk 4
-cor(cost_report$cost_charge_ratio, cost_report$patient_revenue_per_bed)
-# Chunk 5
-cost_charge_ratio_model1 <- lm(cost_charge_ratio ~ patient_revenue_per_bed,
-cost_report)
-summary(cost_charge_ratio_model1)
-# Chunk 6
-tibble(pred = cost_charge_ratio_model1$fitted.values,
-obs = cost_report$cost_charge_ratio) |>
-ggplot(aes(x = pred, y = obs)) +
-geom_point()
-# Chunk 7
-tibble(pred = cost_charge_ratio_model1$fitted.values,
-obs = cost_report$cost_charge_ratio) |>
-cor()
-# Chunk 8
-tibble(residual = cost_charge_ratio_model1$residuals,
-patient_revenue_per_bed = cost_report$patient_revenue_per_bed) |>
-ggplot(aes(x = patient_revenue_per_bed, y = residual)) +
-geom_point()
-# Chunk 9
-cost_charge_ratio_model2 <-
-lm(cost_charge_ratio ~ patient_revenue_per_bed + discharges_per_bed +
-employees_per_bed + salary_per_bed +
-patient_income_per_bed + patient_charges_per_bed +
-hospital_state,
-cost_report)
-summary(cost_charge_ratio_model2)
-# Chunk 10
-tibble(pred = cost_charge_ratio_model2$fitted.values,
-obs = cost_report$cost_charge_ratio) |>
-ggplot(aes(x = pred, y = obs)) +
-geom_point()
-# Chunk 11
-tibble(pred = cost_charge_ratio_model2$fitted.values,
-obs = cost_report$cost_charge_ratio) |>
-cor()
-# Chunk 12
-tibble(residual = cost_charge_ratio_model2$residuals,
-patient_revenue_per_bed = cost_report$patient_revenue_per_bed) |>
-ggplot(aes(x = patient_revenue_per_bed, y = residual)) +
-geom_point()
-summary(cost_charge_ratio_model2)
-round(coef(cost_charge_ratio_model2)["patient_revenue_per_bed"], 3)
-cost_charge_ratio_model2$coefficients["patient_revenue_per_bed"]
-cost_charge_ratio_model2 <-
-lm(cost_charge_ratio ~ patient_revenue_per_bed + discharges_per_bed +
-employees_per_bed + salary_per_bed +
-patient_income_per_bed + patient_charges_per_bed,
-cost_report)
-summary(cost_charge_ratio_model2)
-cost_report
-cost_report |> count(facility_type)
+cost_report |>
+filter(cost_charge_ratio == max(cost_charge_ratio)) |>
+select(hospital_name, hospital_city, hospital_state, cost_charge_ratio)
+# Chunk 1
+#| message: false
 library(tidyverse)
-hospital_data <- read_csv("hospital_data/CostReport_2021_Final.csv")
-setwd("~/Library/CloudStorage/Box-Box/teaching/headlamp/statistics/assessments/project_prep")
+cost_report <- read_csv("hospital_cost_report.csv")
+setwd("~/Library/CloudStorage/Box-Box/teaching/headlamp/statistics/assessments/1_inference_fundamentals/inference_fundamentals_project_solutions")
+# Chunk 1
+#| message: false
 library(tidyverse)
-hospital_data <- read_csv("hospital_data/CostReport_2021_Final.csv")
-colnames(hospital_data)
-hospital_data_clean <- hospital_data |>
-select(hospital_name = `Hospital Name`,
-hospital_address = `Street Address`,
-hospital_city = `City`,
-hospital_state = `State Code`,
-hospital_zip = `Zip Code`,
-hospital_county = `County`,
-rural_urban = `Rural Versus Urban`,
-facility_type = `CCN Facility Type`,
-full_time_employees = `FTE - Employees on Payroll`,
-total_salary = `Total Salaries From Worksheet A`,
-beds = `Number of Beds`,
-discharges = `Total Discharges (V + XVIII + XIX + Unknown)`,
-patient_charges = `Combined Outpatient + Inpatient Total Charges`,
-patient_revenue = `Total Patient Revenue`,
-income_patient_service = `Net Income from Service to Patients`,
-cost_charge_ratio = `Cost To Charge Ratio`) |>
-filter(!(hospital_state %in% c("AS", "PR", "GU", "VI", "DC")),
-cost_charge_ratio < 10,
-facility_type == "STH") |>
-select(-facility_type)
-set.seed(8328)
-hospital_data_sample <- hospital_data_clean |>
-drop_na(discharges, full_time_employees, beds, total_salary, patient_revenue, patient_charges) |>
-sample_n(500)
-write_csv(hospital_data_sample, "hospital_cost_report.csv")
-summary(lm(cost_charge_ratio ~ rural_urban +
-discharges +
-full_time_employees + discharges,
-hospital_data_sample))
-urban <- hospital_data_sample |>
+cost_report <- read_csv("hospital_cost_report.csv")
+mean(cost_report$cost_charge_ratio)
+sd(cost_report$cost_charge_ratio)
+cost_report |>
+ggplot() +
+geom_histogram(aes(x = cost_charge_ratio),
+col = "white") +
+geom_vline(xintercept = 1)
+cost_report |>
+filter(cost_charge_ratio == max(cost_charge_ratio)) |>
+select(hospital_name, hospital_city, hospital_state, cost_charge_ratio)
+cost_report |>
+filter(cost_charge_ratio == min(cost_charge_ratio)) |>
+select(hospital_name, hospital_city, hospital_state, cost_charge_ratio)
+cost_report |>
+mutate(discharges_per_bed = discharges / beds) |>
+ggplot(aes(x = rural_urban, y = discharges_per_bed)) +
+geom_boxplot()
+rural_discharges_per_bed <- cost_report |>
 filter(rural_urban == "R") |>
 mutate(discharges_per_bed = discharges / beds) |>
 pull(discharges_per_bed)
-rural <- hospital_data_sample |>
+urban_discharges_per_bed <- cost_report |>
 filter(rural_urban == "U") |>
 mutate(discharges_per_bed = discharges / beds) |>
 pull(discharges_per_bed)
-mean(urban)
-mean(rural)
-t.test(urban, rural)
-urban <- hospital_data_sample |>
+t.test(rural_discharges_per_bed, urban_discharges_per_bed)
+cost_report |>
+mutate(employees_per_bed = full_time_employees / beds) |>
+ggplot(aes(x = rural_urban, y = employees_per_bed)) +
+geom_boxplot()
+t.test(rural_employees_per_bed, urban_employees_per_bed, alternative = "greater")
+rural_employees_per_bed <- cost_report |>
 filter(rural_urban == "R") |>
 mutate(discharges_per_bed = full_time_employees / beds) |>
 pull(discharges_per_bed)
-rural <- hospital_data_sample |>
+urban_employees_per_bed <- cost_report |>
 filter(rural_urban == "U") |>
 mutate(discharges_per_bed = full_time_employees / beds) |>
 pull(discharges_per_bed)
-mean(urban)
-mean(rural, na.rm = T)
-t.test(urban, rural)
-urban <- hospital_data_sample |>
+t.test(rural_employees_per_bed, urban_employees_per_bed, alternative = "greater")
+cost_report |>
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+ggplot(aes(x = rural_urban, y = patient_charges_revenue_ratio)) +
+geom_boxplot()
+t.test(rural_patient_charges_revenue_ratio, urban_patient_charges_revenue_ratio)
+rural_patient_charges_revenue_ratio <- cost_report |>
 filter(rural_urban == "R") |>
-mutate(discharges_per_bed = patient_charges / patient_revenue) |>
-pull(discharges_per_bed)
-rural <- hospital_data_sample |>
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+pull(patient_charges_revenue_ratio)
+urban_patient_charges_revenue_ratio <- cost_report |>
 filter(rural_urban == "U") |>
-mutate(discharges_per_bed = patient_charges / patient_revenue) |>
-pull(discharges_per_bed)
-mean(urban, na.rm = T)
-mean(rural, na.rm = T)
-t.test(urban, rural)
-cost_report <- read_csv("hospital_cost_report.csv")
-mean(cost_report$cost_charge_ratio)
-cost_report |>
-ggplot() +
-geom_histogram(aes(x = cost_charge_ratio))
-cost_report |>
-ggplot() +
-geom_histogram(aes(x = cost_charge_ratio),
-col = "white") +
-geom_vline(xintercept = 1)
-cost_report |>
-filter(cost_charge_ratio < 0.1)
-cost_report |>
-filter(cost_charge_ratio < 0.01)
-cost_report |>
-filter(cost_charge_ratio < 0.05)
-cost_report |>
-filter(cost_charge_ratio < 0.05) |>
-select(hospital_name, hospital_city, hospital_state)
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+pull(patient_charges_revenue_ratio)
+rural_patient_charges_revenue_ratio <- cost_report |>
+filter(rural_urban == "R") |>
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+pull(patient_charges_revenue_ratio)
+urban_patient_charges_revenue_ratio <- cost_report |>
+filter(rural_urban == "U") |>
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+pull(patient_charges_revenue_ratio)
+t.test(rural_patient_charges_revenue_ratio, urban_patient_charges_revenue_ratio)
+# recreate the plot without the outlier
 cost_report |>
-filter(cost_charge_ratio == max(cost_charge_ratio)) |>
-select(hospital_name, hospital_city, hospital_state)
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+filter(patient_charges_revenue_ratio < 1.5) |>
+ggplot(aes(x = rural_urban, y = patient_charges_revenue_ratio)) +
+geom_boxplot()
+# reconduct the test without the outlier
+rural_patient_charges_revenue_ratio <- cost_report |>
+filter(rural_urban == "R") |>
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+filter(patient_charges_revenue_ratio < 1.5) |>
+pull(patient_charges_revenue_ratio)
+urban_patient_charges_revenue_ratio <- cost_report |>
+filter(rural_urban == "U") |>
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+filter(patient_charges_revenue_ratio < 1.5) |>
+pull(patient_charges_revenue_ratio)
+t.test(rural_patient_charges_revenue_ratio, urban_patient_charges_revenue_ratio)
 cost_report |>
-filter(cost_charge_ratio == min(cost_charge_ratio)) |>
-select(hospital_name, hospital_city, hospital_state)
-hospital_Data
-hospital_data
-sd(hospital_data$`Cost To Charge Ratio`)
-sd(hospital_data$`Cost To Charge Ratio`, na.rm = T)
-mean(hospital_data$`Cost To Charge Ratio`, na.rm = T)
-median(hospital_data$`Cost To Charge Ratio`, na.rm = T)
-mean(cost_report$cost_charge_ratio)
-sd(hospital_data$`Cost To Charge Ratio`, na.rm = T)
-hospital_data_clean <- hospital_data |>
-select(hospital_name = `Hospital Name`,
-hospital_address = `Street Address`,
-hospital_city = `City`,
-hospital_state = `State Code`,
-hospital_zip = `Zip Code`,
-hospital_county = `County`,
-rural_urban = `Rural Versus Urban`,
-facility_type = `CCN Facility Type`,
-full_time_employees = `FTE - Employees on Payroll`,
-total_salary = `Total Salaries From Worksheet A`,
-beds = `Number of Beds`,
-discharges = `Total Discharges (V + XVIII + XIX + Unknown)`,
-patient_charges = `Combined Outpatient + Inpatient Total Charges`,
-patient_revenue = `Total Patient Revenue`,
-income_patient_service = `Net Income from Service to Patients`,
-cost_charge_ratio = `Cost To Charge Ratio`) |>
-filter(!(hospital_state %in% c("AS", "PR", "GU", "VI", "DC")),
-cost_charge_ratio < 10,
-facility_type == "STH") |>
-select(-facility_type)
-mean(hospital_data_clean$cost_charge_ratio)
-sd(hospital_data_clean$cost_charge_ratio)
-mean(cost_report$cost_charge_ratio)
-sd(cost_report$cost_charge_ratio)
-simulated_means <- rnorm(1000, 0.26, 0.15 / sqrt(500))
-n <- nrow(cost_report)
-pop_mean <- 0.26
-pop_sd <- 0.15
-n <- nrow(cost_report)
-simulated_means <- rnorm(1000, pop_mean, pop_sd / sqrt(n))
-ggplot() +
-geom_histogram(aes(x = simulated_means))
-simulated_means <- rnorm(1000, mean = pop_mean, sd = pop_sd / sqrt(n))
-ggplot() +
-geom_histogram(aes(x = simulated_means))
+mutate(patient_charges_revenue_ratio = patient_charges / patient_revenue) |>
+filter(patient_charges_revenue_ratio > 1.5) |>
+select(hospital_name, hospital_state, beds, patient_charges, patient_revenue, patient_charges_revenue_ratio)
+cost_report |> sample_n(10) |> pull(patient_revenue)
+# pull up the help page for n_distinct
+?n_distinct
+n_distinct(demographics$gender)
 # Chunk 1
-# define a function called `add_one()` that adds 1 to its argument, x
-add_one <- function(x) {
-x + 1
-}
+# load the tidyverse and demographics dataset
+library(tidyverse)
+demographics <- read_csv("data/demographics.csv")
+# apply n_distinct to the "gender" column of demographics
+n_distinct(demographics$gender)
+setwd("~/Library/CloudStorage/Box-Box/teaching/live_workshops_utah/2024-10-29-adv_r/website/content/complete")
+# Chunk 1
+# load the tidyverse and demographics dataset
+library(tidyverse)
+demographics <- read_csv("data/demographics.csv")
 # Chunk 2
-# apply function add_one() to 5
-add_one(5)
+# pull up the help page for n_distinct
+?n_distinct
+# apply n_distinct to the "gender" column of demographics
+n_distinct(demographics$gender)
+# apply n_distinct to the "gender" column of demographics
+n_distinct(demographics$gender, na.rm = FALSE)
+# apply n_distinct to the "gender" column of demographics
+n_distinct(demographics$gender, na.rm = TRUE)
+# extract the third entry from my_list
+my_list[3]
+# Chunk 1
+# load the tidyverse and demographics dataset
+library(tidyverse)
+demographics <- read_csv("data/demographics.csv")
+# Chunk 2
+# pull up the help page for n_distinct
+?n_distinct
 # Chunk 3
-# apply function add_one() to 5 using a *named argument*
-add_one(x = 5)
+# apply n_distinct to the "gender" column of demographics
+n_distinct(demographics$gender)
 # Chunk 4
-# re-define add_one() with two lines of code: x - 1 and x + 1
-add_one <- function(x) {
-x - 1
-x + 1
-}
+# use map() to apply n_distinct to every column of demographics
+map(demographics, n_distinct)
 # Chunk 5
-# apply add_one to 5 again
-add_one(5)
+# use map() to apply exp() to the vector c(4, 5, 6)
+map(c(4, 5, 6), exp)
 # Chunk 6
-# redefine add_one() but apply the return statement to x - 1 only
-add_one <- function(x) {
-return(x - 1)
-x + 1
-}
-# Chunk 7
-# apply add_one to 5 again
-add_one(5)
-# Chunk 9
-# define function cube()
-cube <- function(value) {
-cubed_value = value^3
-return(cubed_value)
-}
-# apply cube() to 3
-cube(value = 3)
-add_xy()
-add_xy
-add_x2y
+# define a list called my_list with three elements: 1, 4, and 7
+my_list <- list(1, 4, 7)
+type(my_list[3])
+class(my_list[3])
+# ask the class of the object above
+class(my_list[3])
+my_list[[3]]
+# extract the third entry from my_list using []
+my_list[3]
+my_list[[3]]
+# try to add 1 to my_list
+my_list + 1
+# create a list containing (1) the head of demographics, (2) the value 2, and (3) a vector containing "a" and "b"
+my_complex_list <- list(head(demographics), 2, c("a", "b"))
+my_complex_list
+# create a named version of my_complex_list
+my_complex_list <- list(data = head(demographics),
+value = 2,
+vector = c("a", "b"))
+my_complex_list
+# extract one of the elements from my_complex_list using $
+my_complex_list$data
+my_complex_list[["data"]]
+demographics_class <- map(demographics, class)
+demographics_class
+demographics_class$household_income
+demographics_class[ncol(demographics)]
+demographics_class["household_income"]
+# compute the number of missing values in the "pregnant" column of demographics
+sum(is.na(demographics$pregnant))
+# use a map function to compute the number of missing values in every column
+map(demographics, ~{sum(is.na(.))})
+map_df(demographics, class)
+# apply map_df to demographics to determine the class of each column, outputting a "wide" data frame
+map_df(demographics, class)
+tibble(col_class = class(demographics$gender))
+tibble(col_class = class(demographics$pregnant))
+# use tibble() to create a single-column tibble containing the class of demographics$pregnant
+tibble(col_class = class(demographics$pregnant))
+# modify this code to be the function call in map_df to create a
+# long-form data frame
+map_df(demographics, ~tibble(col_class = class(.)))
+# provide an .id argument to include the original column names as a variable
+map_df(demographics, ~tibble(col_class = class(.)), .id = "variable_name")
+map_df(demographics,
+~tibble(n_missing = sum(is.na(.))),
+.id = "variable_name")
+# create a bar chart of the number of missing values in each column
+map_df(demographics,
+~tibble(n_missing = sum(is.na(.))),
+.id = "variable_name") |>
+arrange(n_missing) |>
+mutate(variable_name = fct_inorder(variable_name)) |>
+ggplot() +
+geom_col(aes(x = variable_name, y = n_missing)) +
+scale_y_continuous(expand = c(0, 0)) +
+theme_classic() +
+theme(axis.text.x = element_text(angle = 90,
+hjust = 1,
+vjust = 0.5))
+# create a bar chart of the number of missing values in each column
+# use factors to order the columns by the number of missing values
+map_df(demographics,
+~tibble(n_missing = sum(is.na(.))),
+.id = "variable_name") |>
+arrange(n_missing) |>
+mutate(variable_name = fct_inorder(variable_name)) |>
+ggplot() +
+geom_col(aes(x = variable_name, y = n_missing)) +
+scale_y_continuous(expand = c(0, 0)) +
+theme_classic() +
+theme(axis.text.x = element_text(angle = 90,
+hjust = 1,
+vjust = 0.5))
+map_dbl(demographics,
+~sum(is.na(.)))
+map_dbl(demographics,
+~sum(is.na(.)))  |>
+enframe()
+map_dbl(demographics, ~sum(is.na(.))) |>
+enframe()
+# apply toupper to "abc"
+toupper("abc")
+# one way to convert all character columns to uppercase
+# create a logical vector for identifying character columns using map_chr()
+character_cols <- map_chr(demographics, class) == "character"
+character_cols
+# look at demographics
+demographics
+# create a logical vector for identifying character columns using map_chr()
+character_cols <- map_chr(demographics, class) == "character"
+# re-assign the character columns to an upper-case version of themselves
+# using map_df() applied to demographics[character_cols] with toupper()
+demographics[character_cols] <- map_df(demographics[character_cols], toupper)
+# look at demographics
+demographics
+# apply toupper to interview_examination with mutate
+demographics |> mutate(interview_examination = toupper(interview_examination))
 # Chunk 1
-# define a function called `add_one()` that adds 1 to its argument, x
-add_one <- function(x) {
-x + 1
-}
+# load the tidyverse and demographics dataset
+library(tidyverse)
+demographics <- read_csv("data/demographics.csv")
+head(demographics)
 # Chunk 2
-# apply function add_one() to 5
-add_one(5)
+# apply toupper() to the string value "abc"
+toupper("abc")
 # Chunk 3
-# apply function add_one() to 5 using a *named argument*
-add_one(x = 5)
-# Chunk 4
-# re-define add_one() with two lines of code: x - 1 and x + 1
-add_one <- function(x) {
-x - 1
-x + 1
-}
-# Chunk 5
-# apply add_one to 5 again
-add_one(5)
-# Chunk 6
-# redefine add_one() but apply the return statement to x - 1 only
-add_one <- function(x) {
-return(x - 1)
-x + 1
-}
-# Chunk 7
-# apply add_one to 5 again
-add_one(5)
-# Chunk 9
-# define function cube()
-cube <- function(value) {
-cubed_value = value^3
-return(cubed_value)
-}
-# Chunk 10
-# apply cube() to 3
-cube(value = 3)
-# Chunk 11
-# define a function called add_x2y that computes x + 2y
-add_x2y <- function(x, y) {
-x + 2*y
-}
-# Chunk 12
-# apply add_x2y() to x = 2 and y = 5
-add_xy(2, 5)
-add_x2y
+# create a copy of demographics
+demographics_tmp <- demographics
+# create a logical vector for identifying character columns using map_chr()
+character_cols <- map_chr(demographics_tmp, class) == "character"
+# re-assign the character columns to an upper-case version of themselves
+# using map_df() applied to demographics_tmp[character_cols] with toupper()
+demographics_tmp[character_cols] <- map_df(demographics_tmp[character_cols], toupper)
+# look at demographics_tmp
+demographics_tmp
+# look at demographics (it should still have lowercase)
+demographics
+# create a new version of demographics
+demographics_tmp <- demographics
+# apply toupper to interview_examination with mutate
+demographics_tmp |> mutate(interview_examination = toupper(interview_examination))
+# use across inside mutate to apply toupper() to all columns where(is.character) is TRUE...
+demographics |> mutate(across(where(is.character), toupper))
+# use across inside mutate to apply as.numeric to all columns
+# that ends_with("_usa")...
+demographics |> mutate(across(ends_with("_usa"), as.numeric)) |>
+print(width = Inf)
+# select all columns that contain "age"
+demographics |> select(contains("age"))
+# select all columns that are logical
+demographics |> select(where(is.logical))
+demographics |>
+mutate(completed_high_school = if_else(
+education >= 3,
+true = "yes",
+false = "no")) |>
+select(education, completed_high_school) |>
+sample_n(10)
+# apply count() to the marital_status column
+demographics |> count(marital_status)
+# use mutate and case_when() to create marital_status_numeric
+# with the values above
+demographics |> mutate(marital_status_numeric = case_when(
+marital_status == "married" ~ 3,
+marital_status == "living_with_partner" ~ 2,
+marital_status %in% c("divorced", "widowed", "never_married", "separated") ~ 1)) |>
+select(marital_status, marital_status_numeric)
+# Chunk 1
+# load the tidyverse and the urine_albumin_creatinine NHANES data
+library(tidyverse)
+labs_data <- read.csv("data/urine_albumin_creatinine.csv")
+setwd("~/Library/CloudStorage/Box-Box/teaching/live_workshops_utah/2024-10-29-adv_r/website/content/complete")
+labs_data <- read.csv("data/urine_albumin_creatinine.csv")
+labs_data <- read.csv("data/urine_albumin_creatinine.csv")
+# take a look at the first 10 rows
+head(labs_data, 10)
+# use pivot_wider() to convert the labs data from a
+# long format to a wide format.
+# arguments are `names_from` and `values_from`
+labs_data_wide <- labs_data |>
+pivot_wider(names_from = "lab", values_from = "measurement")
+labs_data_wide
+# create boxplots using the long format for the measurements from each lab
+labs_data |> ggplot() +
+geom_boxplot(aes(x = lab, y = measurement))
+# create boxplots using the long format for the measurements from each lab
+labs_data |> ggplot() +
+geom_boxplot(aes(x = lab, y = measurement)) +
+scale_y_log10()
+demographics <- read_csv("data/demographics.csv")
+demographics
+demographics |> colnames()
+demographics |> pivot_longer(cols = starts_with("education"))
+demographics |>
+pivot_longer(cols = starts_with("education")) |>
+select(respondent_id, starts_with("education"))
+demographics |>
+pivot_longer(cols = starts_with("education")) |>
+select(respondent_id, starts_with("education"))
+demographics |>
+select(respondent_id, starts_with("education")) |>
+pivot_longer(cols = starts_with("education"))
+demographics |>
+select(respondent_id, gender, age_years, race) |>
+pivot_longer(cols = any_of(gender, age_years, race))
+demographics |>
+select(respondent_id, gender, age_years, race) |>
+pivot_longer(cols = any_of(c("gender", "age_years", "race")))
+demographics |>
+select(respondent_id, gender, age_years, race) |>
+pivot_longer(cols = all_of(c("gender", "age_years", "race")))
+demographics |>
+select(respondent_id, gender, age_years, race) |>
+pivot_longer(cols = all_of(c("gender", "marital_status", "race")))
+demographics |>
+select(respondent_id, gender, marital_status, race) |>
+pivot_longer(cols = all_of(c("gender", "marital_status", "race")))
+n_unique(labs$respondent_id)
+unique(labs$respondent_id)
+n_distinct(labs$respondent_id)
+length(unique((labs$respondent_id))
+)
+length(unique(labs$respondent_id))
+labs
+# Chunk 1
+# load the tidyverse, the demographics, and the urine_albumin_creatinine data
+library(tidyverse)
+demographics <- read_csv("data/demographics.csv")
+labs <- read_csv("data/urine_albumin_creatinine.csv")
+# Compute the proportion of people in demographics who are also in labs
+sum(demographics$respondent_id %in% labs$respondent_id) / length(unique(labs$respondent_id))
+n_distinct(labs$respondent_id)
+# Compute the proportion of people in demographics who are also in labs
+sum(demographics$respondent_id %in% labs$respondent_id) / n_distinct(labs$respondent_id)
+# Compute the proportion of people in demographics who are also in labs
+sum(demographics$respondent_id %in% labs$respondent_id) / n_distinct(demographics$respondent_id)
+# proportion of people in labs who are also in demographics
+sum(labs$respondent_id %in% demographics$respondent_id) / n_distinct(labs$respondent_id)
+# proportion of people in labs who are also in demographics
+sum(labs$respondent_id %in% demographics$respondent_id) / n_distinct(labs$respondent_id)
+labs$respondent_id %in% demographics$respondent_id
+n_distinct(labs$respondent_id)
+nrow(labs)
+sum(labs$respondent_id %in% demographics$respondent_id)
+# proportion of people in labs who are also in demographics
+sum(unique(labs$respondent_id) %in% unique(demographics$respondent_id)) / n_distinct(labs$respondent_id)
+# Compute the proportion of people in demographics who are also in labs
+sum(unique(demographics$respondent_id) %in% unique(labs$respondent_id)) / n_distinct(demographics$respondent_id)
+# are the respondent ID values in demographics unique?
+identical(unique(demographics$respondent_id), demographics$respondent_id)
+# check whether the respondent ID values in labs are unique?
+identical(unique(labs$respondent_id), labs$respondent_id)
+# Compute the proportion of people in demographics who are also in labs
+sum(unique(demographics$respondent_id) %in% unique(labs$respondent_id)) / n_distinct(demographics$respondent_id)
+print(left_join(demographics, labs, by = "respondent_id"), width = Inf)
+print(full_join(demographics, labs, by = "respondent_id"), width = Inf)
+# check the number of rows in the joined data
+nrow(left_join(demographics, labs, by = "respondent_id"))
+nrow(demographics)
+# join demographics and labs_wide
+print(left_join(demographics, labs_wide, by = "respondent_id"), width = Inf)
+# compute the wide format of the labs data
+labs_wide <- labs |>
+pivot_wider(names_from = "lab",
+values_from = "measurement")
+labs_wide
+# join demographics and labs_wide
+print(left_join(demographics, labs_wide, by = "respondent_id"), width = Inf)
+# apply add_x2y() to x = 2 and y = 3 and output_type = "logical"
+#| error: true
+add_x2y(2, 3, output_type = "logical")
+# apply add_x2y() to x = 2 and y = 3 with no output_type specified
+add_x2y(2, 3)
 # Chunk 1
 # define a function called `add_one()` that adds 1 to its argument, x
 add_one <- function(x) {
@@ -333,180 +448,3 @@ add_x2y(y = 5, x = 2)
 # Chunk 16
 #| error: true
 y
-add_x2y <- function(x = 1, y = 1, output_type = c("numeric", "character")) {
-# this line will set the default value of output_type to be "numeric"
-# and will only allow options provided in the default vector
-output_type <- match.arg(output_type)
-# stop condition if x or y are not numeric
-if (!is.numeric(x) | !is.numeric(y)) {
-stop("'x' and 'y' must be numeric")
-}
-# computing my result
-result <- x + 2 * y
-# returning result in the format specified by output_type
-if (output_type == "numeric") {
-return(result)
-} else if (output_type == "character") {
-return(as.character(result))
-}
-}
-# apply add_x2y() to x = 2 and y = 3 and output_type = "character"
-add_x2y(x = 2, y = 3, output_type = "character")
-# apply add_x2y() to x = 2 and y = 3 with no output_type specified
-add_x2y(2, 3)
-#| error: true
-add_x2y(2, 3, output_type = "charcter")
-#| error: true
-add_x2y(2, 3, output_type = "blah")
-# apply add_x2y() to x = 2 and y = 3 and output_type = "logical"
-#| error: true
-add_x2y(2, 3, output_type = "logical")
-?geom_point
-# load in the tidyverse and demographics NHANES data
-library(tidyverse)
-demographics <- read_csv("data/demographics.csv")
-setwd("~/Library/CloudStorage/Box-Box/teaching/live_workshops_utah/2024-10-29-adv_r/website/content/complete")
-# load in the tidyverse and demographics NHANES data
-library(tidyverse)
-demographics <- read_csv("data/demographics.csv")
-setwd("~/Library/CloudStorage/Box-Box/teaching/live_workshops_utah/2024-10-29-adv_r/website/content/complete")
-demographics <- read_csv("data/demographics.csv")
-# take a look at the demographics data
-head(demographics)
-# compute boxplots for the age_years distribution across different levels of the served_active_duty_us variable
-demographics |> ggplot() +
-geom_boxplot(aes(x = served_active_duty_us,
-y = age_years))
-# compute boxplots for the age_years distribution across different levels of the served_active_duty_us variable
-demographics |> ggplot() +
-geom_boxplot(aes(x = served_active_duty_us,
-y = age_years), na.rm = TRUE)
-# turn the above boxplot code into a function called createBoxplots()
-createBoxplots <- function(variable_name) {
-demographics |> ggplot() +
-geom_boxplot(aes(x = variable_name,
-y = age_years))
-}
-# try to apply createBoxplots to the language_english column of demographics
-createBoxplots(language_english)
-createBoxplots("language_english")
-# update our createBoxplots() function so that it uses tidy_evaluation
-createBoxplots <- function(variable_name) {
-demographics |> ggplot() +
-geom_boxplot(aes(x = {{ variable_name }},
-y = age_years))
-}
-createBoxplots(language_english)
-# apply createBoxplots() to the marital_status column
-createBoxplots(marital_status)
-View(demographics)
-createBoxplots(pregnant)
-createBoxplots(marital_status) +
-theme(axis.text.x = element_text(angle = 90,
-hjust = 1,
-vjust = 0.5))
-library(patchwork)
-createBoxplots(language_english) + createBoxplots(marital_status)
-(createBoxplots(language_english) + createBoxplots(marital_status)) / createBoxplots(pregnant)
-(createBoxplots(language_english) + createBoxplots(pregnant)) /   createBoxplots(marital_status)
-createOrderedBars <- function(variable_name) {
-demographics |>
-# group by the column provided
-group_by({{ variable_name }}) |>
-# compute the mean age
-summarize(mean_age = mean(age_years)) |>
-# create the bar plot
-ggplot() +
-geom_col(aes(x = {{ variable_name }},
-y = mean_age))
-}
-createOrderedBars(marital_status)
-demographics |>
-# group by the column provided
-group_by(marital_status) |>
-# compute the mean age
-summarize(mean_age = mean(age_years))
-demographics |>
-group_by(marital_status) |>
-summarize(mean_age = mean(age_years)) |>
-ggplot() +
-geom_col()
-demographics |>
-group_by(marital_status) |>
-summarize(mean_age = mean(age_years)) |>
-ggplot() +
-geom_col(aes(x = marital_status,
-y = mean_age))
-createOrderedBars <- function(variable_name, ascending = TRUE) {
-mean_age <- demographics |>
-# group by the column provided
-group_by({{ variable_name }}) |>
-# compute the mean age
-summarize(mean_age = mean(age_years))
-if (ascending) {
-mean_age <- mean_age |>
-# arrange in increasing order of mean_age
-arrange(mean_age) |>
-# modify selected_variable so that it is a factor whose levels are in
-# increasing order of mean_age
-mutate(selected_variable = fct_inorder({{ variable_name }}))
-}
-# create the bar plot
-mean_age |>
-ggplot() +
-geom_col(aes(x = selected_variable,
-y = mean_age))
-}
-createOrderedBars(marital_status)
-createOrderedBars(marital_status, ascending = TRUE)
-createOrderedBars(marital_status, ascending = FALSE)
-createOrderedBars <- function(variable_name, ascending = TRUE) {
-mean_age <- demographics |>
-# group by the column provided
-group_by({{ variable_name }}) |>
-# compute the mean age
-summarize(mean_age = mean(age_years))
-if (ascending) {
-mean_age <- mean_age |>
-# arrange in increasing order of mean_age
-arrange(mean_age) |>
-# modify selected_variable so that it is a factor whose levels are in
-# increasing order of mean_age
-mutate(selected_variable = fct_inorder({{ variable_name }}))
-} else {
-mean_age <- mean_age |>
-mutate(selected_variable = variable_name)
-}
-# create the bar plot
-mean_age |>
-ggplot() +
-geom_col(aes(x = selected_variable,
-y = mean_age))
-}
-createOrderedBars(marital_status, ascending = FALSE)
-createOrderedBars <- function(variable_name, ascending = TRUE) {
-mean_age <- demographics |>
-# group by the column provided
-group_by({{ variable_name }}) |>
-# compute the mean age
-summarize(mean_age = mean(age_years))
-if (ascending) {
-mean_age <- mean_age |>
-# arrange in increasing order of mean_age
-arrange(mean_age) |>
-# modify selected_variable so that it is a factor whose levels are in
-# increasing order of mean_age
-mutate(selected_variable = fct_inorder({{ variable_name }}))
-} else {
-mean_age <- mean_age |>
-mutate(selected_variable = {{ variable_name }})
-}
-# create the bar plot
-mean_age |>
-ggplot() +
-geom_col(aes(x = selected_variable,
-y = mean_age))
-}
-createOrderedBars(marital_status, ascending = TRUE)
-createOrderedBars(marital_status, ascending = FALSE)
-createOrderedBars(marital_status)
diff --git a/docs/content.zip b/docs/content.zip
new file mode 100644
index 0000000..e21f868
Binary files /dev/null and b/docs/content.zip differ
diff --git a/docs/index.html b/docs/index.html
index 3d956a4..1bc615e 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -108,7 +108,11 @@
     <h2 id="toc-title">On this page</h2>
    
   <ul>
-  <li><a href="#general-information" id="toc-general-information" class="nav-link active" data-scroll-target="#general-information">General Information</a></li>
+  <li><a href="#general-information" id="toc-general-information" class="nav-link active" data-scroll-target="#general-information">General Information</a>
+  <ul class="collapse">
+  <li><a href="#posit-cloud" id="toc-posit-cloud" class="nav-link" data-scroll-target="#posit-cloud">Posit Cloud</a></li>
+  <li><a href="#download-files-and-data" id="toc-download-files-and-data" class="nav-link" data-scroll-target="#download-files-and-data">Download files and data</a></li>
+  </ul></li>
   <li><a href="#schedule" id="toc-schedule" class="nav-link" data-scroll-target="#schedule">Schedule</a></li>
   </ul>
 </nav>
@@ -159,6 +163,14 @@ <h2 class="anchored" data-anchor-id="general-information">General Information</h
 <p>Participants in this workshop should already be comfortable working in Quarto (or R Markdown) documents in RStudio, and have basic knowledge of the the R programming language and the tidyverse, including the dplyr select, filter, mutate, and summarize functions, as well as creating basic visualizations using ggplot2.</p>
 <p><strong>Requirements</strong>: Participants must bring a laptop onto which they can <a href="https://posit.co/download/rstudio-desktop/">download R and Rstudio</a> (and you should do so before the workshop).</p>
 <p><strong>Contact</strong>: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information.</p>
+<section id="posit-cloud" class="level3">
+<h3 class="anchored" data-anchor-id="posit-cloud">Posit Cloud</h3>
+<p><a href="https://posit.cloud/spaces/569320/join?access_code=wEBdUgVZCTVK1UwPUOvHG9EnMXi2NznUTO1uHruB">Click here to join the Posit Cloud workspace</a></p>
+</section>
+<section id="download-files-and-data" class="level3">
+<h3 class="anchored" data-anchor-id="download-files-and-data">Download files and data</h3>
+<p>If you are working in RStudio locally (rather than using posit cloud, above) <a href="content.zip" title="download">click here</a> to download all of the complete and incomplete .qmd files and data files we will be using throughout the workshop.</p>
+</section>
 </section>
 <section id="schedule" class="level2">
 <h2 class="anchored" data-anchor-id="schedule">Schedule</h2>
diff --git a/docs/search.json b/docs/search.json
index c0b40fa..022a2fe 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -11,7 +11,7 @@
     "href": "index.html#general-information",
     "title": "Advanced R for Data Analysis",
     "section": "General Information",
-    "text": "General Information\nWhat: This workshop will build upon the foundations covered in the Introduction to R workshop, and will introduce more advanced tools, including creating your own custom functions, iterating with purrr map functions, advanced data manipulations with dplyr, and reshaping and joining datasets in R.\nWho: The course is aimed at graduate students, postdocs, staff, faculty, and other researchers across campus who are interested in taking their R skills to the next level to learn how to conduct more sophisticated data manipulations and analysis.\nParticipants in this workshop should already be comfortable working in Quarto (or R Markdown) documents in RStudio, and have basic knowledge of the the R programming language and the tidyverse, including the dplyr select, filter, mutate, and summarize functions, as well as creating basic visualizations using ggplot2.\nRequirements: Participants must bring a laptop onto which they can download R and Rstudio (and you should do so before the workshop).\nContact: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information."
+    "text": "General Information\nWhat: This workshop will build upon the foundations covered in the Introduction to R workshop, and will introduce more advanced tools, including creating your own custom functions, iterating with purrr map functions, advanced data manipulations with dplyr, and reshaping and joining datasets in R.\nWho: The course is aimed at graduate students, postdocs, staff, faculty, and other researchers across campus who are interested in taking their R skills to the next level to learn how to conduct more sophisticated data manipulations and analysis.\nParticipants in this workshop should already be comfortable working in Quarto (or R Markdown) documents in RStudio, and have basic knowledge of the the R programming language and the tidyverse, including the dplyr select, filter, mutate, and summarize functions, as well as creating basic visualizations using ggplot2.\nRequirements: Participants must bring a laptop onto which they can download R and Rstudio (and you should do so before the workshop).\nContact: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information.\n\nPosit Cloud\nClick here to join the Posit Cloud workspace\n\n\nDownload files and data\nIf you are working in RStudio locally (rather than using posit cloud, above) click here to download all of the complete and incomplete .qmd files and data files we will be using throughout the workshop."
   },
   {
     "objectID": "index.html#schedule",
diff --git a/index.qmd b/index.qmd
index 564bd4c..ac94d99 100644
--- a/index.qmd
+++ b/index.qmd
@@ -32,6 +32,18 @@ Participants in this workshop should already be comfortable working in Quarto (o
 
 **Contact**: Please email andrew.george@hsc.utah.edu or rebecca.barter@hsc.utah.edu for more information.
 
+
+### Posit Cloud
+
+[Click here to join the Posit Cloud workspace](https://posit.cloud/spaces/569320/join?access_code=wEBdUgVZCTVK1UwPUOvHG9EnMXi2NznUTO1uHruB)
+
+### Download files and data
+
+If you are working in RStudio locally (rather than using posit cloud, above) [click here](content.zip "download") to download all of the complete and incomplete .qmd files and data files we will be using throughout the workshop.
+
+
+
+
 ## Schedule
 
 Note that the schedule below serves as a guideline. The start, end, and break times are fixed, but timing for each topics covered may vary as we may go faster or slower through the content.