Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pilot (A) for analytic workflow #5

Open
andkov opened this issue Mar 26, 2016 · 1 comment
Open

pilot (A) for analytic workflow #5

andkov opened this issue Mar 26, 2016 · 1 comment

Comments

@andkov
Copy link
Member

andkov commented Mar 26, 2016

@smhofer prosed the following plan for the reproducible report(s):

  • Section 1: Read in each of five data sets
  • Section 2: Relabel and transform variables (organized by data set; as discussed yesterday)
  • Section 3: Combine into single data set (include study level dummy variables)
  • Section 4: Estimate models (ever smoked as primary outcome to start; logistic regression)
  • Section 5: Table results and compute odds ratio for covariates
@andkov andkov changed the title Meta-data manholes in data grooming work flow pilot (A) for analytic workflow Mar 26, 2016
andkov added a commit that referenced this issue Mar 26, 2016
andkov added a commit that referenced this issue Mar 26, 2016
@andkov
Copy link
Member Author

andkov commented Mar 26, 2016

@smhofer , here is my commentary on your five sections. I need to introduce a slight modification to account for the way the scripts actually deal with the data. Specifically, I suggest implementing the processes in Section 2 and 3 for each set of harmonized variables separately. It's more practical to organize it this way and it will not change the end result of Section 3 : creation of a combined data set.

The script ./manipulation/0-ellis-island.r produces a working report ./manipulation/stitched-output/0-ellis-island.md. This report accomplishes accomplished Section (1), (2b), (3a). I've copiously annotated it and it's meant to be a part of the live documentation. This is where one will go to find out how specifically the processes in section (1), (2a), and (3a) have been implemented.

Note that Section (2a) is accomplished outside of R by editing the file ./data/shared/meta-data-map.csv. I don't think it's a good idea for projects like these to conduct renaming by hand in script. This is my biggest lesson learned from Portland, so I'd like to gently insist on this.

I'm moving on to developing the scripts to implement Section (3c) for smoking.

  • Section (1)
    • Read in each of five data sets, extract raw metadata
  • Section (2)
    • (2a): Edit and augment metadata to provide instructions on how to relabel, classify, and transform variables
    • (2b): Create a single data object dto containing unmerged, raw unit data from each study and a single metadata file containing metadata for variables from all studies. (e.g. what type of type of variable that is, how the variable should be renamed, etc..)
  • Section (3)
    • (3a): Using unit and meta data from the main data object (dto) create datasets that aggregate variables with shared properties of the metadata (e.g. "all variables that have smoking for the value of the construct column in the metadata set).
    • (3b): For each unit of harmonization (e.g. smoking, education, ect.) transform the raw variables in corresponding dataset to create harmonized variables. Evaluate each harmonized variable separately. (managing a large, combined file during harmonization is inconvenient. in addition, there might be a need/interest to inspect individual files during the process. this makes it easier to provision)
    • (3c) : collect transformed datasets containing harmonized variables and transform them into a single data file, with study_name as a factor.
  • Section (4)
    • Estimate models (this will need a bit more specifics, but I think they will emerge to us as we complete Section 3)
  • Section (5)
    • Organize model outputs to evaluate across studies, outcomes, and covariates (this potentially is a bottomless pit, greater specifics will be crucial. It's hard to comment on this step without knowing what the model results will look like. )

andkov added a commit that referenced this issue Mar 26, 2016
@wibeasley , could you offer some comment on the chunk
`generate-report`. I"m styling it after your report for early Portland
and missing something. This is no rush though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant