pilot (A) for analytic workflow #5

andkov · 2016-03-26T15:00:45Z

@smhofer prosed the following plan for the reproducible report(s):

Section 1: Read in each of five data sets
Section 2: Relabel and transform variables (organized by data set; as discussed yesterday)
Section 3: Combine into single data set (include study level dummy variables)
Section 4: Estimate models (ever smoked as primary outcome to start; logistic regression)
Section 5: Table results and compute odds ratio for covariates

andkov · 2016-03-26T21:58:53Z

@smhofer , here is my commentary on your five sections. I need to introduce a slight modification to account for the way the scripts actually deal with the data. Specifically, I suggest implementing the processes in Section 2 and 3 for each set of harmonized variables separately. It's more practical to organize it this way and it will not change the end result of Section 3 : creation of a combined data set.

The script ./manipulation/0-ellis-island.r produces a working report ./manipulation/stitched-output/0-ellis-island.md. This report accomplishes accomplished Section (1), (2b), (3a). I've copiously annotated it and it's meant to be a part of the live documentation. This is where one will go to find out how specifically the processes in section (1), (2a), and (3a) have been implemented.

Note that Section (2a) is accomplished outside of R by editing the file ./data/shared/meta-data-map.csv. I don't think it's a good idea for projects like these to conduct renaming by hand in script. This is my biggest lesson learned from Portland, so I'd like to gently insist on this.

I'm moving on to developing the scripts to implement Section (3c) for smoking.

Section (1)
- Read in each of five data sets, extract raw metadata
Section (2)
- (2a): Edit and augment metadata to provide instructions on how to relabel, classify, and transform variables
- (2b): Create a single data object dto containing unmerged, raw unit data from each study and a single metadata file containing metadata for variables from all studies. (e.g. what type of type of variable that is, how the variable should be renamed, etc..)
Section (3)
- (3a): Using unit and meta data from the main data object (dto) create datasets that aggregate variables with shared properties of the metadata (e.g. "all variables that have smoking for the value of the construct column in the metadata set).
- (3b): For each unit of harmonization (e.g. smoking, education, ect.) transform the raw variables in corresponding dataset to create harmonized variables. Evaluate each harmonized variable separately. (managing a large, combined file during harmonization is inconvenient. in addition, there might be a need/interest to inspect individual files during the process. this makes it easier to provision)
- (3c) : collect transformed datasets containing harmonized variables and transform them into a single data file, with study_name as a factor.
Section (4)
- Estimate models (this will need a bit more specifics, but I think they will emerge to us as we complete Section 3)
Section (5)
- Organize model outputs to evaluate across studies, outcomes, and covariates (this potentially is a bottomless pit, greater specifics will be crucial. It's hard to comment on this step without knowing what the model results will look like. )

@wibeasley

@wibeasley , could you offer some comment on the chunk `generate-report`. I"m styling it after your report for early Portland and missing something. This is no rush though.

andkov changed the title ~~Meta-data manholes in data grooming work flow~~ pilot (A) for analytic workflow Mar 26, 2016

andkov added a commit that referenced this issue Mar 26, 2016

#5 stable ellis ready

c541981

andkov added a commit that referenced this issue Mar 26, 2016

#5 update

c9b8b2e

andkov added a commit that referenced this issue Mar 26, 2016

#5

63037ae

@wibeasley , could you offer some comment on the chunk `generate-report`. I"m styling it after your report for early Portland and missing something. This is no rush though.

andkov mentioned this issue Apr 4, 2016

2016-04-04 with A.Piccinin #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pilot (A) for analytic workflow #5

pilot (A) for analytic workflow #5

andkov commented Mar 26, 2016

andkov commented Mar 26, 2016

pilot (A) for analytic workflow #5

pilot (A) for analytic workflow #5

Comments

andkov commented Mar 26, 2016

andkov commented Mar 26, 2016