-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
harmonize: SMOKING #9
Comments
back at @ampiccinin
|
@andkov – as long as you document it, it would be OK to drop small groups like this that don’t readily fit with the focus of the question and can’t be harmonized with the other datasets. |
@ampiccinin - ok. Let's put these documentation notes into issues like these, dealing with harmonization rules. |
back @ampiccinin
the following response were decided to be inconsistent and were removed from the computation of the harmonized variable |
back at @ampiccinin
I can only speculate that the order is as given in the documentation |
Unfortunately, no, @andkov . Those are alphabetic, as far as I know. I could try to track down the original data entry sheet on NACDA. |
@andkov - Argh! Sorry – I put them in as comments, rather than issues, didn’t I? Since at this point it will take you just as long to read them as one as the other, I will not re-type what I wrote, but will respond to the issues you started this morning so that next comments are sorted by issue (assuming I understand correctly). Will you be starting an alcohol issue soon? As I see new issues appear in emailed notification, I will use this as a prompt that the descriptives are available. In the meantime I will move on to other topics since I don’t know what else I could do for this at the moment. |
back @ampiccinin
The dto[["unitData"]][["tilda"]] %>%
+ dplyr::group_by_("BH002") %>%
+ dplyr::summarise(count = n())
Source: local data frame [3 x 2]
BH002 count
(fctr) (int)
1 UNDOCUMENTED CODE 3727
2 Yes 1564
3 No, I have stopped 3213 Either the Maelstrom documentation about this item is incorrect, or wrong data has been passed down to the participants. |
Looks like more info than we want right now. Stick with simpler dichotomy: current smoker/not |
CategorizationBefore we can encode unique combination of response to categorical variables we need to have those categorical variables. There are two continuous variables related to smoking: > dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="BR0030") %>% dplyr::select(name,label)
name label
1 BR0030 how many years smoked
> dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="BH003") %>% dplyr::select(name,label)
name label
1 BH003 bh003 How old were you when you stopped smoking? similar to harmonization rules, we can encode the decisions about how the continuous variable should be split up into an additional column of table (.csv file). Procedureedit the files in The categorical variable created with this procedure will then be passed down to the data schema definition to create response-profiles so that harmonized rules could be declared. |
@ampiccinin , to clarify the work flow. The h-rules for smoking for SHARE and TILDA will have to be revised after the categorization rule has been established. I don't see a work around or automating of this process, unfortunately. |
We don’t need that variable – we can just use GEVERSMK: YES From: Andriy V. Koval [mailto:[email protected]] @ampiccininhttps://github.com/ampiccinin Categorization Before we can encode unique combination of response to categorical variables we need to have those categorical variables. There are two continuous variables related to smoking:
1 BR0030 how many years smoked
name label 1 BH003 bh003 How old were you when you stopped smoking? similar to harmonization rules, we can encode the decisions about how the continuous variable should be split up into an additional column of table (.csv file). Procedure edit the files in ./data/meta/c-rules/https://github.com/IALSA/ialsa-2016-groningen/tree/master/data/meta/c-rules to provide correction to the categorization rule (edit the new column) or provide an alternative categorization rule (create a new column with a distinct, descriptive name). The categorical variable created with this procedure will then be passed down to the data schema definition to create response-profiles so that harmonized rules could be declared. — |
@andkov - Don’t need to bother with time since quit. Just ignore. |
back @ampiccinin
The variable |
back @ampiccinin
The variable dto[["metaData"]] %>%
dplyr::filter(study_name=="tilda", name=="BH003") %>%
dplyr::select(name,label)
name label
1 BH003 bh003 How old were you when you stopped smoking? has been ruled to be excluded from the data schema variables for harmonized variables operationalizing the construct |
Back @andkov – I just meant we can rely on the categorical variables only in each dataset (BR0020) (BEHSMOKER, BH002) |
@ampiccinin, allow me to clarify things for myself. I interpret your comment
as the following decision : The continuous variables > dto[["metaData"]] %>% dplyr::filter(study_name=="share", name=="BR0030") %>% dplyr::select(name,label)
name label
1 BR0030 how many years smoked
> dto[["metaData"]] %>% dplyr::filter(study_name=="tilda", name=="BH003") %>% dplyr::select(name,label)
name label
1 BH003 bh003 How old were you when you stopped smoking? are excluded from the data schema variables for the harmonized operationalization of the construct The instructions for the exercise ask to provide the reasons for excluding the proposed variables from the use in harmonized variable computation. Please document the reason for each (please edit this comment) :
|
@andkov – yes, excluded. Instructions for what exercise? Edits:
|
I went back to re-read the instructions that was passed down to all teams and see that this actually a false memory. The instructions ask to document It doesn't say explicitly to document what variables from the source data sets should be included into Data Schema variables and which should not. But it seems that it's implied. In my opinion, we nevertheless should provide such argumentation so that we don't come back to making the same decisions again. The Let's type up these decisions in the issues and I'll transfer them to reports as text when I update them. |
before : recoding of continuous variables is accomplised programmatically in the script now: recode is done on empirical profile of responses using the external .csv
After implemented the suggested corrections to the harmonization rules for In a comment below, please put "viewed and agreed". Hearing from both @ampiccinin and @smhofer will indicate to me we are ready to close this issue and accept the current state of harmonization of this variables as stable. |
h-rules
The text was updated successfully, but these errors were encountered: