-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NA values in results of assocTestSingle #116
Comments
Besides, I tried to calculate the P value of GxE.Stat (using 3rd row in my above result) which is |
It looks like there are two terms named "PC10" in your data -- did you accidentally include the same variable twice in your phenotype file? The collinearity of including the same term twice might be leading to the NA issue. Since there are actually 11 interaction terms in your model (because PC10 is in there twice), the p-value calculation is using 11 d.f., which returns the same value as the function output. Lastly - while people often use 10 PCs in their analyses by default, I would suggest looking at your PCs to determine how many reflect population structure in your data and using that many PCs in your analysis. |
Thanks. I checked my phenotype file as followed, it seems there is no PC10 for twice. So I am a little confused why PC10 appear for two times. If collinearity of including the same term twice leads to the NA issue, it is weird that some SNPs can get the results. `> pc.df <- as.data.frame(mypcair$vectors)
Besides, I also checked the results when I only include two PCs, it is weird that the result also include the interaction between SNP and PC10, I have no idea if the GENESIS include PC10 wrongly when it try to find PC1
|
We've identified the problem. If your input phenotypes contain columns where the name of one column is the same as the name of another column pus additional characters, AND you supply the shorter column name as one of the values to test for GxE, you will get this error. It comes from this function to retrieve the GxE column names from the model matrix: https://github.com/UW-GAC/GENESIS/blob/devel/R/utils.R#L252 The reason we do this instead of matching the column names directly, is that the
We find the columns corresponding to There isn't a quick fix in the code for this, but an easy way for you to get around it when you set up your analysis is to change your column names so the "PC1" prefix is not repeated twice. If you change "PC10" to "pc10", for example, it should work. |
Thanks, it works. I appreciate your help. I have one last question, if I have microarray data, |
You can use either GWASTools or SeqArray/SeqVarTools and get the same results. SeqArray is newer and more flexible; it can import all variants represented in a VCF file. GWASTools was developed for microarray data so can only store biallelic variants. |
Got it, I appreciate it so much.
Thanks. it helps. |
after running code like this, the result of some SNPs are NA, here I included 10 PCs here, and include interaction between each SNPs with those 10 PCs. When I only include interaction of SNP with 2 PCs, there is no NA in the results. Could you please tell me why there is NA values and how can I avoid such problem. It seems that people always include 10 PCs in their analysis. Thanks
The text was updated successfully, but these errors were encountered: