-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for non-binary traits #96
Comments
Dear @MrTomRod Yes, I've been wanting to add that feature for five years now. I'm not sure about the best way to implement it though, since much of the Scoary functionality is so intimately tied to having binary categories. Some of the approaches I've given some thought:
As you've noticed Scoary is not exactly in very active development at the moment due to other pressing obligations, but I maintain the ambition to continue development on it. I will happily accept PRs or spin-offs as long as you credit the original work. Thanks for offering! All the best, |
I have been using GaussionMixture to split by a continuous trait. This is a histogram of an example trait:
( My approach is simple and straightforward, but maybe not too powerful. I need something to quickly work on thousands of continuous traits. I'm not sure if I fully understand your suggestions, will have to think about that some more. Would you be willing to discuss this sometimes or perhaps even support me a little if I decided to do this? Btw, I've been using Boschloo's test instead of Fisher's, since it is perfectly matches the problem and is more powerful. It is slower, though. Not sure if it's worth it. |
That looks promising! And thanks for teaching me about Boschloo's test. Absolutely willing to work with you on this in the time I can contribute. You can get in contact with me at any time through my e-mail: [email protected]. |
I wanted to compare Fisher's vs Boschloo's test. To do this, I simulated 10 pangenomes for each combination of sample size: Each dot represents the results from one simulated pangenome. The x-axis is the rank of the 'causal' gene in the final table computed using Fisher's minus the rank computed using Boschloo's. In other words, if the resulting value is negative, Fisher's performed better, and if it is positive, Boschloo's performed better. I performed a Wilcoxon signed-rank test to see if Fisher and Boschloo perform differently: While Boschloo's test (imo justifiedly) gives a lower p-value, Fisher's seems to perform better at ranking genes with the simulated data. I have no clue why that is, though. |
I performed the same analysis with my fast-fisher library. It is now incredibly fast. The causal gene always got the same rank as with scipy's implementation, except for two simulated datasets: in one, the rank was one higher, in the other, it was one lower. |
Heya, Best wishes, |
You want to run Scoary on continuous traits? I'm working on an update for Scoary, but it's not ready yet. Approximately another month until testing makes sense. I will use GaussianMixture for splitting. You could simply pre-process your continuous traits with GaussianMixture yourself and then feed them into Scoary. |
I think @MrTomRod did the necessary updates and put it as Scoary-2. Kindly check his GitHub repo |
Dear @AdmiralenOla
Under
Coming soon
, you listSupport for non-binary traits
.This could be of interest to me. What approach are you thinking of? (Simply binarizing continuous traits?) What use case would it answer?
Btw, since you have probably moved on to other things, I am considering adding this feature to your code myself. Would this be okay with you?
Best, MrTomRod
The text was updated successfully, but these errors were encountered: