Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics for measuring how we do with real data #41

Open
hyanwong opened this issue Jul 3, 2017 · 4 comments
Open

Statistics for measuring how we do with real data #41

hyanwong opened this issue Jul 3, 2017 · 4 comments

Comments

@hyanwong
Copy link
Contributor

hyanwong commented Jul 3, 2017

Ww want to run tsinfer on real data and see how we do, compared to what we might expect. This issue collects some ideas for how to do that.

@hyanwong
Copy link
Contributor Author

hyanwong commented Jul 3, 2017

On average, individuals from the same demographic area should share the vast majority of most recent coalescences with each other. Another way to measure this would be to use Shiffels & Durbin's 'cross coalescence rate'

@hyanwong
Copy link
Contributor Author

hyanwong commented Nov 27, 2017

Chatted to George Busby about 1000G data: his project with Ryan Christ involved chromosome painting with some of the 1000G data + a focal population of Africans with and introgressed lactose tolerance haplotype, and looking for areas of shared ancestry within the focal African population (ancestry was estimated by splitting the 1000G data into e.g. 6 populations & basing ancestry measures on haplotype prevalence within each pop). We might be able do this sort of thing with ancestors on trees instead.

Another suggestion would be to look across the duffy locus, which we know to have been under selection. One issue is that this is on Chromosome 2, which is the largest human chromosome.

@hyanwong
Copy link
Contributor Author

Just chatting to Wilder - I wonder if we can plot a "densitree" using a random subsample of the data (both haplotypes and genomic positions)

@hyanwong
Copy link
Contributor Author

Also, if there are any individuals in 1000G who have admixed parents (e.g. one maternal grandparent african, the other european), then we might be able to see large chunks where the genome shows more close relationship with africans and another chunk with europeans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant