-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should post-SLiM simplify have keep_input_roots=True
?
#1579
Comments
for context: @silastittes and I ran in to this trying to debug DFE inference results which required the number of neutral and non-neutral substitutions |
This is for my own understanding, but does that make the root length = total sim time minus each tree's TMRCA? |
Yup, that should be the case. So you could retroactively calculate the neutral substitution rate given the total simulation time + simplified trees. This would be something like, exp_neutral_subs = 0.0
for t in ts.trees():
if t.num_edges > 0:
exp_neutral_subs += max(0, (slim_simulation_time - t.root) * t.span * neutral_mutation_rate) though this would need to be modified slightly if the neutral mutation rate varied across the sequence, I guess. Assuming that the mutation rates are in an |
That's my vote, too -- for one-off debugging it seems easier to modify a fork to get whatever's needed, rather than introduce another option + even more SLiM tests ... |
it's probably easiest for @silastittes and I to just calculate the expected value within the |
Now that I understand the application, I think it's a good idea to be able to get number of subs on branch to an arbitrarily distant outgroup somehow. But, I'm not sure I think that a The ideal way would be to have the outgroup as a population in the model. But, that's not possible to enforce generally and seems really restrictive-- the desired divergence time to the outgroup might change depending on the application. An alternative would be to have an option to add an outgroup at an arbitrary time (e.g. a population with no migration to the other populations, a fixed size, and a divergence time that exceeds any of the demographic events in the model). So for example, model = HomSap.get_demographic_model("whatever")
model.add_outgroup(name="some_hominid", time=long_long_ago, size=not_very_big)
samples = {..., "some_hominid" : 1}
engine.simulate(demographic_model=model, samples=samples, ...) The downside is that this increases computational overhead? But that could be largely avoided by setting the outgroup population size to 1. |
this is probably the way to go, but would need a pretty major refactor for the code. I think the thing to do is to shelve this for the coming paper/release and revisit at a later time. |
I think we'd just need to add a method to |
My naive (and almost certainly wrong) take on this is that the arbitrary decision of where to place the root time is all that matters. The length of that branch is what dictates the number of substitutions right? Does there need to be any information about the outgroup samples at all for this purpose? I agree that being forced to use the slim burn in time doesn’t make sense, and is demonstrably too short in some cases. |
The proposal is that we just document this for the time being: #1624 |
After we run SLiM, we simplify down to the requested samples, here:
https://github.com/popsim-consortium/stdpopsim/blob/main/stdpopsim/slim_engine.py#L1764
We do not pass in the
keep_input_roots
argument. This means that:Numbers of substitions are important to know sometimes, but these would be substitutions relative to the time we started running SLiM from which might not be the right reference time. So, it's not so clear to me.
Cons: this will make the resulting tree sequence bigger, and confuse people (since now the "roots" of the resulting trees won't actually be the roots-where-everything-coalesces).
Given the last point I'm inclined to say "no", or maybe make it optional.
The text was updated successfully, but these errors were encountered: