Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Latest commit

 

History

History
32 lines (20 loc) · 1.63 KB

TODO.md

File metadata and controls

32 lines (20 loc) · 1.63 KB

To get msprime integration working:

When I'm back from Tahoe, there is an issue resulting in left == right. I think I know what is causing it: the rng is generating exactly 0 for the first breakpoint. FIXED, I think.

The C++ side can/should do more to help

  • Get a list of sample indexes
  • Convert forward time to backward time
  • Sort nodes and edges
  • Prune edges that where a child ID is never used later as a parent id
  • Prune extinct nodes

During edge pruning, we can build a list of non-extinct nodes that we use to actually prune the node list. We can used unorderd_set for a constant-time lookup table.

We will need to book-keep things like:

  • The last generation where GC happened
  • The last population size, as that relates to "len(samples"), and is the next ID value to use

We can probably get rid of:

  • Tracking vectors of parent/offspring IDs. As I've moved to a more "naive" method of node insertion, all I really need is the first ID of the parental generation and its population size, etc. Likewise for offspring.

We may want to consider:

  • Some rudimentary pruning each generation.

Safety stuff:

  • We are now using signed integers for some fields. For "big" sims (large N/rho), we can overflow things pretty quick. I need to add a check that throws an exception when we overflow. I'll be strict about this and simply throw the exception. However, a safer/saner implementation for production code would be to check if the next generation will overflow and do a simplification if so. The interesting edge case is if the population size and/or 4Nr is so large that we should overflow mid-generation. That is a nasty thing to ponder...