Large scale inference docs #994

benjeffery · 2025-02-04T10:23:26Z

Fixes #840

benjeffery · 2025-02-04T10:23:42Z

@hyanwong Can I have a read-through here?

codecov · 2025-02-04T10:38:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.17%. Comparing base (e3b2155) to head (37645ef).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #994   +/-   ##
=======================================
  Coverage   93.17%   93.17%           
=======================================
  Files          18       18           
  Lines        6374     6374           
  Branches     1088     1088           
=======================================
  Hits         5939     5939           
  Misses        296      296           
  Partials      139      139

Flag	Coverage Δ
C	`93.17% <100.00%> (ø)`
python	`95.52% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jeromekelleher

LGTM

jeromekelleher · 2025-02-04T19:38:37Z

tsinfer/inference.py

+    :param int min_work_per_job: The minimum amount of work (as a count of genotypes) to
+        allocate to a single parallel job. If the amount of work in a group of ancestors
+        exceeds this level it will be broken up into parallel partitions, subject to
+        the constriant of `max_num_partitions`.


typo, constriant

hyanwong

Great works, thanks @benjeffery

It's all quite complicated, and I'm not sure I have a feel for how all the parts fit together, but the descriptions are detailed enough that I could follow them without problems.

I guess at some future point we might want a schematic, but that can wait for now. I reckon you can merge this and get someone (e.g. Duncan or Savita?) to try it out.

hyanwong · 2025-02-05T17:07:08Z

docs/large_scale.md

+entire genotype array for the contig being inferred needs to fit in RAM.
+This is the high-water mark for memory usage in tsinfer.
+Note the `genotype_encoding` argument, setting this to
+{class}`tsinfer.GenotypeEncoding.ONE_BIT` reduces the memory footprint of


Do we need to say that this can't be used if there is missing data?

hyanwong · 2025-02-05T17:07:28Z

docs/large_scale.md

+The plot below shows the number of ancestors matched in each group for a typical
+human data set:
+
+```{figure} _static/ancestor_grouping.png


May be worth indicating that the group number is ordered by time, so that group 0 represents the oldest ancestors?

hyanwong · 2025-02-05T17:07:33Z

docs/large_scale.md

+{meth}`match_ancestors_batch_group_finalise` will then insert the matches and
+output the tree sequence to `work_dir`.
+
+At anypoint the process can be resumed from the last successfully completed call to 


"anypoint" -> "any point"

hyanwong · 2025-02-05T17:07:55Z

docs/large_scale.md

+
+At anypoint the process can be resumed from the last successfully completed call to 
+{meth}`match_ancestors_batch_groups`. As the tree sequences in `work_dir` checkpoint the
+progress.


I'm not sure I understand / can parse this last sentence

Large scale inference docs

37645ef

benjeffery force-pushed the large-scale-docs branch from 952dbb9 to 37645ef Compare February 4, 2025 12:28

jeromekelleher approved these changes Feb 4, 2025

View reviewed changes

benjeffery mentioned this pull request Feb 5, 2025

Document resume option #834

Closed

hyanwong approved these changes Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large scale inference docs #994

Large scale inference docs #994

benjeffery commented Feb 4, 2025

benjeffery commented Feb 4, 2025

codecov bot commented Feb 4, 2025 •

edited

Loading

jeromekelleher left a comment

jeromekelleher Feb 4, 2025

hyanwong left a comment

hyanwong Feb 5, 2025

hyanwong Feb 5, 2025

hyanwong Feb 5, 2025

hyanwong Feb 5, 2025

Large scale inference docs #994

Are you sure you want to change the base?

Large scale inference docs #994

Conversation

benjeffery commented Feb 4, 2025

benjeffery commented Feb 4, 2025

codecov bot commented Feb 4, 2025 • edited Loading

Codecov Report

jeromekelleher left a comment

Choose a reason for hiding this comment

jeromekelleher Feb 4, 2025

Choose a reason for hiding this comment

hyanwong left a comment

Choose a reason for hiding this comment

hyanwong Feb 5, 2025

Choose a reason for hiding this comment

hyanwong Feb 5, 2025

Choose a reason for hiding this comment

hyanwong Feb 5, 2025

Choose a reason for hiding this comment

hyanwong Feb 5, 2025

Choose a reason for hiding this comment

codecov bot commented Feb 4, 2025 •

edited

Loading