documentation updates

dirac-institute · Jan 12, 2025 · 2162419 · 2162419
1 parent 50e87c2
commit 2162419
Show file tree

Hide file tree

Showing 5 changed files with 75 additions and 41 deletions.
diff --git a/docs/configfiles.rst b/docs/configfiles.rst
@@ -43,6 +43,8 @@ approximation of the Rubin detector.
     :language: text
     :linenos:
 
+.. _known_config:
+
 Rubin Known Object Prediction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 This configuration file is appropriate for running ``Sorcha`` using the full camera footprint but with randomization,

diff --git a/docs/ephemerisgen.rst b/docs/ephemerisgen.rst
@@ -122,6 +122,9 @@ If you want to use the same input orbits across multiple ``Sorcha`` runs, you ca
 .. tip::
    Compared to the other outputs from ``Sorcha``, the ephemeris output files are typicaly very large in size.  The output will be slow to read in to ``Sorcha``, but for some use cases reading in the ephemeris as a file  can be faster than ephemeris generation on the fly. We recommend only outuputting the contents of the ephemeris stage if you need it to speed up future simulations. If possible, use the HDF5 file format to help with disk I/O  speeds. 
 
+.. tip::
+   If instead you want to know which of the input small body population lands in the survey observations with an estimate of their apparent magnitude wihtout applying any other cuts or filters on the detections (not including discovery efficiency and linking effects), you can use/adapt the :ref:`known_config` example :ref:`configs`.
+
 Validation
 --------------------------
 

diff --git a/docs/example_files/multi_sorcha.py b/docs/example_files/multi_sorcha.py
@@ -4,9 +4,9 @@
 import pandas as pd
 import sqlite3
 
-def run_sorcha(i, args, path_inputs, pointings, instance, config):
-    print(f"sorcha run -c {config} -pd {pointings} -o {args.path}{instance}/ -t {instance}_{i} -ob  {args.path}{instance}/orbits_{i}.csv -p {args.path}{instance}/physical_{i}.csv", flush=True)
-    os.system(f"sorcha run -c {config} -pd {pointings} -o {args.path}{instance}/ -t {instance}_{i} -ob  {args.path}{instance}/orbits_{i}.csv -p {args.path}{instance}/physical_{i}.csv")
+def run_sorcha(i, args, path_inputs, pointings, instance,stats, config):
+    print(f"sorcha run -c {config} --pd {pointings} -o {args.path}{instance}/ -t {instance}_{i} --ob  {args.path}{instance}/orbits_{i}.csv -p {args.path}{instance}/physical_{i}.csv --st {stats}_{i}", flush=True)
+    os.system(f"sorcha run -c {config} --pd {pointings} -o {args.path}{instance}/ -t {instance}_{i} --ob  {args.path}{instance}/orbits_{i}.csv -p {args.path}{instance}/physical_{i}.csv --st {stats}_{i}")
 
 if __name__ == '__main__':
         import argparse
@@ -22,6 +22,7 @@ def run_sorcha(i, args, path_inputs, pointings, instance, config):
         parser.add_argument('--cleanup',  action='store_true')
         parser.add_argument('--copy_inputs', action='store_true')
         parser.add_argument('--pointings', type=str)
+        parser.add_argument('--stats', type=str)
         parser.add_argument('--config', type=str)
         args = parser.parse_args()
         chunk = args.chunksize
@@ -30,6 +31,7 @@ def run_sorcha(i, args, path_inputs, pointings, instance, config):
         pointings = args.pointings
         path = args.path
         config = args.config
+        stats=args.stats
 
         orbits = tb.Table.read(args.input_orbits)
         orbits = orbits[instance*chunk:(instance+1)*chunk]
@@ -50,7 +52,7 @@ def run_sorcha(i, args, path_inputs, pointings, instance, config):
                 sub_phys.write(f"{args.path}{instance}/physical_{i}.csv", overwrite=True)
 
         with Pool(processes=args.cores) as pool:
-            pool.starmap(run_sorcha, [(i, args, path_inputs, pointings, instance, config) for i in range(args.cores)])
+            pool.starmap(run_sorcha, [(i, args, path_inputs, pointings, instance, config, stats) for i in range(args.cores)])
 
         data = [] 
         for i in range(args.cores):

diff --git a/docs/hpc.rst b/docs/hpc.rst
@@ -1,32 +1,34 @@
-Sorcha Parallelization
+.. _hpc:
+
+ Parallelization
 ===============================================
 
 Embarrassingly Parallel Problem
 ------------------------------------
 
-Sorcha’s design lends itself perfectly to parallelization – when it simulates a large number of solar system objects, each one is considered in turn independently of all other objects. If you have access to a large number of computing cores, you can run Sorcha much more quickly by dividing up the labor: giving a small part of your model population to each core. 
+’s design lends itself perfectly to parallelization – when it simulates a large number of solar system objects, each one is considered in turn independently of all other objects. If you have access to a large number of computing cores, you can run ``Sorcha`` much more quickly by dividing up the labor: giving a small part of your model population to each core. 
 
-This involves two subtasks: breaking up your model population into an appropriate number of input files with unique names and organizing a large number of cores to simultaneously run Sorcha their own individually-named input files. Both of these tasks are easy in theory, but tricky enough in practice that we provide some guidance below.
+This involves two subtasks: breaking up your model population into an appropriate number of input files with unique names and organizing a large number of cores to simultaneously run ``Sorcha`` on their own individually-named input files. Both of these tasks are easy in theory, but tricky enough in practice that we provide some guidance below.
 
 
 SLURM
 ---------
 
-Slurm Workload Manager is a resource management utility commonly used by computing clusters. We provide starter code for running large parallel Sorcha batches using SLURM, though general guidance we provide is applicable to any system. Documentation for SLURM is available `here <https://slurm.schedmd.com/>`_. Please note that your HPC (High Performance Computing) facility’s SLURM setup may differ from those on which Sorcha was tested, and it is always a good idea to read any facility-specific documentation or speak to the HPC maintainers before you begin to run jobs.
+Slurm Workload Manager is a resource management utility commonly used by computing clusters. We provide starter code for running large parallel  batches using SLURM, though general guidance we provide is applicable to any system. Documentation for SLURM is available `here <https://slurm.schedmd.com/>`_. Please note that your HPC (High Performance Computing) facility’s SLURM setup may differ from those on which ``Sorcha`` was tested, and it is always a good idea to read any facility-specific documentation or speak to the HPC maintainers before you begin to run jobs.
 
 Quickstart
 --------------
 
-We provide as a starting point our example scripts for running Sorcha on HPC facilities using SLURM. Some modifications will be required to make them work for your facility.
+We provide as a starting point our example scripts for running  on HPC facilities using SLURM. Some modifications will be required to make them work for your facility.
 
-Below is a very simple SLURM script example designed to run the demo files three times on three cores in parallel. Here, one core has been assigned to each Sorcha run, with each core assigned 1Gb of memory. 
+Below is a very simple SLURM script example designed to run the demo files three times on three cores in parallel. Here, one core has been assigned to each ``Sorcha`` run, with each core assigned 1Gb of memory. 
 
 .. literalinclude:: ./example_files/multi_sorcha.sh 
     :language: text
 
-Please note that time taken to run and memory required will vary enormously based on the size of your input files, your input population, and the chunk size assigned in the Sorcha configuration file: we therefore recommend test runs before you commit to very large runs. The chunk size is an especially important parameter: too small and Sorcha will take a very long time to run, too large and the memory footprint may become prohibitive. We have found that chunk sizes of 1000 to 10,000 work best.
+Please note that time taken to run and memory required will vary enormously based on the size of your input files, your input population, and the chunk size assigned in the ``Sorcha`` configuration file: we therefore recommend test runs before you commit to very large runs. The chunk size is an especially important parameter: too small and ``Sorcha`` will take a very long time to run, too large and the memory footprint may become prohibitive. We have found that chunk sizes of 1000 to 10,000 work best.
 
-Below is a more complex example of a SLURM script. Here, multi_sorcha.sh calls multi_sorcha.py, which splits up an input file into a number of ‘chunks’ and runs Sorcha in parallel on a user-specified number of cores. 
+Below is a more complex example of a SLURM script. Here, multi_sorcha.sh calls multi_sorcha.py, which splits up an input file into a number of ‘chunks’ and runs ``Sorcha`` in parallel on a user-specified number of cores. 
 
 multi_sorcha.sh:
 
@@ -38,49 +40,54 @@ multi_sorcha.py:
 .. literalinclude:: ./example_files/multi_sorcha.py
     :language: python
 
+.. note::
+  We provide these here for you to copy, paste, and edit as needed. You might have to some some slight modifications to both the SLURM script and multi_sorcha.py depending if you're using ``Sorcha`` without calling the stats file.   
 
 multi_sorcha.sh requests many parallel Slurm jobs of multi_sorcha.py, feeding each a different --instance parameter. After changing ‘my_orbits.csv’, ‘my_colors.csv’, and ‘my_pointings.db’ to match the above, it could be run as sbatch --array=0-9 multi_sorcha.sh 25 4 to generate ten jobs, each with 4 cores running 25 orbits each. 
 
 
 You can run multi_sorcha.py on the command line as well::
 
-   python  multi_sorcha.py --config sorcha_config_demo.ini --input_orbits mba_sample_1000_orbit.csv --input_physical mba_sample_1000_physical.csv --pointings baseline_v2.0_1yr.db --path ./ --chunksize 1000 --norbits 250 --cores 4 --instance 0 --cleanup --copy_inputs
+   python  multi_sorcha.py --config sorcha_config_demo.ini --input_orbits mba_sample_1000_orbit.csv --input_physical mba_sample_1000_physical.csv --pointings baseline_v2.0_1yr.db --path ./ --chunksize 1000 --norbits 250 --cores 4 --instance 0 --stats mbastats --cleanup --copy_inputs 
 
-This will  generate a single output file. It should work fine on a laptop, and be a bit, but not 4x, faster than the single-core equivalent due to overheads (time sorcha run -c sorcha_config_demo.ini -pd baseline_v2.0_1yr.db -o ./ -t 0_0 -ob mba_sample_1000_orbit.csv -p mba_sample_1000_physical.csv).
+This will generate a single output file. It should work fine on a laptop, and be a bit, but not 4x, faster than the single-core equivalent due to overheads (time sorcha run -c sorcha_config_demo.ini -pd baseline_v2.0_1yr.db -o ./ -t 0_0 --st mbatats_0 -ob mba_sample_1000_orbit.csv -p mba_sample_1000_physical.csv).
 
-This ratio improves as input file sizes grow. Make sure to experiment with different numbers of cores to find what’s fastest given your setup and file sizes.
+.. note::
+  This ratio improves as input file sizes grow. Make sure to experiment with different numbers of cores to find what’s fastest given your setup and file sizes.
 
 
 Sorcha’s Helpful Utilities
 ---------------------------------
 
-Sorcha comes with a tool designed to combine the results of multiple runs and the input files used to create them into tables on a SQL database. This can make exploring your results easier. To see the usage of this tool, on the command line, run::
+``Sorcha`` comes with a tool designed to combine the results of multiple runs and the input files used to create them into tables on a SQL database. This can make exploring your results easier. To see the usage of this tool, on the command line, run::
 
    sorcha outputs create-sqlite –help
 
-Sorcha also has a tool designed to search for and check the logs of a large number of runs. This tool can make sure all of the runs completed successfully, and output to either the terminal or a .csv file the names of the runs which have not completed and the relevant error message, if applicable. To see the usage of this tool, on the command line run::
+``Sorcha`` also has a tool designed to search for and check the logs of a large number of runs. This tool can make sure all of the runs completed successfully, and output to either the terminal or a .csv file the names of the runs which have not completed and the relevant error message, if applicable. To see the usage of this tool, on the command line run::
 
    sorcha outputs check-logs –help
 
 
 Best Practices/Tips and Tricks
 -------------------------------------
 
-1. We strongly recommend that HPC users download the auxiliary files needed to run the ASSIST+REBOUND into a known, named directory, and use the -ar command line flag in their sorcha run call to point Sorcha to those files. You can download the auxiliary files using::
+1. We strongly recommend that HPC users download the auxiliary files needed to run the ASSIST+REBOUND into a known, named directory, and use the -ar command line flag in their **sorcha run** call to point ``Sorcha`` to those files. You can download the auxiliary files using::
 
     sorcha bootstrap --cache <directory>
 
+ And then run ``Sorcha`` via::
+
     sorcha run … -ar /path/to/folder/ 
 
-  This is because Sorcha will otherwise attempt to download the files into the local cache, which may be on the HPC nodes rather than in your user directory, potentially triggering multiple slow downloads.
+  This is because ``Sorcha`` will otherwise attempt to download the files into the local cache, which may be on the HPC nodes rather than in your user directory, potentially triggering multiple slow downloads.
 
-2. We recommend that each Sorcha run be given its own individual output directory. If multiple parallel Sorcha runs are attempting to save to the same file in the same directory, this will cause confusing and unexpected results.
+2. We recommend that each ``Sorcha`` run be given its own individual output directory. If multiple parallel ``Sorcha`` runs are attempting to save to the same file in the same directory, this will cause confusing and unexpected results.
 
-3. Sorcha output files can be very large, and user directories on HPC facilities are usually space-limited. Please ensure that your Sorcha runs are directing the output to be saved in a location with sufficient space, like your HPC cluster’s scratch drive.
+3. ``Sorcha`` output files can be **very large**, and user directories on HPC facilities are usually space-limited. Please ensure that your ``Sorcha`` runs are directing the output to be saved in a location with sufficient space, like your HPC cluster’s scratch drive.
 
 4. Think about having useful, helpful file names for your outputs. It is often tempting to call them something like “sorcha_output_<number>” or “sorcha_output_<taskid>”, but hard-won experience has led us to instead recommend more explanatory names for when you come back to your output later.
 
 
-
-
+..tip::
+  You can use the **sorcha init** command to copy ``Sorcha``'s :ref:`example configuration files  <example_configs>` into a directory of your choice.
  
diff --git a/docs/outputs.rst b/docs/outputs.rst
@@ -3,6 +3,17 @@
 Outputs
 ==================
 
+``Sorcha`` outputs:
+  * :ref:`Detections File <detections>` (list of all the detections of the input popuation made by the simulated survey
+  * (Optioanal) :ref:`Statistics (Tally) File <stats>`  that provides a summary overview for the objects from the input population that were ''found'' in the simulated survey
+  * (Optional) :ref:`Ephemeris Output <ephem_output>` that provides the output from the :ref:`Ephemeris Generation<ephemeris_gen>`  
+
+.. image:: images/survey_simulator_flow_chart.png
+  :width: 800
+  :alt: An overview of the inputs and outputs of the Sorcha code.
+  :align: center
+
+
 .. attention::
    Use the **-o** flag on the command line to specify where ``Sorcha`` should be  saving any output and log files (the file path).
 
@@ -26,6 +37,8 @@ The :ref:`configuration file<configs>` keyword output_format in the OUTPUT secti
 .. attention::
    Use the **-t** flag on the command line to specify the filename stem for all the ``Sorcha`` output files and logs.
 
+.. _detections:
+
 Detections File
 ----------------------
 
@@ -35,7 +48,7 @@ with a row for each predicted detection and a column for each parameter  calcula
 
 Additionally, the output columns of the detections file  can be set to either "basic" or "all" settings (described below) using the output_columns :ref:`configuration file<configs>` keyword. 
 
-.. _basic::
+.. _basic:
 
 Basic Output
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -114,7 +127,7 @@ Example Detections File in Basic Format
    S1000000a,61789.27659,164.99043640246796,-19.09523631317997,164.29665099999988,-19.110176000000447,2.8895553381860802e-06,z,19.376978135088684,19.359651855968583,0.008079363622311368,0.00805998568672928,23.293210067462763,23.293123719813384
 
 
-.. _full::
+.. _full:
 
 Full Output
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -254,23 +267,9 @@ Detections File: Full Output Column Names, Formats, and Descriptions
 
 Optional  Outputs
 ----------------------
-
-Ephemeris Output
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Optionally (with the **--ew** flag set at the command line), an ephemeris file of all detections near the 
-field can be generated to a separate file, which can then be provided back to ``Sorcha`` as an optional external ephemeris file with the **-er** flag.
-More information can be found on this functionality, including the output columns, in the :ref:`Ephemeris Generation<ephemeris_gen>` section of the documentation.
-
-The format of the outputted ephemeris file is controlled by the **eph_format** configuration keyword in the Inputs section of the :ref:`configuration file<configs>`e::
-
-   [INPUT]
-   ephemerides_type = external
-   eph_format = csv
-
-.. attention::
-   Users should note that output produced by reading in a previously-generated ephemeris file will be in a different order than the output produced when running the ephemeris generator within ``Sorcha``.
-   This is simply a side-effect of how 	``Sorcha`` reads in ephemeris files and does not affect the actual content of the output.
 
+.. _stats:
+
 Statistics (Tally) File
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ``Sorcha`` can also output a statistics or "tally" file (if specified uisng the **--st flag)  which contains an overview of the ``Sorcha`` output for each object and filter. Minimally, this
@@ -311,3 +310,24 @@ Statistics (Tally) File Column Names, Formats, and Descriptions
 
 .. note::
 Unless the user has specified **drop_unlinked = False** in the :ref:`configuration file<configs>`, the object_linked column will read TRUE for all objects. To see which objects were not linked by ``Sorcha``, this variable must be set to False.
+
+.. _ephem_output:
+
+Ephemeris Output
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Optionally (with the **--ew** flag set at the command line), an ephemeris file of all detections near the
+field can be generated to a separate file, which can then be provided back to ``Sorcha`` as an optional external ephemeris file with the **-er** flag.
+More information can be found on this functionality, including the output columns, in the :ref:`Ephemeris Generation<ephemeris_gen>` section of the documentation.
+
+The format of the outputted ephemeris file is controlled by the **eph_format** configuration keyword in the Inputs section of the :ref:`configuration file<configs>`e::
+
+   [INPUT]
+   ephemerides_type = external
+   eph_format = csv
+
+.. attention::
+   Users should note that output produced by reading in a previously-generated ephemeris file will be in a different order than the output produced when running the ephemeris generator within ``Sorcha``. This is simply a side-effect of how  ``Sorcha`` reads in ephemeris files and does not affect the actual content of the output.
+
+.. tip::
+   If instead you want to know which of the input small body population lands in the survey observations with an estimate of their apparent magnitude wihtout applying any other cuts or filters on the detections (not including discovery efficiency and linking effects), you can use/adapt the :ref:`known_config` example :ref:`configs`.
+