Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated discussion of optimizations in cnmf demo #1300

Merged
merged 1 commit into from
Mar 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 13 additions & 14 deletions demos/notebooks/demo_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -197,12 +197,11 @@
"source": [
"<div class=\"alert alert-info\">\n",
" <h2>Displaying large files</h2><br>\n",
" \n",
"Loading a movie with <em>cm.load()</em> pulls the data into memory, which is not always feasible. When working with your own data, you might need to adapt the above code when working with extremely large files. Caiman provides tools to handle this use case. One, you can just load some of your data into a movie object using the `subindices` argument to the `load()` function. For example, if you just want to load the first 500 frames of a movie, you can send it <em>subindices=np.arange(0,500)</em>.<br>\n",
" Loading a movie with <em>cm.load()</em> pulls the data into memory, which is not always feasible. When working with your own data, you might need to adapt the above code when working with extremely large files. Caiman provides tools to handle this use case. One, you can just load some of your data into a movie object using the `subindices` argument to the `load()` function. For example, if you just want to load the first 500 frames of a movie, you can send it <em>subindices=np.arange(0,500)</em>.<br>\n",
"\n",
"If you don't want to truncate your movie, there is a <em>play_movie()</em> function that behaves just like <em>movie.play()</em>, but it doesn't load the movie into memory. Rather, it takes the filename as an argument and loads frames from disk as they are shown. If you want to use it, just import using <em>from caiman.base.movies import play_movie</em> and read the documentation. We don't use it for this demo because the demo movie is small, and we do some calculations on the loaded movie array.<br>\n",
"<br>If you don't want to truncate your movie, there is a <em>play_movie()</em> function that behaves just like <em>movie.play()</em>, but it doesn't load the movie into memory. Rather, it takes the filename as an argument and loads frames from disk as they are shown. If you want to use it, just import using <em>from caiman.base.movies import play_movie</em> and read the documentation. We don't use it for this demo because the demo movie is small, and we do some calculations on the loaded movie array.<br>\n",
"\n",
"Another option for viewing very large movies is to use the <a href=https://github.com/fastplotlib/fastplotlib>fastplotlib library</a>, which leverages the GPU to provide interactive visualization within Jupyter notebooks (we discuss this more below).\n",
"<br>Another option for viewing very large movies is to use the <a href=https://github.com/fastplotlib/fastplotlib>fastplotlib library</a>, which leverages the GPU to provide interactive visualization within Jupyter notebooks (we discuss this more below).\n",
"</div>"
]
},
Expand Down Expand Up @@ -448,13 +447,14 @@
"metadata": {},
"source": [
"<div class=\"alert alert-info\" markdown=\"1\">\n",
" <h2 style=\"margin-top: 0;\">Optimizing performance</h2>\n",
"<br> \n",
"If you hit memory issues later, there are a few things you can do. First, you may want to lower the number of processors you are using. Each processor uses more RAM, and on a workstation with many processors, you can sometimes get better performance by reducing <em>num_processors_to_use</em>.The best way to determine the optimal number is by trial and error. When you set <em>num_processors_to_use</em> variable to <em>None</em>, it defaults to <i>one</i> less than the total number of CPU cores available (the reason we don't automatically set it to the total number of cores is because in practice this typically leads to worse performance).<br>\n",
" <h2 style=\"margin-top: 0;\">Optimizing performance</h2> \n",
" If you hit memory issues later, there are a few things you can do. First, you may want to lower the number of processors you are using. Each processor uses more RAM, and on a workstation with many processors, you can sometimes get better performance by reducing <em>num_processors_to_use</em>.The best way to determine the optimal number is by trial and error. When you set <em>num_processors_to_use</em> variable to <em>None</em>, it defaults to <i>one</i> less than the total number of CPU cores available (the reason we don't automatically set it to the total number of cores is because in practice this typically leads to worse performance).\n",
"\n",
"<br>Second, if your system has less than 32GB of RAM, and things are running slowly or you are running out of memory, then get more RAM. While you can sometimes get away with less, we recommend a *bare minimum* level of 16GB of RAM, but more is better. 32GB RAM is acceptable, 64GB or more is best. Obviously, this will depend on the size of your data sets.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could mention that OSX users have less ram than they think because of hybrid memory, but maybe that's too much detail to put in a notebook; good to see specific numbers here.

"\n",
"Second, if your system has less than 32GB of RAM, and things are running slowly or you are running out of memory, then get more RAM. While you can sometimes get away with less, we recommend a *bare minimum* level of 16GB of RAM, but more is better. 32GB RAM is acceptable, 64GB or more is best. Obviously, this will depend on the size of your data sets.<br>\n",
"<br>Third, try subsampling your data. You can *spatially* or *temporally* subsample. If you are already sampling at a low frame rate, you should probably just spatially subsample. You can do this outside of Caiman (many acquisition systems have this capability built in), and load the subsampled file into the notebook directly. This will make subsequent analysis less prone to subtle errors. You can also set Caiman's `ssub` and `tsub` parameters (spatial and temporal subsampling parameters). If you do set `ssub` to 2, then you should divide `gSig` by 2. Dealing with such parameter side-effects is why it is easier to subsample *before* importing data into Caiman. If the results of your analysis with subsampled data look reasonable, then you are good to go. What if you can't subsample your data? \n",
"\n",
"Third, if none of your memory optimizations work, you may just have too much data for offline CNMF. For this case, we also provide an online version of CNMF (OnACID), which uses a small number of frames to initialize the spatial and temporal components, and iteratively updates them with new data. This uses much less memory than the offline approach. The demo notebook for OnACID is found in <a href=\"./demo_OnACID_mesoscope.ipynb\">demo_OnACID_mesoscope.ipynb</a>. See the <a href=\"https://pubmed.ncbi.nlm.nih.gov/30652683/\">Caiman paper</a> for more discussion.\n",
"<br>If none of the above memory optimization procedures work, you may just have too much data for offline CNMF. For this case, we also provide an online version of CNMF (OnACID), which uses a small number of frames to initialize the spatial and temporal components, and iteratively updates the components as new data comes in. This uses much less memory than the offline approach. The demo notebook for OnACID is found in <a href=\"./demo_OnACID_mesoscope.ipynb\">demo_OnACID_mesoscope.ipynb</a>. See the <a href=\"https://pubmed.ncbi.nlm.nih.gov/30652683/\">Caiman paper</a> for more discussion.\n",
"</div>"
]
},
Expand Down Expand Up @@ -902,7 +902,7 @@
"| **YrA** | Residual for each trace | num_components x num_frames |\n",
"| **A** | Spatial components/footprints | num pixels x num components |\n",
"\n",
"To recover raw calcium traces, you can add together the denoised calcium traces and their residuals (`C + YrA`). Note that the `F_dff` calculation is not done automatically, so when you run `fit()` the `F_dff` field will initially be `None`. We will show how to do populate that field below."
"To recover raw calcium traces, you can add together the denoised calcium traces and their residuals (`C + YrA`). Note that the `F_dff` calculation is not done automatically, so when you run `fit()` the `F_dff` field will initially be `None`. Below, we will show how to populate that field."
]
},
{
Expand All @@ -921,10 +921,9 @@
"source": [
"<div class=\"alert alert-info\" markdown=\"1\">\n",
" <h2 style=\"margin-top: 0;\">More on Estimates</h2>\n",
" \n",
"The estimates object contains a great deal of information. The attributes are discussed in more detail <a href=\"https://caiman.readthedocs.io/en/latest/Getting_Started.html#result-interpretation\">in the documentation</a>, but you might also find exploring the <a href=\"https://github.com/flatironinstitute/CaImAn/blob/main/caiman/source_extraction/cnmf/estimates.py\">source code</a> helpful. For instance, while most users initially care about the extracted calcium signals <em>C</em> and spatial footprints <em>A</em>, the <b>background model</b> is also very important. The background model is included in the estimate in fields <em>b</em> and <em>f</em> (which correspond to the spatial and temporal components of the low-rank background model, respectively). We discuss the background model more below.<br>\n",
" The estimates object contains a great deal of information. The attributes are discussed in more detail <a href=\"https://caiman.readthedocs.io/en/latest/Getting_Started.html#result-interpretation\">in the documentation</a>, but you might also find exploring the <a href=\"https://github.com/flatironinstitute/CaImAn/blob/main/caiman/source_extraction/cnmf/estimates.py\">source code</a> helpful. For instance, while most users initially care about the extracted calcium signals <em>C</em> and spatial footprints <em>A</em>, the <b>background model</b> is also very important. The background model is included in the estimate in fields <em>b</em> and <em>f</em> (which correspond to the spatial and temporal components of the low-rank background model, respectively). We discuss the background model more below.\n",
"\n",
"<p>We realize that attribute names like <em>A</em> are not very informative or Pythonic. These names are rooted in mathematical conventions from the original papers in the literature.</p>\n",
"<br>We realize that attribute names like <em>A</em> are not very informative or Pythonic. These names are rooted in mathematical conventions from the original papers in the literature.\n",
"</div>"
]
},
Expand Down Expand Up @@ -1285,7 +1284,7 @@
"source": [
"<div class=\"alert alert-info\" markdown=\"1\">\n",
" <h2 style=\"margin-top: 0;\">More on $\\Delta F/F$</h2> \n",
"The above discussion of dff leaves out some details about it is actually calculated in Caiman. Behind the scenes, Caiman projects the background activity to the spatial footprint of the neuron and adds a moving percentile of this projected trace to the denominator as an additional normalizing term (see the discussion on the background model below). This acts as an empirical fudge factor that can keep $\\Delta F/F$ from getting too large for very small values of $F_0$ (especially when using denoised traces). Users may prefer to use the more traditional equation directly. Thanks to Peter Rupprecht for a helpful discussion of this topic. \n",
"The above discussion of dff leaves out some details about it is actually calculated in Caiman. In practice, Caiman locally projects the model of the background activity to the spatial footprint of the neuron, and adds a moving percentile of this projected trace to the denominator as an additional normalizing term (see the discussion on the background model below). This acts as an empirical fudge factor that can keep $\\Delta F/F$ from getting too large for very small values of $F_0$ (especially when using denoised traces). Users may prefer to use the more traditional equation directly. Thanks to Peter Rupprecht for a helpful discussion of this topic. \n",
"</div>"
]
},
Expand Down
Loading