Skip to content

Commit

Permalink
deploy: 337eab6
Browse files Browse the repository at this point in the history
  • Loading branch information
ofisette committed Nov 3, 2024
1 parent 7fcb375 commit 67d153c
Show file tree
Hide file tree
Showing 3 changed files with 113 additions and 1 deletion.
58 changes: 58 additions & 0 deletions 4-storage.html
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,7 @@ <h2> Contents </h2>
</ul>
</li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#storage-management">Storage Management</a><ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#challenges-relating-to-active-data">Challenges relating to active data</a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#life-cycle-of-active-data">Life Cycle of Active Data</a><ul class="nav section-nav flex-column">
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#example-of-the-life-cycle-of-active-data">Example of the Life Cycle of Active Data</a></li>
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-running-a-small-pipeline"><strong>Exercise</strong> - Running a Small Pipeline</a></li>
Expand Down Expand Up @@ -574,6 +575,62 @@ <h3><code class="docutils literal notranslate"><span class="pre">/nearline</span
</section>
<section id="storage-management">
<h2>Storage Management<a class="headerlink" href="#storage-management" title="Link to this heading">#</a></h2>
<section id="challenges-relating-to-active-data">
<h3>Challenges relating to active data<a class="headerlink" href="#challenges-relating-to-active-data" title="Link to this heading">#</a></h3>
<p>Research data management involves the following steps:</p>
<ol class="arabic simple">
<li><p><a class="reference external" href="https://dmp-pgd.ca/">Making a data management plan</a></p></li>
<li><p><strong>Managing active data</strong></p></li>
<li><p><a class="reference external" href="https://www.lunaris.ca/">Depositing data</a></p></li>
</ol>
<p>Common challenges and issues with active research include:</p>
<ul class="simple">
<li><p><strong>Lack of an agreed upon structure</strong> that would be used
by all team members</p>
<ul>
<li><p>Reduced efficiency</p></li>
<li><p>Reduced ability to work as a team</p></li>
<li><p>It becomes more difficult to create metadata
once the research project is over</p></li>
</ul>
</li>
<li><p>Researchers <strong>quit the group without</strong></p>
<ul>
<li><p>Transfering intellectual property
to the principal investigator</p></li>
<li><p>Enabling data sharing</p></li>
</ul>
</li>
<li><p>Lack of a hierarchy for <strong>data priority</strong></p>
<ul>
<li><p>Temporary files (that could be deleted)</p></li>
<li><p>Immediate metadata creation</p></li>
<li><p>Imported data (copied) or not (to back up)</p></li>
<li><p>Raw vs cleaned-up data</p></li>
<li><p>Intermediate results vs final results for archival</p></li>
</ul>
</li>
<li><p>Lack of planning for <strong>future data use</strong>
when writing participant consent</p>
<ul>
<li><p>Data must be deleted at the end
of the initially planned period</p></li>
</ul>
</li>
</ul>
<p>Technical consequences:</p>
<ul class="simple">
<li><p>Large <strong>storage systems filed with</strong>:</p>
<ul>
<li><p>Temporary data that was not deleted</p></li>
<li><p>Multiple dataset copies
rather than a centralised repository</p></li>
</ul>
</li>
<li><p>Inefficient parallel file systems
for <strong>a large number of small files</strong></p></li>
</ul>
</section>
<section id="life-cycle-of-active-data">
<h3>Life Cycle of Active Data<a class="headerlink" href="#life-cycle-of-active-data" title="Link to this heading">#</a></h3>
<p>As time passes, the data tend to accumulate. It eventually becomes
Expand Down Expand Up @@ -780,6 +837,7 @@ <h2>Key Points<a class="headerlink" href="#key-points" title="Link to this headi
</ul>
</li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#storage-management">Storage Management</a><ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#challenges-relating-to-active-data">Challenges relating to active data</a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#life-cycle-of-active-data">Life Cycle of Active Data</a><ul class="nav section-nav flex-column">
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#example-of-the-life-cycle-of-active-data">Example of the Life Cycle of Active Data</a></li>
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-running-a-small-pipeline"><strong>Exercise</strong> - Running a Small Pipeline</a></li>
Expand Down
54 changes: 54 additions & 0 deletions _sources/4-storage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,60 @@
"metadata": {},
"source": [
"## Storage Management\n",
"### Challenges relating to active data\n",
"Research data management involves the following steps:\n",
"\n",
"1. [Making a data management plan](https://dmp-pgd.ca/)\n",
"1. **Managing active data**\n",
"1. [Depositing data](https://www.lunaris.ca/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Common challenges and issues with active research include:\n",
"\n",
"* **Lack of an agreed upon structure** that would be used\n",
" by all team members\n",
" * Reduced efficiency\n",
" * Reduced ability to work as a team\n",
" * It becomes more difficult to create metadata\n",
" once the research project is over\n",
"* Researchers **quit the group without**\n",
" * Transfering intellectual property\n",
" to the principal investigator\n",
" * Enabling data sharing\n",
"* Lack of a hierarchy for **data priority**\n",
" * Temporary files (that could be deleted)\n",
" * Immediate metadata creation\n",
" * Imported data (copied) or not (to back up)\n",
" * Raw vs cleaned-up data\n",
" * Intermediate results vs final results for archival\n",
"* Lack of planning for **future data use**\n",
" when writing participant consent\n",
" * Data must be deleted at the end\n",
" of the initially planned period"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Technical consequences:\n",
"\n",
"* Large **storage systems filed with**:\n",
" * Temporary data that was not deleted\n",
" * Multiple dataset copies\n",
" rather than a centralised repository\n",
"* Inefficient parallel file systems\n",
" for **a large number of small files**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Life Cycle of Active Data\n",
"As time passes, the data tend to accumulate. It eventually becomes\n",
"necessary to monitor the used space, as well as the number of files.\n",
Expand Down
Loading

0 comments on commit 67d153c

Please sign in to comment.