Skip to content

Samples registration: datamover

Michael Huber edited this page Jul 12, 2017 · 7 revisions

Our model in openBIS defines three sample types, listed here below (here a general overview of openBIS):

  1. MISEQ_RUN,
  2. MISEQ_SAMPLE: child of MISEQ_RUN,
  3. RESISTANCE_TEST: child of MISEQ_SAMPLE.

In order to register this in openBIS, the script SampleSheet2openBIS.sh goes through the sample sheets found in MiSeqOutput and creates specific directories and files in datamover. We explain the process with an example.

Automatic registration of openBIS samples

Let's assume that a run on MiSeq2 has been completed, named 170629_M02081_0219_000000000-B67GM. The script on MiSeq2 will parse /cygdrive/d/Illumina/MiSeqOutput/170629_M02081_0219_000000000-B67GM/SampleSheet.csv and it will find two samples, called pat1 and pat2, both belonging to project Metagenomics. The script will have to register three openBIS samples, one MISEQ_RUN and two MISEQ_SAMPLE. In order to do this, it will create on datamover:/data/outgoing, respectively, the following three directories

  • 170629_M02081_0219_000000000-B67GM_METAGENOMICS,
  • 170629_M02081_0219_000000000-B67GM-1,
  • 170629_M02081_0219_000000000-B67GM-2.

In each directory it will create two files, named dataset.propertiesand sample.properties, described here below for 170629_M02081_0219_000000000-B67GM-1.

dataset.properties

This files defines in which space, project, experiment, the sample should go, its unique ID, sample type and dataset type. For example,

SPACE = IMV
PROJECT = Metagenomics
EXPERIMENT = MISEQ_SAMPLES
SAMPLE = 170629_M02081_0219_000000000-B67GM-1
SAMPLE_TYPE = MISEQ_SAMPLE
DATASET_TYPE = FASTQ

would be used to register sample 1.

sample.properties

This file defines the properties of the sample, as defined by our model in openBIS. For example, a MISEQ_SAMPLE can be annotated with a sample ID, sample name, sample well, indices, description and so on. For example,

SAMPLE_ID=1
SAMPLE_NAME=pat1
SAMPLE_PLATE=
SAMPLE_WELL=
I7_INDEX_ID=N702
INDEX_1=CGTACTAG
I5_INDEX_ID=S506
INDEX_2=ACTGCATA
DESCRIPTION=

would annotate sample 1.

Optional: a dataset file

MISEQ_SAMPLE also has a dataset, i.e., a fastq file can be copied into the directory and this will be available in openBIS. A sample of type MISEQ_RUN, on the other hand, does not have datasets and all information is contained in the two properties files.

Touch the marker file

Finally, a file .MARKER_is_finished_170629_M02081_0219_000000000-B67GM-1 must be created in order to trigger the registration. So, finally, in ~/data/outgoing the following files will be present

~/data/outgoing/170629_M02081_0219_000000000-B67GM-1/
~/data/outgoing/170629_M02081_0219_000000000-B67GM-1/sample.properties
~/data/outgoing/170629_M02081_0219_000000000-B67GM-1/dataset.properties
~/data/outgoing/170629_M02081_0219_000000000-B67GM-1/pat1_S1_L001_R1_001.fastq.gz
~/data/outgoing/.MARKER_is_finished_170629_M02081_0219_000000000-B67GM-1

This will be short lived anyway: about every minute a worker scans data/outgoing and if the marker file is present it will move the entire directory to the dropbox and, if there are no mistakes, to openBIS.