Skip to content

Commit

Permalink
Merge pull request #11 from MetaCell/feature/CELE-57
Browse files Browse the repository at this point in the history
CELE-57 Add first format ingestion documentation
  • Loading branch information
ddelpiano authored Aug 12, 2024
2 parents 9f7d7c8 + 66128c5 commit 167e525
Showing 1 changed file with 121 additions and 0 deletions.
121 changes: 121 additions & 0 deletions format-ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Format of data ingested in the database

The management script is able to ingest data represented in a JSON format.
Different files are necessary:

* `neurons.json` that encodes the information about the neurons in general
* `datasets.json` that encodes the information about the different datasets
* `connections/xxx.json` that encodes the different connections for dedicated datasets
* `annotations/xxx.json` that encodes annotatinos for different zones of the anatomy

Those files are automatically exported from third-party tool and shouldn't be edited manually.

## Format of `neurons.json`

This file defines a list of JSON object as root structure:

```json
[
{
... // definition of neuron 1
},
...,
{
... // definition of neuron N
}
]
```

Each JSON object represents a neuron with this schema:

```json
{
"inhead": int, // int used as bool, is the neuron part of the head or not
"name": string, // name of the neuron, can be same as classes, or L or R of classes
"emb": int, // int used as bool
"nt": string, // neurotransmitter type
"intail": int, // int used as bool
"classes": string, // general name of the neuron
"typ": string // type of the neuron: "i" => , TODO fillme
}
```


## Format of `datasets.json`

This file defines a list of JSON object as root structure.

```json
[
{
... // definition of dataset 1
},
...,
{
... // definition of dataset N
}
]
```

Each JSON object represents a specific dataset with this schema:

```json
{
"id": string // unique ID for the dataset
"name": string // display name of the dataset
"type": string // type of dataset: "complete" or "head"
"time": int // time of the dataset
"visualTime": int // visualTime of the dataset
"description": string // description of the dataset
"axes": [ // OPTIONAL: different axes and their representation, not used but can appear in the file
...
]
}
```

## Format of `connections/xxx.json`

The `connections` directory encodes the information about the different connections by dataset.
Each file in this directory is named after the `id` of a dataset present in the `datasets.json` file, e.g.: a dataset defined using the `id` `white_1986_jsh` will defines each of the connections of the dataset in the file `connections/white_1986_jsh.json`.

Each of those files is a list of JSON object where each of the JSON objects encodes different connections between different neurons.
The schema is the following:

```json
{
"ids": [ ... ], // a list of int, where each int represents the ID of the neurons involved in this connection
"post": string, // the name of a neuron as defined in "neurons.json"
"post_tid": [ ... ], // a list of int where each int represents the ID of a post synapse for a dedicated post neuron
"pre": string, // the name of a neuron as defined in "neurons.json"
"pre_tid": [ ... ], // a list of int where each int represents the ID of a pre synapse for a dedicated pre neuron
"syn": [ ... ], // a list of int where each int represents the weight of a post or pre synapses (indice matches the neuron in pre/post_tid)
"typ": int // the type of connection ("electrical" or "chemical")
}
```

For each of those objects: `ids`, `post_tid`, `pre_tid` and `syn` need to have the same number of elements.

## Format of `annotations/xxx.json`

The `annotations` directory encodes annotations about the different part (`head` or `complete`) following the naming convention `part.annotations.json`, e.g.: the annotations for the `head` are located in `annotations/head.annotations.json`.

Each of those files is a JSON object that defines categories as keys and a list of neurons couples as values.
Here is the schema for the `head.annotations.json` file (the `complete.annotations.json` file, while existing, is an empty JSON object).

```json
{
"increase": [ // the type of annotation
[
string, // pre, the ID/name of a neuron from "neurons.json"
string // post, the ID/name of the other neuron from "neurons.json" that is part of the couple
]
]
}
```

The types of annotations can be `increase`, `variable`, `postembryonic`, `decrease` or `stable`

### Note:

The existing repository contains a `trajectories` folder with a set of JSON files.
Those files are not ingested anymore, they are part of a legacy system.

0 comments on commit 167e525

Please sign in to comment.