load-project

This will take an xlsx file and generate a project_0.json suitable for uploading to the DSS.

This will then upload the project_0.json and a (mostly) empty links.json which will populate a new project in the browser.

Source a new environment and install dependencies:

virtualenv -p python3.6 v3nv && . v3nv/bin/activate && pip install -r requirements.txt

Parse the xlsx:

#!/usr/bin/env bash

# E-GEOD-81547_curated_ontologies_07_2019.xlsx
# DSS prod uuid: cddab57b-6868-4be4-806f-395ed9dd635a
python xlsx_to_project_json.py --xlsx data/test_000.xlsx

# Gary_Bader_9_16.xlsx
# DSS prod uuid: 4d6f6c96-2a83-43d8-8fe1-0f53bffd4674
python xlsx_to_project_json.py --xlsx data/test_001.xlsx

# GEOD-93593_HCA_Ontologies_July_2.xlsx
# DSS prod uuid: 2043c65a-1cf8-4828-a656-9e247d4e64f1
python xlsx_to_project_json.py --xlsx data/test_002.xlsx

# hca-metadata-spreadsheet-GSE84133_pancreas.xlsx
# DSS prod uuid: f86f1ab4-1fbb-4510-ae35-3ffd752d4dfc
python xlsx_to_project_json.py --xlsx data/test_003.xlsx

# hca-metadata-spreadsheet-GSE95459-GSE114374-colon.xlsx
# DSS prod uuid: f8aa201c-4ff1-45a4-890e-840d63459ca2
python xlsx_to_project_json.py --xlsx data/test_004.xlsx

# mf-E-GEOD-106540_spreadsheet_v9.xlsx
# DSS prod uuid: 90bd6933-40c0-48d4-8d76-778c103bf545
python xlsx_to_project_json.py --xlsx data/test_005.xlsx

Adding --upload true will upload the data to the DSS. Note that UUID's are now always programmatically generated from GEO accessions and cannot be provided via the commandline.

NOTE:

Edited the following fields in "data/test_004.xlsx":

publications.publication_url -> publications.url
publications.publication_title -> publications.title

NOTE

6 ORIGINAL DATASETS (ALREADY IN THE DSS):

spreadsheets/existing/*.xlsx are the original excel files provided that currently exist in dss prod and we have finished examples to compare against.

71 RAW DATASETS (STATUS NOT PARSED)

The xlsx files in spreadsheets/new were downloaded from a spreadsheet of spreadsheets and assumed to be (mostly) complete projects. These inputs were provided with the labels "finished" or "full". Differences assumed are inferred from skimming over the files. I chose to use the inputs which end in ".0.xlsx" ("finished") rather than the normal ".xlsx" extension ("full").

These are missing fields such as the "funders" section (as opposed to the 6 excel files above). Not sure of other differences yet.

Name		Name	Last commit message	Last commit date
Latest commit History 383 Commits
.attic		.attic
dcp_stats		dcp_stats
spreadsheets		spreadsheets
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
_pathlib.py		_pathlib.py
clean.py		clean.py
convert_matrices.py		convert_matrices.py
copy_static_project.py		copy_static_project.py
count_cells.py		count_cells.py
create_project.py		create_project.py
csv2mtx.py		csv2mtx.py
download.py		download.py
download_scxa.py		download_scxa.py
extract.py		extract.py
generate_metadata.py		generate_metadata.py
generate_metadata_scxa.py		generate_metadata_scxa.py
geo-accession-accessory-files.json		geo-accession-accessory-files.json
h5_to_mtx.py		h5_to_mtx.py
make_stubs.py		make_stubs.py
overview_report.py		overview_report.py
requirements.txt		requirements.txt
scxa.ipynb		scxa.ipynb
tsne.py		tsne.py
upload_assets.py		upload_assets.py
upload_bundles.py		upload_bundles.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

load-project

About

Releases

Packages

Contributors 7

Languages

DailyDreaming/load-project

Folders and files

Latest commit

History

Repository files navigation

load-project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages