Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load all GEO GDS datasets #37

Open
JTFouquier opened this issue May 10, 2016 · 4 comments
Open

Load all GEO GDS datasets #37

JTFouquier opened this issue May 10, 2016 · 4 comments

Comments

@JTFouquier
Copy link
Collaborator

We need to build a tool to load all GEO GDS datasets (GDS datasets contain curated metadata) It should support incremental update.

Useful resources:


@JTFouquier
Copy link
Collaborator Author

An alternative is to load datasets from ArrayExpress (they imports dataset from GEO (plus other resources) and did the manual curations).

Related code examples:

http://pythonhosted.org/bioservices/references.html#bioservices.arrayexpress.ArrayExpress


Original comment by: Chunlei Wu

@JTFouquier
Copy link
Collaborator Author

Ref: http://www.ebi.ac.uk/arrayexpress/help/programmatic_access.html

To get a list of Experiments:

full list of array types can be obtained here:
ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/array/

For a given Experiment (using the id returned from above queries):


Original comment by: Chunlei Wu

@JTFouquier
Copy link
Collaborator Author

ArrayExpress experiment loaded from GEO has the ID pattern like this one:

E-GEOD-32474 <--> GSE32474


Original comment by: Chunlei Wu

@JTFouquier
Copy link
Collaborator Author

When making web-service calls, consider using a library like httplib2, with the support of local caching (so that avoid hitting web services too much during the development).


Original comment by: Chunlei Wu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant