Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subset columns to MIxS terms (version 5) #19

Closed
wdduncan opened this issue Sep 25, 2020 · 4 comments
Closed

Subset columns to MIxS terms (version 5) #19

wdduncan opened this issue Sep 25, 2020 · 4 comments

Comments

@wdduncan
Copy link
Collaborator

Create a version of biosample data whose columns are MIxS 5 terms. Not all the harmonized names (e.g., 'fire') MIxS terms.

cc @cmungall @realmarcin

@cmungall
Copy link
Collaborator

fire is in mixs

@cmungall
Copy link
Collaborator

I suggest before starting this, catalog the list of fields not in mixs, we may need to make a sssom mapping

@wdduncan
Copy link
Collaborator Author

wdduncan commented Sep 25, 2020

I going to subset using the non-human environmental package terms.

  • air
  • soil
  • sediment
  • water

@wdduncan
Copy link
Collaborator Author

See notebook build-non-human-samples.ipynb:
https://github.com/INCATools/biosample-analysis/blob/master/src/notebooks/build-non-human-samples.ipynb

This notebook is subset to env_packages containing the strings:

  • 'air'
  • 'soil'
  • 'sediment'
  • 'plant-associated'
  • 'water'

Output has been saved to target/non-human-samples.tsv.gz.

Potential enhancements to target/non-human-samples.tsv.gz:

  • include packages containing 'plant' instead of 'plant-associated'.
  • normalize all package names as described in normalize package names #24
  • replace values indicating missing data with NaNs

cc @realmarcin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants