Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The efetch command reported BioSamples’ Attributes completely different from those displayed at the NCBI website #45

Open
lauraht opened this issue Jun 26, 2019 · 2 comments

Comments

@lauraht
Copy link

lauraht commented Jun 26, 2019

I was trying to use efetch to obtain the Attributes of a BioSample record, but I found that for some BioSample records, the Attributes reported in the xml are completely different from those displayed at the NCBI website. And the BioSample Id reported in the xml is different from the BioSample Id specified in the efetch command.
I use the following command to get the xml of a BioSample record:
efetch -db biosample -id SAMEA5244969 -format xml

Example 1: for BioSample SAMEA5244969, the NCBI website displays the Attributes as shown at https://www.ncbi.nlm.nih.gov/biosample/10858554
However, the efetch command reported the following xml:

<?xml version="1.0" ?>
<BioSampleSet>
   <BioSample access="public" publication_date="2016-06-04T00:00:00.000" last_update="2017-01-23T16:11:22.000" submission_date="2016-06-14T11:27:28.390" id="5244969" accession="SAMEA4457316">   
      <Ids>     
         <Id db="BioSample" is_primary="1">SAMEA4457316</Id>   
      </Ids>   
      <Description>     
         <Title>Sample from Homo sapiens</Title>     
         <Organism taxonomy_id="9606" taxonomy_name="Homo sapiens">       
            <OrganismName>Homo sapiens</OrganismName>     
         </Organism>   
      </Description>   
      <Owner>     
         <Name>EBI</Name>   
      </Owner>   
      <Models>     
         <Model>Generic</Model>   
      </Models>   
      <Package display_name="Generic">Generic.1.0</Package>   
      <Attributes>     
         <Attribute attribute_name="Sample Name" harmonized_name="sample_name" display_name="sample name">source 4</Attribute>     
         <Attribute attribute_name="Sex" harmonized_name="sex" display_name="sex">male</Attribute>     
         <Attribute attribute_name="disease state" harmonized_name="disease" display_name="disease">normal</Attribute>     
         <Attribute attribute_name="organism part" harmonized_name="tissue" display_name="tissue">colon</Attribute>     
         <Attribute attribute_name="specimen with known storage state">frozen specimen</Attribute>  
      </Attributes>   
      <Status status="live" when="2016-06-14T11:27:28.393"/> 
   </BioSample> 
</BioSampleSet>

The Attributes in this xml are completely different from those displayed at the NCBI website. And the reported BioSample Id (SAMEA4457316) in this xml is different from the BioSample Id (SAMEA5244969) specified in the efetch command.

Example 2: for BioSample SAMEA104565009, the NCBI website displays the Attributes as shown at https://www.ncbi.nlm.nih.gov/biosample/11349430
However, the efetch command reported the following xml:

<?xml version="1.0" ?>

This xml does not contain any elements even though a list of Attributes are displayed at the NCBI website.

Example 3: for BioSample SAMEA5099860, the NCBI website displays the Attributes as shown at https://www.ncbi.nlm.nih.gov/biosample/10655621
However, the efetch command reported the following xml:

<?xml version="1.0" ?>
<BioSampleSet>
   <BioSample access="public" publication_date="2014-10-22T00:00:00.000" last_update="2016-10-25T08:32:28.000" submission_date="2016-05-19T19:48:00.303" id="5099860" accession="SAMEA3067264">   
      <Ids>     
         <Id db="BioSample" is_primary="1">SAMEA3067264</Id>   
      </Ids>   
      <Description>     
         <Title>Sample from Homo sapiens</Title>     
         <Organism taxonomy_id="9606" taxonomy_name="Homo sapiens">  
            <OrganismName>Homo sapiens</OrganismName>     
         </Organism>     
         <Comment>       
            <Paragraph>ExAC_v0.1_Sample_52281</Paragraph>     
         </Comment>   
      </Description>   
      <Owner>     
         <Name>EBI</Name>   
      </Owner>   
      <Models>     
         <Model>Generic</Model>   
      </Models>   
      <Package display_name="Generic">Generic.1.0</Package>   
      <Attributes>     
         <Attribute attribute_name="Sample Name" harmonized_name="sample_name" display_name="sample name">52281</Attribute>   
      </Attributes>   
      <Status status="live" when="2016-05-19T19:48:00.305"/> 
   </BioSample> 
</BioSampleSet>

The Attributes in this xml are completely different from those displayed at the NCBI website. And the reported BioSample Id (SAMEA3067264) in this xml is different from the BioSample Id (SAMEA5099860) specified in the efetch command.

I was wondering if you have some ideas about why the efetch command did not work correctly for the above BioSamples?

I’d greatly appreciate your help!

Thank you very much!

@lwagnerdc
Copy link
Collaborator

Hmm, it looks like efetch has stripped the noninteger part of your id, appears to just understand entrez numeric IDs rather than BioSample or SRA accessions. The accessions are indexed, so an extra step:
esearch -db biosample -q ERS3052368 | efetch -format xml

@lauraht
Copy link
Author

lauraht commented Jun 28, 2019

Thank you so much for your help! I really appreciate it!
I use the BioSample accession in “-q” (instead of the SRA accession) as below:
esearch -db biosample -q SAMEA5244969 | efetch -format xml
and it works as expected.
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants