Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get coding sequences for a gene id #61

Open
snk5040 opened this issue Oct 20, 2022 · 2 comments
Open

Get coding sequences for a gene id #61

snk5040 opened this issue Oct 20, 2022 · 2 comments

Comments

@snk5040
Copy link

snk5040 commented Oct 20, 2022

Hi everyone,

I would like to use a list of gene ids to get FASTA formats of the proteins coded in those genes and the mRNA sequence without introns.

So far with this command I can get the protein sequence:
os.system('esearch -db gene -query "'+ "102888688" + ' [ID]" | elink -target protein -name gene_protein_refseq -cmd neighbor | xtract -pattern LinkSet -block IdList -element Id -block LinkSetDb -element Id | efetch -db protein -format fasta')

With this command I can get the mRNA with introns, which I don't want:
os.system('elink -db gene -id ' + "102888688" + ' -target nuccore -name gene_nuccore_refseqrna | efetch -format fasta')

@vkkodali
Copy link

You are better off doing this sort of thing using NCBI Datasets. That said, you can do this using EntrezDirect as follows:

$ elink -db gene -id 102888688 -target nuccore -name gene_nuccore_refseqrna | efetch -format fasta | head -n2 
>NM_001290175.1 Pteropus alecto interferon induced with helicase C domain 1 (IFIH1), mRNA
AGAGCTGCGTCGCGAGAGAGCAGAGGCGGCTCCCTAGTCCCGGCCCCCGCGAGCACCGTAGAGTCAGAGG
$ elink -db gene -id 102888688 -target protein -name gene_protein_refseq | efetch -format fasta | head -n2 
>NP_001277104.1 interferon-induced helicase C domain-containing protein 1 [Pteropus alecto]
MSNEYSADKRFRYLISCFRARVKMYIQVEPVLDYLTFLSADMKEQIQRTATTMGNINAAEQLLSTLEKGV

Your command to get mRNA is correct. What makes you say that the output sequence has introns?

@snk5040
Copy link
Author

snk5040 commented Oct 26, 2022

Great, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants