Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change url to new website from legacy #29

Open
adamjohnwright opened this issue Nov 4, 2022 · 0 comments
Open

change url to new website from legacy #29

adamjohnwright opened this issue Nov 4, 2022 · 0 comments

Comments

@adamjohnwright
Copy link
Contributor

This is only once they can handle the larger queries:

https://rest.uniprot.org/uniprotkb/stream?compressed=true&format=list&query=%28reviewed%3Afalse

The pipeline downloads the first file using the following command:
get -N ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

his file downloads just as before.

The second file download traditionally could be accessed with the
following command:

get -O uniprot-reviewed:no.list.gz https://www.uniprot.org/uniprot/?query=reviewed:no&format=list&force=true&compress=yes

Unfortunately, this one no longer works after the update. I believe the new URL that we should be querying is:

wget -O uniprot-file-test.txt.gz https://rest.uniprot.org/uniprotkb/stream?compressed=true&format=list&query=(reviewed:false)

From this query, we are expecting a list of non-reviewed UniProt ids.

The documentation that the limit for the stream is something like 5 million ( a lot less than the >200 million).

Once they make it so that we can do the query with the non legacy site we should make the switch.

Another way to do this is

Uniprot suggestions:

ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz
and use the command below to extract all the accessions:
zgrep '>' uniprot_trembl.fasta.gz | cut -d '|' -f 2

Or, while the legacy site is up (https://legacy.uniprot.org/) you could continue to use your previous command, just replacing www by legacy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant