Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ben ensembl patch 1 #58

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
badc6cb
Update and rename blast.md to how_to_run_blast.md
Ben-Ensembl Apr 27, 2023
b127c44
Update toc.yml
Ben-Ensembl Apr 27, 2023
ca40209
Update toc.yml
Ben-Ensembl Apr 27, 2023
64d2d77
Create the_different_blast_programs.md
Ben-Ensembl Apr 27, 2023
c2950ed
Update how_to_run_blast.md
Ben-Ensembl Apr 27, 2023
b5d7bdb
Update the_different_blast_programs.md
Ben-Ensembl Apr 27, 2023
250e618
Update toc.yml
Ben-Ensembl Apr 27, 2023
cda11ee
Create error_messages.md
Ben-Ensembl Apr 27, 2023
dadcb58
Update error_messages.md
Ben-Ensembl May 12, 2023
eb71f3b
Update error_messages.md
Ben-Ensembl May 12, 2023
c3ae6a5
Add files via upload
Ben-Ensembl May 12, 2023
f51ef56
Create blast_parameters.md
amushtaq102 May 18, 2023
0566050
Update toc.yml
amushtaq102 May 18, 2023
81e7d04
Create blast_results_table.md
amushtaq102 May 18, 2023
8218f55
Update blast_results_table.md
amushtaq102 May 18, 2023
41dbef7
Update toc.yml
amushtaq102 May 18, 2023
d2dcb65
Add files via upload
amushtaq102 May 18, 2023
c9a4353
Create download_blast_results.md
amushtaq102 May 18, 2023
871c649
Update toc.yml
amushtaq102 May 19, 2023
b68a3b7
Update blast_results_table.md
Ben-Ensembl May 22, 2023
344097b
Update blast_results_table.md
Ben-Ensembl May 22, 2023
a1f6607
Update download_blast_results.md
Ben-Ensembl May 22, 2023
d8aeec9
Update toc.yml
Ben-Ensembl May 22, 2023
a78d3e7
Update blast_parameters.md
Ben-Ensembl May 22, 2023
d29ba79
Update blast_parameters.md
Ben-Ensembl May 22, 2023
4f5434e
Update blast_results_table.md
Ben-Ensembl May 22, 2023
38b26b3
Update download_blast_results.md
Ben-Ensembl May 22, 2023
96724cd
Update blast_parameters.md
Ben-Ensembl May 22, 2023
846a361
Update error_messages.md
Ben-Ensembl May 22, 2023
0183e94
Update how_to_run_blast.md
Ben-Ensembl May 22, 2023
40679ea
Update the_different_blast_programs.md
Ben-Ensembl May 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
slug: blast-parameters
title: What are the different BLAST parameters and how to use them
description: How to use the different parameters for your BLAST query and what does each parameter do
related_articles:
- href: how_to_run_blast.md
- href: the_different_blast_programs.md
- href: error_messages.md
- href: blast_results_table.md
- href: download_blast_results.md
tags:
- blast
status: draft
---

# How to use BLAST parameters?
Clicking on the ‘Parameters” link will expand a normally collapsed section allowing access to additional parameter settings.

# A description of the different BLAST parameters
The Ensembl BLAST tool has several parameters that can affect the search sensitivity. There are some general and scoring parameters that can be modified by using the pull-down menu. A description of the individual parameters can be found below;

## Max. alignments
This parameters sets the maximum database alignments displayed for a given query. It varies from 5 to 1000 and the default is set to 100.

## Max. scores
This is the number of database hits that are displayed. The actual number of alignments may be greater than this. It varies from 5 to 1000 and the default is set to 50

## E-Threshold
The alignments found by BLAST during a search are scored and assigned a statistical value, called the Expect Value. The E-value is the number of times an alignment as good or better than that found by BLAST would be expected to occur by chance, given the size of the database searched. This parameter can set the number of hits reported that contain more than the Expected values selected. It varies from 1e-200 to 1000 and the default value is 10.
A higher E-value threshold is less stringent and the BLAST default of “10” is designed to ensure that no biologically significant alignment is overlooked. For significant alignments the E-value should be close to zero.

## Statistical accuracy

## HSPs per hit
High-scoring segment pair (HSP) is a local alignment with no gaps that achieves one of the highest alignment scores in a given search. It corresponds to the matching region between the query and the database hit sequence.
The HSP distribution can be visualised on the query, which is shown as a chain of black and white boxes. Fragments of the query sequence that hit other places in the genome are shown as red boxes. Usually these fragments are small (they vary between 1-100 nt) and map to various locations. These sequences are of low complexity, such as repetitive sequences.

## Drop-off
This is the X drop-off value for final gapped alignment. If X value is high, the quality of the alignment might degrade whereas a smaller value may increase the chances of missing some alignment. This value is set to 0 by default, however it can be changed to 2, 4, 6, 8 and 10.

## Word size
It is the length of the seed that initiates an alignment between the query and the target sequences. It varies from 2 to 15 and the default is 11 (nucleotides) for DNA and 3 (residues) for protein.

## Match/mismatch scores
The Match/Mismatch Scores will specify the reward assigned to the exact match and penalty assigned to a mismatch. The default value is set to 2,-3.

## Gap penalties
The Gap penalties parameter specifies how gaps that are introduced in the alignment should be penalised.
There may be a need to introduce gaps into sequences in order to compensate for insertions and deletions. The gap penalty should be large enough that gaps are introduced only where needed, and the penalty for extending a gap should take into account the possibility that gaps occur over several residues at a time.
This parameter allows you to choose from several different sets of parameters for scoring gaps. These values are set to 5,2 as default.

## Gap align
This option uses a lower threshold for generating the list of high-scoring matching words; the algorithm uses short matched regions with no insertions or deletions between them and within a certain distance of each other as the starting points for longer ungapped alignments. These joined regions are then extended using the same method as in the original BLAST.

## Filter low complexity regions
Certain sequences, such as low-complexity regions (homopolymeric runs, short-period repeats, and subtler over representation of one or a few residues) can display significant similarity when there is no significant homology. This option is used to mask or filter low complexity regions in amino acid queries in order to improve the sensitivity of sequence similarity searches performed with that sequence.
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
slug: blast-results-table
title: BLAST results table
description: What does the information in the BLAST results table mean
related_articles:
- href: how_to_run_blast.md
- href: the_different_blast_programs.md
- href: blast_parameters.md
- href: error_messages.md
- href: download_blast_results.md
tags:
- blast
status: draft
---
# A description of the BLAST results table
The results page will show summary results for each sequence submitted with a graphical display of hits on the query sequence. A table of results for each combination of sequences and target database can be found by clicking on the drop down icon next to the species and genome assembly name. The results table lists the sequence similarity hits in order of E-value, but can be customised to reorder the table based on the different columns by clicking on the arrow next to each column heading. You can customise all rows, or select specific hits, by clicking on the box in the first column of each row.

## E-value
This is the number of times a match is expected to occur by chance. A lower E-value indicates greater similarity between sequences.

## Length
Length
Length of alignment between query and target sequence.

## Alignment view
Click on this to view the one-to-one alignment between the region of your query sequence and the subject sequence from the database. This alignment will show the length of coverage of your alignment and the presence of any gaps and mismatched regions.

## % ID
This score indicates the extent to which the query sequence and the hit have the same residue at the same position in an alignment.

## Score
The score gives an indication of how good the alignment is, with a higher score indicating a more exact alignment.

## Genomic location
Shows the genomic location of the hit in the selected species. You can click on the coordinates to view this location in the Ensembl Genome Browser.

## Hit orientation
The orientation of the hit sequence against the query sequence.

## Hit start
The position within the target sequence at which the hit started.

## Hit end
The position within the target sequence at which the hit ends.

## Query start
The first position within the query sequence that matches the beginning of the hit that BLAST returns

## Query end
The last position within the query sequence that matches the beginning of the hit that BLAST returns
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
slug: download-blast-results
title: How to download BLAST results
description: The different ways you can download results from your BLAST query
related_articles:
- href: how_to_run_blast.md
- href: the_different_blast_programs.md
- href: blast_parameters.md
- href: blast_results_table.md
- href: error_messages.md
tags:
- blast
status: draft
---
# There are different ways to download your BLAST results
## How to download results for the entire submission
You can download all results as a ZIP file by clicking on the download button on top of the results table.

![Download button](blast/BLAST_Download_1.png)

## How to download individual BLAST results
We also allow you to download individual BLAST results table as a tsv file by expanding the results table for selected species, clicking on the ‘Actions’ drop down menu and selecting ‘Download this table’.

![Download individual row from results table](blast/BLAST_Download_2.png)

## Customising your results download
The ‘Actions’ drop down menu further allows you to configure your download file by filtering for specific columns of the table and show or hide selected rows of the table by clicking on the eye icon. You can restore the default settings of the table by clicking on ‘Restore defaults’ in the ‘Actions’ drop down menu.

![Download customised results table](blast/BLAST_Download_3.png)


Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
slug: error-messages
title: I have an error message - what can I do?
description: How to resolve different error messages commonly encountered in BLAST
related_articles:
- href: how_to_run_blast.md
- href: the_different_blast_programs.md
- href: blast_parameters.md
- href: blast_results_table.md
- href: download_blast_results.md
tags:
- blast
status: draft
---

# Blast error messages

## Sequence input error messages

Sequence input errors are indicated in 3 ways, with
* a red exclamation mark on the bar above the sequence boxes.
* a red exclamation mark above the box with the offending sequence.
* a red keyline around the box with the offending sequence.

![BLAST error message](blast_error_message.png)

Error messages are the red exclamation mark(s) and red key lines are displayed when mistakes are made in sequence input .

## Reasons for error messages

Error messages will show for a variety of reasons when a sequence is entered into the Blast interface.

### Sequence format

The blast sequence input box will only accept sequences (nucleotide or amino acid) in a FASTA or plain text format.

If the input sequence is not in the correct format an error message will be displayed.

If the sequence you have submitted is not in a FASTA or plain text format an error message will be displayed and you will need to change your sequence format to FASTA or plain text.

### Length of sequence
The blast sequence input box will only accept sequences (nucleotide or amino acid) over 5 bases long.

If the input sequence under 5 bases an error message will be displayed.

If the sequence you have submitted is less than 5 bases long an error message will be displayed and you will need to increase the length of your sequence.

### Special characters

The blast sequence input box will not accept sequences (nucleotide or amino acid) that contain special characters.

If the input sequence contains special characters an error message will be displayed.

If the sequence you have submitted contains special characters an error message will be displayed and you will need to remove the special characters from your sequence.

### Sequence types and blast programs

The sequence input interface will only accept the type of sequence required for the Blast program selected.

For example if Blastn is the selected program the sequence input interface will accept multiple nucleotide sequences. However if a protein sequence is uploaded amongst the nucleotide sequences an error message will be displayed.

If a sequence you have submitted is incompatible with the Blast program you have selected an error message will be displayed around the incompatible sequence and you will need to delete the sequence.

## Blast program error messages

### Submission failed
The entire submission has failed.

### Run failed
A single job within the entire submission has failed ie a Blastn running a single sequence against a single species has failed.

Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
---
slug: blast
slug: how-to-run-blast
title: How to run BLAST
description: How to run BLAST queries
related_articles:
- href: blast_parameters.md
- href: the_different_blast_programs.md
- href: error_messages.md
- href: blast_results_table.md
- href: download_blast_results.md
tags:
- blast
status: draft
Expand All @@ -11,17 +17,9 @@ status: draft

BLAST is a sequence similarity search tool that can be used for both DNA and proteins.

## Selecting BLAST parameters

The target databases available for similarity searches are DNA, transcripts and proteins. Click on the ‘Database’ dropdown menu and choose from these options.

The following BLAST programs are available:
- BLASTn: nucleotide sequences against nucleotide databases
- tBLASTn: translated nucleotide sequences against a nucleotide database
- tBLASTx: translated nucleotide sequences against a translated nucleotide database
- BLASTp: peptide sequences against peptide databases
- BLASTx: nucleotide sequences against amino acid databases
## Selecting BLAST database and program

The target databases available for similarity searches are: genomic sequence (softmasked), genomic sequence, transcripts and proteins. Click on the ‘Database’ dropdown menu and choose from these options.

The relevant BLAST programs are selected automatically and displayed in the *Program* drop-down menu when the target database is selected and the Nucleotide or Protein option is selected above the sequence text box. The BLAST programs have pre-configured parameters, which you can view and change by clicking on the ‘Parameters’ option.

Expand All @@ -41,27 +39,16 @@ You can select the species you want to run your sequence against by clicking on

You can give a name or description to this BLAST query in the ‘Submission name’ (optional) field. Once your parameters are set, click RUN to start the search.

## What does the result table show
## What does the results view show

The submissions page will show the jobs that are currently running or recently completed. Jobs are divided into two lists: ‘Unviewed Jobs’ and ‘Jobs list’. A submission ID is assigned to each submission and additional information is provided, including the date and time of submission. If you navigate away from the BLAST interface, the status of the query is indicated by the BLAST icon in the top panel changing from red to green to prompt you that it has successfully completed.

You can view the results by clicking on the Results button or you can download the results by clicking on the blue Download icon. A submission in the ‘Unviewed jobs’ list, when viewed, is transferred to the ‘Jobs list’ for future reference. Results are available for 7 days and queries can be rerun for 28 days.

## What does the result table show

The results page shows summary results for each sequence submitted with a graphical display of hits on the query sequence. A table of results for each combination of sequences and target database can be found by clicking on the drop down icon next to the species and genome assembly name.

![BLAST results page](media/Blast_results.png)

The results table lists the sequence similarity hits in order of E-value, but can be customised to reorder the table based on the different columns by clicking on the arrow next to each column heading. You can customise all rows, or select specific hits, by clicking on the box in the first column of each row.

The table provides information on:
- **E-value**: The number of times a match is expected to occur by chance.
- **Length**: Length of alignment between query and target sequence.
- **Alignment**: You can view alignment between the query sequence and the hit sequence.
- **%ID**: Indicates the extent to which the query sequence and the hit have the same residue at the same position in an alignment.
- **Score**: The score gives an indication of how good the alignment is, with a higher score indicating a more exact alignment.
- **Genomic location**: Shows the genomic location of the hit in this species
- **Hit orientation**: The orientation of the hit against the query sequence
- **Hit start**: The position within the target sequence at which the hit started
- **Query start**: The first position within the query sequence that matches the beginning of the hit that BLAST returns
- **Query end**: The last position within the query sequence that matches the beginning of the hit that BLAST returns

Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
slug: the-different-blast-programs
title: The different BLAST programs
description: The different BLAST programs
related_articles:
- href: how_to_run_blast.md
- href: blast_parameters.md
- href: error_messages.md
- href: blast_results_table.md
- href: download_blast_results.md
tags:
- blast
status: draft
---

# The different BLAST programs

The following BLAST programs are available:

- BLASTn: nucleotide sequences against nucleotide databases
- tBLASTn: translated nucleotide sequences against a nucleotide database
- tBLASTx: translated nucleotide sequences against a translated nucleotide database
- BLASTp: peptide sequences against peptide databases
- BLASTx: nucleotide sequences against amino acid databases

The relevant BLAST programs are selected automatically and displayed in the Program drop-down menu when the target database is selected and the Nucleotide or Protein option is selected above the sequence text box. The BLAST programs have pre-configured parameters, which you can view and change by clicking on the ‘Parameters’ option.
12 changes: 11 additions & 1 deletion docs/ensembl-help/using-ensembl/ensembl-apps/blast/toc.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,12 @@
- name: How to run BLAST
href: blast.md
href: how_to_run_blast.md
- name: The different BLAST programs
href: the_different_blast_programs.md
- name: I have an error message - what can I do?
href: error_messages.md
- name: What are the different BLAST parameters and how to use them
href: blast_parameters.md
- name: BLAST results table
href: blast_results_table.md
- name: How to download BLAST results
href: download_blast_results.md