Skip to content

Commit

Permalink
Merge pull request #120 from wurmlab/dev
Browse files Browse the repository at this point in the history
Small, small updates like: add -f option, use faster jq instead of json, fix typos and pub url.
  • Loading branch information
yeban authored Dec 24, 2017
2 parents 597b946 + fcb35bc commit f9524dd
Show file tree
Hide file tree
Showing 10 changed files with 59 additions and 36 deletions.
7 changes: 4 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
language: ruby
rvm:
- "2.0.0"
- "2.1.3"
- "2.2.0"
- "2.1.10"
- "2.2.7"
- "2.3.4"
before_install:
- wget -P ~ http://mafft.cbrc.jp/alignment/software/mafft-7.205-with-extensions-src.tgz
- tar -zxvf ~/mafft-7.205-with-extensions-src.tgz -C ~
Expand All @@ -13,6 +13,7 @@ before_install:
cache: bundler
sudo: false
script: bundle exec rake test
after_script: bundle exec codeclimate-test-reporter
addons:
code_climate:
repo_token: 2177997ae2dd26804c32e1ec34a2221f94b71a2170f6c1db2c020f8858cd87f2
5 changes: 4 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
source 'http://rubygems.org'

gemspec
gem 'codeclimate-test-reporter', group: :test, require: nil
group :test do
gem "simplecov"
gem "codeclimate-test-reporter", "~> 1.0.0"
end
37 changes: 24 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ If you would like to use GeneValidator on a few sequences, see our online [GeneV


If you use GeneValidator in your work, please cite us as follows:
> [Dragan M<sup>&Dagger;</sup>, Moghul MI<sup>&Dagger;</sup>, Priyam A, Bustos C & Wurm Y. 2016. GeneValidator: identify problems with protein-coding gene predictions. <em>Bioinformatics</em>, doi: 10.1093/bioinformatics/btw015](http://bioinformatics.oxfordjournals.org/content/early/2016/02/26/bioinformatics.btw015).
> [Dragan M<sup>&Dagger;</sup>, Moghul MI<sup>&Dagger;</sup>, Priyam A, Bustos C & Wurm Y. 2016. GeneValidator: identify problems with protein-coding gene predictions. <em>Bioinformatics</em>, doi: 10.1093/bioinformatics/btw015](https://academic.oup.com/bioinformatics/article/32/10/1559/1742817/GeneValidator-identify-problems-with-protein).


Expand Down Expand Up @@ -42,7 +42,7 @@ Each analysis of each query returns a binary result (good vs. potential problem)

## Installation
### Installation Requirements
* Ruby (>= 2.0.0)
* Ruby (>= 2.1.0)
* NCBI BLAST+ (>= 2.2.30+) (download [here](http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download)).
* MAFFT installation (>=7.273) (download [here](http://mafft.cbrc.jp/alignment/software/)).
* A web browser - [Mozilla FireFox](https://www.mozilla.org/en-GB/firefox/new/) & Safari are recommended. At the moment, it is not possible to use Chrome to view the results locally (as chrome does not allow ajax to local files). To avoid this, simply use a different browser (like Firefox or Safari) or start a local server in the results folder.
Expand Down Expand Up @@ -205,7 +205,7 @@ genevalidator -d DATABASE_PATH -e -t BLAST_TAB_FILE -o 'qseqid sseqid sacc slen
## If you ran the previous command (i.e. if you produced fasta file for the BLAST hits)
genevalidator -n NUM_THREADS -t BLAST_TAB_FILE -o 'qseqid sseqid sacc slen qstart qend sstart send length qframe pident nident evalue qseq sseq' -r RAW_SEQUENCES_FILE INPUT_FASTA_FILE

## If you did generate the BLAST hits fasta file (this will run the previous command for you)
## If you did not generate the BLAST hits fasta file (this will automatically run the previous command for you)
genevalidator -d DATABASE_PATH -n NUM_THREADS -t BLAST_TAB_FILE -o 'qseqid sseqid sacc slen qstart qend sstart send length qframe pident nident evalue qseq sseq' INPUT_FASTA_FILE

```
Expand All @@ -231,33 +231,45 @@ Lastly, a tabular summary of the results is also outputted in the terminal to pr

## Analysing the JSON output

There are numerous methods to analyse the JSON output including the [streamable JSON command line program](http://trentm.com/json/) or [jq](https://stedolan.github.io/jq/). The below examples use the JSON tool.
There are numerous methods to analyse the JSON output including the [streamable JSON command line program](http://trentm.com/json/) or [jq](https://stedolan.github.io/jq/). The below examples uses jq 1.5.

### Examplar JSON CLI Installation
### Examplar JQ CLI Installation
After installing node:

```bash
$ npm install -g json
# ubuntu
$ sudo apt-get install jq
# brew / linuxbrew
$ brew install jq
```

### Filtering the results

```bash
# Requires jq 1.5

# Extract sequences that have an overall score of 100
$ json -f INPUT_JSON_FILE -c 'this.overall_score == 100' > OUTPUT_JSON_FILE
$ cat INPUT_JSON_FILE | jq '.[] | select(.overall_score == 100)' > OUTPUT_JSON_FILE

# Extract sequences that have an overall score of over 70
$ json -f INPUT_JSON_FILE -c 'this.overall_score > 70' > OUTPUT_JSON_FILE
$ cat INPUT_JSON_FILE | jq '.[] | select(.overall_score == 70)' > OUTPUT_JSON_FILE

# Extract sequences that have more than 50 hits
$ json -f INPUT_JSON_FILE -c 'this.no_hits > 50' > OUTPUT_JSON_FILE
$ cat INPUT_JSON_FILE | jq '.[] | select(.no_hits > 50)' > OUTPUT_JSON_FILE

# Sort the JSON based on the overall score (ascending - 0 to 100)
$ json -f INPUT_JSON_FILE -A -e 'this.sort(function(a,b) {return (a.overall_score > b.overall_score) ? 1 : ((b.overall_score > a.overall_score) ? -1 : 0);} );' > OUTPUT_JSON_FILE

$ cat INPUT_JSON_FILE | jq 'sort_by(.overall_score)' > OUTPUT_JSON_FILE
# Sort the JSON based on the overall score (decending - 100 to 0)
json -f INPUT_JSON_FILE -A -e 'this.sort(function(a,b) {return (a.overall_score < b.overall_score) ? 1 : ((b.overall_score < a.overall_score) ? -1 : 0);} );' > OUTPUT_JSON_FILE
$ cat INPUT_JSON_FILE | jq 'sort_by(- .overall_score)' > OUTPUT_JSON_FILE

# Remove the large graphs objects (note these Graphs objects are required if you wish to pass the json back into GV using the `-j` option - see below)
$ cat INPUT_JSON_FILE | jq -r '[ .[] | del(.validations[].graphs) ]' > OUTPUT_JSON_FILE

# Save JSON as CSV
## Write header first
cat data/protein_data.fasta.json | jq -r '.[0] | ["idx", "overall_score", "definition", "no_hits", .validations[].header ] | @csv' > OUTPUT_JSON_FILE
## write content to the same file
$ cat INPUT_JSON_FILE | jq -r '.[] | [.idx, .overall_score, .definition, .no_hits, .validations[].print ] | @csv ' >> OUTPUT_JSON_FILE
```

The subsetted/sorted JSON file can then be passed back into GeneValidator (using the `-j` command line argument) to generate the HTML report for the sequences in the JSON file.
Expand All @@ -266,7 +278,6 @@ The subsetted/sorted JSON file can then be passed back into GeneValidator (using
genevalidator -j SORTED_JSON_FILE
```


## Related projects
[GeneValidatorApp](https://github.com/wurmlab/GeneValidatorApp) - A Web App wrapper for GeneValidator.<br>
[GeneValidatorApp-API](https://github.com/wurmlab/GeneValidatorApp-API) - An easy to use API for GeneValidatorApp to allow you to use GeneValidator within your web applications.
2 changes: 1 addition & 1 deletion aux/json_footer.erb
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
<% output_files.each_with_index do |results_html, idx| %>
<li><a href="<%=results_html%>"><%= idx + 1 %></a></li>
<% end %></ul></nav><%end%>
<footer><div class="container center-block"><p class="text-muted text-center">Please cite:<a href="http://bioinformatics.oxfordjournals.org/content/early/2016/02/26/bioinformatics.btw015"> "Dragan M<sup>&Dagger;</sup>, Moghul MI<sup>&Dagger;</sup>, Priyam A, Bustos C &amp; Wurm Y <em>(2016)</em> GeneValidator: identify problematic gene predictions"</a><br/> Developed at <a href="https://wurmlab.github.io" target="_blank">Wurm Lab</a>, <a href="http://www.sbcs.qmul.ac.uk" target="_blank">QMUL</a> with funding by <a href="http://www.bbsrc.ac.uk/home/home.aspx" target="_blank">BBSRC</a> and <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2013" target="_blank">Google Summer of Code 2013</a><br/>This page was created by <a href="https://github.com/wurmlab/genevalidator" target="_blank" >GeneValidator</a> v<%= GeneValidator::VERSION %></p></div></footer></body></html>
<footer><div class="container center-block"><p class="text-muted text-center">Please cite:<a href="https://academic.oup.com/bioinformatics/article/32/10/1559/1742817/GeneValidator-identify-problems-with-protein"> "Dragan M<sup>&Dagger;</sup>, Moghul MI<sup>&Dagger;</sup>, Priyam A, Bustos C &amp; Wurm Y <em>(2016)</em> GeneValidator: identify problematic gene predictions"</a><br/> Developed at <a href="https://wurmlab.github.io" target="_blank">Wurm Lab</a>, <a href="http://www.sbcs.qmul.ac.uk" target="_blank">QMUL</a> with funding by <a href="http://www.bbsrc.ac.uk/home/home.aspx" target="_blank">BBSRC</a> and <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2013" target="_blank">Google Summer of Code 2013</a><br/>This page was created by <a href="https://wurmlab.github.io/tools/genevalidator/" target="_blank" >GeneValidator</a> v<%= GeneValidator::VERSION %></p></div></footer></body></html>
2 changes: 1 addition & 1 deletion aux/template_footer.erb
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
<% output_files.each_with_index do |results_html, idx| %>
<li><a href="<%=results_html%>"><%= idx + 1 %></a></li>
<% end %></ul></nav><%end%>
<footer><div class="container center-block"><p class="text-muted text-center">Please cite:<a href="http://bioinformatics.oxfordjournals.org/content/early/2016/02/26/bioinformatics.btw015"> "Dragan M<sup>&Dagger;</sup>, Moghul MI<sup>&Dagger;</sup>, Priyam A, Bustos C &amp; Wurm Y <em>(2016)</em> GeneValidator: identify problematic gene predictions"</a><br/> Developed at <a href="https://wurmlab.github.io" target="_blank">Wurm Lab</a>, <a href="http://www.sbcs.qmul.ac.uk" target="_blank">QMUL</a> with funding by <a href="http://www.bbsrc.ac.uk/home/home.aspx" target="_blank">BBSRC</a> and <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2013" target="_blank">Google Summer of Code 2013</a><br/>This page was created by <a href="https://github.com/wurmlab/genevalidator" target="_blank" >GeneValidator</a> v<%= GeneValidator::VERSION %></p></div></footer></body></html>
<footer><div class="container center-block"><p class="text-muted text-center">Please cite:<a href="https://academic.oup.com/bioinformatics/article/32/10/1559/1742817/GeneValidator-identify-problems-with-protein"> "Dragan M<sup>&Dagger;</sup>, Moghul MI<sup>&Dagger;</sup>, Priyam A, Bustos C &amp; Wurm Y <em>(2016)</em> GeneValidator: identify problematic gene predictions"</a><br/> Developed at <a href="https://wurmlab.github.io" target="_blank">Wurm Lab</a>, <a href="http://www.sbcs.qmul.ac.uk" target="_blank">QMUL</a> with funding by <a href="http://www.bbsrc.ac.uk/home/home.aspx" target="_blank">BBSRC</a> and <a href="https://www.google-melange.com/gsoc/homepage/google/gsoc2013" target="_blank">Google Summer of Code 2013</a><br/>This page was created by <a href="https://wurmlab.github.io/tools/genevalidator/" target="_blank" >GeneValidator</a> v<%= GeneValidator::VERSION %></p></div></footer></body></html>
5 changes: 5 additions & 0 deletions bin/genevalidator
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ BANNER
opt[:raw_sequences] = raw_seq
end

opts.on('-f', '--force_rewrite',
'Rewrites over existing output.') do
opt[:force_rewrite] = true
end

opts.on('-b', '--binaries [binaries]', Array,
'Path to BLAST and MAFFT bin folders (is added to $PATH variable)',
'To be provided as follows:',
Expand Down
29 changes: 15 additions & 14 deletions genevalidator.gemspec
Original file line number Diff line number Diff line change
@@ -1,32 +1,29 @@
# coding: utf-8
lib = File.expand_path('../lib', __FILE__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require 'genevalidator/version'

Gem::Specification.new do |s|
# meta
s.name = 'genevalidator'
s.version = GeneValidator::VERSION
s.version = GeneValidator::VERSION
s.authors = ['Monica Dragan', 'Ismail Moghul', 'Anurag Priyam',
'Yannick Wurm']
s.email = '[email protected]'
s.homepage = 'https://wurmlab.github.io/tools/genevalidator/'
s.license = 'AGPL'
s.summary = 'Identifying problems with gene predictions.'
s.description = 'The tool validates the input predicted genes and provides' \
' useful information (length validation, gene merge'\
' useful information (length validation, gene merge' \
' validation, sequence duplication checking, ORF finding)' \
' based on the similarities to genes in public databases.'
s.required_ruby_version = '>= 2.1.0'

s.required_ruby_version = '>= 2.0.0'
s.add_development_dependency 'bundler', '~> 1.6'
s.add_development_dependency 'minitest', '~> 5.10'
s.add_development_dependency 'rake', '~>10.3'
s.add_development_dependency 'yard', '~> 0.9.11'
s.add_development_dependency 'codeclimate-test-reporter', '~> 0.4', '>= 0.4.7'
s.add_development_dependency('minitest', '~> 5.4')
s.add_dependency('bio', '~> 1.4')
s.add_dependency('bio-blastxmlparser', '~>2.0')
s.add_dependency('statsample', '2.0.1')

s.add_dependency 'bio', '~> 1.4'
s.add_dependency 'bio-blastxmlparser', '~>2.0'
s.add_dependency 'statsample', '2.0.1'

s.files = `git ls-files -z`.split("\x0")
s.executables = s.files.grep(%r{^bin/}) { |f| File.basename(f) }
Expand All @@ -35,15 +32,19 @@ Gem::Specification.new do |s|

s.post_install_message = <<INFO
------------------------------------------------------------------------
----------------------------------------------------------------------------
Thank you for validating your gene predictions with GeneValidator!
To launch GeneValidator execute 'genevalidator' from command line.
$ genevalidator [options] FASTA_FILE
Visit https://github.com/wurmlab/GeneValidator for more information.
------------------------------------------------------------------------
Visit https://wurmlab.github.io/tools/genevalidator/ for more information.
Note there is an also online demo server at:
http://genevalidator.sbcs.qmul.ac.uk
----------------------------------------------------------------------------
INFO
end
2 changes: 2 additions & 0 deletions lib/genevalidator/arg_validation.rb
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ def assert_BLAST_output_files
def assert_output_dir_does_not_exist
output_dir = "#{@opt[:input_fasta_file]}.html"
return unless File.exist?(output_dir)
FileUtils.rm_r(output_dir) if @opt[:force_rewrite]
return if @opt[:force_rewrite]
$stderr.puts 'The output directory already exists for this fasta file.'
$stderr.puts "\nPlease remove the following directory: #{output_dir}\n"
$stderr.puts "You can run the following command to remove the folder.\n"
Expand Down
2 changes: 1 addition & 1 deletion lib/genevalidator/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module GeneValidator
VERSION = '1.6.12'
VERSION = '1.7.2'.freeze
end
4 changes: 2 additions & 2 deletions test/test_helper.rb
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
require 'codeclimate-test-reporter'
CodeClimate::TestReporter.start
require "simplecov"
SimpleCov.start

0 comments on commit f9524dd

Please sign in to comment.