FR-HIT

FR-HIT is an efficient fragment recruitment algorithm for next generation sequences against microbial reference genomes. It produces similar sensitivity of BLASTN, but runs at a 100 times higher speed. We applied FR-HIT, BLAST and several other alignment programs to recruit an Illumina dataset of 1 million 75-bp reads selected from a recent human gut microbiome study (Qin J, et al. Nature 2010, 464:59) to 194 public human gut bacterial reference genomes. BLAST recruited 475,584 reads and found 6,134,663 alignments in 241.5 hours. Mapping program BWA used 0.1 hours, but only produced 212,699 recruitments. FR-HIT recruited 523,868 reads and identified 5,780,580 alignments in 1.8 hours.

Current version: 0.7

The current version supports:

1. Multithreads parallel computing using OPENMP.
2. PSL output format.
3. Mask of low-complexity regions as lower cased sequences in reference database.
4. E-value cutoff.

Three perl scripts:

psl2sam.pl: transfer PSL format to SAM format.(This script is from samtools package)
frhit2pairend.pl: analysis FR-HIT output to get pair-end alignment information
binning-1.1.1: This program package performs taxonomy binning using output from FR-HIT. The algorithm is based on LCA (Lowest Common Ancestor).

Usage

Usage: fr-hit v0.7 [options]

    -a   <string>   reads file, *.fasta format
    -d   <string>   reference genome sequences file, *.fasta format
    -o   <string>   output recruitments file
    -e   <double>   e-value cutoff, default=10
    -u   <int>      mask out repeats as lower cased sequence to prevent spurious hits? 1: yes; 0: no; default=1
    -f   <int>      format control for output file,0:FR-HIT format; 1:PSL fromat, default=0
    -k   <int>      k-mer size (8<=k<=12), default=11
    -p   <int>      k-mer overlap of index (1<=p<-k), using small overlap for longer reads(454, Sanger), default=8
    -c   <int>      sequence identity threshold(%), default=75
    -g   <int>      use global or local alignment? 1:global; 0:local (need -m), default=0
    -w   <int>      minimal read length to use 2bp k-mer index step to 454 long reads, default=1000
    -m   <int>      minimal alignment coverage control for the read (g=0), default=30
    -l   <int>      length of throw_away_reads, default=20
    -t   <int>      maximum number of failed alingment attempts, default=20
    -r   [0,N]      how to report alignment hits, 0:all; N:the best top N hits for one read, default=0
    -n   <int>      do alignment for which chain? 0:both; 1:direct only; 2:complementary only. default=0
    -b   <int>      band_width of alignment, default=4
    -T   [0,N]      number of threads, default 1; with 0, all CPUs will be used
    -h   help

example:

    ./fr-hit -a 454reads-sample.fa -d 1000bacterialgenomes.fasta -c 90 -m 40 -w 120 -r 0 -o out.sop

The default output format of FR-HIT recruitment result file looks like:

    ReadName	ReadLength	E-value	AlignmentLength	Begin	End	Strand	Identity	ReferenceSequenceName	Begin	End

    1_lane2_1       75nt    8.3e-25 69      69      1       -       95.01%     Acidaminococcus_D21     486841311       486841379
    1_lane2_1       75nt    8.3e-25 69      69      1       -       95.23%     Ruminococcus_5_1_39B_FAA        3450573 3450641
    1_lane2_9       75nt    2.2e-25 64      1       64      +       98.81%     Acidaminococcus_D21     7901322 7901385
    1_lane2_9       75nt    2.2e-25 64      1       64      +       98.90%     Alistipes_putredinis_DSM_17216  1029618 1029681
    1_lane2_10      75nt    4.5e-23 72      1       72      +       93.32%     Acidaminococcus_D21     453948881       453948952
    1_lane2_10      75nt    4.5e-23 72      1       72      +       93.67%     Prevotella_copri_DSM_18205      3128442 3128513
    1_lane2_11      75nt    1.7e-22 74      75      2       -       91.21%     Acidaminococcus_D21     451839012       451839085
    1_lane2_11      75nt    1.7e-22 74      75      2       -       91.08%     Prevotella_copri_DSM_18205      1018573 1018646

FR-HIT supports PSL output format and users can also use psl2sam.pl to convert PSL format to SAM format.

Install

FR-HIT has no any dependency, so just clone FR-HIT repo, and build the fr-hit binary:

    git clone https://github.com/Beifang/fr-hit.git
    cd fr-hit
    make

Now you can put the resulting binary where your $PATH can find it. If you have supermissions, then I recommend dumping it in the system directory for locally compiled packages:

    sudo mv fr-hit /usr/local/bin/

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
binning-1.1.1		binning-1.1.1
LICENSE		LICENSE
README.md		README.md
align.cpp		align.cpp
align.h		align.h
frhit.cpp		frhit.cpp
frhit.h		frhit.h
frhit2pairend.pl		frhit2pairend.pl
makefile		makefile
param.cpp		param.cpp
param.h		param.h
psl2sam.pl		psl2sam.pl
reads.cpp		reads.cpp
reads.h		reads.h
refseq.cpp		refseq.cpp
refseq.h		refseq.h
utilities.cpp		utilities.cpp
utilities.h		utilities.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FR-HIT

Usage

Install

About

Releases

Packages

Languages

License

Beifang/fr-hit

Folders and files

Latest commit

History

Repository files navigation

FR-HIT

Usage

Install

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages