Skip to content
View mourisl's full-sized avatar

Block or report mourisl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mourisl/README.md

Hi there 👋 I'm Li Song, an Assistant Professor in the Department of Biomedical Data Science at Dartmouth College. My reserach area is bioinformatics and my research interest is to design algorithms and develop software to analyze sequencing data. Here is the software developed by collaborators and me:

Immunology

  • TRUST4: TCR/BCR assembler for RNA-seq data. TRUST4 can be applied on either bulk or single-cell RNA-seq data. In addition to report CDR3s, TRUST4 also assembles full-length TCRs/BCRs. GitHub Repo stars Anaconda-Server Badge
  • T1K: Genotyper for highly polymorphic genes including KIR and HLA. T1K is verstile and works with RNA-seq, WGS and WES data. T1K also identifies novel SNPs and is compatible with single-cell RNA-seq data. GitHub Repo stars Anaconda-Server Badge

Microbiome

  • Centrifuger: Fast and memory-efficient classifier for metagenomics sequences using a lossless compressed FM-index with run-block compressed BWT. It can assign the taxonomy IDs to each sequencing read by comparing it against a database containing 34,190 prokaryotic genomes with 140 Gbp sequences using about 43 Gb memory. GitHub Repo stars Anaconda-Server Badge
  • Centrifuge: Fast and memory-efficient classifier for metagenomics sequences using an FM-index. It requires only 4.2 Gb memory for a database containing ~4300 prokaryotic genomes using lossy representations. GitHub Repo stars Anaconda-Server Badge

RNA-seq

  • CLASS/CLASS2: Efficient and accurate transcript assemblers for RNA-seq data that detect more fine-grained alternative splice variants. The programs combine linear programming algorithms to detect exons from read coverage levels, with splice graph representations of genes and their splice variants, and memory efficient optimization algorithms for transcript selection. [Also on SourceForge] GitHub Repo stars
  • PsiCLASS: Simultaneous multi-sample transcript assembler for RNA-seq data. It builds a global data structure representing the structure of the transcripts, from which each sample generates its expressed transcripts. The global information allows accurate sample-wise assemblies and final meta-assembly. GitHub Repo stars Anaconda-Server Badge
  • Rcorrector: Efficient and accurate k-mer-based error correction software for Illumina RNA-seq reads. It can also be applied to data sets where the read coverage is non-uniform, such as single-cell sequencing. GitHub Repo stars Anaconda-Server Badge
  • Rascaf: Scaffolding with RNA-seq read alignment. It uses information from paired-end and split reads to improve the completeness and contiguity of a draft genome assembly, particularly in the gene regions. GitHub Repo stars Anaconda-Server Badge

Next-generation sequencing

  • Chromap: Ultrafast alignment and preprocessing for chromatin profiling sequencing data, including ChIP-seq, ATAC-seq and Hi-C. It supports both bulk and single-cell platforms, and is more than 10 times faster than traditional workflows without sacrificing alignment accuracy. GitHub Repo stars Anaconda-Server Badge
  • Lighter: Fast and memory-efficient k-mer-based software to correct the sequencing errors from whole genome sequencing data without counting. It samples the k-mers in the data set and uses two memory-efficient Bloom filters to obtain solid k-mers. GitHub Repo stars Anaconda-Server Badge

Visualization

Python libraries to help with plotting figures
  • MSAplot: visualize multiple sequence alignment
  • pvalannot: add p-value annotation to box plots generated by Seaborn.
  • heatmapannot: add color annotation in the axes to heatmap or dot plot generated by Seaborn.

Pinned Loading

  1. centrifuger centrifuger Public

    Classifier for metagenomic sequences using FM-index with run-block compressed BWT.

    C++ 49 4

  2. liulab-dfci/TRUST4 liulab-dfci/TRUST4 Public

    TCR and BCR assembly from RNA-seq data

    C 278 49

  3. T1K T1K Public

    T1K is a versatile methods to genotype highly polymorphic genes (e.g. KIR, HLA) with bulk or single-cell RNA-seq, WGS or WES data.

    C 48 7

  4. splicebox/PsiCLASS splicebox/PsiCLASS Public

    Simultaneous multi-sample transcript assembler for RNA-seq data

    C 16 4

  5. Lighter Lighter Public

    Fast and memory-efficient sequencing error corrector

    C++ 92 17

  6. Rcorrector Rcorrector Public

    Error correction for Illumina RNA-seq reads

    C++ 63 18