This repository houses code that maps TFBS onto alignments. The purpose is to identify where TFBS are in an alignment. There are currently three aspects of the code, located in /code
directory.
- the
map+motif.py
script that can be implemented in command line. motif_scoring_and_extraction.ipynb
that is a jupyter notebook of the script for interactive coding and visualization- The
/D3_vis
portion, which is a working project to visualize TFBS interactively in the browser.
map_motif.py
is a python script that has two inputs.
File Inputs:
- alignment (fasta)
- TFBS Position Frequency Matrix.
Arguments:
- alignment fasta file
- TFBS Position Frequency Matrix
- optional -threshold score cutoff (outputs only scores greater than the specified threshold)
File Outputs:
-.csv
file that outputs found TFBSs at each position, if any, in alignment.
Output data frame includes:
- position
- score
- sequence entry
- raw_position (from each sequence entry)
- strand (which direction the motif was found)
- motif_found (sequence motif at each postion)
The output file will be saved in directory script was ran.
Example output file: map_motif-alignment.fa-motif.fm.csv
python map_motif.py alignment.fa motif.fm 3.2