This repository contains a Python package, named map_motif2, which maps transcription factor binding sites (TFBS) onto alignments. There are three components of the code, located inside the /map_motif2_code
directory:
motif_pipeline.py
- INPUT: This script can be run directly from command line. It takes in an alignment file path, a motif file path, and a desired threshold.
- OUTPUT: Generates a CSV file with information about each TFBS: the score, species, raw position, alignment position, and motif associated with it.
threshold_setter.py
- Helper script that is imported in motif_pipeline.py. Helps set an appropriate score threshold for determining TFBS's, depending on the motif being used.
MSE.py
- Helper script that is imported in motif_pipeline.py. This file, along with threshold_setter.py, are integrated into the final motif_pipeline.py.
The motivation behind mapping TFBS's onto alignments is to eventually gain an understanding of how the presence of TFBS's fluctuates across different species throughout time.
- map_motif2_code contains a folder for code-testing purposes (currently empty, but can be implemented).
Users should run this package through motif_pipeline.py. For example, assuming you are currently inside the map_mptif2/map_motif2_code
directory, you may type something like:
python motif_pipeline.py align_outlier_rm_with_length_VT0809.fa
../../TFBS_presence/data/pwm/bcd_FlyReg.fm 3.2
In command line. Once the CSV is generated within the same folder, you will see a success message.