Marcin Magnus (/my family name/
(under development)
- RNA structural bioinformatics
- RNA utils
- Scientific software
- More
- References
- Books
- Notes
Created by gh-md-toc
seq = sequence
ss = secondary structure
rmsd = [Root-mean-square deviation of atomic positions](
RNA bioinformatics
Chapter #1 Introduction (by Michael Levitt) & #2 Modeling RNA Molecules (by Leontis & Westhof) & #5 Template-Based and Template-Free Modeling of RNA 3D Structure: Inspirations from Protein Structure Modeling of RNA 3D Structure Analysis and Prediction
Python Managing Your Biological Data with Python, Allegra Via, Kristian Rother, Anna Tramontano
The sequence:
Which format to save:
- a sequence
- a secondary structure
- a structure.
a) What is a FASTA format?
- write the seq in the Fasta format
b) RFAM database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs). (
- Which RNA family the seq belongs to?
- How many structures
- What members clan (CL00012) contains?
- Download the alignment and view it in JalView
Predict secondary structure for the sequence.
- Use CompaRNA ( to check what is the best available tool at the moment.
- Get a secondary structure from a structure 3E5C (PDB database)
- Compare the predicted secondary structure to the native secondary structure
- run SimRNAweb using the seq
- run SimRNAweb using the seq and ss
Edit secondary structure to provide your own secondary structure
Trick to run the tool:>seq|GUUCCCGAAAGGAUGGCGGAAACGCCAGAUGCCUUGUAACCGAAAGGGGGAAU|(((((((((((((((((((..))))))...)))))))..(((..)))))))))&action=Advanced
- use PyMOL to align SimRNAweb models with the native structure
QRNA 0.2 - Quick Refinement of Nucleic Acids 0.2
Tutorial by Magnus
QRNA [RNA&DNA ONLY (incl. modified nts)
- adds missing atoms (esp. hydrogen)
- single-point energy calculations
- energy minimization in all-atom representation (Amber ff ONLY, implicit water, constraints possible)
$ ./QRNA -i pdbfile.pdb -o outfile.pdb
It minimizes pdbfile.pdb and writes outfile.pdb every 100 steps.
All default parameters are used, which should be fine in most cases.
Mine RNA 3D structure motifs and their contacts - both with themselves and with proteins (RNAbricks)
The Biopython Project is an open-source collection of non-commercial Python tools for computational biology and bioinformatics, created by an international association of developers. It contains classes to represent biological sequences and sequence annotations, and it is able to read and write to a variety of file formats. It also allows for a programmatic means of accessing online databases of biological information, such as those at NCBI. Separate modules extend Biopython's capabilities to sequence alignment, protein structure, population genetics, phylogenetics, sequence motifs, and machine learning. Biopython is one of a number of Bio* projects designed to reduce code duplication in computational biology.
- read a seq.fa into biopython
- calc rmsd between two structures using biopython
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license.[2] The name is derived from the term "panel data", an econometrics term for multidimensional structured data sets.
- make an dataframe,
and run head(), tail(), and select a column
NumPy (pronounced "Numb Pie" or sometimes "Numb pee"[1][2]) is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open source and has many contributors.
- make an array of [2,3,1,0]
- ...
matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like wxPython, Qt, or GTK+. There is also a procedural "pylab" interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB. SciPy makes use of matplotlib. matplotlib was originally written by John D. Hunter, has an active development community, and is distributed under a BSD-style license. Michael Droettboom was nominated as matplotlib's lead developer shortly before John Hunter's death in 2012.
- plot [2,3,1,0]
- ...
scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.[2] It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. scikit-learn is largely written in Python, with some core algorithms written in Cython to achieve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression and linear support vector machines by a similar wrapper around LIBLINEAR.
The IPython Notebook is now known as the Jupyter Notebook. It is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media. For more details on the Jupyter Notebook, please see the Jupyter website.
- select a chain by clicking and from the cmd
- renumber a chain
- show a structure as cartoon and ribbon
- align two structures
H2O is open-source software for big-data analysis. It is produced by the start-up (formerly 0xdata), which launched in 2011 in Silicon Valley. The speed and flexibility of H2O allow users to fit hundreds or thousands of potential models as part of discovering patterns in data. With H2O, users can throw models at data to find usable information, allowing H2O to discover patterns. Using H2O, Cisco estimates each month 20 thousand models of its customers' propensities to buy.
H2O's mathematical core is developed with the leadership of Arno Candel; after H2O was rated as the best "open-source Java machine learning project" by GitHub's programming members, Candel was named to the first class of "Big Data All Stars" by Fortune in 2014. The firm's scientific advisors are experts on statistical learning theory and mathematical optimization.
The H2O software runs can be called from the statistical package R and other environments. It is used for exploring and analyzing datasets held in cloud computing systems and in the Apache Hadoop Distributed File System as well as in the conventional operating-systems Linux, Mac OS, and Microsoft Windows. The H2O software is written in Java, Python, and R. Its graphical-user interface is compatible with four popular browsers: Chrome, Safari, Firefox, and Internet Explorer.
Programming languages The H2O software was written with three programming languages: Java (6 or later), Python (2.7.x), and R (3.0.0 or later).
- doc for h2o
- doc for h2o and python
- tutorials
err =, shell=True)
The system console, computer console, root console, operator's console, or simply console is the text entry and display device for system administration messages, particularly those from the BIOS or boot loader, the kernel, from the init system and from the system logger. It is a physical device consisting of a keyboard and a screen, and traditionally is a text terminal, but may also be a graphical terminal. System consoles are generalized to computer terminals, which are abstracted respectively by virtual consoles and terminal emulators. Today communication with system consoles is generally done abstractly, via the standard streams (stdin, stdout, and stderr), but there may be system-specific interfaces, for example those used by the system kernel.
- show the contect of seq.fa file in your terminal
- change permission to execute a file
- install biopython/pymol from using terminal
- login to a remote machine using ssh keys
- mount a drive using sshfs
- download from the terminal this file
- check the version of your system
- add this
to your (python) path and reload the file with a new variable - grep a file...
- open a seq.fa in vim and quite ;-)
- take a look at the processes at your machine? (htop)
- write a simple bash script to run
cat seq.fa
- go to your home and find
- gzip
- get the top of
- get the bottom of
- screen?
- diff?
- rsync?
- crontab?
- run mc and move to your home
- ?
- make an alias
magnus@peyote2:~$ qstat -u '*'
for i in `seq -w 1 10`; do echo "../SimRNA -c config.dat -s 1gid.fas -S -r 1gid_restraints_3_01.txt -E 10 -R $i -o 1gid+restraints_3_01_$i >& 1gid+restraints_3_01_$i.txt" | qsub -cwd -V -pe mpi 10 -l h_vmem=250M; done
Git (/ɡɪt/) is a version control system that is widely used for software development and other version control tasks. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows.[9] Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with other kernel developers contributing to its initial development. (private)
Rfam 12.0: updates to the RNA families database. Eric P. Nawrocki, Sarah W. Burge, Alex Bateman, Jennifer Daub, Ruth Y. Eberhardt, Sean R. Eddy, Evan W. Floden, Paul P. Gardner, Thomas A. Jones, John Tate and Robert D. Finn Nucleic Acids Research (2014) 10.1093/nar/gku1063
Gorodkin, J., & Walker, J. M. (n.d.). RNA Sequence , Structure , and Function : Computational and Bioinformatic Methods IN Series Editor.
RNA 3D Structure Analysis and Prediction Neocles Leontis, Eric Westhof 2012
RNA Sequence, Structure, and Function: Computational and Bioinformatic, Methods Editors: Gorodkin, Jan, Ruzzo, Walter L. (Eds.) 2014
Seq in fasta format:
Crystal Structure of the SMK box (SAM-III) Riboswitch with SAM
The SMKbox riboswitch (also known as SAM-III) is a RNA element that regulates gene expression in bacteria.[2][3] The SMK box riboswitch is found in the 5' UTR of the MetK gene in lactic acid bacteria. The structure of this element changes upon binding to S-adenosyl methionine (SAM) to a conformation that blocks the shine-dalgarno sequence and blocks translation of the gene.
There are other known SAM-binding riboswitches such as SAM-I and SAM-II, but these appear to share no similarity in sequence or structure to SAM-III.
Get clusters of 3e5c
Secondary structure:
((((((..((((.(((((....)))))....))))....((....)))))))) # 3E5C.pdb rnapdbee
((((((..((((.(((((....)))))....))))....((....)))))))) # mfold (-24.00) / 1.
.(((((..((((.(((((....)))))....))))....((....))))))). # clarna 2.
(((((((((((((((((((..))))))...)))))))..(((..))))))))) [view] [edit] [submit]
(((((((((((((((((((..)))))))...))))))..(((..))))))))) [view] [edit] [submit]
(((((((((((((((((((..))))))..).))))))..(((..))))))))) [view] [edit] [submit]
SimRNAweb of 3e5c (seq + ss)
- use data from Rhiju to model any RNA