Mutual Information Tools for protein Sequence analysis
MIToS v2.1.0 for Julia 0.6 is out, check out the NEWS!
You need to do
Pkg.add("MIToS") to install it or
Pkg.update() to update your installed
Some breaking changes were introduced in v2.0.0. See the NEWS.md
file and the new documentation to migrate code from an old version
of MIToS. If you need more help to migrate code from MIToS 1.0 in Julia 0.4 to MIToS 2.1 in Julia 0.6, you can
write a mail to diegozea at gmail dot com asking for assistance.
Documentation for MIToS 1.0 in Julia 0.4
Documentation for MIToS 2.0 or greater in Julia 0.5 or greater:
MIToS is an environment for Mutual Information (MI) analysis and implements several useful
tools for Multiple Sequence Alignments (MSAs) and PDB structures management in the Julia
language . MI allows determining covariation between positions in a MSA. MI derived scores
are good predictors of residue contacts and functional sites in proteins [2,3].
MIToS starting point was an improvement of the algorithm published by Buslje et. al. . A BLOSUM62-based pseudocount strategy, similar to Altschul et. al., was implemented for
a better performance in the range of MSAs with low number of sequences. MIToS offers
all the necessary tools for using, developing and testing MI based scores, in different
MIToS tools are separated on different modules, related to different tasks.
- MSA This module defines multiple functions and types for dealing with MSAs and
their annotations. It also includes facilities for sequence clustering.
- PDB This module defines types and methods to work with protein structures from PDB.
- SIFTS This module allows access to SIFTS residue-level mapping of UniProt, Pfam and
other databases with PDB entries.
- Information This module defines residue contingency tables and methods on them
to estimate information measure from MSAs. It includes functions to estimate corrected
mutual information (ZMIp, ZBLMIp) between MSA columns.
This module use the previous modules to work with Pfam MSAs. It also has useful parameter
optimization functions to be used with Pfam alignments.
- Utils MIToS has also an Utils module with common utils functions and types used
in this package.
MIToS implements several useful scripts for command line execution
(without requiring Julia coding):
- Buslje09.jl : Calculates the corrected MI/MIp described on Buslje et. al. 2009 .
- BLMI.jl : Calculates corrected mutual information using BLOSUM62 based-pseudocounts.
- DownloadPDB.jl : Downloads gzipped files from PDB.
- Distances.jl : Calculates residues distances in a PDB file.
- SplitStockholm.jl : Splits a Stockholm file with multiple alignments into one
compressed file per MSA
- AlignedColumns.jl : Creates a Stockholm file with the aligned columns from a Pfam
Stockholm file (insertions are deleted) saving the mapping (residue number in UniProt)
and the columns in the original MSA.
- PercentIdentity.jl : Calculates the percentage identity between all the sequences
of an MSA and saves mean, median, minimum, etc.
- MSADescription.jl : Calculates the number of columns, sequences and clusters after
Hobohm I clustering at 62% identity given a stockholm file as input . It also gives the
percent indentity mean and mean, standard deviation and quantiles of: sequence coverage of
the MSA and gap percentage.
If you use MIToS, please cite:
Diego J. Zea, Diego Anfossi, Morten Nielsen, Cristina Marino-Buslje; MIToS.jl: mutual information tools for protein sequence analysis in the Julia language, Bioinformatics, Volume 33, Issue 4, 15 February 2017, Pages 564–565, https://doi.org/10.1093/bioinformatics/btw646
- Zea, Diego Javier, et al. "MIToS. jl: mutual information tools for protein sequence
analysis in the Julia language." Bioinformatics 33, no. 4 (2016): 564-565.
- Buslje, Cristina Marino, et al. "Correction for phylogeny, small number of
observations and data redundancy improves the identification of coevolving amino acid
pairs using mutual information." Bioinformatics 25.9 (2009): 1125-1131.
- Buslje, Cristina Marino, et al. "Networks of high mutual information define the
structural proximity of catalytic sites: implications for catalytic residue
identification." PLoS Comput Biol 6.11 (2010): e1000978.
- Altschul, Stephen F., et al. "Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs." Nucleic acids research 25.17 (1997): 3389-3402.
- Hobohm, Uwe, et al. "Selection of representative protein data sets." Protein Science
1.3 (1992): 409-417.
Structural Bioinformatics Unit