Mutual Information Tools for protein Sequence analysis in Julia

First Commit


Last Touched

30 days ago

Commit Count

415 commits



Mutual Information Tools for protein Sequence analysis

Julia 0.4: MIToS

Linux, OSX: Build Status

Windows: Build status

Code Coverage: Coverage Status codecov.io

Julia version: MIToS only works with Julia 0.4, you can download this Julia version from here. MIToS 2.0 with Julia 0.5 support is under development, we expect to release it soon.

MIToS v1.1.0 is out, check out the NEWS! You need to do Pkg.add("MIToS") to install it or Pkg.update() to update your installed version.

DOCUMENTATION : http://mitos.leloir.org.ar

MIToS is an environment for Mutual Information (MI) analysis and implements several useful tools for Multiple Sequence Alignments (MSAs) and PDB structures management in the Julia language. MI allows determining covariation between positions in a MSA. MI derived scores are good predictors of residue contacts and functional sites in proteins [1,2].

MIToS starting point was an improvement of the algorithm published by Buslje et. al. [1]. A BLOSUM62-based pseudocount strategy, similar to Altschul et. al. [3], was implemented for a better performance in the range of MSAs with low number of sequences. MIToS offers all the necessary tools for using, developing and testing MI based scores, in different modules.


MIToS tools are separated on different modules, related to different tasks.

  • MSA This module defines multiple functions and types for dealing with MSAs and their annotations. It also includes facilities for sequence clustering.
  • PDB This module defines types and methods to work with protein structures from PDB.
  • SIFTS This module allows access to SIFTS residue-level mapping of UniProt, Pfam and other databases with PDB entries.
  • Information This module defines residue contingency tables and methods on them to estimate information measure from MSAs. It includes functions to estimate corrected mutual information (ZMIp, ZBLMIp) between MSA columns.
  • Pfam This module use the previous modules to work with Pfam MSAs. It also has useful parameter optimization functions to be used with Pfam alignments.
  • Utils MIToS has also an Utils module with common utils functions and types used in this package.


MIToS implements several useful scripts for command line execution (without requiring Julia coding):

  • Buslje09.jl : Calculates the corrected MI/MIp described on Buslje et. al. 2009 [1].
  • BLMI.jl : Calculates corrected mutual information using BLOSUM62 based-pseudocounts.
  • DownloadPDB.jl : Downloads gzipped files from PDB.
  • Distances.jl : Calculates residues distances in a PDB file.
  • SplitStockholm.jl : Splits a Stockholm file with multiple alignments into one compressed file per MSA
  • AlignedColumns.jl : Creates a Stockholm file with the aligned columns from a Pfam Stockholm file (insertions are deleted) saving the mapping (residue number in UniProt) and the columns in the original MSA.
  • PercentIdentity.jl : Calculates the percentage identity between all the sequences of an MSA and saves mean, median, minimum, etc.
  • MSADescription.jl : Calculates the number of columns, sequences and clusters after Hobohm I clustering at 62% identity given a stockholm file as imput. It also gives the percent indentity mean and mean, standard deviation and quantiles of: sequence coverage of the MSA and gap percentage.


The mitos-benchmarks repository contains MIToS’ benchmarks and a little comparison against other packages. An extensive benchmark of PDB features is located in the jgreener64/pdb-benchmarks repository.


  1. Buslje, Cristina Marino, et al. "Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information." Bioinformatics 25.9 (2009): 1125-1131.
  2. Buslje, Cristina Marino, et al. "Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification." PLoS Comput Biol 6.11 (2010): e1000978.
  3. Altschul, Stephen F., et al. "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic acids research 25.17 (1997): 3389-3402.
  4. Hobohm, Uwe, et al. "Selection of representative protein data sets." Protein Science 1.3 (1992): 409-417.
  5. Dunn, Stanley D., Lindi M. Wahl, and Gregory B. Gloor. "Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction." Bioinformatics 24.3 (2008): 333-340.

Structural Bioinformatics Unit