This package implements the paralog matching technique presented in the paper "Simultaneous identification of specifically interacting paralogs and interprotein contacts by Direct Coupling Analysis" by Thomas Gueudré, Carlo Baldassi, Marco Zamparo, Martin Weigt and Andrea Pagnani, Proc. Natl. Acad. Sci. U.S.A. 113, 12186–12191 (2016), doi:10.1073/pnas.1607570113.
The main idea of the method is to perform a statistical analysis of two given multiple sequence alignments, each containing one protein family. Each familiy should comprise several species, and each species may have several sequences belonging to the family. The algorithm tries to associate (match) interacting partners from the two families within each species. It belongs to the more general class of Direct Coupling Analysis methods.
The underlying main assumption is that the proper matching is the one maximizing the co-evolution signal. Such maximization is performed over the Bayesian inference of a Gaussian model, by inverting the correlation matrix.
The code is written in Julia, and the functions are called from within Julia. However, a command-line interface is also provided for those unfamiliar with the language (see the documentation).
The package is tested against Julia
0.7 on Linux, OS X, and Windows.
The package is not registered; it can be installed with
Dependencies will be installed automatically.
The package requires to install at least one linear programming solver supported by MathProgBase. By default, it uses GLPK, which is free and open source, but you can choose any another: see the list of available solvers at the JuliaOpt page. However, note that the solver efficiency is not particularly important for paralog matching, whose computational time is dominated by matrix inversion operations, therefore it's likely that you won't need a particularly fast solver.
Contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or would just like to ask a question.
5 months ago