OMA standalone: orthology inference among public and custom genomes and transcriptomes
- Adrian M. Altenhoff1,2,10,
- Jeremy Levy3,4,10,
- Magdalena Zarowiecki5,
- Bartłomiej Tomiczek4,6,
- Alex Warwick Vesztrocy1,4,
- Daniel A. Dalquen2,
- Steven Müller4,
- Maximilian J. Telford4,
- Natasha M. Glover1,7,8,
- David Dylus1,7,8 and
- Christophe Dessimoz1,4,7,8,9
- 1Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
- 2Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland;
- 3Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), University College London, London WC1E 6BT, United Kingdom;
- 4Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London WC1E 6BT, United Kingdom;
- 5Genomics England, Queen Mary University of London, London EC1M 6BQ, United Kingdom;
- 6Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, 80-307 Gdansk, Poland;
- 7Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland;
- 8Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland;
- 9Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
Abstract
Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs—corresponding genes across multiple species—but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.
Footnotes
-
↵10 Joint first authors.
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.243212.118.
-
Freely available online through the Genome Research Open Access option.
- Received August 22, 2018.
- Accepted May 24, 2019.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.