Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes

Rubert D, Dias Vieira Braga M (2022)
In: 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Boucher C, Rahmann S (Eds); Leibniz International Proceedings in Informatics (LIPIcs), 242. Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik.

Konferenzbeitrag | Veröffentlicht | Englisch
 
Download
OA 1.37 MB
Herausgeber*in
Boucher, Christina; Rahmann, Sven
Abstract / Bemerkung
Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. Although the ILP is quite efficient and could conceptually analyze genomes that are not completely assembled but split in several contigs, our tool failed in completing that task. The main reason is that each ILP pairwise comparison includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space. In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into m ≥ 1 subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub- optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on real data show that we can now efficiently analyze fruit fly genomes with unfinished assemblies distributed in hundreds or even thousands of contigs, obtaining orthologies that are more similar to FlyBase orthologies when compared to orthologies computed by other inference tools. Moreover, for complete assemblies the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the optimal version of our tool. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities.
Erscheinungsjahr
2022
Titel des Konferenzbandes
22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Serien- oder Zeitschriftentitel
Leibniz International Proceedings in Informatics (LIPIcs)
Band
242
Art.-Nr.
24
Konferenz
22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Konferenzort
Dagstuhl, Germany
Konferenzdatum
2022-09-05 – 2022-09-07
ISBN
978-3-95977-243-3
Page URI
https://pub.uni-bielefeld.de/record/2968237

Zitieren

Rubert D, Dias Vieira Braga M. Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes. In: Boucher C, Rahmann S, eds. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs). Vol 242. Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik; 2022.
Rubert, D., & Dias Vieira Braga, M. (2022). Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes. In C. Boucher & S. Rahmann (Eds.), Leibniz International Proceedings in Informatics (LIPIcs): Vol. 242. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022) Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.WABI.2022.24
Rubert, Diego, and Dias Vieira Braga, Marília. 2022. “Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes”. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022), ed. Christina Boucher and Sven Rahmann. Vol. 242. Leibniz International Proceedings in Informatics (LIPIcs). Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik: 24.
Rubert, D., and Dias Vieira Braga, M. (2022). “Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes” in 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022), Boucher, C., and Rahmann, S. eds. Leibniz International Proceedings in Informatics (LIPIcs), vol. 242, (Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik).
Rubert, D., & Dias Vieira Braga, M., 2022. Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes. In C. Boucher & S. Rahmann, eds. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs). no.242 Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik.
D. Rubert and M. Dias Vieira Braga, “Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes”, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022), C. Boucher and S. Rahmann, eds., Leibniz International Proceedings in Informatics (LIPIcs), vol. 242, Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik, 2022.
Rubert, D., Dias Vieira Braga, M.: Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes. In: Boucher, C. and Rahmann, S. (eds.) 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs). 242, Schloss Dagstuhl, Leibniz-Zentrum für Informatik, Dagstuhl (2022).
Rubert, Diego, and Dias Vieira Braga, Marília. “Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes”. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Ed. Christina Boucher and Sven Rahmann. Dagstuhl: Schloss Dagstuhl, Leibniz-Zentrum für Informatik, 2022.Vol. 242. Leibniz International Proceedings in Informatics (LIPIcs).
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2023-01-16T13:35:35Z
MD5 Prüfsumme
f4991b526e495eb274dced62d435540b


Link(s) zu Volltext(en)
Access Level
OA Open Access

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar
ISBN Suche