Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing

Wittkop T, Baumbach J, Lobo FP, Rahmann S (2007)
BMC Bioinformatics 8(1): 396.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
OA
Autor/in
; ; ;
Abstract / Bemerkung
Background: Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. Results: We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools ( Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation). Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192 187 prokaryotic protein sequences ( 66 organisms) obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. Conclusion: FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at http://gi.cebitec.uni-bielefeld.de/comet/force/.
Erscheinungsjahr
2007
Zeitschriftentitel
BMC Bioinformatics
Band
8
Ausgabe
1
Seite(n)
396
ISSN
1471-2105
Page URI
https://pub.uni-bielefeld.de/record/1784011

Zitieren

Wittkop T, Baumbach J, Lobo FP, Rahmann S. Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing. BMC Bioinformatics. 2007;8(1):396.
Wittkop, T., Baumbach, J., Lobo, F. P., & Rahmann, S. (2007). Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing. BMC Bioinformatics, 8(1), 396. doi:10.1186/1471-2105-8-396
Wittkop, T., Baumbach, J., Lobo, F. P., and Rahmann, S. (2007). Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing. BMC Bioinformatics 8, 396.
Wittkop, T., et al., 2007. Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing. BMC Bioinformatics, 8(1), p 396.
T. Wittkop, et al., “Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing”, BMC Bioinformatics, vol. 8, 2007, pp. 396.
Wittkop, T., Baumbach, J., Lobo, F.P., Rahmann, S.: Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing. BMC Bioinformatics. 8, 396 (2007).
Wittkop, Tobias, Baumbach, Jan, Lobo, Francisco P., and Rahmann, Sven. “Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing”. BMC Bioinformatics 8.1 (2007): 396.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T08:48:53Z
MD5 Prüfsumme
c3b69cf1dee5125ff903c896dd7c7fb9

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 17941985
PubMed | Europe PMC

Suchen in

Google Scholar