Solving the protein sequence metric problem

Atchley WR, Zhao J, Fernandes AD, Drüke T (2005)
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 102(18): 6395-6400.

Zeitschriftenaufsatz | Veröffentlicht| Englisch
 
Download
Es wurde kein Volltext hochgeladen. Nur Publikationsnachweis!
Autor/in
Atchley, William R.; Zhao, Jieping; Fernandes, Andrew D.; Drüke, Tanja
Abstract / Bemerkung
Biological sequences are composed of long strings of alphabetic letters rather than arrays of numerical values. Lack of a natural underlying metric for comparing such alphabetic data significantly inhibits sophisticated statistical analyses of sequences, modeling structural and functional aspects of proteins, and related problems. Herein, we use multivariate statistical analyses on almost 500 amino acid attributes to produce a small set of highly interpretable numeric patterns of amino acid variability. These high-dimensional attribute data are summarized by five multidimensional patterns of attribute covariation that reflect polarity, secondary structure, molecular volume, codon diversity, and electrostatic charge. Numerical scores for each amino acid then transform amino acid sequences for statistical analyses. Relationships between transformed data and amino acid substitution matrices show significant associations for polarity and codon diversity scores. Transformed alphabetic data are used in analysis of variance and discriminant analysis to study DNA binding in the basic helix-loop-helix proteins. The transformed scores offer a general solution for analyzing a wide variety of sequence analysis problems.
Stichworte
amino acid attributes; multivariate statistics; basic helix-loop-helix; molecular evolution; factor analysis
Erscheinungsjahr
2005
Zeitschriftentitel
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
Band
102
Ausgabe
18
Seite(n)
6395-6400
ISSN
0027-8424
Page URI
https://pub.uni-bielefeld.de/record/1603912

Zitieren

Atchley WR, Zhao J, Fernandes AD, Drüke T. Solving the protein sequence metric problem. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA. 2005;102(18):6395-6400.
Atchley, W. R., Zhao, J., Fernandes, A. D., & Drüke, T. (2005). Solving the protein sequence metric problem. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 102(18), 6395-6400. doi:10.1073/pnas.0408677102
Atchley, W. R., Zhao, J., Fernandes, A. D., and Drüke, T. (2005). Solving the protein sequence metric problem. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 102, 6395-6400.
Atchley, W.R., et al., 2005. Solving the protein sequence metric problem. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 102(18), p 6395-6400.
W.R. Atchley, et al., “Solving the protein sequence metric problem”, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 102, 2005, pp. 6395-6400.
Atchley, W.R., Zhao, J., Fernandes, A.D., Drüke, T.: Solving the protein sequence metric problem. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA. 102, 6395-6400 (2005).
Atchley, William R., Zhao, Jieping, Fernandes, Andrew D., and Drüke, Tanja. “Solving the protein sequence metric problem”. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 102.18 (2005): 6395-6400.

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®

Quellen

PMID: 15851683
PubMed | Europe PMC

Suchen in

Google Scholar