Efficient computation of absent words in genomic sequences
Herold J, Kurtz S, Giegerich R (2008)
BMC Bioinformatics 9(1): 167.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
Autor*in
Herold, JuliaUniBi;
Kurtz, Stefan;
Giegerich, RobertUniBi
Einrichtung
Abstract / Bemerkung
Background: Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies. Results: We describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 109 down to 105 bp. Conclusion: The new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space. This enables us to perform this type of analysis not only for the largest genomes available so far, but also for the emerging pan- and meta-genome data.
Erscheinungsjahr
2008
Zeitschriftentitel
BMC Bioinformatics
Band
9
Ausgabe
1
Art.-Nr.
167
ISSN
1471-2105
Page URI
https://pub.uni-bielefeld.de/record/1784025
Zitieren
Herold J, Kurtz S, Giegerich R. Efficient computation of absent words in genomic sequences. BMC Bioinformatics. 2008;9(1): 167.
Herold, J., Kurtz, S., & Giegerich, R. (2008). Efficient computation of absent words in genomic sequences. BMC Bioinformatics, 9(1), 167. https://doi.org/10.1186/1471-2105-9-167
Herold, Julia, Kurtz, Stefan, and Giegerich, Robert. 2008. “Efficient computation of absent words in genomic sequences”. BMC Bioinformatics 9 (1): 167.
Herold, J., Kurtz, S., and Giegerich, R. (2008). Efficient computation of absent words in genomic sequences. BMC Bioinformatics 9:167.
Herold, J., Kurtz, S., & Giegerich, R., 2008. Efficient computation of absent words in genomic sequences. BMC Bioinformatics, 9(1): 167.
J. Herold, S. Kurtz, and R. Giegerich, “Efficient computation of absent words in genomic sequences”, BMC Bioinformatics, vol. 9, 2008, : 167.
Herold, J., Kurtz, S., Giegerich, R.: Efficient computation of absent words in genomic sequences. BMC Bioinformatics. 9, : 167 (2008).
Herold, Julia, Kurtz, Stefan, and Giegerich, Robert. “Efficient computation of absent words in genomic sequences”. BMC Bioinformatics 9.1 (2008): 167.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Name
Access Level
Open Access
Zuletzt Hochgeladen
2019-09-06T08:48:53Z
MD5 Prüfsumme
d2ae09323894f1609aa545bf52e0f47a
Daten bereitgestellt von European Bioinformatics Institute (EBI)
20 Zitationen in Europe PMC
Daten bereitgestellt von Europe PubMed Central.
Absent words and the (dis)similarity analysis of DNA sequences: an experimental study.
Rahman MS, Alatabbi A, Athar T, Crochemore M, Rahman MS., BMC Res Notes 9(), 2016
PMID: 27004958
Rahman MS, Alatabbi A, Athar T, Crochemore M, Rahman MS., BMC Res Notes 9(), 2016
PMID: 27004958
The bulk and the tail of minimal absent words in genome sequences.
Aurell E, Innocenti N, Zhou HJ., Phys Biol 13(2), 2016
PMID: 27043075
Aurell E, Innocenti N, Zhou HJ., Phys Biol 13(2), 2016
PMID: 27043075
Spatial distribution of predicted transcription factor binding sites in Drosophila ChIP peaks.
Pettie KP, Dresch JM, Drewell RA., Mech Dev 141(), 2016
PMID: 27264535
Pettie KP, Dresch JM, Drewell RA., Mech Dev 141(), 2016
PMID: 27264535
Nullomers and High Order Nullomers in Genomic Sequences.
Vergni D, Santoni D., PLoS One 11(12), 2016
PMID: 27906971
Vergni D, Santoni D., PLoS One 11(12), 2016
PMID: 27906971
Three minimal sequences found in Ebola virus genomes and absent from human DNA.
Silva RM, Pratas D, Castro L, Pinho AJ, Ferreira PJ., Bioinformatics 31(15), 2015
PMID: 25840045
Silva RM, Pratas D, Castro L, Pinho AJ, Ferreira PJ., Bioinformatics 31(15), 2015
PMID: 25840045
keeSeek: searching distant non-existing words in genomes for PCR-based applications.
Falda M, Fontana P, Barzon L, Toppo S, Lavezzo E., Bioinformatics 30(18), 2014
PMID: 24867942
Falda M, Fontana P, Barzon L, Toppo S, Lavezzo E., Bioinformatics 30(18), 2014
PMID: 24867942
Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.
Buschmann T, Zhang R, Brash DE, Bystrykh LV., BMC Bioinformatics 15(), 2014
PMID: 25099007
Buschmann T, Zhang R, Brash DE, Bystrykh LV., BMC Bioinformatics 15(), 2014
PMID: 25099007
Linear-time computation of minimal absent words using suffix array.
Barton C, Heliou A, Mouchard L, Pissis SP., BMC Bioinformatics 15(), 2014
PMID: 25526884
Barton C, Heliou A, Mouchard L, Pissis SP., BMC Bioinformatics 15(), 2014
PMID: 25526884
Comparative analysis of DNA word abundances in four yeast genomes using a novel statistical background model.
Hariharan R, Simon R, Pillai MR, Taylor TD., PLoS One 8(3), 2013
PMID: 23472131
Hariharan R, Simon R, Pillai MR, Taylor TD., PLoS One 8(3), 2013
PMID: 23472131
Pervasive sequence patents cover the entire human genome.
Rosenfeld JA, Mason CE., Genome Med 5(3), 2013
PMID: 23522065
Rosenfeld JA, Mason CE., Genome Med 5(3), 2013
PMID: 23522065
Clustering of DNA words and biological function: a proof of principle.
Hackenberg M, Rueda A, Carpena P, Bernaola-Galván P, Barturen G, Oliver JL., J Theor Biol 297(), 2012
PMID: 22226985
Hackenberg M, Rueda A, Carpena P, Bernaola-Galván P, Barturen G, Oliver JL., J Theor Biol 297(), 2012
PMID: 22226985
Insertion site preference of Mu, Tn5, and Tn7 transposons.
Green B, Bouchier C, Fairhead C, Craig NL, Cormack BP., Mob DNA 3(1), 2012
PMID: 22313799
Green B, Bouchier C, Fairhead C, Craig NL, Cormack BP., Mob DNA 3(1), 2012
PMID: 22313799
Minimal absent words in prokaryotic and eukaryotic genomes.
Garcia SP, Pinho AJ, Rodrigues JM, Bastos CA, Ferreira PJ., PLoS One 6(1), 2011
PMID: 21386877
Garcia SP, Pinho AJ, Rodrigues JM, Bastos CA, Ferreira PJ., PLoS One 6(1), 2011
PMID: 21386877
Microbial diversity in saliva of oral squamous cell carcinoma.
Pushalkar S, Mane SP, Ji X, Li Y, Evans C, Crasta OR, Morse D, Meagher R, Singh A, Saxena D., FEMS Immunol Med Microbiol 61(3), 2011
PMID: 21205002
Pushalkar S, Mane SP, Ji X, Li Y, Evans C, Crasta OR, Morse D, Meagher R, Singh A, Saxena D., FEMS Immunol Med Microbiol 61(3), 2011
PMID: 21205002
Minimal absent words in four human genome assemblies.
Garcia SP, Pinho AJ., PLoS One 6(12), 2011
PMID: 22220210
Garcia SP, Pinho AJ., PLoS One 6(12), 2011
PMID: 22220210
On finding minimal absent words.
Pinho AJ, Ferreira PJ, Garcia SP, Rodrigues JM., BMC Bioinformatics 10(), 2009
PMID: 19426495
Pinho AJ, Ferreira PJ, Garcia SP, Rodrigues JM., BMC Bioinformatics 10(), 2009
PMID: 19426495
Word-based characterization of promoters involved in human DNA repair pathways.
Lichtenberg J, Jacox E, Welch JD, Kurz K, Liang X, Yang MQ, Drews F, Ecker K, Lee SS, Elnitski L, Welch LR., BMC Genomics 10 Suppl 1(), 2009
PMID: 19594877
Lichtenberg J, Jacox E, Welch JD, Kurz K, Liang X, Yang MQ, Drews F, Ecker K, Lee SS, Elnitski L, Welch LR., BMC Genomics 10 Suppl 1(), 2009
PMID: 19594877
Multiplex primer prediction software for divergent targets.
Gardner SN, Hiddessen AL, Williams PL, Hara C, Wagner MC, Colston BW., Nucleic Acids Res 37(19), 2009
PMID: 19759213
Gardner SN, Hiddessen AL, Williams PL, Hara C, Wagner MC, Colston BW., Nucleic Acids Res 37(19), 2009
PMID: 19759213
Genomic DNA k-mer spectra: models and modalities.
Chor B, Horn D, Goldman N, Levy Y, Massingham T., Genome Biol 10(10), 2009
PMID: 19814784
Chor B, Horn D, Goldman N, Levy Y, Massingham T., Genome Biol 10(10), 2009
PMID: 19814784
The word landscape of the non-coding segments of the Arabidopsis thaliana genome.
Lichtenberg J, Yilmaz A, Welch JD, Kurz K, Liang X, Drews F, Ecker K, Lee SS, Geisler M, Grotewold E, Welch LR., BMC Genomics 10(), 2009
PMID: 19814816
Lichtenberg J, Yilmaz A, Welch JD, Kurz K, Liang X, Drews F, Ecker K, Lee SS, Geisler M, Grotewold E, Welch LR., BMC Genomics 10(), 2009
PMID: 19814816
24 References
Daten bereitgestellt von Europe PubMed Central.
The spectrum of genomic signatures: from dinucleotides to chaos game representation.
Wang Y, Hill K, Singh S, Kari L., Gene 346(), 2005
PMID: 15716010
Wang Y, Hill K, Singh S, Kari L., Gene 346(), 2005
PMID: 15716010
No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution.
Workman C, Krogh A., Nucleic Acids Res. 27(24), 1999
PMID: 10572183
Workman C, Krogh A., Nucleic Acids Res. 27(24), 1999
PMID: 10572183
GISMO--gene identification using a support vector machine for ORF classification.
Krause L, McHardy AC, Nattkemper TW, Puhler A, Stoye J, Meyer F., Nucleic Acids Res. 35(2), 2006
PMID: 17175534
Krause L, McHardy AC, Nattkemper TW, Puhler A, Stoye J, Meyer F., Nucleic Acids Res. 35(2), 2006
PMID: 17175534
Structure and function of type II restriction endonucleases.
Pingoud A, Jeltsch A., Nucleic Acids Res. 29(18), 2001
PMID: 11557805
Pingoud A, Jeltsch A., Nucleic Acids Res. 29(18), 2001
PMID: 11557805
Monotony of Surprise And Large-Scale Quest for Unusual Words
Apostolico A, Bock ME, Lonardi S., 2002
Apostolico A, Bock ME, Lonardi S., 2002
Verbumculus and the Discovery of Unusual Words
Apostolico A, Gong F, Lonardi S., 2004
Apostolico A, Gong F, Lonardi S., 2004
Mauve: multiple alignment of conserved genomic sequence with rearrangements.
Darling AC, Mau B, Blattner FR, Perna NT., Genome Res. 14(7), 2004
PMID: 15231754
Darling AC, Mau B, Blattner FR, Perna NT., Genome Res. 14(7), 2004
PMID: 15231754
Genome comparison without alignment using shortest unique substrings.
Haubold B, Pierstorff N, Moller F, Wiehe T., BMC Bioinformatics 6(), 2005
PMID: 15910684
Haubold B, Pierstorff N, Moller F, Wiehe T., BMC Bioinformatics 6(), 2005
PMID: 15910684
Absent sequences: nullomers and primes.
Hampikian G, Andersen T., Pac Symp Biocomput (), 2007
PMID: 17990505
Hampikian G, Andersen T., Pac Symp Biocomput (), 2007
PMID: 17990505
Nullomers: really a matter of natural selection?
Acquisti C, Poste G, Curtiss D, Kumar S., PLoS ONE 2(10), 2007
PMID: 17925870
Acquisti C, Poste G, Curtiss D, Kumar S., PLoS ONE 2(10), 2007
PMID: 17925870
Replacing Suffix Trees with Enhanced Suffix Arrays
Abouelhoda M, Kurtz S, Ohlebusch E., 2004
Abouelhoda M, Kurtz S, Ohlebusch E., 2004
Vmatch
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
On the distribution of the number of missing words in random texts
Rahmann S, Rivals E., 2003
Rahmann S, Rivals E., 2003
Human Genome
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Mouse Genome
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Drosophila Genomes
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
C. elegans Genome
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
The genome sequence of the filamentous fungus Neurospora crassa
Galagan J, Calvo S, Borkovich K, Selker E, Read N, Jaffe D, FitzHugh W, Ma L, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen C, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson M, Werner-Washburne M, Selitrennikoff C, Kinsey J, Braun E, Zelter A, Schulte U, Kothe G, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg R, Perkins D, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt R, Osmani S, DeSouza C, Glass L, Orbach M, Berglund J, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig D, Alex L, Mannhaupt G, Ebbole D, Freitag M, Paulsen I, Sachs M, Lander E, Nusbaum C, Birren B., 2003
Galagan J, Calvo S, Borkovich K, Selker E, Read N, Jaffe D, FitzHugh W, Ma L, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen C, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson M, Werner-Washburne M, Selitrennikoff C, Kinsey J, Braun E, Zelter A, Schulte U, Kothe G, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg R, Perkins D, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt R, Osmani S, DeSouza C, Glass L, Orbach M, Berglund J, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig D, Alex L, Mannhaupt G, Ebbole D, Freitag M, Paulsen I, Sachs M, Lander E, Nusbaum C, Birren B., 2003
S. cerevisiae Genome
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Complete genome sequence of the hyperthermophilic archaeon Thermococcus kodakaraensis KOD1 and comparison with Pyrococcus genomes.
Fukui T, Atomi H, Kanai T, Matsumi R, Fujiwara S, Imanaka T., Genome Res. 15(3), 2005
PMID: 15710748
Fukui T, Atomi H, Kanai T, Matsumi R, Fujiwara S, Imanaka T., Genome Res. 15(3), 2005
PMID: 15710748
Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii.
Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, Kerlavage AR, Dougherty BA, Tomb JF, Adams MD, Reich CI, Overbeek R, Kirkness EF, Weinstock KG, Merrick JM, Glodek A, Scott JL, Geoghagen NS, Venter JC., Science 273(5278), 1996
PMID: 8688087
Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, Kerlavage AR, Dougherty BA, Tomb JF, Adams MD, Reich CI, Overbeek R, Kirkness EF, Weinstock KG, Merrick JM, Glodek A, Scott JL, Geoghagen NS, Venter JC., Science 273(5278), 1996
PMID: 8688087
Construction of a large signature-tagged mini-Tn5 transposon library and its application to mutagenesis of Sinorhizobium meliloti.
Pobigaylo N, Wetter D, Szymczak S, Schiller U, Kurtz S, Meyer F, Nattkemper TW, Becker A., Appl. Environ. Microbiol. 72(6), 2006
PMID: 16751548
Pobigaylo N, Wetter D, Szymczak S, Schiller U, Kurtz S, Meyer F, Nattkemper TW, Becker A., Appl. Environ. Microbiol. 72(6), 2006
PMID: 16751548
Computing Unwords on BibiServ
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Unwords
AUTHOR UNKNOWN, 0
AUTHOR UNKNOWN, 0
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Quellen
PMID: 18366790
PubMed | Europe PMC
Suchen in