메뉴 건너뛰기




Volumn 17, Issue 1, 2016, Pages

Mash: Fast genome and metagenome distance estimation using MinHash

Author keywords

Alignment; Comparative genomics; Genomic distance; Metagenomics; Nanopore; Sequencing

Indexed keywords

ARTICLE; BIOINFORMATICS; CLUSTER ANALYSIS; GENE CLUSTER; GENE SEQUENCE; GENE STRUCTURE; GENETIC ANALYSIS; GENETIC DATABASE; GENETIC ENGINEERING; GENETIC PROCEDURES; HUMAN; METAGENOME; METAGENOMICS; MINHASH SKETCH; NONHUMAN; PHYLOGENETIC TREE; PHYLOGENY; SEQUENCE ALIGNMENT; GENOME; GENOMICS; MOLECULAR EVOLUTION; NUCLEIC ACID DATABASE; PROCEDURES; SOFTWARE;

EID: 84978998700     PISSN: 14747596     EISSN: 1474760X     Source Type: Journal    
DOI: 10.1186/s13059-016-0997-x     Document Type: Article
Times cited : (1975)

References (63)
  • 2
    • 84979016426 scopus 로고    scopus 로고
    • Accessed 31 May
    • GenBank and WGS Statistics. http://www.ncbi.nlm.nih.gov/genbank/statistics. Accessed 31 May 2016.
    • (2016)
  • 7
    • 84898444828 scopus 로고    scopus 로고
    • Near Duplicate Image Detection: min-Hash and tf-idf Weighting
    • Durham, UK: British Machine Vision Association and Society for Pattern Recognition
    • Chum O, Philbin J, Zisserman A. Near Duplicate Image Detection: min-Hash and tf-idf Weighting. In: Proceedings of the British Machine Vision Conference 2008. Durham, UK: British Machine Vision Association and Society for Pattern Recognition; 2008.
    • (2008) Proceedings of the British Machine Vision Conference 2008
    • Chum, O.1    Philbin, J.2    Zisserman, A.3
  • 10
    • 80053243671 scopus 로고    scopus 로고
    • Parallel metagenomic sequence clustering via sketching and maximal quasi-clique enumeration on map-reduce clouds
    • Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE
    • Yang X, Zola J, Aluru S. Parallel metagenomic sequence clustering via sketching and maximal quasi-clique enumeration on map-reduce clouds. In: Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE. 2011. p. 1223-33.
    • (2011) , pp. 1223-1233
    • Yang, X.1    Zola, J.2    Aluru, S.3
  • 12
    • 84899755646 scopus 로고    scopus 로고
    • A Map-Reduce Framework for Clustering Metagenomes
    • 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum: IEEE
    • Rasheed Z, Rangwala H. A Map-Reduce Framework for Clustering Metagenomes. In: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum: IEEE. 2013.
    • (2013)
    • Rasheed, Z.1    Rangwala, H.2
  • 13
    • 0037342499 scopus 로고    scopus 로고
    • Alignment-free sequence comparison-a review
    • Vinga S, Almeida J. Alignment-free sequence comparison-a review. Bioinformatics. 2003;19:513-23.
    • (2003) Bioinformatics , vol.19 , pp. 513-523
    • Vinga, S.1    Almeida, J.2
  • 14
    • 84900808883 scopus 로고    scopus 로고
    • Alignment-free phylogenetics and population genetics
    • Haubold B. Alignment-free phylogenetics and population genetics. Brief Bioinform. 2014;15:407-18.
    • (2014) Brief Bioinform , vol.15 , pp. 407-418
    • Haubold, B.1
  • 15
    • 0022743812 scopus 로고
    • A measure of the similarity of sets of sequences not requiring sequence alignment
    • Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci U S A. 1986;83:5155-9.
    • (1986) Proc Natl Acad Sci U S A , vol.83 , pp. 5155-5159
    • Blaisdell, B.E.1
  • 16
    • 0000977008 scopus 로고
    • Computation of d2: a measure of sequence dissimilarity
    • Bell GI, Marr TG, editors. Computers and DNA: the proceedings of the Interface between Computation Science and Nucleic Acid Sequencing Workshop, held December 12 to 16, 1988 in Santa Fe, New Mexico. Redwood City: Addison-Wesley Pub. Co
    • Torney DC, Burks C, Davison D, Sirotkin KM. Computation of d2: a measure of sequence dissimilarity. In: Bell GI, Marr TG, editors. Computers and DNA: the proceedings of the Interface between Computation Science and Nucleic Acid Sequencing Workshop, held December 12 to 16, 1988 in Santa Fe, New Mexico. Redwood City: Addison-Wesley Pub. Co; 1990.
    • (1990)
    • Torney, D.C.1    Burks, C.2    Davison, D.3    Sirotkin, K.M.4
  • 17
    • 0037195172 scopus 로고    scopus 로고
    • Distributional regimes for the number of k-word matches between two random sequences
    • Lippert RA, Huang H, Waterman MS. Distributional regimes for the number of k-word matches between two random sequences. Proc Natl Acad Sci U S A. 2002;99:13980-9.
    • (2002) Proc Natl Acad Sci U S A , vol.99 , pp. 13980-13989
    • Lippert, R.A.1    Huang, H.2    Waterman, M.S.3
  • 18
    • 41149132297 scopus 로고    scopus 로고
    • Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
    • Yang K, Zhang L. Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction. Nucleic Acids Res. 2008;36:e33.
    • (2008) Nucleic Acids Res , vol.36
    • Yang, K.1    Zhang, L.2
  • 19
    • 58149527706 scopus 로고    scopus 로고
    • A genomic distance based on MUM indicates discontinuity between most bacterial species and genera
    • Deloger M, El Karoui M, Petit MA. A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J Bacteriol. 2009;191:91-9.
    • (2009) J Bacteriol , vol.191 , pp. 91-99
    • Deloger, M.1    Karoui, M.2    Petit, M.A.3
  • 20
    • 84876526790 scopus 로고    scopus 로고
    • Co-phylog: an assembly-free phylogenomic approach for closely related organisms
    • Yi H, Jin L. Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucleic Acids Res. 2013;41:e75.
    • (2013) Nucleic Acids Res , vol.41
    • Yi, H.1    Jin, L.2
  • 21
    • 84927731921 scopus 로고    scopus 로고
    • andi: fast and accurate estimation of evolutionary distances between closely related genomes
    • Haubold B, Klotzl F, Pfaffelhuber P. andi: fast and accurate estimation of evolutionary distances between closely related genomes. Bioinformatics. 2015;31:1169-75.
    • (2015) Bioinformatics , vol.31 , pp. 1169-1175
    • Haubold, B.1    Klotzl, F.2    Pfaffelhuber, P.3
  • 22
    • 84938907619 scopus 로고    scopus 로고
    • An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
    • Fan H, Ives AR, Surget-Groba Y, Cannon CH. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics. 2015;16:522.
    • (2015) BMC Genomics , vol.16 , pp. 522
    • Fan, H.1    Ives, A.R.2    Surget-Groba, Y.3    Cannon, C.H.4
  • 23
    • 14044262978 scopus 로고    scopus 로고
    • Genomic insights that advance the species definition for prokaryotes
    • Konstantinidis KT, Tiedje JM. Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci U S A. 2005;102:2567-72.
    • (2005) Proc Natl Acad Sci U S A , vol.102 , pp. 2567-2572
    • Konstantinidis, K.T.1    Tiedje, J.M.2
  • 24
    • 84899704095 scopus 로고    scopus 로고
    • The rise of a digital immune system
    • Schatz MC, Phillippy AM. The rise of a digital immune system. Gigascience. 2012;1:4.
    • (2012) Gigascience , vol.1 , pp. 4
    • Schatz, M.C.1    Phillippy, A.M.2
  • 25
    • 84859436530 scopus 로고    scopus 로고
    • NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy
    • Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40:D130-5.
    • (2012) Nucleic Acids Res , vol.40 , pp. D130-D135
    • Pruitt, K.D.1    Tatusova, T.2    Brown, G.R.3    Maglott, D.R.4
  • 26
    • 0023375195 scopus 로고
    • The neighbor-joining method: a new method for reconstructing phylogenetic trees
    • Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406-25.
    • (1987) Mol Biol Evol , vol.4 , pp. 406-425
    • Saitou, N.1    Nei, M.2
  • 27
    • 38849149795 scopus 로고    scopus 로고
    • 28-way vertebrate alignment and conservation track in the UCSC Genome Browser
    • Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, et al. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res. 2007;17:1797-808.
    • (2007) Genome Res , vol.17 , pp. 1797-1808
    • Miller, W.1    Rosenbloom, K.2    Hardison, R.C.3    Hou, M.4    Taylor, J.5    Raney, B.6
  • 29
    • 0028177080 scopus 로고
    • A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates
    • Kuhner MK, Felsenstein J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol. 1994;11:459-68.
    • (1994) Mol Biol Evol , vol.11 , pp. 459-468
    • Kuhner, M.K.1    Felsenstein, J.2
  • 30
    • 84938421951 scopus 로고    scopus 로고
    • A complete bacterial genome assembled de novo using only nanopore sequencing data
    • Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods. 2015;12:733-5.
    • (2015) Nat Methods , vol.12 , pp. 733-735
    • Loman, N.J.1    Quick, J.2    Simpson, J.T.3
  • 31
    • 84965186660 scopus 로고    scopus 로고
    • Lighter: fast and memory-efficient sequencing error correction without counting
    • Song L, Florea L, Langmead B. Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 2014;15:509.
    • (2014) Genome Biol , vol.15 , pp. 509
    • Song, L.1    Florea, L.2    Langmead, B.3
  • 32
    • 84907030857 scopus 로고    scopus 로고
    • Exploration and retrieval of whole-metagenome sequencing samples
    • Seth S, Valimaki N, Kaski S, Honkela A. Exploration and retrieval of whole-metagenome sequencing samples. Bioinformatics. 2014;30:2471-9.
    • (2014) Bioinformatics , vol.30 , pp. 2471-2479
    • Seth, S.1    Valimaki, N.2    Kaski, S.3    Honkela, A.4
  • 34
    • 84922826203 scopus 로고    scopus 로고
    • COMMET: comparing and combining multiple metagenomic datasets
    • 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): IEEE
    • Maillet N, Collet G, Vannier T, Lavenier D, Peterlongo P. COMMET: comparing and combining multiple metagenomic datasets. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): IEEE. 2014.
    • (2014)
    • Maillet, N.1    Collet, G.2    Vannier, T.3    Lavenier, D.4    Peterlongo, P.5
  • 35
    • 33947238287 scopus 로고    scopus 로고
    • The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific
    • Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007;5:e77.
    • (2007) PLoS Biol , vol.5
    • Rusch, D.B.1    Halpern, A.L.2    Sutton, G.3    Heidelberg, K.B.4    Williamson, S.5    Yooseph, S.6
  • 36
    • 84862276328 scopus 로고    scopus 로고
    • Structure, function and diversity of the healthy human microbiome
    • Human Microbiome Project C. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207-14.
    • (2012) Nature , vol.486 , pp. 207-214
  • 37
    • 77950251400 scopus 로고    scopus 로고
    • A human gut microbial gene catalogue established by metagenomic sequencing
    • Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59-65.
    • (2010) Nature , vol.464 , pp. 59-65
    • Qin, J.1    Li, R.2    Raes, J.3    Arumugam, M.4    Burgdorf, K.S.5    Manichanh, C.6
  • 40
    • 84959923909 scopus 로고    scopus 로고
    • Large-scale search of transcriptomic read sets with sequence bloom trees
    • Solomon B, Kingsford C. Large-scale search of transcriptomic read sets with sequence bloom trees. bioRxiv. 2015. doi: 10.1101/017087.
    • (2015) bioRxiv
    • Solomon, B.1    Kingsford, C.2
  • 45
    • 84899090573 scopus 로고    scopus 로고
    • Kraken: ultrafast metagenomic sequence classification using exact alignments
    • Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15:R46.
    • (2014) Genome Biol , vol.15 , pp. R46
    • Wood, D.E.1    Salzberg, S.L.2
  • 46
    • 84864665526 scopus 로고    scopus 로고
    • The power of simple tabulation hashing
    • Patrascu M, Thorup M. The power of simple tabulation hashing. J ACM. 2012;59:14.
    • (2012) J ACM , vol.59 , pp. 14
    • Patrascu, M.1    Thorup, M.2
  • 47
    • 0027113212 scopus 로고
    • Approximate string-matching with Q-grams and maximal matches
    • Ukkonen E. Approximate string-matching with Q-grams and maximal matches. Theor Comput Sci. 1992;92:191-211.
    • (1992) Theor Comput Sci , vol.92 , pp. 191-211
    • Ukkonen, E.1
  • 49
    • 45549109750 scopus 로고    scopus 로고
    • Genome assembly forensics: finding the elusive mis-assembly
    • Phillippy AM, Schatz MC, Pop M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 2008;9:R55.
    • (2008) Genome Biol , vol.9 , pp. R55
    • Phillippy, A.M.1    Schatz, M.C.2    Pop, M.3
  • 50
    • 0000122573 scopus 로고
    • PHYLIP - Phylogeny Inference Package (Version 3.2)
    • Felsenstein J. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989;5:164-6.
    • (1989) Cladistics , vol.5 , pp. 164-166
    • Felsenstein, J.1
  • 51
    • 84979077373 scopus 로고    scopus 로고
    • Accessed 31 May
    • UCSC multiz20way. http://hgdownload.cse.ucsc.edu/goldenPath/hg38/multiz20way/. Accessed 31 May 2016.
    • (2016)
  • 52
    • 84979077375 scopus 로고    scopus 로고
    • Accessed 31 May
    • HMP Illumina WGS Reads. http://hmpdacc.org/HMIWGS/all/. Accessed 31 May 2016.
    • (2016)
  • 53
    • 84978964778 scopus 로고    scopus 로고
    • Accessed 31 May
    • HMP Illumina WGS Assemblies. http://hmpdacc.org/HMASM/all/. Accessed 31 May 2016.
    • (2016)
  • 54
    • 84978925354 scopus 로고    scopus 로고
    • Accessed 31 May
    • MetaHIT assemblies. http://www.bork.embl.de/~arumugam/Qin_et_al_2010/. Accessed 31 May 2016.
    • (2016)
  • 55
    • 67649884743 scopus 로고    scopus 로고
    • Fast and accurate short read alignment with Burrows-Wheeler transform
    • Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754-60.
    • (2009) Bioinformatics , vol.25 , pp. 1754-1760
    • Li, H.1    Durbin, R.2
  • 56
    • 84979088211 scopus 로고    scopus 로고
    • Accessed 31 May
    • Cap'n Proto. https://capnproto.org. Accessed 31 May 2016.
    • (2016)
  • 57
    • 84979088213 scopus 로고    scopus 로고
    • Accessed 31 May
    • MurmurHash3. https://code.google.com/p/smhasher. Accessed 31 May 2016.
    • (2016)
  • 58
    • 3543116959 scopus 로고    scopus 로고
    • GNU scientific library reference manual
    • Godalming: Network Theory Ltd.
    • Gough B. GNU scientific library reference manual. Godalming: Network Theory Ltd.; 2009.
    • (2009)
    • Gough, B.1
  • 59
    • 84979077380 scopus 로고    scopus 로고
    • Accessed 31 May
    • Open Bloom Filter Library. https://code.google.com/p/bloom. Accessed 31 May 2016.
    • (2016)
  • 60
    • 0011581750 scopus 로고    scopus 로고
    • The Boost Graph Library: User Guide and Reference Manual
    • New York, NY: Pearson Education
    • Siek JG, Lee L-Q, Lumsdaine A. The Boost Graph Library: User Guide and Reference Manual. New York, NY: Pearson Education; 2001.
    • (2001)
    • Siek, J.G.1    Lee, L.-Q.2    Lumsdaine, A.3
  • 61
    • 0242490780 scopus 로고    scopus 로고
    • Cytoscape: a software environment for integrated models of biomolecular interaction networks
    • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498-504.
    • (2003) Genome Res , vol.13 , pp. 2498-2504
    • Shannon, P.1    Markiel, A.2    Ozier, O.3    Baliga, N.S.4    Wang, J.T.5    Ramage, D.6
  • 62
    • 0024640140 scopus 로고
    • An algorithm for drawing general undirected graphs
    • Kamada T, Kawai S. An algorithm for drawing general undirected graphs. Inform Process Lett. 1989;31:7-15.
    • (1989) Inform Process Lett , vol.31 , pp. 7-15
    • Kamada, T.1    Kawai, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.