메뉴 건너뛰기




Volumn 3909 LNBI, Issue , 2006, Pages 175-189

Clustering near-identical sequences for fast homology search

Author keywords

[No Author keywords available]

Indexed keywords

DATABASE SYSTEMS; ELECTRONIC DOCUMENT EXCHANGE; INFORMATION RETRIEVAL; PROTEINS; QUERY LANGUAGES; REDUNDANCY; SEARCH ENGINES;

EID: 33745767924     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/11732990_16     Document Type: Conference Paper
Times cited : (1)

References (40)
  • 2
    • 0036699189 scopus 로고    scopus 로고
    • Sequence clustering strategies improve remote homology recognitions while reducing search times
    • Li, W., Jaroszewski, L., Godzik, A.: Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Engineering 15 (2002) 643-649
    • (2002) Protein Engineering , vol.15 , pp. 643-649
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 3
    • 0033940118 scopus 로고    scopus 로고
    • RSDB: Representative sequence databases have high information content
    • Park, J., Holm, L., Heger, A., Chothia, C.: RSDB: representative sequence databases have high information content. Bioinformatics 16 (2000) 458-464
    • (2000) Bioinformatics , vol.16 , pp. 458-464
    • Park, J.1    Holm, L.2    Heger, A.3    Chothia, C.4
  • 6
    • 0021919480 scopus 로고
    • Rapid and sensitive protein similarity searches
    • Pearson, W., Lipman, D.: Rapid and sensitive protein similarity searches. Science 227 (1985) 1435-1441
    • (1985) Science , vol.227 , pp. 1435-1441
    • Pearson, W.1    Lipman, D.2
  • 7
    • 0036169928 scopus 로고    scopus 로고
    • Tolerating some redundancy significantly speeds up clustering of large protein databases
    • Li, W., Jaroszewski, L., Godzik, A.: Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18 (2001) 77-82
    • (2001) Bioinformatics , vol.18 , pp. 77-82
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 8
    • 0019887799 scopus 로고
    • Identification of common molecular subsequences
    • Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981) 195-197
    • (1981) Journal of Molecular Biology , vol.147 , pp. 195-197
    • Smith, T.1    Waterman, M.2
  • 9
    • 0032703998 scopus 로고    scopus 로고
    • d2-cluster: A validated method for clustering EST and full-length DNA sequences
    • Burke, J., Davison, D., Hide, W.: d2-cluster: A validated method for clustering EST and full-length DNA sequences. Genome Research 9 (1999) 1135-1142
    • (1999) Genome Research , vol.9 , pp. 1135-1142
    • Burke, J.1    Davison, D.2    Hide, W.3
  • 10
    • 0025234646 scopus 로고
    • Construction of validated, non-redundant composite protein sequence databases
    • Bleasby, A.J., Wootton, J.C.: Construction of validated, non-redundant composite protein sequence databases. Protein Engineering 3 (1990) 153-159
    • (1990) Protein Engineering , vol.3 , pp. 153-159
    • Bleasby, A.J.1    Wootton, J.C.2
  • 11
    • 0032950094 scopus 로고    scopus 로고
    • KIND - A non-redundant protein database
    • Kallberg, Y., Persson, B.: KIND - a non-redundant protein database. Bioinformatics 15 (1999) 260-261
    • (1999) Bioinformatics , vol.15 , pp. 260-261
    • Kallberg, Y.1    Persson, B.2
  • 12
  • 13
    • 0031829372 scopus 로고    scopus 로고
    • Removing near-neighbour redundancy from large protein sequence collections
    • Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14 (1998) 423-429
    • (1998) Bioinformatics , vol.14 , pp. 423-429
    • Holm, L.1    Sander, C.2
  • 14
    • 0035072551 scopus 로고    scopus 로고
    • Clustering of highly homologous sequences to reduce the size of large protein databases
    • Li, W., Jaroszewski, L., Godzik, A.: Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17 (2001) 282-283
    • (2001) Bioinformatics , vol.17 , pp. 282-283
    • Li, W.1    Jaroszewski, L.2    Godzik, A.3
  • 15
    • 0038494602 scopus 로고    scopus 로고
    • Fast sequence clustering using a suffix array algorithm
    • Malde, K., Coward, E., Jonassen, I.: Fast sequence clustering using a suffix array algorithm. Bioinformatics 19 (2003) 1221-1226
    • (2003) Bioinformatics , vol.19 , pp. 1221-1226
    • Malde, K.1    Coward, E.2    Jonassen, I.3
  • 16
    • 0031876699 scopus 로고    scopus 로고
    • Automated protein sequence database classification, i. integration of compositional similarity search, local similarity search, and multiple sequence alignment
    • Gracy, J., Argos, P.: Automated protein sequence database classification, i. integration of compositional similarity search, local similarity search, and multiple sequence alignment. Bioinformatics 14 (1998) 164-173
    • (1998) Bioinformatics , vol.14 , pp. 164-173
    • Gracy, J.1    Argos, P.2
  • 18
    • 0027681165 scopus 로고
    • Suffix arrays: A new method for on-line string searches
    • Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22 (1993) 935-948
    • (1993) SIAM Journal on Computing , vol.22 , pp. 935-948
    • Manber, U.1    Myers, G.2
  • 20
    • 0036203448 scopus 로고    scopus 로고
    • Multiple sequence alignment using partial order graphs
    • Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18 (2002) 452-464
    • (2002) Bioinformatics , vol.18 , pp. 452-464
    • Lee, C.1    Grasso, C.2    Sharlow, M.F.3
  • 23
    • 0013207911 scopus 로고    scopus 로고
    • Scalable document fingerprinting
    • Oakland, California, USA
    • Heintze, N.: Scalable document fingerprinting. In: 1996 USENIX Workshop on Electronic Commerce, Oakland, California, USA (1996) 191-200
    • (1996) 1996 USENIX Workshop on Electronic Commerce , pp. 191-200
    • Heintze, N.1
  • 24
    • 84976810280 scopus 로고
    • Copy detection mechanisms for digital documents
    • Carey, M., Schneider, D., eds.: San Jose, California, United States, ACM Press
    • Brin, S., Davis, J., García-Molina, H.: Copy detection mechanisms for digital documents. In Carey, M., Schneider, D., eds.: Proceedings of the ACM SIGMOD Annual Conference, San Jose, California, United States, ACM Press (1995) 398-409
    • (1995) Proceedings of the ACM SIGMOD Annual Conference , pp. 398-409
    • Brin, S.1    Davis, J.2    García-Molina, H.3
  • 26
    • 84871101442 scopus 로고    scopus 로고
    • A scalable system for identifying co-derivative documents
    • Apostolico, A., Melucci, M., eds.: Padova, Italy, Springer
    • Bernstein, Y., Zobel, J.: A scalable system for identifying co-derivative documents. In Apostolico, A., Melucci, M., eds.: Proc. String Processing and Information Retrieval Symposium (SPIRE), Padova, Italy, Springer (2004) 55-67
    • (2004) Proc. String Processing and Information Retrieval Symposium (SPIRE) , pp. 55-67
    • Bernstein, Y.1    Zobel, J.2
  • 28
    • 0014129195 scopus 로고
    • Hierarchical clustering schemes
    • Johnson, S.: Hierarchical clustering schemes. Psychometrika 32 (1967) 241-254
    • (1967) Psychometrika , vol.32 , pp. 241-254
    • Johnson, S.1
  • 29
    • 0026008859 scopus 로고
    • Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins
    • Robinson, A., Robinson, L.: Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. Proceedings of the National Academy of Sciences USA 88 (1991) 8880-8884
    • (1991) Proceedings of the National Academy of Sciences USA , vol.88 , pp. 8880-8884
    • Robinson, A.1    Robinson, L.2
  • 31
    • 0022591495 scopus 로고
    • The classification of amino-acid conservation
    • Taylor, W.: The classification of amino-acid conservation. Journal of Theoretical Biology 119 (1986) 205-218
    • (1986) Journal of Theoretical Biology , vol.119 , pp. 205-218
    • Taylor, W.1
  • 32
    • 0028961335 scopus 로고
    • SCOP: A structural classification of proteins database for the investigation of sequences and structures
    • Murzin, A., Brenner, S., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247 (1995) 536-540
    • (1995) Journal of Molecular Biology , vol.247 , pp. 536-540
    • Murzin, A.1    Brenner, S.2    Hubbard, T.3    Chothia, C.4
  • 34
    • 0032568596 scopus 로고    scopus 로고
    • Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
    • Brenner, S., Chothia, C., Hubbard, T.: Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proceedings of the National Academy of Sciences USA 95 (1998) 6073-6078
    • (1998) Proceedings of the National Academy of Sciences USA , vol.95 , pp. 6073-6078
    • Brenner, S.1    Chothia, C.2    Hubbard, T.3
  • 35
    • 0032509105 scopus 로고    scopus 로고
    • Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
    • Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., Chothia, C.: Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. Journal of Molecular Biology 284 (1998) 1201-1210
    • (1998) Journal of Molecular Biology , vol.284 , pp. 1201-1210
    • Park, J.1    Karplus, K.2    Barrett, C.3    Hughey, R.4    Haussler, D.5    Hubbard, T.6    Chothia, C.7
  • 37
    • 0001969211 scopus 로고    scopus 로고
    • Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching
    • Gribskov, M., Robinson, N.: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computers & Chemistry 20 (1996) 25-33
    • (1996) Computers & Chemistry , vol.20 , pp. 25-33
    • Gribskov, M.1    Robinson, N.2
  • 38
    • 0025259313 scopus 로고
    • Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes
    • Karlin, S., Altschul, S.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences USA 87 (1990) 2264-2268
    • (1990) Proceedings of the National Academy of Sciences USA , vol.87 , pp. 2264-2268
    • Karlin, S.1    Altschul, S.2
  • 40
    • 33745765346 scopus 로고    scopus 로고
    • A deterministic finite automaton for faster protein hit detection in BLAST
    • To appear
    • Cameron, M., Williams, H.E., Cannane, A.: A deterministic finite automaton for faster protein hit detection in BLAST. Journal of Computational Biology (2005) To appear.
    • (2005) Journal of Computational Biology
    • Cameron, M.1    Williams, H.E.2    Cannane, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.