메뉴 건너뛰기




Volumn 4, Issue 10, 2009, Pages

A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; AMINO ACID SEQUENCE; ARTICLE; CONTROLLED STUDY; MATHEMATICAL ANALYSIS; PREDICTION; PROTEIN DATABASE; PROTEIN FUNCTION; SEQUENCE ALIGNMENT; SEQUENCE HOMOLOGY; STATISTICAL MODEL; BIOLOGY; CHEMISTRY; COMPUTER PROGRAM; METHODOLOGY; MOLECULAR GENETICS; PROTEIN ANALYSIS; SEQUENCE ANALYSIS;

EID: 70449552832     PISSN: None     EISSN: 19326203     Source Type: Journal    
DOI: 10.1371/journal.pone.0007546     Document Type: Article
Times cited : (26)

References (52)
  • 1
    • 0032229196 scopus 로고    scopus 로고
    • Sources of systematic error in functional annotation of genomes: Domain rearrangement, non-orthologous gene displacement and operon disruption
    • Galperin MY, Koonin EV (1998) Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol. (Gedrukt) 1: 55-67.
    • (1998) In Silico Biol. (Gedrukt) , vol.1 , pp. 55-67
    • Galperin, M.Y.1    Koonin, E.V.2
  • 2
    • 0031797096 scopus 로고    scopus 로고
    • What we do not know about sequence analysis and sequence databases
    • Karp PD (1998) What we do not know about sequence analysis and sequence databases. Bioinformatics 14: 753-754.
    • (1998) Bioinformatics , vol.14 , pp. 753-754
    • Karp, P.D.1
  • 3
    • 0033119399 scopus 로고    scopus 로고
    • Errors in genome annotation
    • doi:10.1016/S0168-9525(99)01706-0
    • Brenner SE (1999) Errors in genome annotation. Trends in Genetics 15: 132-133. doi:10.1016/S0168-9525(99)01706-0.
    • (1999) Trends in Genetics , vol.15 , pp. 132-133
    • Brenner, S.E.1
  • 4
    • 0033388521 scopus 로고    scopus 로고
    • Completing the E. coli proteome: A database of gene products characterised since the completion of the genome sequence
    • Thomas GH (1999) Completing the E. coli proteome: a database of gene products characterised since the completion of the genome sequence. Bioinformatics 15: 860-861.
    • (1999) Bioinformatics , vol.15 , pp. 860-861
    • Thomas, G.H.1
  • 5
    • 0034060929 scopus 로고    scopus 로고
    • Powers and pitfalls in sequence analysis: The 70% hurdle
    • Bork P (2000) Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res 10: 398-400.
    • (2000) Genome Res , vol.10 , pp. 398-400
    • Bork, P.1
  • 6
    • 0034308142 scopus 로고    scopus 로고
    • Devos D, Valencia A (2000) Practical limits of function prediction. Proteins: Structure, Function, and Genetics 41: 98-107. doi:10.1002/ 1097-0134(20001001)41:1,98::AID-PROT120.3.0.CO;2-S.
    • Devos D, Valencia A (2000) Practical limits of function prediction. Proteins: Structure, Function, and Genetics 41: 98-107. doi:10.1002/ 1097-0134(20001001)41:1,98::AID-PROT120.3.0.CO;2-S.
  • 7
    • 0034677669 scopus 로고    scopus 로고
    • Assessing annotation transfer for genomics: Quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores1
    • doi:10.1006/jmbi.2000.3550
    • Wilson CA, Kreychman J, Gerstein M (2000) Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores1. Journal of Molecular Biology 297: 233-249. doi:10.1006/jmbi.2000.3550.
    • (2000) Journal of Molecular Biology , vol.297 , pp. 233-249
    • Wilson, C.A.1    Kreychman, J.2    Gerstein, M.3
  • 8
    • 0034767547 scopus 로고    scopus 로고
    • Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins
    • doi:10.1101/gr.183801
    • Hegyi H, Gerstein M (2001) Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins. Genome Res 11: 1632-1640. doi:10.1101/gr.183801.
    • (2001) Genome Res , vol.11 , pp. 1632-1640
    • Hegyi, H.1    Gerstein, M.2
  • 9
    • 12244283680 scopus 로고    scopus 로고
    • Modeling the percolation of annotation errors in a database of protein sequences
    • doi:10.1093/bioinformatics/18.12.1641
    • Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA (2002) Modeling the percolation of annotation errors in a database of protein sequences. Bioinformatics 18: 1641-1649. doi:10.1093/bioinformatics/18.12.1641.
    • (2002) Bioinformatics , vol.18 , pp. 1641-1649
    • Gilks, W.R.1    Audit, B.2    De Angelis, D.3    Tsoka, S.4    Ouzounis, C.A.5
  • 10
    • 0036175276 scopus 로고    scopus 로고
    • The past, present and future of genome-wide reannotation
    • comment2001.1-comment2001.6. doi:10.1186/ gb-2002-3-2-comment2001
    • Ouzounis C, Karp P (2002) The past, present and future of genome-wide reannotation. Genome Biology 3: comment2001.1-comment2001.6. doi:10.1186/ gb-2002-3-2-comment2001.
    • (2002) Genome Biology , vol.3
    • Ouzounis, C.1    Karp, P.2
  • 12
    • 0037433717 scopus 로고    scopus 로고
    • Evaluation of annotation strategies using an entire genome sequence
    • doi:10.1093/bioinformatics/btg077
    • Iliopoulos I, Tsoka S, Andrade MA, Enright AJ, Carroll M, et al. (2003) Evaluation of annotation strategies using an entire genome sequence. Bioinformatics 19: 717-726. doi:10.1093/bioinformatics/btg077.
    • (2003) Bioinformatics , vol.19 , pp. 717-726
    • Iliopoulos, I.1    Tsoka, S.2    Andrade, M.A.3    Enright, A.J.4    Carroll, M.5
  • 13
    • 0038444287 scopus 로고    scopus 로고
    • Initial Proteome Analysis of Model Microorganism Haemophilus influenzae Strain Rd KW20
    • doi:10.1128/ JB.185.15.4593-4602.2003
    • Kolker E, Purvine S, Galperin MY, Stolyar S, Goodlett DR, et al. (2003) Initial Proteome Analysis of Model Microorganism Haemophilus influenzae Strain Rd KW20. J Bacteriol 185: 4593-4602. doi:10.1128/ JB.185.15.4593-4602.2003.
    • (2003) J Bacteriol , vol.185 , pp. 4593-4602
    • Kolker, E.1    Purvine, S.2    Galperin, M.Y.3    Stolyar, S.4    Goodlett, D.R.5
  • 14
    • 2342526611 scopus 로고    scopus 로고
    • Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae
    • doi:10.1093/nar/ gkh555
    • Kolker E, Makarova KS, Shabalina S, Picone AF, Purvine S, et al. (2004) Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae. Nucl Acids Res 32: 2353-2361. doi:10.1093/nar/ gkh555.
    • (2004) Nucl Acids Res , vol.32 , pp. 2353-2361
    • Kolker, E.1    Makarova, K.S.2    Shabalina, S.3    Picone, A.F.4    Purvine, S.5
  • 15
    • 11144354963 scopus 로고    scopus 로고
    • In Silico Metabolic Model and Protein Expression of Haemophilus influenzae Strain Rd KW20 in Rich Medium
    • doi:10.1089/ 153623104773547471
    • Raghunathan A, Price ND, Galperin MY, Makarova KS, Purvine S, et al. (2004) In Silico Metabolic Model and Protein Expression of Haemophilus influenzae Strain Rd KW20 in Rich Medium. OMICS 8: 25-41. doi:10.1089/ 153623104773547471.
    • (2004) OMICS , vol.8 , pp. 25-41
    • Raghunathan, A.1    Price, N.D.2    Galperin, M.Y.3    Makarova, K.S.4    Purvine, S.5
  • 16
    • 19344377263 scopus 로고    scopus 로고
    • Identifying Protein Function - A Call for Community Action
    • doi:10.1371/journal.pbio.0020042
    • Roberts RJ (2004) Identifying Protein Function - A Call for Community Action. PLoS Biol 2: e42. doi:10.1371/journal.pbio.0020042.
    • (2004) PLoS Biol , vol.2
    • Roberts, R.J.1
  • 18
    • 20444456581 scopus 로고    scopus 로고
    • Automatic annotation of protein function
    • doi:10.1016/j.sbi.2005.05.010
    • Valencia A (2005) Automatic annotation of protein function. Current Opinion in Structural Biology 15: 267-274. doi:10.1016/j.sbi.2005.05.010.
    • (2005) Current Opinion in Structural Biology , vol.15 , pp. 267-274
    • Valencia, A.1
  • 19
    • 33748981457 scopus 로고    scopus 로고
    • New metrics for comparative genomics
    • doi:10.1016/ j.copbio.2006.08.007
    • Galperin MY, Kolker E (2006) New metrics for comparative genomics. Current Opinion in Biotechnology 17: 440-447. doi:10.1016/ j.copbio.2006.08.007.
    • (2006) Current Opinion in Biotechnology , vol.17 , pp. 440-447
    • Galperin, M.Y.1    Kolker, E.2
  • 20
    • 34548246519 scopus 로고    scopus 로고
    • Protein Annotation at Genomic Scale: The Current Status
    • doi:10.1021/cr068303k
    • Frishman D (2007) Protein Annotation at Genomic Scale: The Current Status. Chemical Reviews 107: 3448-3466. doi:10.1021/cr068303k.
    • (2007) Chemical Reviews , vol.107 , pp. 3448-3466
    • Frishman, D.1
  • 21
    • 46249122830 scopus 로고    scopus 로고
    • Annotation-based inference of transporter function
    • doi:10.1093/ bioinformatics/btn180
    • Lee TJ, Paulsen I, Karp P (2008) Annotation-based inference of transporter function. Bioinformatics 24: i259-267. doi:10.1093/ bioinformatics/btn180.
    • (2008) Bioinformatics , vol.24 , Issue.I259-267
    • Lee, T.J.1    Paulsen, I.2    Karp, P.3
  • 22
    • 51349145617 scopus 로고    scopus 로고
    • Validating annotations for uncharacterized proteins in Shewanella oneidensis
    • doi:10.1089/omi.2008.0051
    • Louie B, Tarczy-Hornoch P, Higdon R, Kolker E (2008) Validating annotations for uncharacterized proteins in Shewanella oneidensis. OMICS 12: 211-215. doi:10.1089/omi.2008.0051.
    • (2008) OMICS , vol.12 , pp. 211-215
    • Louie, B.1    Tarczy-Hornoch, P.2    Higdon, R.3    Kolker, E.4
  • 24
    • 0034961746 scopus 로고    scopus 로고
    • Database verification studies of SWISS-PROT and GenBank
    • doi:10.1093/ bioinformatics/17.6.526
    • Karp PD, Paley S, Zhu J (2001) Database verification studies of SWISS-PROT and GenBank. Bioinformatics 17: 526-532. doi:10.1093/ bioinformatics/17.6.526.
    • (2001) Bioinformatics , vol.17 , pp. 526-532
    • Karp, P.D.1    Paley, S.2    Zhu, J.3
  • 25
    • 0030801002 scopus 로고    scopus 로고
    • Gapped BLAST and PSI-BLAST: A new generation of protein database search programs
    • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
    • (1997) Nucleic Acids Res , vol.25 , pp. 3389-3402
    • Altschul, S.F.1    Madden, T.L.2    Schäffer, A.A.3    Zhang, J.4    Zhang, Z.5
  • 26
    • 33747844624 scopus 로고    scopus 로고
    • BLAST: Improvements for better sequence analysis
    • doi:10.1093/nar/gkl164
    • Ye J, McGinnis S, Madden TL (2006) BLAST: improvements for better sequence analysis. Nucl Acids Res 34: W6-9. doi:10.1093/nar/gkl164.
    • (2006) Nucl Acids Res , vol.34
    • Ye, J.1    McGinnis, S.2    Madden, T.L.3
  • 27
    • 0019887799 scopus 로고
    • Identification of common molecular subsequences
    • Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147: 197, 195.
    • (1981) J Mol Biol , vol.147 , Issue.197 , pp. 195
    • Smith, T.1    Waterman, M.2
  • 28
    • 0028118434 scopus 로고
    • Protein Family Classification Based on Searching a Database of Blocks
    • doi:10.1006/ geno.1994.1018
    • Henikoff S, Henikoff JG (1994) Protein Family Classification Based on Searching a Database of Blocks. Genomics 19: 97-107. doi:10.1006/ geno.1994.1018.
    • (1994) Genomics , vol.19 , pp. 97-107
    • Henikoff, S.1    Henikoff, J.G.2
  • 29
    • 0035798406 scopus 로고    scopus 로고
    • Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structurel
    • doi:10.1006/jmbi.2001.5080
    • Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structurel. Journal of Molecular Biology 313: 903-919. doi:10.1006/jmbi.2001.5080.
    • (2001) Journal of Molecular Biology , vol.313 , pp. 903-919
    • Gough, J.1    Karplus, K.2    Hughey, R.3    Chothia, C.4
  • 30
    • 13244268370 scopus 로고    scopus 로고
    • GOtcha: A new method for prediction of protein function assessed by the annotation of seven genomes
    • doi:10.1186/1471-2105-5-178
    • Martin D, Berriman M, Barton G (2004) GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics 5: 178. doi:10.1186/1471-2105-5-178.
    • (2004) BMC Bioinformatics , vol.5 , pp. 178
    • Martin, D.1    Berriman, M.2    Barton, G.3
  • 32
    • 58149203231 scopus 로고    scopus 로고
    • InterPro: The integrative protein signature database
    • doi:10.1093/nar/gkn785
    • Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, et al. (2009) InterPro: the integrative protein signature database. Nucl Acids Res 37: D211-215. doi:10.1093/nar/gkn785.
    • (2009) Nucl Acids Res , vol.37
    • Hunter, S.1    Apweiler, R.2    Attwood, T.K.3    Bairoch, A.4    Bateman, A.5
  • 34
    • 0035424599 scopus 로고    scopus 로고
    • Intrinsic errors in genome annotation
    • doi:10.1016/S0168-9525(01)02348-4
    • Devos D, Valencia A (2001) Intrinsic errors in genome annotation. Trends in Genetics 17: 429-431. doi:10.1016/S0168-9525(01)02348-4.
    • (2001) Trends in Genetics , vol.17 , pp. 429-431
    • Devos, D.1    Valencia, A.2
  • 36
    • 34250790725 scopus 로고    scopus 로고
    • Estimating the annotation error rate of curated GO database sequence annotations
    • doi:10.1186/1471-2105-8-170
    • Jones C, Brown A, Baumann U (2007) Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics 8: 170. doi:10.1186/1471-2105-8-170.
    • (2007) BMC Bioinformatics , vol.8 , pp. 170
    • Jones, C.1    Brown, A.2    Baumann, U.3
  • 37
    • 34548130064 scopus 로고    scopus 로고
    • Quantitative assessment of relationship between sequence similarity and function similarity
    • doi:10.1186/1471-2164-8-222
    • Joshi T, Xu D (2007) Quantitative assessment of relationship between sequence similarity and function similarity. BMC Genomics 8: 222. doi:10.1186/1471-2164-8-222.
    • (2007) BMC Genomics , vol.8 , pp. 222
    • Joshi, T.1    Xu, D.2
  • 38
    • 34548655343 scopus 로고    scopus 로고
    • Quantitative sequence-function relationships in proteins based on gene ontology
    • doi:10.1186/1471-2105-8-294
    • Sangar V, Blankenberg DJ, Altman N, Lesk AM (2007) Quantitative sequence-function relationships in proteins based on gene ontology. BMC Bioinformatics 8: 294. doi:10.1186/1471-2105-8-294.
    • (2007) BMC Bioinformatics , vol.8 , pp. 294
    • Sangar, V.1    Blankenberg, D.J.2    Altman, N.3    Lesk, A.M.4
  • 39
    • 61649112657 scopus 로고    scopus 로고
    • Domain-Based and Family-Specific Sequence Identity Thresholds Increase the Levels of Reliable Protein Function Transfer
    • doi:10.1016/j.jmb.2008.12.045
    • Addou S, Rentzsch R, Lee D, Orengo CA (2009) Domain-Based and Family-Specific Sequence Identity Thresholds Increase the Levels of Reliable Protein Function Transfer. Journal of Molecular Biology 387: 416-430. doi:10.1016/j.jmb.2008.12.045.
    • (2009) Journal of Molecular Biology , vol.387 , pp. 416-430
    • Addou, S.1    Rentzsch, R.2    Lee, D.3    Orengo, C.A.4
  • 40
    • 0032962457 scopus 로고    scopus 로고
    • Twilight zone of protein sequence alignments
    • doi:10.1093/protein/12.2.85
    • Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12: 85-94. doi:10.1093/protein/12.2.85.
    • (1999) Protein Eng , vol.12 , pp. 85-94
    • Rost, B.1
  • 41
    • 6044244615 scopus 로고    scopus 로고
    • Conserved hypothetical' proteins: Prioritization of targets for experimental study
    • doi:10.1093/nar/gkh885
    • Galperin MY, Koonin EV (2004) 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucl Acids Res 32: 5452-5463. doi:10.1093/nar/gkh885.
    • (2004) Nucl Acids Res , vol.32 , pp. 5452-5463
    • Galperin, M.Y.1    Koonin, E.V.2
  • 42
    • 0034069495 scopus 로고    scopus 로고
    • Gene Ontology: Tool for the unification of biology
    • doi:10.1038/75556
    • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25: 25-29. doi:10.1038/75556.
    • (2000) Nat Genet , vol.25 , pp. 25-29
    • Ashburner, M.1    Ball, C.A.2    Blake, J.A.3    Botstein, D.4    Butler, H.5
  • 43
    • 0037480738 scopus 로고    scopus 로고
    • Investigating semantic similarity measures across the Gene Ontology: The relationship between sequence and annotation
    • doi:10.1093/ bioinformatics/btg153
    • Lord PW, Stevens RD, Brass A, Goble CA (2003) Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19: 1275-1283. doi:10.1093/ bioinformatics/btg153.
    • (2003) Bioinformatics , vol.19 , pp. 1275-1283
    • Lord, P.W.1    Stevens, R.D.2    Brass, A.3    Goble, C.A.4
  • 44
    • 3242876311 scopus 로고    scopus 로고
    • BLAST: At the core of a powerful and diverse set of sequence analysis tools
    • doi:10.1093/nar/gkh435
    • McGinnis S, Madden TL (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucl Acids Res 32: W20-25. doi:10.1093/nar/gkh435.
    • (2004) Nucl Acids Res , vol.32
    • McGinnis, S.1    Madden, T.L.2
  • 45
    • 13444272087 scopus 로고    scopus 로고
    • Entrez Gene: Gene-centered information at NCBI
    • doi:10.1093/nar/gki031
    • Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez Gene: gene-centered information at NCBI. Nucl Acids Res 33: D54-58. doi:10.1093/nar/gki031.
    • (2005) Nucl Acids Res , vol.33
    • Maglott, D.1    Ostell, J.2    Pruitt, K.D.3    Tatusova, T.4
  • 46
    • 9144252016 scopus 로고    scopus 로고
    • The Gene Ontology Annotation (GOA) Database: Sharing knowledge in Uniprot with Gene Ontology
    • doi:10.1093/nar/gkh021
    • Camon E, Magrane M, Barrell D, Lee V, Dimmer E, et al. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucl Acids Res 32: D262-266. doi:10.1093/nar/gkh021.
    • (2004) Nucl Acids Res , vol.32
    • Camon, E.1    Magrane, M.2    Barrell, D.3    Lee, V.4    Dimmer, E.5
  • 47
    • 33846057724 scopus 로고    scopus 로고
    • NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins
    • doi:10.1093/nar/ gkl842
    • Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: D61-D65. doi:10.1093/nar/ gkl842.
    • (2007) Nucleic Acids Res , vol.35
    • Pruitt, K.D.1    Tatusova, T.2    Maglott, D.R.3
  • 49
    • 47149083877 scopus 로고    scopus 로고
    • Metrics for GO based protein semantic similarity: A systematic evaluation
    • doi:10.1186/1471-2105-9-S5-S4
    • Pesquita C, Faria D, Bastos H, Ferreira A, Falcao A, et al. (2008) Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics 9: S4. doi:10.1186/1471-2105-9-S5-S4.
    • (2008) BMC Bioinformatics , vol.9
    • Pesquita, C.1    Faria, D.2    Bastos, H.3    Ferreira, A.4    Falcao, A.5
  • 50
    • 70449590634 scopus 로고    scopus 로고
    • Lin D (1998) An Information-Theoretic Definition of Similarity. IN PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING: 296-304.
    • Lin D (1998) An Information-Theoretic Definition of Similarity. IN PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING: 296-304.
  • 52
    • 0016355478 scopus 로고
    • A new look at the statistical model identification. Automatic Control
    • Akaike H (1974) A new look at the statistical model identification. Automatic Control, IEEE Transactions on 19: 723, 716.
    • (1974) IEEE Transactions on , vol.19 , Issue.723 , pp. 716
    • Akaike, H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.