메뉴 건너뛰기




Volumn 25, Issue 1-2, 2005, Pages 31-52

Knowledge discovery in biology and biotechnology texts: A review of techniques, evaluation strategies, and applications

Author keywords

Bioinformatics; Information extraction; Information retrieval; Knowledge discovery in text; Text mining

Indexed keywords

DATA MINING; INFORMATION RETRIEVAL; KNOWLEDGE ENGINEERING;

EID: 19944403928     PISSN: 07388551     EISSN: None     Source Type: Journal    
DOI: 10.1080/07388550590935571     Document Type: Review
Times cited : (22)

References (97)
  • 6
    • 0642368715 scopus 로고    scopus 로고
    • Information extraction from full text scientific articles: Where are the keywords?
    • Shah, P. K., Perez-Iratxeta, C., Bork, P., and Andrade, M. A. 2003. Information extraction from full text scientific articles: Where are the keywords? BMC Bioinformatics 4:20.
    • (2003) BMC Bioinformatics , vol.4 , pp. 20
    • Shah, P.K.1    Perez-Iratxeta, C.2    Bork, P.3    Andrade, M.A.4
  • 7
    • 19944417354 scopus 로고    scopus 로고
    • PubMed Central at http://www.pubmedcentral.gov
  • 8
    • 19944418340 scopus 로고    scopus 로고
    • BioMedCentral (BMC) at http://www.biomedcentral.com
  • 9
    • 19944422847 scopus 로고    scopus 로고
    • Public Library of Science (PloS) at http://www.publiclibraryofscience. org/
  • 10
    • 3342901682 scopus 로고    scopus 로고
    • Finding kinetic parameters using text mining
    • W. Dubitzky (guest ed.), Special Issue on Data Mining meets Integrative Biology
    • Hakenberg, J., Schmeier, S., Kowald, A., Klipp, E., and Leser, U. 2004. Finding Kinetic Parameters Using Text Mining, in W. Dubitzky (guest ed.), Special Issue on Data Mining meets Integrative Biology, OMICS: A Journal of Integrative Biology 8(2): 131-152.
    • (2004) OMICS: A Journal of Integrative Biology , vol.8 , Issue.2 , pp. 131-152
    • Hakenberg, J.1    Schmeier, S.2    Kowald, A.3    Klipp, E.4    Leser, U.5
  • 12
    • 0002000581 scopus 로고    scopus 로고
    • Untangling text data mining
    • Hearst M. A. 1999. Untangling text data mining, Proc. of ACL, 37.
    • (1999) Proc. of ACL , pp. 37
    • Hearst, M.A.1
  • 13
    • 45549117987 scopus 로고
    • Term weighting approaches in automatic information retrieval
    • Salton, G., and Buckley C. 1988. Term weighting approaches in automatic information retrieval, Inf. Proc. Man. 24: 513-523.
    • (1988) Inf. Proc. Man. , vol.24 , pp. 513-523
    • Salton, G.1    Buckley, C.2
  • 14
    • 0029989103 scopus 로고    scopus 로고
    • An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts
    • Wilbur, W. J., and Yang, Y. 1996. An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts, Comput. Biol. Med. Vol. 26, 209.
    • (1996) Comput. Biol. Med. , vol.26 , pp. 209
    • Wilbur, W.J.1    Yang, Y.2
  • 15
  • 17
    • 19944375553 scopus 로고    scopus 로고
    • Medical Subject Heading (MeSH) at http://www.ncbi.nlm. nih.gov/entrez/query.fcgi?db=mesh
  • 24
    • 0003402855 scopus 로고
    • The Benjamin/Cummings Publishing Company, Inc. New York
    • Allen, J. 1995. Natural Language Understanding, The Benjamin/Cummings Publishing Company, Inc. New York.
    • (1995) Natural Language Understanding
    • Allen, J.1
  • 25
    • 19944415036 scopus 로고    scopus 로고
    • Natural language processing and systems biology
    • W. Dubitzky and F. Azuaje (eds), Kluwer Academic Publishers, Boston/Dordrecht/London
    • Cohen, K. B., and Hunter L. 2004. Natural language processing and systems biology, in W. Dubitzky and F. Azuaje (eds), Artificial Intelligence Methods and Tools for Systems Biology, Kluwer Academic Publishers, Boston/Dordrecht/ London, 147-174.
    • (2004) Artificial Intelligence Methods and Tools for Systems Biology , pp. 147-174
    • Cohen, K.B.1    Hunter, L.2
  • 26
    • 0031633368 scopus 로고    scopus 로고
    • Towards information extraction: Identifying protein names from biological papers
    • Fukuda, K., Tsunoda, T., Tamura, A., and Takagi, T. 1998. Towards information extraction: identifying protein names from biological papers, Pacific Symposium on Biocomputing 707-718.
    • (1998) Pacific Symposium on Biocomputing , pp. 707-718
    • Fukuda, K.1    Tsunoda, T.2    Tamura, A.3    Takagi, T.4
  • 29
    • 0035104866 scopus 로고    scopus 로고
    • Automated extraction of information on protein-protein interactions from the biological literature
    • Ono, T, Hishigaki, H., Tanigami, A., and Takagi, T. 2001. Automated extraction of information on protein-protein interactions from the biological literature, Bioinformatics 17: 155-161.
    • (2001) Bioinformatics , vol.17 , pp. 155-161
    • Ono, T.1    Hishigaki, H.2    Tanigami, A.3    Takagi, T.4
  • 30
    • 0034707204 scopus 로고    scopus 로고
    • Using blast for identifying gene and protein names in journal articles
    • Krauthammer, M., Rzhetsky, A., Morozov, P., and Friedman C. 2000. Using blast for identifying gene and protein names in journal articles, Gene. 245-152.
    • (2000) Gene , pp. 245-152
    • Krauthammer, M.1    Rzhetsky, A.2    Morozov, P.3    Friedman, C.4
  • 34
    • 0002670150 scopus 로고    scopus 로고
    • Extraction of name of genes and gene products with a Hidden Markov Model
    • Collier, N., Nobata, C., and Tsujii, J. 2000. Extraction of name of genes and gene products with a Hidden Markov Model, COLING Conference Proceedings 201-207.
    • (2000) COLING Conference Proceedings , pp. 201-207
    • Collier, N.1    Nobata, C.2    Tsujii, J.3
  • 36
    • 2442702846 scopus 로고    scopus 로고
    • Recognizing names in biomedical texts: A machine learning approach
    • Zhou, G., Zhang, J., Su, J., Shen, D., and Tan, C. 2004. Recognizing names in biomedical texts: A machine learning approach, Bioinformatics 20: 1178-1190.
    • (2004) Bioinformatics , vol.20 , pp. 1178-1190
    • Zhou, G.1    Zhang, J.2    Su, J.3    Shen, D.4    Tan, C.5
  • 37
    • 10244245027 scopus 로고    scopus 로고
    • An entity tagger for recognizing acquired genomic variations in cancer literature
    • McDonald, R. T., Scot Winters, R., Mandel, M., Jin, Y., White, P. S., and Pereira, F. 2004. An entity tagger for recognizing acquired genomic variations in cancer literature, Bioinformatics 20:3249-3251.
    • (2004) Bioinformatics , vol.20 , pp. 3249-3251
    • McDonald, R.T.1    Scot Winters, R.2    Mandel, M.3    Jin, Y.4    White, P.S.5    Pereira, F.6
  • 38
    • 19944369505 scopus 로고    scopus 로고
    • The Human genome organization
    • HUGO, The Human genome organization at http://www.gene. ucl.ac.uk/hugo/
  • 39
    • 19944378885 scopus 로고    scopus 로고
    • corpus
    • GENIA, corpus at http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/topics/ Corpus/
  • 40
    • 19944406545 scopus 로고    scopus 로고
    • corpus
    • BioCreative Task1A, corpus at http://www.mitre.org/public/ biocreative/
  • 41
    • 19944382574 scopus 로고    scopus 로고
    • Yapex, corpus
    • Yapex, corpus at http://www.sics.se/humle/projects/prothalt/
  • 42
    • 19944415369 scopus 로고    scopus 로고
    • Shared task
    • BioCreative, Shared task at http://www.pdg.cnb.uam.es/BioLINK/ workshop_BioCreative_04
  • 43
    • 0001413266 scopus 로고    scopus 로고
    • Towards routine automatic pathway discovery from on-line scientific text abstracts
    • Ng, S-K., and Wong, M. 1999. Towards routine automatic pathway discovery from on-line scientific text abstracts, Proc. of the workshop on Genome Informatics 10: 104-112.
    • (1999) Proc. of the Workshop on Genome Informatics , vol.10 , pp. 104-112
    • Ng, S.-K.1    Wong, M.2
  • 44
    • 0035228039 scopus 로고    scopus 로고
    • A protein interaction extraction system
    • Wong, L. 2001. A protein interaction extraction system, Pacific Symposium on Biocomputing 6: 520-531.
    • (2001) Pacific Symposium on Biocomputing , vol.6 , pp. 520-531
    • Wong, L.1
  • 45
    • 0035230208 scopus 로고    scopus 로고
    • Bi-directional incremental parsing for automatic pathway identification with combinatory categorical grammar
    • Park, J. C., Kim, H. S., and Kim, J. J. 2001. Bi-directional incremental parsing for automatic pathway identification with combinatory categorical grammar, Pacific Symposium on Biocomputing 6: 396-407.
    • (2001) Pacific Symposium on Biocomputing , vol.6 , pp. 396-407
    • Park, J.C.1    Kim, H.S.2    Kim, J.J.3
  • 48
    • 0036371839 scopus 로고    scopus 로고
    • Filling preposition-based templates to capture information from medical abstracts
    • Leroy, G., and Chen, H. 2002. Filling preposition-based templates to capture information from medical abstracts, Pacific Symposium on Biocomputing 7: 350-361.
    • (2002) Pacific Symposium on Biocomputing , vol.7 , pp. 350-361
    • Leroy, G.1    Chen, H.2
  • 49
    • 0000206156 scopus 로고    scopus 로고
    • Automatic extraction of protein interactions from scientific abstracts
    • Thomas, J., Milward, D., and Ouzounis, C. 2000. Automatic extraction of protein interactions from scientific abstracts, Pacific Symposium on Biocomputing 5: 384-395.
    • (2000) Pacific Symposium on Biocomputing , vol.5 , pp. 384-395
    • Thomas, J.1    Milward, D.2    Ouzounis, C.3
  • 51
    • 0035236048 scopus 로고    scopus 로고
    • GENIES: A natural language processing system for extraction of molecular pathways from journal articles
    • Friedman, C., Kra, P., Yu, H., Krauthammer, M., and Rzhetsky, A. 2001. GENIES: A natural language processing system for extraction of molecular pathways from journal articles, Bioinformatics Suppl. 1: 74-82.
    • (2001) Bioinformatics Suppl. , vol.1 , pp. 74-82
    • Friedman, C.1    Kra, P.2    Yu, H.3    Krauthammer, M.4    Rzhetsky, A.5
  • 52
    • 12344282809 scopus 로고    scopus 로고
    • Discovering patterns to extract protein-protein interactions from full texts
    • Huang, M., Zhu, X., Hao, Y., Payan, D. G., Qu, K., and Li, M. 2004. Discovering patterns to extract protein-protein interactions from full texts, Bioinformatics 20(18): 3604-3612.
    • (2004) Bioinformatics , vol.20 , Issue.18 , pp. 3604-3612
    • Huang, M.1    Zhu, X.2    Hao, Y.3    Payan, D.G.4    Qu, K.5    Li, M.6
  • 53
    • 0002284942 scopus 로고    scopus 로고
    • Identifying the interaction between genes and gene products based on frequently seen verbs in MEDLINE abstracts
    • Sekimizu, T., Park, H. S., and Tsujii, J. 1998. Identifying the interaction between genes and gene products based on frequently seen verbs in MEDLINE abstracts, Proc. of the workshop on Genome Informatics 62-71.
    • (1998) Proc. of the Workshop on Genome Informatics , pp. 62-71
    • Sekimizu, T.1    Park, H.S.2    Tsujii, J.3
  • 54
    • 0000206152 scopus 로고    scopus 로고
    • Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structure
    • Humphreys, K., Demertriou, G., and Geizauskas, R. 2000. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structure, Pacific Symposium on Biocomputing 5: 502-513.
    • (2000) Pacific Symposium on Biocomputing , vol.5 , pp. 502-513
    • Humphreys, K.1    Demertriou, G.2    Geizauskas, R.3
  • 58
    • 0037250015 scopus 로고    scopus 로고
    • Protein structure and information extraction from biological texts: The PASTA system
    • Gaizauskas, R., Demetriou, G., Artymiuk, P. J., and Willett, P. 2003. Protein structure and information extraction from biological texts: The PASTA system, Bioinformatics 19(1): 135-143.
    • (2003) Bioinformatics , vol.19 , Issue.1 , pp. 135-143
    • Gaizauskas, R.1    Demetriou, G.2    Artymiuk, P.J.3    Willett, P.4
  • 59
    • 0004694187 scopus 로고    scopus 로고
    • Knowledge discovery and data mining in biological databases
    • Brusic V., and Zeleznikow J. 1999. Knowledge discovery and data mining in biological databases, in The Knowledge Engineering Review 14(3): 257-277.
    • (1999) The Knowledge Engineering Review , vol.14 , Issue.3 , pp. 257-277
    • Brusic, V.1    Zeleznikow, J.2
  • 60
    • 7444228338 scopus 로고    scopus 로고
    • The CRISP-DM Model: The new blueprint for data mining
    • Fall 2000
    • Shearer C. 2000. The CRISP-DM Model: The new blueprint for data mining, in Journal of Data Warehousing 13-22, 5(4): Fall 2000.
    • (2000) Journal of Data Warehousing , vol.5 , Issue.4 , pp. 13-22
    • Shearer, C.1
  • 61
    • 0002442796 scopus 로고    scopus 로고
    • Machine learning in automated text categorization
    • Sebastiani, F. 2002. Machine learning in automated text categorization, ACM Computer Surveys 2002, 34(1): 1-47.
    • (2002) ACM Computer Surveys 2002 , vol.34 , Issue.1 , pp. 1-47
    • Sebastiani, F.1
  • 63
    • 0036358850 scopus 로고    scopus 로고
    • Predicting the sub-cellular location of proteins from text using support vector machines
    • Stapley, B. J., Kelley, L. A., and Strenberg, M. J. E. 2002. Predicting the sub-cellular location of proteins from text using support vector machines, Pacific Symposium on Biocomputing 7: 374-385.
    • (2002) Pacific Symposium on Biocomputing , vol.7 , pp. 374-385
    • Stapley, B.J.1    Kelley, L.A.2    Strenberg, M.J.E.3
  • 64
    • 0036144742 scopus 로고    scopus 로고
    • Associating genes with gene ontology codes using maximum entropy analysis of biomedical literature
    • Raychaudhuri, S., Chang, J. T., Sutphin, P. D., and Altman, R. B. 2002. Associating genes with gene ontology codes using maximum entropy analysis of biomedical literature, Genome Research 12: 203-214.
    • (2002) Genome Research , vol.12 , pp. 203-214
    • Raychaudhuri, S.1    Chang, J.T.2    Sutphin, P.D.3    Altman, R.B.4
  • 66
    • 3543147086 scopus 로고
    • Recent trends in hierarchic document clustering: A critical review
    • Willett, P. 1988. Recent trends in hierarchic document clustering: A critical review, in Information Processing and Management 24(5): 577-597.
    • (1988) Information Processing and Management , vol.24 , Issue.5 , pp. 577-597
    • Willett, P.1
  • 69
    • 0001940412 scopus 로고    scopus 로고
    • Model selection in unsupervised learning with applications to document clustering
    • Vaithyanathan, S., and Dom, B. 1999. Model selection in unsupervised learning with applications to document clustering, ICML-99.
    • (1999) ICML-99
    • Vaithyanathan, S.1    Dom, B.2
  • 71
    • 0009815710 scopus 로고    scopus 로고
    • TEXTQUEST: Document clustering of MEDLINE abstracts for concept discovery in molecular biology
    • Iliopoulos, I., Enright, A. J., and Ouzounis, C. 2001. TEXTQUEST: Document clustering of MEDLINE abstracts for concept discovery in molecular biology, Pacific Symposium on Biocomputing 374-383.
    • (2001) Pacific Symposium on Biocomputing , pp. 374-383
    • Iliopoulos, I.1    Enright, A.J.2    Ouzounis, C.3
  • 72
    • 0031690080 scopus 로고    scopus 로고
    • Automatic extraction of keywords from scientific text: Application to the knowledge domain of protein families
    • Andrade, M. A., and Valencia A. 1998. Automatic extraction of keywords from scientific text: Application to the knowledge domain of protein families, Bioinformatics 14: 600-607.
    • (1998) Bioinformatics , vol.14 , pp. 600-607
    • Andrade, M.A.1    Valencia, A.2
  • 73
    • 0022786956 scopus 로고
    • Fish oil, Raynaud's syndrome, and undiscovered public knowledge
    • Swanson, D. R. 1986. Fish oil, Raynaud's syndrome, and undiscovered public knowledge, in Perspectives in Biology and Medicine 30: 7-18.
    • (1986) Perspectives in Biology and Medicine , vol.30 , pp. 7-18
    • Swanson, D.R.1
  • 74
    • 0031125707 scopus 로고    scopus 로고
    • An interactive system for finding complementary literatures: A stimulates to scientific discovery
    • Swanson, D. R., and Smalheiser, N. R. 1997. An interactive system for finding complementary literatures: A stimulates to scientific discovery, Artificial Intelligence 91(1): 183-203.
    • (1997) Artificial Intelligence , vol.91 , Issue.1 , pp. 183-203
    • Swanson, D.R.1    Smalheiser, N.R.2
  • 75
    • 0032795144 scopus 로고    scopus 로고
    • MedMiner: An Internet text-mining tool for biomedical information with application to gene expression profiling
    • Tanabe, L., Scherf, U., and Smith, L.H. 1999. MedMiner: An Internet text-mining tool for biomedical information with application to gene expression profiling, Biotechniques 27: 1210-1217.
    • (1999) Biotechniques , vol.27 , pp. 1210-1217
    • Tanabe, L.1    Scherf, U.2    Smith, L.H.3
  • 76
    • 0035042776 scopus 로고    scopus 로고
    • A literature network of human genes for high-throughput analysis of gene expression
    • Jenssen, T. K., Laegreid, A., Komorowski, J., and Hovig, E. 2001. A literature network of human genes for high-throughput analysis of gene expression, Nature Genetics 28: 21-28.
    • (2001) Nature Genetics , vol.28 , pp. 21-28
    • Jenssen, T.K.1    Laegreid, A.2    Komorowski, J.3    Hovig, E.4
  • 77
    • 0033655017 scopus 로고    scopus 로고
    • Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in medical abstracts
    • Stapley, B. J., and Benoit, G. 2000. Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in medical abstracts, Pacific Symposium on Biocomputing 5: 529-540.
    • (2000) Pacific Symposium on Biocomputing , vol.5 , pp. 529-540
    • Stapley, B.J.1    Benoit, G.2
  • 79
    • 0036366771 scopus 로고    scopus 로고
    • Creating knowledge repositories from biomedical reports: The MEDSYNDIKATE text mining system
    • Hahn, U., Romacker, M., and Schulz, S. 2002. Creating knowledge repositories from biomedical reports: The MEDSYNDIKATE text mining system, Pacific Symposium on Biocomputing 7: 338-349.
    • (2002) Pacific Symposium on Biocomputing , vol.7 , pp. 338-349
    • Hahn, U.1    Romacker, M.2    Schulz, S.3
  • 80
    • 0032211335 scopus 로고    scopus 로고
    • Using ARROWSMITH: A computer assisted approach to formulating and accessing scientific hypotheses
    • Smalheiser, N. R., and Swanson, D. R. 1998. Using ARROWSMITH: A computer assisted approach to formulating and accessing scientific hypotheses, Computer Methods and Programs in Biomedicine 57: 149-153.
    • (1998) Computer Methods and Programs in Biomedicine , vol.57 , pp. 149-153
    • Smalheiser, N.R.1    Swanson, D.R.2
  • 81
    • 19944410243 scopus 로고    scopus 로고
    • Online database
    • GeneCards, Online database at http://bioinfo.weizmann.ac.il/cards
  • 83
    • 0000259511 scopus 로고    scopus 로고
    • Approximate statistical tests for comparing supervised classification learning algorithms
    • Dietterich T. 1998. Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation 10(7): 1895-1924.
    • (1998) Neural Computation , vol.10 , Issue.7 , pp. 1895-1924
    • Dietterich, T.1
  • 84
    • 0028806048 scopus 로고
    • Quantitative monitoring of gene expression patterns with a complementary DNA microarray
    • Schena, M., Shalon, D., Davis, R. W., and Brown, P. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science 270: 467-470.
    • (1995) Science , vol.270 , pp. 467-470
    • Schena, M.1    Shalon, D.2    Davis, R.W.3    Brown, P.4
  • 85
    • 0030669030 scopus 로고    scopus 로고
    • Exploring the metabolic and genetic control of gene expression on a genomic scale
    • DeRisi, J., Iyer, V., and Brown, P. 1997. Exploring the metabolic and genetic control of gene expression on a genomic scale, Science 278: 680-686.
    • (1997) Science , vol.278 , pp. 680-686
    • DeRisi, J.1    Iyer, V.2    Brown, P.3
  • 86
    • 19944391399 scopus 로고    scopus 로고
    • OMIM, Online Mendelian Inheritance in Man at http://www.ncbi. nlm.nih.gov/entrez/query.fcgi?db=OMIM
  • 87
    • 0642282545 scopus 로고    scopus 로고
    • Automatic ontology construction from the literature
    • Blaschke C., and Valencia A. 2002. Automatic Ontology Construction from the Literature, Genome Informatics Series 13: 201-213.
    • (2002) Genome Informatics Series , vol.13 , pp. 201-213
    • Blaschke, C.1    Valencia, A.2
  • 89
    • 19944364616 scopus 로고    scopus 로고
    • Statistical issues in Microarray Analysis
    • Sabatti, C. 2003. Statistical issues in Microarray Analysis, Current Genomics.
    • (2003) Current Genomics
    • Sabatti, C.1
  • 91
    • 0036796319 scopus 로고    scopus 로고
    • Using text analysis to identify functionally coherent gene groups
    • Raychaudhuri, S., Schueltz, H., and Altman, R.B. 2002. Using text analysis to identify functionally coherent gene groups, Genome Research 12: 1582-1590.
    • (2002) Genome Research , vol.12 , pp. 1582-1590
    • Raychaudhuri, S.1    Schueltz, H.2    Altman, R.B.3
  • 95
    • 19944405527 scopus 로고    scopus 로고
    • Gene Ontology, at http://www.geneontology.org


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.