메뉴 건너뛰기




Volumn 13, Issue 1, 2012, Pages

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Author keywords

[No Author keywords available]

Indexed keywords

EVALUATION TOOL; JOURNAL ARTICLES; LINGUISTIC ANNOTATIONS; NAMED ENTITY RECOGNITION; NATURAL LANGUAGE PROCESSING; NATURAL LANGUAGE PROCESSING TOOLS; POOR PERFORMANCE; RULE SET; SYNTACTIC PARSING; TOKENIZATION; TRAINABLE SYSTEM;

EID: 84865060319     PISSN: None     EISSN: 14712105     Source Type: Journal    
DOI: 10.1186/1471-2105-13-207     Document Type: Article
Times cited : (100)

References (73)
  • 3
    • 67651205715 scopus 로고    scopus 로고
    • Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions
    • 10.1371/journal.pgen.1000534, 2694358, 19557189
    • Raychaudhuri S, Plenge RM, Rossin EJ, Ng ACY, Purcell SM, Sklar P, Scolnick EM, Xavier RJ, Altshuler D, Daly MJ. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet 2009, 5(6):e1000534. 10.1371/journal.pgen.1000534, 2694358, 19557189.
    • (2009) PLoS Genet , vol.5 , Issue.6
    • Raychaudhuri, S.1    Plenge, R.M.2    Rossin, E.J.3    Ng, A.C.Y.4    Purcell, S.M.5    Sklar, P.6    Scolnick, E.M.7    Xavier, R.J.8    Altshuler, D.9    Daly, M.J.10
  • 4
    • 77957140630 scopus 로고    scopus 로고
    • The structural and content aspects of abstracts versus bodies of full text journal articles are different
    • [doi:10.1186/1471-2105-11-492]
    • Cohen KB, Johnson HL, Verspoor K, Roeder C, Hunter L. The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinf 2010, 11(492). [doi:10.1186/1471-2105-11-492].
    • (2010) BMC Bioinf , vol.11 , Issue.492
    • Cohen, K.B.1    Johnson, H.L.2    Verspoor, K.3    Roeder, C.4    Hunter, L.5
  • 5
    • 33847095048 scopus 로고    scopus 로고
    • Benchmarking natural-language parsers for biological applications using dependency graphs
    • [doi:10.1186/1471-2105-8-24]
    • Clegg A, Shepherd A. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinf 2007, 8(24). [doi:10.1186/1471-2105-8-24].
    • (2007) BMC Bioinf , vol.8 , Issue.24
    • Clegg, A.1    Shepherd, A.2
  • 6
    • 67949095857 scopus 로고    scopus 로고
    • The textual characteristics of traditional and Open Access scientific journals are similar
    • [doi:10.1186/1471-2105-10-183]
    • Verspoor K, Cohen KB, Hunter L. The textual characteristics of traditional and Open Access scientific journals are similar. BMC Bioinf 2009, 10:183. [doi:10.1186/1471-2105-10-183].
    • (2009) BMC Bioinf , vol.10 , pp. 183
    • Verspoor, K.1    Cohen, K.B.2    Hunter, L.3
  • 7
    • 0034868773 scopus 로고    scopus 로고
    • Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study
    • 10.1002/cfg.91, 2447212, 18628915
    • Blaschke C, Valencia A. Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study. Comp Funct Genomics 2001, 2(4):196-206. 10.1002/cfg.91, 2447212, 18628915.
    • (2001) Comp Funct Genomics , vol.2 , Issue.4 , pp. 196-206
    • Blaschke, C.1    Valencia, A.2
  • 8
    • 0642368715 scopus 로고    scopus 로고
    • Information extraction from full text scientific articles: where are the keywords?
    • [doi:10.1186/1471-2105-4-20]
    • Shah PK, Perez-Iratxeta C, Bork P, Andrade MA. Information extraction from full text scientific articles: where are the keywords?. BMC Bioinf 2003, 4:20. [doi:10.1186/1471-2105-4-20].
    • (2003) BMC Bioinf , vol.4 , pp. 20
    • Shah, P.K.1    Perez-Iratxeta, C.2    Bork, P.3    Andrade, M.A.4
  • 9
    • 10244245424 scopus 로고    scopus 로고
    • BioRAT: extracting biological information from full-length papers
    • 10.1093/bioinformatics/bth386, 15231534
    • Corney DP, Buxton BF, Langdon WB, Jones DT. BioRAT: extracting biological information from full-length papers. Bioinformatics 2004, 20(17):3206-3213. 10.1093/bioinformatics/bth386, 15231534.
    • (2004) Bioinformatics , vol.20 , Issue.17 , pp. 3206-3213
    • Corney, D.P.1    Buxton, B.F.2    Langdon, W.B.3    Jones, D.T.4
  • 10
    • 0035236048 scopus 로고    scopus 로고
    • GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles
    • 10.1093/bioinformatics/17.suppl_1.S74, 11472995
    • Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 2001, 17(Suppl 1):S74-S82. 10.1093/bioinformatics/17.suppl_1.S74, 11472995.
    • (2001) Bioinformatics , vol.17 , Issue.SUPPL. 1
    • Friedman, C.1    Kra, P.2    Yu, H.3    Krauthammer, M.4    Rzhetsky, A.5
  • 11
    • 0036678776 scopus 로고    scopus 로고
    • Tagging gene and protein names in biomedical text
    • 10.1093/bioinformatics/18.8.1124, 12176836
    • Tanabe L, Wilbur WJ. Tagging gene and protein names in biomedical text. Bioinformatics 2002, 18(8):1124-1132. 10.1093/bioinformatics/18.8.1124, 12176836.
    • (2002) Bioinformatics , vol.18 , Issue.8 , pp. 1124-1132
    • Tanabe, L.1    Wilbur, W.J.2
  • 15
    • 34247370651 scopus 로고    scopus 로고
    • Essie: A concept-based search engine for structured biomedical text
    • Ide NC, Loane RF, Demner-Fushman D. Essie: A concept-based search engine for structured biomedical text. J Am Med Inf Assoc 2007, 14:253-263.
    • (2007) J Am Med Inf Assoc , vol.14 , pp. 253-263
    • Ide, N.C.1    Loane, R.F.2    Demner-Fushman, D.3
  • 16
    • 59549091756 scopus 로고    scopus 로고
    • Evaluating contributions of natural language parsers to protein-protein interaction extraction
    • 10.1093/bioinformatics/btn631, 2639072, 19073593
    • Miyao Y, Sagae K, Saetre R, Matsuzaki T, Tsujii J. Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 2009, 25(3):394-400. 10.1093/bioinformatics/btn631, 2639072, 19073593.
    • (2009) Bioinformatics , vol.25 , Issue.3 , pp. 394-400
    • Miyao, Y.1    Sagae, K.2    Saetre, R.3    Matsuzaki, T.4    Tsujii, J.5
  • 18
    • 67650428776 scopus 로고    scopus 로고
    • A comparison of parsing techniques for the biomedical domain
    • Grover C, Lapata M, Lascarides A. A comparison of parsing techniques for the biomedical domain. Nat Language Eng 2003, 1:1-38.
    • (2003) Nat Language Eng , vol.1 , pp. 1-38
    • Grover, C.1    Lapata, M.2    Lascarides, A.3
  • 19
    • 33646122664 scopus 로고    scopus 로고
    • Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions
    • Pyysalo S, Ginter F, Pahikkala T, Boberg J, Järvinen J, Salakoski T. Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions. Int J Med Inf 2006, 75(6):430-442.
    • (2006) Int J Med Inf , vol.75 , Issue.6 , pp. 430-442
    • Pyysalo, S.1    Ginter, F.2    Pahikkala, T.3    Boberg, J.4    Järvinen, J.5    Salakoski, T.6
  • 22
    • 4944229711 scopus 로고    scopus 로고
    • GENIA corpus-a semantically annotated corpus for bio-textmining
    • Kim JD, Ohta T, Tateisi Y, Tsujii J. GENIA corpus-a semantically annotated corpus for bio-textmining. Bioinformatics 2003, 19(Suppl 1):180-182.
    • (2003) Bioinformatics , vol.19 , Issue.SUPPL. 1 , pp. 180-182
    • Kim, J.D.1    Ohta, T.2    Tateisi, Y.3    Tsujii, J.4
  • 24
    • 36048999532 scopus 로고    scopus 로고
    • Empirical data on corpus design and usage in biomedical natural language processing
    • Cohen KB, Fox L, Ogren PV, Hunter L. Empirical data on corpus design and usage in biomedical natural language processing. AMIA 2005 symposium proceedings 2005, 156-160.
    • (2005) AMIA 2005 symposium proceedings , pp. 156-160
    • Cohen, K.B.1    Fox, L.2    Ogren, P.V.3    Hunter, L.4
  • 25
    • 33947250174 scopus 로고    scopus 로고
    • GENETAG: a tagged corpus for gene/protein named entity recognition
    • Tanabe L, Xie N, Thom L, Matten W, Wilbur W. GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinf 2005, 6(Suppl 1):S3.
    • (2005) BMC Bioinf , vol.6 , Issue.SUPPL. 1
    • Tanabe, L.1    Xie, N.2    Thom, L.3    Matten, W.4    Wilbur, W.5
  • 27
    • 85032610029 scopus 로고    scopus 로고
    • The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts
    • Association for Computational Linguistics, Columbus, Ohio,
    • Szarvas G, Vincze V, Farkas R, Csirik J. The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing 2008, 38-45. Association for Computational Linguistics, Columbus, Ohio, [http://www.aclweb.org/anthology/W/W08/W08-0606].
    • (2008) Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing , pp. 38-45
    • Szarvas, G.1    Vincze, V.2    Farkas, R.3    Csirik, J.4
  • 29
    • 0012941680 scopus 로고
    • Part-of-Speech Tagging Guidelines for the Penn Treebank Project, 3rd revision
    • Santorini B. Part-of-Speech Tagging Guidelines for the Penn Treebank Project, 3rd revision. 1990, [http://repository.upenn.edu/cis reports/570/].
    • (1990)
    • Santorini, B.1
  • 31
    • 84868152086 scopus 로고    scopus 로고
    • Supplementary Guidelines for English Translation Treebank 2.0
    • [projects.ldc.upenn.edu/gale/task specifications/ettb guidelines.pdf]
    • Mott J, Warner C, Bies A, Taylor A. Supplementary Guidelines for English Translation Treebank 2.0. 2009, [projects.ldc.upenn.edu/gale/task specifications/ettb guidelines.pdf].
    • (2009)
    • Mott, J.1    Warner, C.2    Bies, A.3    Taylor, A.4
  • 33
    • 84874603080 scopus 로고    scopus 로고
    • Treebank 2a guidelines
    • Taylor A. Treebank 2a guidelines. 2006, [http://www-users.york.ac.uk/lang22/TB2a Guidelines.htm].
    • (2006)
    • Taylor, A.1
  • 37
    • 22044443709 scopus 로고    scopus 로고
    • The Sequence Ontology: a tool for the unification of genome annotations
    • [doi:10.1186/gb-2005-6-5-r44], 10.1186/gb-2005-6-5-r44, 1175956, 15892872
    • Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol 2005, 6(5):r44. [doi:10.1186/gb-2005-6-5-r44], 10.1186/gb-2005-6-5-r44, 1175956, 15892872.
    • (2005) Genome Biol , vol.6 , Issue.5
    • Eilbeck, K.1    Lewis, S.E.2    Mungall, C.J.3    Yandell, M.4    Stein, L.5    Durbin, R.6    Ashburner, M.7
  • 38
    • 78651317908 scopus 로고    scopus 로고
    • Entrez Gene: gene-centered information at NCBI
    • 3013746, 21115458
    • Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2011, 39(Suppl 1):D52-D57. [http://nar.oxfordjournals.org/content/39/suppl 1/D52.abstract], 3013746, 21115458.
    • (2011) Nucleic Acids Res , vol.39 , Issue.SUPPL. 1
    • Maglott, D.1    Ostell, J.2    Pruitt, K.D.3    Tatusova, T.4
  • 41
    • 41349103793 scopus 로고    scopus 로고
    • Overview of BioCreative II gene normalization
    • 10.1186/gb-2008-9-s2-s3, 2559987, 18834494
    • Morgan AA, Cohen KB, Hirschman L, et al. Overview of BioCreative II gene normalization. Genome Biol 2008, 9(Suppl 2):S3. 10.1186/gb-2008-9-s2-s3, 2559987, 18834494.
    • (2008) Genome Biol , vol.9 , Issue.SUPPL. 2
    • Morgan, A.A.1    Cohen, K.B.2    Hirschman, L.3
  • 42
    • 33947359082 scopus 로고    scopus 로고
    • BioCreative task 1A: gene mention finding evaluation
    • Yeh A, Morgan A, Colosimo M, Hirschman L. BioCreative task 1A: gene mention finding evaluation. BMC Bioinf 2005, 6(Suppl 1).
    • (2005) BMC Bioinf , vol.6 , Issue.SUPPL. 1
    • Yeh, A.1    Morgan, A.2    Colosimo, M.3    Hirschman, L.4
  • 45
    • 84898970714 scopus 로고    scopus 로고
    • Fast exact inference with a factored model for natural language parsing
    • Klein D, Manning C. Fast exact inference with a factored model for natural language parsing. Adv Neural Inf Process Syst 2003, 15:3-10.
    • (2003) Adv Neural Inf Process Syst , vol.15 , pp. 3-10
    • Klein, D.1    Manning, C.2
  • 51
    • 84874590703 scopus 로고    scopus 로고
    • PubMed Central Open Access Collection
    • PubMed Central Open Access Collection. [http://www.ncbi.nlm.nih.gov/pmc/about/openftlist.html].
  • 52
    • 51449084496 scopus 로고    scopus 로고
    • The Evalb software
    • Sekine S, Collins MJ. The Evalb software. 1997, [http://cs.nyu.edu/cs/projects/proteus/evalb].
    • (1997)
    • Sekine, S.1    Collins, M.J.2
  • 54
    • 85119978248 scopus 로고    scopus 로고
    • Knowtator: a Protege plugin for annotated corpus construction
    • Ogren P. Knowtator: a Protege plugin for annotated corpus construction. HLT-NAACL 2006 Companion Volume 2006a,
    • (2006) HLT-NAACL 2006 Companion Volume
    • Ogren, P.1
  • 55
    • 84958739392 scopus 로고    scopus 로고
    • The knowledge model of Protege-2000: Combining interoperability and flexibility.
    • Springer-Verlag, London, UK, UK
    • Noy N, Fridman N, Fergerson Ray W, Musen M, Mark A. The knowledge model of Protege-2000: Combining interoperability and flexibility. 2000, 17-32. Springer-Verlag, London, UK, UK.
    • (2000) , pp. 17-32
    • Noy, N.1    Fridman, N.2    Fergerson Ray, W.3    Musen, M.4    Mark, A.5
  • 56
    • 25144520247 scopus 로고    scopus 로고
    • ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text
    • 10.1093/bioinformatics/bti475, 15860559
    • Settles B. ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics 2005, 21(14):3191-3192. [http://dx.doi.org/doi:10.1093/bioinformatics/bti475], 10.1093/bioinformatics/bti475, 15860559.
    • (2005) Bioinformatics , vol.21 , Issue.14 , pp. 3191-3192
    • Settles, B.1
  • 57
    • 40549140499 scopus 로고    scopus 로고
    • BANNER: An executable survey of advances in biomedical named entity recognition
    • Leaman R, Gonzalez G. BANNER: An executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput 2008,
    • (2008) Pac Symp Biocomput
    • Leaman, R.1    Gonzalez, G.2
  • 58
    • 84868089570 scopus 로고    scopus 로고
    • Phrasal Queries with LingPipe and Lucene.
    • Carpenter B. Phrasal Queries with LingPipe and Lucene. 2004,
    • (2004)
    • Carpenter, B.1
  • 60
    • 51849163865 scopus 로고    scopus 로고
    • The opennlp maximum entropy package
    • Technical report, SourceForge
    • Baldridge J, Morton T, Bierner G. The opennlp maximum entropy package. 2002, Technical report, SourceForge.
    • (2002)
    • Baldridge, J.1    Morton, T.2    Bierner, G.3
  • 61
    • 7444255622 scopus 로고    scopus 로고
    • Building an example application with the unstructured information management architecture
    • Ferrucci D, Lally A. Building an example application with the unstructured information management architecture. IBM Syst J 2004, 43(3):455-475.
    • (2004) IBM Syst J , vol.43 , Issue.3 , pp. 455-475
    • Ferrucci, D.1    Lally, A.2
  • 62
    • 34249852033 scopus 로고
    • Building a large annotated corpus of English: the Penn Treebank
    • Marcus MP, Marcinkiewicz MA, Santorini B. Building a large annotated corpus of English: the Penn Treebank. Comput Linguistics 1993, 19(2):313-330.
    • (1993) Comput Linguistics , vol.19 , Issue.2 , pp. 313-330
    • Marcus, M.P.1    Marcinkiewicz, M.A.2    Santorini, B.3
  • 63
    • 84868122038 scopus 로고    scopus 로고
    • Apache UIMA ConceptMapper Annotator Documentation
    • Tech. rep., The Apache Software Foundation
    • Apache UIMA ConceptMapper Annotator Documentation. 2009, Tech. rep., The Apache Software Foundation.
    • (2009)
  • 65
    • 85117236284 scopus 로고    scopus 로고
    • A distributional analysis of a lexicalized statistical parsing model
    • Bikel D. A distributional analysis of a lexicalized statistical parsing model. Proc. of EMNLP, Volume 4 2004, 182-189.
    • (2004) Proc. of EMNLP, Volume 4 , pp. 182-189
    • Bikel, D.1
  • 68
    • 77954203449 scopus 로고    scopus 로고
    • Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing
    • PhD thesis. Brown University, Department of Computer Science
    • McClosky D. Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing. 2009, PhD thesis. Brown University, Department of Computer Science.
    • (2009)
    • McClosky, D.1
  • 71
    • 84868134211 scopus 로고    scopus 로고
    • K-best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization
    • John Benjamins
    • Choi JD, Nicolov N. K-best, Locally Pruned, Transition-based Dependency Parsing Using Robust Risk Minimization. Collections of Recent Advances in Natural Language Processing V 2009, 205-216. John Benjamins.
    • (2009) Collections of Recent Advances in Natural Language Processing V , pp. 205-216
    • Choi, J.D.1    Nicolov, N.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.