메뉴 건너뛰기




Volumn 8, Issue , 2007, Pages 2297-2345

Harnessing the expertise of 70,000 human editors: Knowledge-based feature generation for text categorization

Author keywords

Background knowledge; Feature generation; Text classification

Indexed keywords

ALGORITHMS; DATA ACQUISITION; FILE EDITORS; KNOWLEDGE BASED SYSTEMS; LEARNING SYSTEMS; ONTOLOGY;

EID: 35748946646     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (46)

References (114)
  • 1
    • 27844439373 scopus 로고    scopus 로고
    • Framework for learning predictive structures from multiple tasks and unlabeled data
    • Rie Kubota Ando and Tong Zhang. Framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, pages 1817-1853, 2005a.
    • (2005) Journal of Machine Learning Research , pp. 1817-1853
    • Kubota Ando, R.1    Zhang, T.2
  • 2
    • 84859921107 scopus 로고    scopus 로고
    • A high-performance semi-supervised learning method for text chunking
    • Ann Arbor, MI, June
    • Rie Kubota Ando and Tong Zhang. A high-performance semi-supervised learning method for text chunking. In Proceedings of the 43rd Annual Meeting of the ACL, pages 1-9, Ann Arbor, MI, June 2005b.
    • (2005) Proceedings of the 43rd Annual Meeting of the ACL , pp. 1-9
    • Kubota Ando, R.1    Zhang, T.2
  • 3
    • 0032264186 scopus 로고    scopus 로고
    • Douglas Baker and Andrew K. McCallum. Distributional clustering of words for text classification. In Bruce Croft, Alistair Moffat, Cornelis J. Van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors, Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval, pages 96-103, Melbourne, AU, 1998. ACM Press, New York, US. URL http://www.cs.cmu.edu/mccallum/papers/clustering-sigir98.ps.gz.
    • Douglas Baker and Andrew K. McCallum. Distributional clustering of words for text classification. In Bruce Croft, Alistair Moffat, Cornelis J. Van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors, Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval, pages 96-103, Melbourne, AU, 1998. ACM Press, New York, US. URL http://www.cs.cmu.edu/mccallum/papers/clustering-sigir98.ps.gz.
  • 8
    • 22044453925 scopus 로고    scopus 로고
    • The combination of text classifiers using reliability indicators
    • Paul N. Bennett, Susan T. Dumais, and Eric Horvitz. The combination of text classifiers using reliability indicators. Information Retrieval, 8(1):67-100, 2005.
    • (2005) Information Retrieval , vol.8 , Issue.1 , pp. 67-100
    • Bennett, P.N.1    Dumais, S.T.2    Horvitz, E.3
  • 11
    • 35748947125 scopus 로고    scopus 로고
    • Stephan Bloehdorn and Andreas Hotho. Boosting for text classification with semantic features. In Proceedings of the MSW 2004 Workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 70-87, 2004.
    • Stephan Bloehdorn and Andreas Hotho. Boosting for text classification with semantic features. In Proceedings of the MSW 2004 Workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 70-87, 2004.
  • 13
    • 84867919822 scopus 로고
    • Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging
    • Eric Brill. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21(4):543-565, 1995.
    • (1995) Computational Linguistics , vol.21 , Issue.4 , pp. 543-565
    • Brill, E.1
  • 16
    • 35748977017 scopus 로고    scopus 로고
    • Maria Fernanda Caropreso, Stan Matwin, and Fabrizio Sebastiani. A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In Amita G. Chin, editor, Text Databases and Document Management: Theory and Practice, pages 78-102. Idea Group Publishing, Hershey, US, 2001. URL http://faure.iei.pi.cnr.it/fabrizio/ Publications/TD01a/TD01a.pdf.
    • Maria Fernanda Caropreso, Stan Matwin, and Fabrizio Sebastiani. A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In Amita G. Chin, editor, Text Databases and Document Management: Theory and Practice, pages 78-102. Idea Group Publishing, Hershey, US, 2001. URL http://faure.iei.pi.cnr.it/fabrizio/ Publications/TD01a/TD01a.pdf.
  • 17
    • 0031189914 scopus 로고    scopus 로고
    • Multitask learning
    • Rich Caruana. Multitask learning. Machine Learning, 28(1):41-75, 1997.
    • (1997) Machine Learning , vol.28 , Issue.1 , pp. 41-75
    • Caruana, R.1
  • 27
    • 0000017646 scopus 로고
    • Explanation-based learning: An alternative view
    • Gerald Dejong and Raymond Mooney. Explanation-based learning: An alternative view. Machine Learning, 1(2): 145-176, 1986.
    • (1986) Machine Learning , vol.1 , Issue.2 , pp. 145-176
    • Dejong, G.1    Mooney, R.2
  • 28
    • 35748937843 scopus 로고    scopus 로고
    • Melvil Dewey, Joan S. Mitchell, Julianne Beall, Giles Martin, Winton E. Matthews, and Gregory R. New, editors. Dewey Decimal Classification and Relative Index. Online Computer Library Center (OCLC), 22nd edition, 2003.
    • Melvil Dewey, Joan S. Mitchell, Julianne Beall, Giles Martin, Winton E. Matthews, and Gregory R. New, editors. Dewey Decimal Classification and Relative Index. Online Computer Library Center (OCLC), 22nd edition, 2003.
  • 29
    • 2942723846 scopus 로고    scopus 로고
    • A divisive information-theoretic feature clustering algorithm for text classification
    • March, URL
    • Inderjit Dhillon, Subramanyam Mallela, and Rahul Kumar. A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3:1265-1287, March 2003. URL http://www.jmlr.org/papers/volume3/dhillon03a/dhillon03a.pdf.
    • (2003) Journal of Machine Learning Research , vol.3 , pp. 1265-1287
    • Dhillon, I.1    Mallela, S.2    Kumar, R.3
  • 37
    • 0004289791 scopus 로고    scopus 로고
    • Christiane Fellbaum, editor, MIT Press, Cambridge, MA
    • Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.
    • (1998) WordNet: An Electronic Lexical Database
  • 39
    • 14344259210 scopus 로고    scopus 로고
    • Text categorization with many redundant features: Using aggressive feature selection to make SVMs competitive with C4.5
    • Evgeniy Gabrilovich and Shaul Markovitch. Text categorization with many redundant features: Using aggressive feature selection to make SVMs competitive with C4.5. In Proceedings of the 21st International Conference on Machine Learning, pages 321-328, 2004.
    • (2004) Proceedings of the 21st International Conference on Machine Learning , pp. 321-328
    • Gabrilovich, E.1    Markovitch, S.2
  • 41
    • 33750719969 scopus 로고    scopus 로고
    • Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge
    • July
    • Evgeniy Gabrilovich and Shaul Markovitch. Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, pages 1301-1306, July 2006.
    • (2006) Proceedings of the 21st National Conference on Artificial Intelligence , pp. 1301-1306
    • Gabrilovich, E.1    Markovitch, S.2
  • 45
    • 35748966590 scopus 로고    scopus 로고
    • Eui-Hong (Sam) Han and George Karypis. Centroid-based document classification: Analysis and experimental results. In Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2000.
    • Eui-Hong (Sam) Han and George Karypis. Centroid-based document classification: Analysis and experimental results. In Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2000.
  • 49
    • 34447620746 scopus 로고
    • Improving text retrieval for the routing problem using latent semantic indexing
    • W Bruce Croft and Cornells J. Van Rijsbergen, editors, Dublin, Ireland, Springer Verlag, Heidelberg, Germany. URL
    • David A. Hull. Improving text retrieval for the routing problem using latent semantic indexing. In W Bruce Croft and Cornells J. Van Rijsbergen, editors, Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval, pages 282-289, Dublin, Ireland, 1994. Springer Verlag, Heidelberg, Germany. URL http://www.acm.org/pubs/articles/ proceedings/ir/188490/p282-hull/p282-hull.pdf.
    • (1994) Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval , pp. 282-289
    • Hull, D.A.1
  • 51
    • 84957069814 scopus 로고    scopus 로고
    • Text categorization with support vector machines: Learning with many relevant features
    • Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning, pages 137-142, 1998.
    • (1998) Proceedings of the European Conference on Machine Learning , pp. 137-142
    • Joachims, T.1
  • 52
    • 0002714543 scopus 로고    scopus 로고
    • Making large-scale SVM learning practical
    • B. Schoelkopf, C. Burges, and A. Smola, editors, The MIT Press
    • Thorsten Joachims. Making large-scale SVM learning practical. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 169-184. The MIT Press, 1999a.
    • (1999) Advances in Kernel Methods - Support Vector Learning , pp. 169-184
    • Joachims, T.1
  • 59
    • 0036161242 scopus 로고    scopus 로고
    • Text categorization with support vector machines: How to represent texts in input space
    • Edda Leopold and Joerg Kindermann. Text categorization with support vector machines: How to represent texts in input space. Machine Learning, 46:423-444, 2002.
    • (2002) Machine Learning , vol.46 , pp. 423-444
    • Leopold, E.1    Kindermann, J.2
  • 63
    • 84876811202 scopus 로고    scopus 로고
    • RCV1: A new benchmark collection for text categorization research
    • David D. Lewis, Yiming Yang, Tony Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 361-397
    • Lewis, D.D.1    Yang, Y.2    Rose, T.3    Li, F.4
  • 64
    • 0036778917 scopus 로고    scopus 로고
    • Feature generation using general constructor functions
    • Shaul Markovitch and Danny Rosenstein. Feature generation using general constructor functions. Machine Learning, 49(1):59-98, 2002.
    • (2002) Machine Learning , vol.49 , Issue.1 , pp. 59-98
    • Markovitch, S.1    Rosenstein, D.2
  • 67
    • 35748973139 scopus 로고    scopus 로고
    • The Universal Decimal Classification: Guide to its Use
    • Ia Mcilwaine. The Universal Decimal Classification: Guide to its Use. UDC Consortium, 2000.
    • (2000) UDC Consortium
    • Mcilwaine, I.1
  • 68
    • 36448953563 scopus 로고    scopus 로고
    • MeSH, National Library of Medicine
    • MeSH. Medical subject headings (MeSH). National Library of Medicine, 2003. http://www.nlm.nih.gov/mesh.
    • (2003) Medical subject headings
    • MeSH1
  • 69
    • 0141611059 scopus 로고    scopus 로고
    • Turning wordnet into an information retrieval resource: Systematic polysemy and conversion to hierarchical codes
    • Rada Mihalcea. Turning wordnet into an information retrieval resource: Systematic polysemy and conversion to hierarchical codes. International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), 17(1):689-704, 2003.
    • (2003) International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) , vol.17 , Issue.1 , pp. 689-704
    • Mihalcea, R.1
  • 70
    • 35748942459 scopus 로고    scopus 로고
    • Feature lattices and maximum entropy models
    • Andrei Mikheev. Feature lattices and maximum entropy models. Information Retrieval, 1999.
    • (1999) Information Retrieval
    • Mikheev, A.1
  • 71
    • 0003268983 scopus 로고
    • Explanation-based generalization: A unifying view
    • Tom Mitchell, Richard Keller, and Smadar Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1(1):47-80, 1986.
    • (1986) Machine Learning , vol.1 , Issue.1 , pp. 47-80
    • Mitchell, T.1    Keller, R.2    Kedar-Cabelli, S.3
  • 75
    • 85140468046 scopus 로고
    • ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees
    • Morgan Kaufmann
    • Patrick M. Murphy and Michael J. Pazzani. ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees. In Proceedings of the 8th International Conference on Machine Learning, pages 183-188. Morgan Kaufmann, 1991.
    • (1991) Proceedings of the 8th International Conference on Machine Learning , pp. 183-188
    • Murphy, P.M.1    Pazzani, M.J.2
  • 76
    • 0033886806 scopus 로고    scopus 로고
    • Text classification from labeled and unlabeled documents using EM
    • Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2-3): 103-134, 2000.
    • (2000) Machine Learning , vol.39 , Issue.2-3 , pp. 103-134
    • Nigam, K.1    McCallum, A.2    Thrun, S.3    Mitchell, T.4
  • 77
    • 33750027607 scopus 로고    scopus 로고
    • Semi-supervised text classification using EM
    • Olivier Chapelle, Bernhard Schoelkopf, and Alexander Zien, editors, MIT Press, Boston, MA
    • Kamal Nigam, Andrew McCallum, and Tom Mitchell. Semi-supervised text classification using EM. In Olivier Chapelle, Bernhard Schoelkopf, and Alexander Zien, editors, Semi-Supervised Learning. MIT Press, Boston, MA, 2006.
    • (2006) Semi-Supervised Learning
    • Nigam, K.1    McCallum, A.2    Mitchell, T.3
  • 78
    • 0025389210 scopus 로고
    • Boolean feature discovery in empirical learning
    • ISSN 0885-6125
    • Giulia Pagallo and David Haussler. Boolean feature discovery in empirical learning. Machine Learning, 5(1):71-99, 1990. ISSN 0885-6125.
    • (1990) Machine Learning , vol.5 , Issue.1 , pp. 71-99
    • Pagallo, G.1    Haussler, D.2
  • 81
    • 3843083955 scopus 로고    scopus 로고
    • Augmenting naive Bayes classifiers with statistical language models
    • Fuchun Peng, Dale Schuurmans, and Shaojun Wang. Augmenting naive Bayes classifiers with statistical language models. Information Retrieval, 7(3-4):317-345, 2004.
    • (2004) Information Retrieval , vol.7 , Issue.3-4 , pp. 317-345
    • Peng, F.1    Schuurmans, D.2    Wang, S.3
  • 82
    • 84948481845 scopus 로고
    • An algorithm for suffix stripping
    • Martin Porter. An algorithm for suffix stripping. Program, 14(3):130-137, 1980.
    • (1980) Program , vol.14 , Issue.3 , pp. 130-137
    • Porter, M.1
  • 85
    • 84948187054 scopus 로고    scopus 로고
    • Second order features for maximizing text classification performance
    • L. De Raedt and P. Flach, editors, Proceedings of the European Conference on Machine Learning ECML, Springer-Verlag
    • Bhavani Raskutti, Herman Ferra, and Adam Kowalczyk. Second order features for maximizing text classification performance. In L. De Raedt and P. Flach, editors, Proceedings of the European Conference on Machine Learning (ECML), Lecture notes in Artificial Intelligence (LNAI) 2167, pages 419-430. Springer-Verlag, 2001.
    • (2001) Lecture notes in Artificial Intelligence (LNAI , vol.2167 , pp. 419-430
    • Raskutti, B.1    Ferra, H.2    Kowalczyk, A.3
  • 90
    • 0242659655 scopus 로고    scopus 로고
    • Hierarchical text categorization using neural networks
    • Miguel E. Ruiz and Padmini Srinivasan. Hierarchical text categorization using neural networks. Information Retrieval, 5:87-118, 2002.
    • (2002) Information Retrieval , vol.5 , pp. 87-118
    • Ruiz, M.E.1    Srinivasan, P.2
  • 91
    • 0346907250 scopus 로고    scopus 로고
    • A survey on the use of relevance feedback for information access systems
    • Ian Ruthven and Mounia Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2):95-145, 2003.
    • (2003) Knowledge Engineering Review , vol.18 , Issue.2 , pp. 95-145
    • Ruthven, I.1    Lalmas, M.2
  • 93
    • 45549117987 scopus 로고
    • Term weighting approaches in automatic text retrieval
    • Gerard Salton and Chris Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513-523, 1988.
    • (1988) Information Processing and Management , vol.24 , Issue.5 , pp. 513-523
    • Salton, G.1    Buckley, C.2
  • 96
    • 0002442796 scopus 로고    scopus 로고
    • Machine learning in automated text categorization
    • URL
    • Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1): 1-47, 2002. URL http://faure.iei.pi.cnr.it/ fabrizio/Publications/ACMCS02.pdf.
    • (2002) ACM Computing Surveys , vol.34 , Issue.1 , pp. 1-47
    • Sebastiani, F.1
  • 98
    • 4043114317 scopus 로고    scopus 로고
    • Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking
    • Kentaro Toyama and Eric Horvitz. Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proceedings of the 4th Asian Conference on Computer Vision, 2000.
    • (2000) Proceedings of the 4th Asian Conference on Computer Vision
    • Toyama, K.1    Horvitz, E.2
  • 101
    • 0002124083 scopus 로고    scopus 로고
    • Using wordnet for text retrieval
    • Christiane Fellbaum, editor, The MIT Press
    • Ellen M. Voorhees. Using wordnet for text retrieval. In Christiane Fellbaum, editor, WordNet, an Electronic Lexical Database. The MIT Press, 1998.
    • (1998) WordNet, an Electronic Lexical Database
    • Voorhees, E.M.1
  • 107
    • 0003029479 scopus 로고    scopus 로고
    • Improving the effectiveness of information retrieval with local context analysis
    • Jinxi Xu and W. Bruce Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1):79-112, 2000.
    • (2000) ACM Transactions on Information Systems , vol.18 , Issue.1 , pp. 79-112
    • Jinxi, X.1    Bruce Croft, W.2
  • 111
    • 84880475213 scopus 로고    scopus 로고
    • Improving pseudo-relevance feedback in web information retrieval using web page segmentation
    • Budapest, Hungary, May, ACM Press
    • Shipeng Yu, Deng Cai, Ji-Rong Wen, and Wei-Ying Ma. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of the 12th International World Wide Web Conference (WWW'03), Budapest, Hungary, May 2003. ACM Press.
    • (2003) Proceedings of the 12th International World Wide Web Conference (WWW'03)
    • Yu, S.1    Cai, D.2    Wen, J.3    Ma, W.4
  • 112
    • 0010253845 scopus 로고    scopus 로고
    • Improving short-text classification using unlabeled background knowledge to assess document similarity
    • Sarah Zelikovitz and Haym Hirsh. Improving short-text classification using unlabeled background knowledge to assess document similarity. In Proceedings of the 17th International Conference on Machine Learning, pages 1183-1190, 2000.
    • (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 1183-1190
    • Zelikovitz, S.1    Hirsh, H.2
  • 114
    • 0002848777 scopus 로고    scopus 로고
    • Exploring the similarity space
    • Justin Zobel and Alistair Moffat. Exploring the similarity space. ACM SIGIR Forum, 32(1):18-34, 1998.
    • (1998) ACM SIGIR Forum , vol.32 , Issue.1 , pp. 18-34
    • Zobel, J.1    Moffat, A.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.