메뉴 건너뛰기




Volumn 28, Issue 1, 2007, Pages 37-78

Classifying web documents in a hierarchy of categories: A comprehensive study

Author keywords

Feature selection; Hierarchical models; Performance evaluation; Supervised learning; Text categorization; Web content mining

Indexed keywords

DATA MINING; HIERARCHICAL SYSTEMS; INFORMATION RETRIEVAL SYSTEMS; KNOWLEDGE REPRESENTATION; TEXT PROCESSING; WORLD WIDE WEB;

EID: 33846979476     PISSN: 09259902     EISSN: 15737675     Source Type: Journal    
DOI: 10.1007/s10844-006-0003-2     Document Type: Article
Times cited : (99)

References (58)
  • 1
    • 0028461417 scopus 로고
    • Automated learning of decision rules for text categorization
    • Aptć, C., Damerau, F., & Weiss, S. M. (1994). Automated learning of decision rules for text categorization. Information Systems, 12(3), 233-251.
    • (1994) Information Systems , vol.12 , Issue.3 , pp. 233-251
    • Aptć, C.1    Damerau, F.2    Weiss, S.M.3
  • 2
    • 0004007675 scopus 로고    scopus 로고
    • Assessing the calibration of Naive Bayes' posterior estimates
    • CMU-CS-00-155. Technical report, School of Computer Science, Carnegie-Mellon University
    • Bennett, P. (2000). Assessing the calibration of Naive Bayes' posterior estimates. CMU-CS-00-155. Technical report, School of Computer Science, Carnegie-Mellon University.
    • (2000)
    • Bennett, P.1
  • 3
    • 11244325978 scopus 로고    scopus 로고
    • Hierarchical multi-classification
    • S. Džeroski, L. de Raedt & S. Wrobel Eds, Edmonton, Canada: University of Alberta
    • Blocked, H., Bruynooghe, M., Dzeroski, S., Ramon, J., & Struyf, J. (2002). Hierarchical multi-classification. In S. Džeroski, L. de Raedt & S. Wrobel (Eds.), Multi-Relational Data Mining 2002 (pp. 21-35). Edmonton, Canada: University of Alberta.
    • (2002) Multi-Relational Data Mining 2002 , pp. 21-35
    • Blocked, H.1    Bruynooghe, M.2    Dzeroski, S.3    Ramon, J.4    Struyf, J.5
  • 4
    • 35248889441 scopus 로고    scopus 로고
    • Hierarchical classification of HTML documents with WebClassII
    • F. Sebastiani Ed, Berlin Heidelberg NewYork: Springer
    • Ceci, M., & Malerba, D. (2003). Hierarchical classification of HTML documents with WebClassII. In F. Sebastiani (Ed.), Proceedings of ECIR-03, 25th European Conference on Information Retrieval (pp. 57-72). Berlin Heidelberg NewYork: Springer.
    • (2003) Proceedings of ECIR-03, 25th European Conference on Information Retrieval , pp. 57-72
    • Ceci, M.1    Malerba, D.2
  • 9
    • 0031269184 scopus 로고    scopus 로고
    • On the optimality of the simple Bayesian classifier under zero-one loss
    • Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2-3), 103-130.
    • (1997) Machine Learning , vol.29 , Issue.2-3 , pp. 103-130
    • Domingos, P.1    Pazzani, M.2
  • 11
    • 85105809948 scopus 로고    scopus 로고
    • Inductive learning algorithms and representations for text categorization
    • New York: ACM
    • Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of ACM-CIKM98 (pp. 148-155). New York: ACM.
    • (1998) Proceedings of ACM-CIKM98 , pp. 148-155
    • Dumais, S.1    Platt, J.2    Heckerman, D.3    Sahami, M.4
  • 12
    • 33847006369 scopus 로고    scopus 로고
    • Esposito, F., Malerba, D., Tamma, V., & Bock, H.-H. (2000). Classical resemblance measures. In H.-H. Bock & E. Diday (Eds.), Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data, 15 of Studies in Classification, Data Analysis, and Knowledge Organization (Chapter 8.1, pp. 139-152). Berlin Heidelberg New York: Springer.
    • Esposito, F., Malerba, D., Tamma, V., & Bock, H.-H. (2000). Classical resemblance measures. In H.-H. Bock & E. Diday (Eds.), Analysis of Symbolic Data. Exploratory methods for extracting statistical information from complex data, volume 15 of Studies in Classification, Data Analysis, and Knowledge Organization (Chapter 8.1, pp. 139-152). Berlin Heidelberg New York: Springer.
  • 17
    • 0002409860 scopus 로고    scopus 로고
    • A probabilistic analysis of the Rocchio algorithm with tfidf for text categorization
    • San Mateo, California: Morgan Kaufmann
    • Joachims, T. (1997). A probabilistic analysis of the Rocchio algorithm with tfidf for text categorization. In ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning (pp. 143-151). San Mateo, California: Morgan Kaufmann.
    • (1997) ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning , pp. 143-151
    • Joachims, T.1
  • 19
    • 84957069814 scopus 로고    scopus 로고
    • Text categorization with support vector machines: Learning with many relevant features
    • Berlin Heidelberg New York: Springer
    • Joachims, T. (1998b). Text categorization with support vector machines: Learning with many relevant features. In ECML '98: Proceedings of the 10th European Conference on Machine Learning (pp. 137-142). Berlin Heidelberg New York: Springer.
    • (1998) ECML '98: Proceedings of the 10th European Conference on Machine Learning , pp. 137-142
    • Joachims, T.1
  • 20
    • 84947928349 scopus 로고    scopus 로고
    • Effective methods for improving Naive Bayes text classifier
    • 7th International Conference on Artificial Intelligence, of , Berlin Heidelberg New York: Springer
    • Kim, S., Rim, H., Yook, D., & Lim, H. (2002). Effective methods for improving Naive Bayes text classifier. In 7th International Conference on Artificial Intelligence, volume 2417 of LNAI (pp. 95-106). Berlin Heidelberg New York: Springer.
    • (2002) LNAI , vol.2417 , pp. 95-106
    • Kim, S.1    Rim, H.2    Yook, D.3    Lim, H.4
  • 23
    • 84957069091 scopus 로고    scopus 로고
    • Naive (Bayes) at forty: The independence assumption in information retrieval
    • Berlin Heidelberg New York: Springer
    • Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In ECML '98: Proceedings of the 10th European Conference on Machine Learning (pp. 4-15). Berlin Heidelberg New York: Springer.
    • (1998) ECML '98: Proceedings of the 10th European Conference on Machine Learning , pp. 4-15
    • Lewis, D.D.1
  • 26
    • 0001673996 scopus 로고    scopus 로고
    • A comparison of event models for Naive Bayes text classification
    • Menlo Park California: AAAI
    • McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization (pp. 41-48). Menlo Park California: AAAI.
    • (1998) AAAI-98 Workshop on Learning for Text Categorization , pp. 41-48
    • McCallum, A.1    Nigam, K.2
  • 30
    • 33646775496 scopus 로고    scopus 로고
    • Conditions for the equivalence of hierarchical and flat Bayesian classifiers
    • Technical report, Center for Automated Learning and Discovery, Carnegie-Mellon University
    • Mitchell, T. (1998). Conditions for the equivalence of hierarchical and flat Bayesian classifiers. Technical report, Center for Automated Learning and Discovery, Carnegie-Mellon University.
    • (1998)
    • Mitchell, T.1
  • 34
    • 0037375142 scopus 로고    scopus 로고
    • Feature selection on hierarchy of web documents
    • Mladenić, D., & Grobelnik, M. (2003). Feature selection on hierarchy of web documents. Decision Support Systems, 35(1), 45-87.
    • (2003) Decision Support Systems , vol.35 , Issue.1 , pp. 45-87
    • Mladenić, D.1    Grobelnik, M.2
  • 35
    • 0030651099 scopus 로고    scopus 로고
    • Feature selection, perception learning, and a usability case study for text categorization
    • Ng, H. T., Goh, W. B., & Low, K. L. (1997). Feature selection, perception learning, and a usability case study for text categorization. SIGIR Forum, 31(SI), 67-73.
    • (1997) SIGIR Forum , vol.31 , Issue.SI , pp. 67-73
    • Ng, H.T.1    Goh, W.B.2    Low, K.L.3
  • 36
    • 0003120218 scopus 로고    scopus 로고
    • Fast training of Support Vector Machines using sequential minimal optimization
    • B. Scholkopf, C. Burges & A. Smola Eds, MIT Press
    • Platt, J. (1998). Fast training of Support Vector Machines using sequential minimal optimization. In B. Scholkopf, C. Burges & A. Smola (Eds.), Advances in Kernel methods - support vector learning. MIT Press.
    • (1998) Advances in Kernel methods - support vector learning
    • Platt, J.1
  • 37
    • 84948481845 scopus 로고
    • An algorithm for suffix stripping
    • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137.
    • (1980) Program , vol.14 , Issue.3 , pp. 130-137
    • Porter, M.F.1
  • 39
    • 0242659655 scopus 로고    scopus 로고
    • Hierarchical text categorization using neural networks
    • Ruiz, M. E., & Srinivasan, P. (2002). Hierarchical text categorization using neural networks. Information Retrieval, 5(1), 87-118.
    • (2002) Information Retrieval , vol.5 , Issue.1 , pp. 87-118
    • Ruiz, M.E.1    Srinivasan, P.2
  • 41
    • 45549117987 scopus 로고
    • Term weighting approaches in automatic text retrieval
    • Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513-523.
    • (1988) Information Processing and Management , vol.24 , Issue.5 , pp. 513-523
    • Salton, G.1    Buckley, C.2
  • 42
    • 0033905095 scopus 로고    scopus 로고
    • Boostexter: A boosting-based system for text categorization
    • Schapire, R. E., & Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2-3), 135-168.
    • (2000) Machine Learning , vol.39 , Issue.2-3 , pp. 135-168
    • Schapire, R.E.1    Singer, Y.2
  • 44
    • 0002442796 scopus 로고    scopus 로고
    • Machine learning in automated text categorization
    • Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.
    • (2002) ACM Computing Surveys , vol.34 , Issue.1 , pp. 1-47
    • Sebastiani, F.1
  • 45
    • 33847000328 scopus 로고    scopus 로고
    • Improving the performance of Naive Bayes for text classification
    • CS224N spring. Technical report, Stanford University
    • Shen, Y., & Jiang, J. (2003). Improving the performance of Naive Bayes for text classification, CS224N spring. Technical report, Stanford University.
    • (2003)
    • Shen, Y.1    Jiang, J.2
  • 49
    • 4644319661 scopus 로고    scopus 로고
    • Experiment with a hierarchical text categorization method on the WIPO-alpha patent collection
    • Los Alamitos, Calfornia: IEEE Computer Society
    • Tikk, D., & Biro, G. (2003). Experiment with a hierarchical text categorization method on the WIPO-alpha patent collection. In ISUMA '03: Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis (p. 104). Los Alamitos, Calfornia: IEEE Computer Society.
    • (2003) ISUMA '03: Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis , pp. 104
    • Tikk, D.1    Biro, G.2
  • 51
    • 0036498205 scopus 로고    scopus 로고
    • A probabilistic framework for the hierarchic organisation and classification of document collections
    • Vinokourov, A., & Girolami, M. (2002). A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information System, 18(2-3), 153-172.
    • (2002) Journal of Intelligent Information System , vol.18 , Issue.2-3 , pp. 153-172
    • Vinokourov, A.1    Girolami, M.2
  • 53
    • 0030340642 scopus 로고    scopus 로고
    • An evaluation of statistical approaches to MEDLINE indexing
    • Philadelphia, Pennsylvania: Hanley and Belfus
    • Yang, Y. (1996). An evaluation of statistical approaches to MEDLINE indexing. In Proceedings of the AMIA (pp. 358-362). Philadelphia, Pennsylvania: Hanley and Belfus.
    • (1996) Proceedings of the AMIA , pp. 358-362
    • Yang, Y.1
  • 54
    • 27144441097 scopus 로고    scopus 로고
    • An evaluation of statistical approaches to text categorization
    • Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1 (1-2), 69-90.
    • (1999) Information Retrieval , vol.1 , Issue.1-2 , pp. 69-90
    • Yang, Y.1
  • 57
    • 1942420344 scopus 로고    scopus 로고
    • Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization
    • Menlo Park, AAAI Press
    • Zhang, J., Jin, R., Yang, Y., & Hauptmann, A. G. (2003). Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization. In Proceedings of the 20th International Conference on Machine Learning (pp. 888-895). Menlo Park, AAAI Press.
    • (2003) Proceedings of the 20th International Conference on Machine Learning , pp. 888-895
    • Zhang, J.1    Jin, R.2    Yang, Y.3    Hauptmann, A.G.4
  • 58
    • 16644402628 scopus 로고    scopus 로고
    • Feature selection for text categorization on imbalanced data
    • Zheng, Z., Wu, X., & Srihari, R. (2004). Feature selection for text categorization on imbalanced data. SIGKDD Explorations Newsletter, 6(1), 80-89.
    • (2004) SIGKDD Explorations Newsletter , vol.6 , Issue.1 , pp. 80-89
    • Zheng, Z.1    Wu, X.2    Srihari, R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.