메뉴 건너뛰기




Volumn 58, Issue 12, 2007, Pages 1820-1837

Mining web data for Chinese segmentation

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; IMAGE SEGMENTATION; INDEXING (OF INFORMATION); INFORMATION RETRIEVAL SYSTEMS; LINGUISTICS; WEBSITES;

EID: 35348893799     PISSN: 15322882     EISSN: 15322890     Source Type: Journal    
DOI: 10.1002/asi.20629     Document Type: Article
Times cited : (8)

References (61)
  • 1
    • 35348877545 scopus 로고    scopus 로고
    • Agirre, E., Ansa, O., Hovy, E., & Martinez, D. (2002). Enriching very large Ontologies using the WWW. In Proceedings of the First Workshop on Ontology Learning (OL'2000), Berlin, Germany, August 2000, CEUR Workshop Proceedings Series, 31, Sun SITE Central Europe (CEUR), Aachen, German, (pp. 73-77). Held in conjunction with the 14th European Conference on Artificial Intelligence ECAI'2000, Berlin, Germany.
    • Agirre, E., Ansa, O., Hovy, E., & Martinez, D. (2002). Enriching very large Ontologies using the WWW. In Proceedings of the First Workshop on Ontology Learning (OL'2000), Berlin, Germany, August 2000, CEUR Workshop Proceedings Series, Vol. 31, Sun SITE Central Europe (CEUR), Aachen, German, (pp. 73-77). Held in conjunction with the 14th European Conference on Artificial Intelligence ECAI'2000, Berlin, Germany.
  • 2
    • 35348928330 scopus 로고    scopus 로고
    • Retrieved from
    • Baidu. (2006). Retrieved from http://www.baidu.com
    • (2006)
    • Baidu1
  • 3
    • 9444233154 scopus 로고    scopus 로고
    • AskMSR: Question answering using the Worldwide Web
    • Bases, Palo Alto, California, USA, March, American Association for Artificial Intelligence, Menlo Park, California pp
    • Banko, M., Brill, E., Dumais, S., & Lin, J. (2002). AskMSR: Question answering using the Worldwide Web. In Proceedings of 2002 AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases, Palo Alto, California, USA, March 2002, American Association for Artificial Intelligence, Menlo Park, California (pp. 7-8).
    • (2002) Proceedings of 2002 AAAI Spring Symposium on Mining Answers from Texts and Knowledge , pp. 7-8
    • Banko, M.1    Brill, E.2    Dumais, S.3    Lin, J.4
  • 4
    • 35348919050 scopus 로고    scopus 로고
    • Version 1, Linguistic Data Consortium, Philadelphia. Retrieved from
    • Brants, T., & Franz, A. (2006). Web 1T 5-Gram Version 1, Linguistic Data Consortium, Philadelphia. Retrieved from http://www.ldc.upenn.edu/Catalog/ CatalogEntry.jsp?catalogId=LDC2006T13
    • (2006) Web 1T 5-Gram
    • Brants, T.1    Franz, A.2
  • 5
    • 0038589165 scopus 로고    scopus 로고
    • The anatomy of a large-scale hypertextual Web search engine
    • Brisbane, Australia, April , Amsterdam, Netherland: Elsevier Science B.V
    • Brin, S., & Page, L., The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the Seventh International World Wide Web Conference (WWW7), Brisbane, Australia, April 1998 (pp. 107-117). Amsterdam, Netherland: Elsevier Science B.V.
    • (1998) Proceedings of the Seventh International World Wide Web Conference (WWW7) , pp. 107-117
    • Brin, S.1    Page, L.2
  • 6
    • 35348915607 scopus 로고    scopus 로고
    • Cambridge University Library, Retrieved from
    • Cambridge University Library. (2006). Chinese classification system. Retrieved from http://www.lib.cam.ac.uk/mulu/class.html
    • (2006) Chinese classification system
  • 9
    • 26444433846 scopus 로고    scopus 로고
    • Unknown word extraction for Chinese documents
    • Taipei, Taiwan, August, Association for Computational Linguistics, Morristown, New Jersy, USA, 2002
    • Chen, K.J., & Ma, W.Y. (2002). Unknown word extraction for Chinese documents. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, August 2002, Association for Computational Linguistics, Morristown, New Jersy, USA, (vol. 1, pp. 1-7).
    • (2002) Proceedings of the 19th International Conference on Computational Linguistics (COLING , vol.1 , pp. 1-7
    • Chen, K.J.1    Ma, W.Y.2
  • 10
    • 85132028005 scopus 로고
    • Word association norms, mutual information, and lexicography
    • Vancouver, British Columbia, Canada, June, Association for Computational Linguistics, Morristown, New Jersey, USA, pp
    • Church, K., & Hanks, P. (1989). Word association norms, mutual information, and lexicography. In Proceedings of 27th Annual Meeting of the Association for Computational Linguistics (ACL-89), Vancouver, British Columbia, Canada, June 1989, Association for Computational Linguistics, Morristown, New Jersey, USA, (pp. 76-83).
    • (1989) Proceedings of 27th Annual Meeting of the Association for Computational Linguistics (ACL-89) , pp. 76-83
    • Church, K.1    Hanks, P.2
  • 13
    • 0001368422 scopus 로고
    • New method in automatic extraction
    • Edmundson, H. (1968). New method in automatic extraction. Journal of the ACM, 16(2), 264-285.
    • (1968) Journal of the ACM , vol.16 , Issue.2 , pp. 264-285
    • Edmundson, H.1
  • 18
    • 35348894614 scopus 로고    scopus 로고
    • Retrieved from
    • Google (2005). Retrieved from http://www.google.com
    • (2005)
    • Google1
  • 19
    • 33748088137 scopus 로고    scopus 로고
    • Web search engines: Part 2
    • Hawking, D. (2006). Web search engines: Part 2. Computer, 39(8), 88-90.
    • (2006) Computer , vol.39 , Issue.8 , pp. 88-90
    • Hawking, D.1
  • 21
    • 35348821436 scopus 로고    scopus 로고
    • Keller, F., Maria, L., & Olga, O. (2002). Using the Web to overcome data sparseness. In Jan Hajic and Yuji Matsumoto (Eds.). In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Philadelphia, Pennsylvania USA, July 2002, Association for Computational Linguistics, Morristown, New Jersey, USA, (10, pp. 230-237)
    • Keller, F., Maria, L., & Olga, O. (2002). Using the Web to overcome data sparseness. In Jan Hajic and Yuji Matsumoto (Eds.). In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Philadelphia, Pennsylvania USA, July 2002, Association for Computational Linguistics, Morristown, New Jersey, USA, (Vol. 10, pp. 230-237)
  • 22
    • 35348917794 scopus 로고    scopus 로고
    • Learning case-based knowledge for disambiguating Chinese word segmentation: A preliminary study
    • Taipei, Taiwan, September, Association for Computational Linguistics, Morristown, New Jersey, USA
    • Kit, C., Pan, H., Chen, H. (2002). Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study. In Proceedings of the first SIGHAN Workshop on Chinese Language Processing, Taipei, Taiwan, September 2002, Association for Computational Linguistics, Morristown, New Jersey, USA, (Vol. 18, pp. 1-7).
    • (2002) Proceedings of the first SIGHAN Workshop on Chinese Language Processing , vol.18 , pp. 1-7
    • Kit, C.1    Pan, H.2    Chen, H.3
  • 25
    • 35348927690 scopus 로고    scopus 로고
    • Language Information Sciences Research Center. (2006). The most common Chinese new words in 2004. Hong Kong: City University of Hong Kong.
    • Language Information Sciences Research Center. (2006). The most common Chinese new words in 2004. Hong Kong: City University of Hong Kong.
  • 26
    • 84989404489 scopus 로고    scopus 로고
    • A statistical learning approach to improving the accuracy of Chinese word segmentation
    • Leung, C.H., & Kan, W.K. (1996). A statistical learning approach to improving the accuracy of Chinese word segmentation. Literary and Linguistic Computing, 11(2), 87-92.
    • (1996) Literary and Linguistic Computing , vol.11 , Issue.2 , pp. 87-92
    • Leung, C.H.1    Kan, W.K.2
  • 27
    • 26444536619 scopus 로고    scopus 로고
    • Li, H.Q., Huang, CN., Gao, J.F., & Fan, X.Z. (2004). The use of SVM for Chinese new word identification. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04), Sanya City, Hainan Island, China, March 2004, Lecture Notes in Computer Science Series, 3248, Springer, 2005, Springer, Berlin, Germany, (pp. 723-732).
    • Li, H.Q., Huang, CN., Gao, J.F., & Fan, X.Z. (2004). The use of SVM for Chinese new word identification. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04), Sanya City, Hainan Island, China, March 2004, Lecture Notes in Computer Science Series, Vol. 3248, Springer, 2005, Springer, Berlin, Germany, (pp. 723-732).
  • 28
    • 33646429607 scopus 로고    scopus 로고
    • Unsupervised training for over-lapping ambiguity resolution in Chinese word segmentation
    • Sapporo, Japan, July, Association for Computational Linguistics, Morristown, New Jersey, USA
    • Li, M., Gao, J., Huang, C., & Li, J. (2003). Unsupervised training for over-lapping ambiguity resolution in Chinese word segmentation. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, July 2003, Association for Computational Linguistics, Morristown, New Jersey, USA. (vol. 17, pp, 1-7).
    • (2003) Proceedings of the Second SIGHAN Workshop on Chinese Language Processing , vol.17 , pp. 1-7
    • Li, M.1    Gao, J.2    Huang, C.3    Li, J.4
  • 29
    • 35348919688 scopus 로고    scopus 로고
    • Library of Congress. (2004). New Chinese romanization guidelines. Retrieved fromhttp://www.loc.gov/catdir/cpso/romanization/chinese.pdf
    • Library of Congress. (2004). New Chinese romanization guidelines. Retrieved fromhttp://www.loc.gov/catdir/cpso/romanization/chinese.pdf
  • 30
    • 0000880768 scopus 로고
    • Automatic creation of literature abstracts
    • Luhn, H. (1958). Automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159-165.
    • (1958) IBM Journal of Research and Development , vol.2 , Issue.2 , pp. 159-165
    • Luhn, H.1
  • 33
    • 33749541689 scopus 로고    scopus 로고
    • A method for word sense disambiguation of unrestricted text
    • College Park, Maryland, USA, June, Association for Computational Linguistics, Morristown, New Jersey, USA, pp
    • Mihalcea, R., & Moldovan, D. (1999). A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), College Park, Maryland, USA, June 1999, Association for Computational Linguistics, Morristown, New Jersey, USA, (pp. 152-158).
    • (1999) Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99) , pp. 152-158
    • Mihalcea, R.1    Moldovan, D.2
  • 35
    • 35348925587 scopus 로고    scopus 로고
    • National Library of Australia, Retrieved from
    • National Library of Australia. (2002). Kinetica Chinese Japanese Korean (CJK) service report. Retrieved from http://www.nla.gov.au/librariesaustralia/ cjk/events/cjkreport2002.html
    • (2002) Kinetica Chinese Japanese Korean (CJK) service report
  • 36
    • 35348881690 scopus 로고
    • National People's Congress, Beijing Author
    • National People's Congress. (1958). Han yu pin yin fang an, Beijing Author.
    • (1958) Han yu pin yin fang an
  • 37
    • 35348912111 scopus 로고    scopus 로고
    • Nie, J.Y., Jin, W., Hannaan, M.L. (1994). A hybrid approach to unknown word detection and segmentation of Chinese. In Proceedings of International Conference on Chinese Computing (ICCC-94), Singapore, June 1994, National University of Singapore, Singapore, (pp. 326-335).
    • Nie, J.Y., Jin, W., Hannaan, M.L. (1994). A hybrid approach to unknown word detection and segmentation of Chinese. In Proceedings of International Conference on Chinese Computing (ICCC-94), Singapore, June 1994, National University of Singapore, Singapore, (pp. 326-335).
  • 40
    • 0004217877 scopus 로고
    • 2nd ed, Glascow: Department of Computer Science, University of Glasgow
    • Rijsbergen, C.V. (1979). Information retrieval (2nd ed.). Glascow: Department of Computer Science, University of Glasgow.
    • (1979) Information retrieval
    • Rijsbergen, C.V.1
  • 41
    • 45549117987 scopus 로고
    • Term-weighting approaches in automatic text retrieval
    • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24, 513-523.
    • (1988) Information Processing and Management , vol.24 , pp. 513-523
    • Salton, G.1    Buckley, C.2
  • 42
    • 34248897035 scopus 로고
    • How fully does a machine-usable dictionary cover English text?
    • Sampson, G. (1989). How fully does a machine-usable dictionary cover English text? Literary and Linguistic Computing, 4, 29-35.
    • (1989) Literary and Linguistic Computing , vol.4 , pp. 29-35
    • Sampson, G.1
  • 44
    • 6344253989 scopus 로고    scopus 로고
    • The first international Chinese word segmentation bakeoff
    • Sapporo, Japan, July, Association for Computational Linguistics, Morristown, New Jersey, USA
    • Sproat, R., Emerson, T. (2003). The first international Chinese word segmentation bakeoff. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, July 2003, Association for Computational Linguistics, Morristown, New Jersey, USA. (vol. 17, pp. 133-143).
    • (2003) Proceedings of the Second SIGHAN Workshop on Chinese Language Processing , vol.17 , pp. 133-143
    • Sproat, R.1    Emerson, T.2
  • 47
    • 0001076101 scopus 로고    scopus 로고
    • A Stochastic Finite-State Word-Segmentation Algorithm for Chinese
    • Sproat, R., Shih C, Gale, W., Chang, N. (1996). A Stochastic Finite-State Word-Segmentation Algorithm for Chinese, Computational Linguistics, 22(3), 377-404.
    • (1996) Computational Linguistics , vol.22 , Issue.3 , pp. 377-404
    • Sproat, R.1    Shih, C.2    Gale, W.3    Chang, N.4
  • 49
    • 84872841506 scopus 로고    scopus 로고
    • Chinese word segmentation without using lexicon and hand-crafted training data
    • Montreal, Quebec, Canada, August, Association for Computational Linguistics, Morristown, New Jersey, USA, 1998
    • Sun, M.S., Shen, D.Y., T'sou, B.K. (1998). Chinese word segmentation without using lexicon and hand-crafted training data. In Proceedings of the 17th International Conference on Computational Linguistics (COLING 1998), Montreal, Quebec, Canada, August 1998, Association for Computational Linguistics, Morristown, New Jersey, USA, (vol. 2, pp. 1265-1271).
    • (1998) Proceedings of the 17th International Conference on Computational Linguistics (COLING , vol.2 , pp. 1265-1271
    • Sun, M.S.1    Shen, D.Y.2    T'sou, B.K.3
  • 50
    • 0001277731 scopus 로고    scopus 로고
    • A compression-based algorithm for Chinese word segmentation
    • Teahan, W.J., Wen, Y.Y., McNad, R., & Witten I. (2000). A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26(3), 375-393.
    • (2000) Computational Linguistics , vol.26 , Issue.3 , pp. 375-393
    • Teahan, W.J.1    Wen, Y.Y.2    McNad, R.3    Witten, I.4
  • 51
    • 35348914947 scopus 로고    scopus 로고
    • Chinese word auto-confirmation agent
    • Hsinchu, Taiwan, September, Association for Computational Linguistics and Chinese Language Processing, Taipei, Taiwan, pp
    • Tsai, J.L., Sung, C.L., Hsu W.L. (2003) Chinese word auto-confirmation agent. In Proceedings of Fifth Conference on Computational Linguistics and Speech Processing (ROCLING XV), Hsinchu, Taiwan, September 2003, Association for Computational Linguistics and Chinese Language Processing, Taipei, Taiwan, (pp. 175-192).
    • (2003) Proceedings of Fifth Conference on Computational Linguistics and Speech Processing (ROCLING XV) , pp. 175-192
    • Tsai, J.L.1    Sung, C.L.2    Hsu, W.L.3
  • 53
    • 16244408434 scopus 로고
    • Computation of word associations based on the co-occurrences of words in large corpora
    • Columbus, Ohio, June, Association for Computational Linguistics, Morristown, New Jersey, USA, pp
    • Wettler, M., Rapp, R., (1993) Computation of word associations based on the co-occurrences of words in large corpora. In Proceedings of the First Workshop on Very Large Corpora (WVLC-1), Columbus, Ohio, June 1993, Association for Computational Linguistics, Morristown, New Jersey, USA, (pp. 84-93).
    • (1993) Proceedings of the First Workshop on Very Large Corpora (WVLC-1) , pp. 84-93
    • Wettler, M.1    Rapp, R.2
  • 54
    • 84989592173 scopus 로고
    • Chinese text segmentation for text retrieval: Achievements and problems
    • Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: Achievements and problems. Journal of the American Society for Information Science, 44(9), 532-542.
    • (1993) Journal of the American Society for Information Science , vol.44 , Issue.9 , pp. 532-542
    • Wu, Z.1    Tseng, G.2
  • 55
    • 35348855226 scopus 로고    scopus 로고
    • Segmentation guideline
    • Chinese Treebank Project. Technical Report, University of Pennsylvania. Retrieved from
    • Xia, F. (1999). Segmentation guideline. Chinese Treebank Project. Technical Report, University of Pennsylvania. Retrieved from http://morph/ldc.upenn.edu/ctb/
    • (1999)
    • Xia, F.1
  • 56
    • 0346955705 scopus 로고    scopus 로고
    • Segmenting Chinese unknown words by heuristic method
    • Proceedings of the International Conference on Asia Digital Libraries, Malaysia, December, Springer, Berlin, Germany, pp
    • Yang, C. C., & Li, K. W. (2003). Segmenting Chinese unknown words by heuristic method. In Proceedings of the International Conference on Asia Digital Libraries, Malaysia, December 2003, Lecture Notes in Computer Science, vol. 2911, Springer, Berlin, Germany, (pp. 510-520).
    • (2003) Lecture Notes in Computer Science , vol.2911 , pp. 510-520
    • Yang, C.C.1    Li, K.W.2
  • 57
    • 4944220726 scopus 로고    scopus 로고
    • Error analysis of Chinese text segmentation using statistical approach
    • Tucson, Arizona, USA, June , New York, USA: ACM Press
    • Yang, C. C., & Li, K. W. (2004), Error analysis of Chinese text segmentation using statistical approach. In Proceedings of ACM/IEEE Joint Conference on Digital Libraries, Tucson, Arizona, USA, June 2004 (pp. 256-257), New York, USA: ACM Press.
    • (2004) Proceedings of ACM/IEEE Joint Conference on Digital Libraries , pp. 256-257
    • Yang, C.C.1    Li, K.W.2
  • 60
    • 35348908554 scopus 로고    scopus 로고
    • SYSTRAN'S Chinese word segmentation
    • Sapporo, Japan, July, Association for Computational Linguistics, Morristown, New Jersey, USA
    • Yang, J., Senellart, J., Zajac, R. (2003). SYSTRAN'S Chinese word segmentation. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, July 2003, Association for Computational Linguistics, Morristown, New Jersey, USA. (vol. 17, pp. 180-183).
    • (2003) Proceedings of the Second SIGHAN Workshop on Chinese Language Processing , vol.17 , pp. 180-183
    • Yang, J.1    Senellart, J.2    Zajac, R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.