메뉴 건너뛰기




Volumn 9, Issue 2, 2003, Pages 127-149

Mostly-unsupervised statistical segmentation of Japanese kanji sequences

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; INFORMATION RETRIEVAL; OPTICAL CHARACTER RECOGNITION; STATISTICAL METHODS; TEXT PROCESSING;

EID: 0037600401     PISSN: 13513249     EISSN: None     Source Type: Journal    
DOI: 10.1017/S1351324902002954     Document Type: Article
Times cited : (20)

References (45)
  • 1
    • 0037956008 scopus 로고    scopus 로고
    • Chinese text segmentation with MBDP-I: Making the most of training corpora
    • Brent, M. R. and Tao, X. (2001) Chinese text segmentation with MBDP-I: Making the most of training corpora, Proc. ACL.
    • (2001) Proc. ACL.
    • Brent, M.R.1    Tao, X.2
  • 2
    • 84936824188 scopus 로고
    • Word association norms, mutual information, and lexicography
    • Church, K. W. and Hanks, P. (1990) Word association norms, mutual information, and lexicography. Computational Linguistics 16(1): 22-29.
    • (1990) Computational Linguistics , vol.16 , Issue.1 , pp. 22-29
    • Church, K.W.1    Hanks, P.2
  • 4
    • 85029362060 scopus 로고    scopus 로고
    • A new statistical formula for Chinese text segmentation incorporating contextual information
    • Dai, Y., Loh, T. E. and Khoo, C. S. G. (1999) A new statistical formula for Chinese text segmentation incorporating contextual information. Proc. 22nd SIGIR, pp. 82-89.
    • (1999) Proc. 22nd SIGIR , pp. 82-89
    • Dai, Y.1    Loh, T.E.2    Khoo, C.S.G.3
  • 5
    • 85149140829 scopus 로고    scopus 로고
    • Japanese morphological analyzer using word co-occurrence - JTAG
    • Fuchi, T. and Takagi, S. (1998) Japanese morphological analyzer using word co-occurrence - JTAG. Proc. COLING-ACL '98, pp. 409-413.
    • (1998) Proc. COLING-ACL '98 , pp. 409-413
    • Fuchi, T.1    Takagi, S.2
  • 6
    • 1542381209 scopus 로고    scopus 로고
    • Extracting key terms from Chinese and Japanese texts
    • Fung, P. (1998) Extracting key terms from Chinese and Japanese texts. Computer Processing of Oriental Languages 12(1).
    • (1998) Computer Processing of Oriental Languages , vol.12 , Issue.1
    • Fung, P.1
  • 7
    • 84958571083 scopus 로고    scopus 로고
    • Discovering Chinese words from unsegmented text
    • (Poster abstract.)
    • Ge, X., Pratt, W. and Smyth, P. (1999) Discovering Chinese words from unsegmented text. Proc. 22nd SIGIR, pp. 271-272 (Poster abstract.)
    • (1999) Proc. 22nd SIGIR , pp. 271-272
    • Ge, X.1    Pratt, W.2    Smyth, P.3
  • 8
    • 0041079008 scopus 로고    scopus 로고
    • Unsupervised learning of the morphology of a natural language
    • Goldsmith, J. (1990) Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2): pp. 153-198.
    • (2001) Computational Linguistics , vol.27 , Issue.2 , pp. 153-198
    • Goldsmith, J.1
  • 9
    • 0038293492 scopus 로고
    • Evaluating parsing strategies using standardized parse files
    • Grishman, R., Macleod, C. and Sterling, J. (1992) Evaluating parsing strategies using standardized parse files. Proc. 3rd ANLP, pp. 156-161.
    • (1992) Proc. 3rd ANLP , pp. 156-161
    • Grishman, R.1    Macleod, C.2    Sterling, J.3
  • 10
    • 0037617943 scopus 로고
    • Language modeling by string pattern N-gram for Japanese speech recognition
    • Ito, A. and Kohda, K. (1995) Language modeling by string pattern N-gram for Japanese speech recognition. Proc. ICASSP.
    • (1995) Proc. ICASSP.
    • Ito, A.1    Kohda, K.2
  • 11
    • 0038632333 scopus 로고    scopus 로고
    • Use of mutual information based character clusters in dictionary-less morphological analysis of Japanese
    • Kashioka, H., Kawata, Y., Kinjo, Y., Finch, A. and Black, E. W. (1998) Use of mutual information based character clusters in dictionary-less morphological analysis of Japanese. Proc. COLING-ACL '98, pp. 658-662.
    • (1998) Proc. COLING-ACL '98 , pp. 658-662
    • Kashioka, H.1    Kawata, Y.2    Kinjo, Y.3    Finch, A.4    Black, E.W.5
  • 12
    • 0038293468 scopus 로고    scopus 로고
    • Japanese morphological analysis system JUMAN version 3.61 manual
    • In Japanese
    • Kurohashi, S. and Nagao, M. (1999) Japanese morphological analysis system JUMAN version 3.61 manual. In Japanese.
    • (1999)
    • Kurohashi, S.1    Nagao, M.2
  • 13
    • 0008686260 scopus 로고
    • Experiments on the use of bigram mutual information for Chinese natural language processing
    • Lua, K.-T. (1995) Experiments on the use of bigram mutual information for Chinese natural language processing. Int. Conf. Computer Processing of Oriental Languages (ICCPOL), pp. 23-25.
    • (1995) Int. Conf. Computer Processing of Oriental Languages (ICCPOL) , pp. 23-25
    • Lua, K.-T.1
  • 15
    • 85158028663 scopus 로고
    • Parsing a natural language using information statistics
    • Magerman, D. M. and Marcus, M. P. (1990) Parsing a natural language using information statistics. Proc. AAAI, pp. 984-989.
    • (1990) Proc. AAAI , pp. 984-989
    • Magerman, D.M.1    Marcus, M.P.2
  • 16
    • 0027681165 scopus 로고
    • Suffix arrays: A new method for on-line string searches
    • Manber, U. and Myers, G. (1993) Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5): 935-948.
    • (1993) SIAM J. Comput. , vol.22 , Issue.5 , pp. 935-948
    • Manber, U.1    Myers, G.2
  • 20
    • 0037617911 scopus 로고    scopus 로고
    • Unknown word extraction from corpora using n-gram statistics
    • In Japanese
    • Mori, S. and Nagao, M. (1998) Unknown word extraction from corpora using n-gram statistics. J. Infor. Process. Soc. Japan 39(7): 2093-2100. In Japanese.
    • (1998) J. Infor. Process. Soc. Japan , vol.39 , Issue.7 , pp. 2093-2100
    • Mori, S.1    Nagao, M.2
  • 21
    • 0006702241 scopus 로고
    • A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese
    • Nagao, M. and Mori, S. (1994) A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese. Proc. 15th COLING, pp. 611-615.
    • (1994) Proc. 15th COLING , pp. 611-615
    • Nagao, M.1    Mori, S.2
  • 22
    • 0038293491 scopus 로고
    • * n-best search algorithm
    • * n-best search algorithm. Proc. 15th COLING, pp. 201-207.
    • (1994) Proc. 15th COLING , pp. 201-207
    • Nagata, M.1
  • 23
    • 0037617941 scopus 로고    scopus 로고
    • Automatic extraction of new words from Japanese texts using generalized forward-backward search
    • Nagata, M. (1996a) Automatic extraction of new words from Japanese texts using generalized forward-backward search. Proc. Conf. Empirical Methods in Natural Language Processing. pp. 48-59.
    • (1996) Proc. Conf. Empirical Methods in Natural Language Processing , pp. 48-59
    • Nagata, M.1
  • 24
    • 0037617940 scopus 로고    scopus 로고
    • Context-based spelling correction for Japanese OCR
    • Nagata, M. (1996b) Context-based spelling correction for Japanese OCR. Proc. 16th COLING, pp. 806-811.
    • (1996) Proc. 16th COLING , pp. 806-811
    • Nagata, M.1
  • 25
    • 0038632330 scopus 로고    scopus 로고
    • A self-organizing Japanese word segmenter using heuristic word identification and re-estimation
    • Nagata, M. (1997) A self-organizing Japanese word segmenter using heuristic word identification and re-estimation. Proc. 5th Workshop on Very Large Corpora, pp. 203-215.
    • (1997) Proc. 5th Workshop on Very Large Corpora , pp. 203-215
    • Nagata, M.1
  • 26
    • 0033349349 scopus 로고    scopus 로고
    • Overlapping statistical segmentation for effective indexing of Japanese text
    • Ogawa, Y. and Matsuda, T. (1999) Overlapping statistical segmentation for effective indexing of Japanese text. Infor. Process. & Manage. 35, 463-480.
    • (1999) Infor. Process. & Manage , vol.35 , pp. 463-480
    • Ogawa, Y.1    Matsuda, T.2
  • 27
    • 85072855288 scopus 로고    scopus 로고
    • A trainable rule-based algorithm for word segmentation
    • Palmer, D. (1997) A trainable rule-based algorithm for word segmentation. Proc. 35th ACL/8th EACL, pp. 321-328.
    • (1997) Proc. 35th ACL/8th EACL , pp. 321-328
    • Palmer, D.1
  • 30
    • 0035743477 scopus 로고    scopus 로고
    • The use of predictive dependencies in language learning
    • Saffran, J. R. (2001) The use of predictive dependencies in language learning. J. Memory & Language 44: 493-515.
    • (2001) J. Memory & Language , vol.44 , pp. 493-515
    • Saffran, J.R.1
  • 31
    • 0030451408 scopus 로고    scopus 로고
    • Statistical learning by 8-month-old infants
    • Saffran, J. R., Aslin, R. N. and Newport, E. L. (1996) Statistical learning by 8-month-old infants. Science 274(5294): 1926-1928.
    • (1996) Science , vol.274 , Issue.5294 , pp. 1926-1928
    • Saffran, J.R.1    Aslin, R.N.2    Newport, E.L.3
  • 33
    • 0001076101 scopus 로고    scopus 로고
    • A stochastic finite-state word-segmentation algorithm for Chinese
    • Sproat, R., Shih, C., Gale, W. and Chang, N. (1996) A stochastic finite-state word-segmentation algorithm for Chinese. Computational Linguistics 22(3): 377-404.
    • (1996) Computational Linguistics , vol.22 , Issue.3 , pp. 377-404
    • Sproat, R.1    Shih, C.2    Gale, W.3    Chang, N.4
  • 34
    • 0008573508 scopus 로고    scopus 로고
    • Chinese word segmentation without using lexicon and hand-crafted training data
    • Sun, M., Shen, D. and Tsou, B. K. (1998) Chinese word segmentation without using lexicon and hand-crafted training data. Proc. COLING-ACL, pp. 1265-1271.
    • (1998) Proc. COLING-ACL , pp. 1265-1271
    • Sun, M.1    Shen, D.2    Tsou, B.K.3
  • 35
    • 0037617942 scopus 로고
    • Automatic decomposition of kanji compound words using stochastic estimation
    • In Japanese
    • Takeda, K. and Fujisaki, T. (1987) Automatic decomposition of kanji compound words using stochastic estimation. J. Infor. Process. Soc. Japan 28(9): 952-961, In Japanese.
    • (1987) J. Infor. Process. Soc. Japan , vol.28 , Issue.9 , pp. 952-961
    • Takeda, K.1    Fujisaki, T.2
  • 37
    • 0001277731 scopus 로고    scopus 로고
    • A compression-based algorithm for Chinese word segmentation
    • Teahan, W. J., Wen, Y., McNab, R. and Witten, I. H. (2000) A compression-based algorithm for Chinese word segmentation. Computational Linguistics 26(3): 275-393.
    • (2000) Computational Linguistics , vol.26 , Issue.3 , pp. 275-393
    • Teahan, W.J.1    Wen, Y.2    McNab, R.3    Witten, I.H.4
  • 38
    • 0028587505 scopus 로고
    • A probabilistic algorithm for segmenting non-kanji Japanese strings
    • Teller, V. and Batchelder, E. O. (1994) A probabilistic algorithm for segmenting non-kanji Japanese strings. Proc. 12th AAAI, pp. 742-747.
    • (1994) Proc. 12th AAAI , pp. 742-747
    • Teller, V.1    Batchelder, E.O.2
  • 41
    • 0038293493 scopus 로고
    • Corpus-based automatic compound extraction with mutual information and relative frequency count
    • Wu, M.-W. and Su, K.-Y. (1993) Corpus-based automatic compound extraction with mutual information and relative frequency count. Proc. R.O.C. Computational Linguistics Conference VI, pp. 207-216.
    • (1993) Proc. R.O.C. Computational Linguistics Conference VI , pp. 207-216
    • Wu, M.-W.1    Su, K.-Y.2
  • 42
    • 84989592173 scopus 로고
    • Chinese text segmentation for text retrieval: Achievements and problems
    • Wu, Z. and Tseng, G. (1993) Chinese text segmentation for text retrieval: Achievements and problems. J. Am. Soc. Infor. Sci. 44(9): 532-542.
    • (1993) J. Am. Soc. Infor. Sci. , vol.44 , Issue.9 , pp. 532-542
    • Wu, Z.1    Tseng, G.2
  • 43
    • 0038632297 scopus 로고    scopus 로고
    • A re-estimation method for stochastic language modeling from ambiguous observations
    • Yamamoto, M. (1996) A re-estimation method for stochastic language modeling from ambiguous observations. Proc. 4th Workshop on Very Large Corpora, pp. 155-167.
    • (1996) Proc. 4th Workshop on Very Large Corpora , pp. 155-167
    • Yamamoto, M.1
  • 44
    • 0038632285 scopus 로고    scopus 로고
    • Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus
    • Yamamoto, M. and Church, K. W. (2001) Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Computational Linguistics 27(1): 1-30.
    • (2001) Computational Linguistics , vol.27 , Issue.1 , pp. 1-30
    • Yamamoto, M.1    Church, K.W.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.