SCOPUS 정보 검색 플랫폼

Natural Language Engineering

Volumn 9, Issue 2, 2003, Pages 127-149

Mostly-unsupervised statistical segmentation of Japanese kanji sequences

(2) Ando, Rie Kubota a Lee, Lillian b

a IBM T J WATSON RESEARCH CENTER (United States)

b Department of Computer Science and School of Operations Research and Information Engineering (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; INFORMATION RETRIEVAL; OPTICAL CHARACTER RECOGNITION; STATISTICAL METHODS; TEXT PROCESSING;

STATISTICAL SEGMENTATION; WORD SEGMENTATION;

WORD PROCESSING;

EID: 0037600401 PISSN: 13513249 EISSN: None Source Type: Journal
DOI: 10.1017/S1351324902002954 Document Type: Article

Times cited : (20)

References (45)

1
- 0037956008
- Chinese text segmentation with MBDP-I: Making the most of training corpora
- Brent, M. R. and Tao, X. (2001) Chinese text segmentation with MBDP-I: Making the most of training corpora, Proc. ACL.
- (2001) Proc. ACL.
- Brent, M.R.¹ Tao, X.²

2
- 84936824188
- Word association norms, mutual information, and lexicography
- Church, K. W. and Hanks, P. (1990) Word association norms, mutual information, and lexicography. Computational Linguistics 16(1): 22-29.
- (1990) Computational Linguistics , vol.16 , Issue.1 , pp. 22-29
- Church, K.W.¹ Hanks, P.²

3
- 84889281816
- Wiley-Interscience, New York
- Cover, T. M. and Thomas, J. A. (1991) Elements of Information Theory, Wiley-Interscience, New York.
- (1991) Elements of Information Theory
- Cover, T.M.¹ Thomas, J.A.²

4
- 85029362060
- A new statistical formula for Chinese text segmentation incorporating contextual information
- Dai, Y., Loh, T. E. and Khoo, C. S. G. (1999) A new statistical formula for Chinese text segmentation incorporating contextual information. Proc. 22nd SIGIR, pp. 82-89.
- (1999) Proc. 22nd SIGIR , pp. 82-89
- Dai, Y.¹ Loh, T.E.² Khoo, C.S.G.³

5
- 85149140829
- Japanese morphological analyzer using word co-occurrence - JTAG
- Fuchi, T. and Takagi, S. (1998) Japanese morphological analyzer using word co-occurrence - JTAG. Proc. COLING-ACL '98, pp. 409-413.
- (1998) Proc. COLING-ACL '98 , pp. 409-413
- Fuchi, T.¹ Takagi, S.²

6
- 1542381209
- Extracting key terms from Chinese and Japanese texts
- Fung, P. (1998) Extracting key terms from Chinese and Japanese texts. Computer Processing of Oriental Languages 12(1).
- (1998) Computer Processing of Oriental Languages , vol.12 , Issue.1
- Fung, P.¹

7
- 84958571083
- Discovering Chinese words from unsegmented text
- (Poster abstract.)
- Ge, X., Pratt, W. and Smyth, P. (1999) Discovering Chinese words from unsegmented text. Proc. 22nd SIGIR, pp. 271-272 (Poster abstract.)
- (1999) Proc. 22nd SIGIR , pp. 271-272
- Ge, X.¹ Pratt, W.² Smyth, P.³

8
- 0041079008
- Unsupervised learning of the morphology of a natural language
- Goldsmith, J. (1990) Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2): pp. 153-198.
- (2001) Computational Linguistics , vol.27 , Issue.2 , pp. 153-198
- Goldsmith, J.¹

9
- 0038293492
- Evaluating parsing strategies using standardized parse files
- Grishman, R., Macleod, C. and Sterling, J. (1992) Evaluating parsing strategies using standardized parse files. Proc. 3rd ANLP, pp. 156-161.
- (1992) Proc. 3rd ANLP , pp. 156-161
- Grishman, R.¹ Macleod, C.² Sterling, J.³

10
- 0037617943
- Language modeling by string pattern N-gram for Japanese speech recognition
- Ito, A. and Kohda, K. (1995) Language modeling by string pattern N-gram for Japanese speech recognition. Proc. ICASSP.
- (1995) Proc. ICASSP.
- Ito, A.¹ Kohda, K.²

11
- 0038632333
- Use of mutual information based character clusters in dictionary-less morphological analysis of Japanese
- Kashioka, H., Kawata, Y., Kinjo, Y., Finch, A. and Black, E. W. (1998) Use of mutual information based character clusters in dictionary-less morphological analysis of Japanese. Proc. COLING-ACL '98, pp. 658-662.
- (1998) Proc. COLING-ACL '98 , pp. 658-662
- Kashioka, H.¹ Kawata, Y.² Kinjo, Y.³ Finch, A.⁴ Black, E.W.⁵

12
- 0038293468
- Japanese morphological analysis system JUMAN version 3.61 manual
- In Japanese
- Kurohashi, S. and Nagao, M. (1999) Japanese morphological analysis system JUMAN version 3.61 manual. In Japanese.
- (1999)
- Kurohashi, S.¹ Nagao, M.²

13
- 0008686260
- Experiments on the use of bigram mutual information for Chinese natural language processing
- Lua, K.-T. (1995) Experiments on the use of bigram mutual information for Chinese natural language processing. Int. Conf. Computer Processing of Oriental Languages (ICCPOL), pp. 23-25.
- (1995) Int. Conf. Computer Processing of Oriental Languages (ICCPOL) , pp. 23-25
- Lua, K.-T.¹

14
- 0008717844
- An application of information theory in Chinese word segmentation
- Lua, K.-T. and Gan, K.-W. (1994) An application of information theory in Chinese word segmentation. Computer Processing of Chinese and Oriental Languages 8(1): 115-123.
- (1994) Computer Processing of Chinese and Oriental Languages , vol.8 , Issue.1 , pp. 115-123
- Lua, K.-T.¹ Gan, K.-W.²

15
- 85158028663
- Parsing a natural language using information statistics
- Magerman, D. M. and Marcus, M. P. (1990) Parsing a natural language using information statistics. Proc. AAAI, pp. 984-989.
- (1990) Proc. AAAI , pp. 984-989
- Magerman, D.M.¹ Marcus, M.P.²

16
- 0027681165
- Suffix arrays: A new method for on-line string searches
- Manber, U. and Myers, G. (1993) Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5): 935-948.
- (1993) SIAM J. Comput. , vol.22 , Issue.5 , pp. 935-948
- Manber, U.¹ Myers, G.²

17
- 0038293469
- Example-based correction of word segmentation and part of speech labelling
- Matsukawa, T., Miller, S. and Weischedel, R. (1993) Example-based correction of word segmentation and part of speech labelling. Proc. Human Language Technologies Workshop (HLT), pp. 227-32.
- (1993) Proc. Human Language Technologies Workshop (HLT) , pp. 227-232
- Matsukawa, T.¹ Miller, S.² Weischedel, R.³

18
- 0004067802
- Technical Report NAIST-IS-TR97007, Nara Institute of Science and Technology. In Japanese
- Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Imaichi, O., and Imamura, T. (1997) Japanese morphological analysis system ChaSen manual. Technical Report NAIST-IS-TR97007, Nara Institute of Science and Technology. In Japanese.
- (1997) Japanese Morphological Analysis System ChaSen Manual
- Matsumoto, Y.¹ Kitauchi, A.² Yamashita, T.³ Hirano, Y.⁴ Imaichi, O.⁵ Imamura, T.⁶

19
- 33645966594
- Improvements of Japanese morphological analyzer JUMAN
- Matsumoto, Y. and Nagao, M. (1994) Improvements of Japanese morphological analyzer JUMAN. Proc. Int. Workshop on Sharable Natural Language Resources, pp. 22-28.
- (1994) Proc. Int. Workshop on Sharable Natural Language Resources , pp. 22-28
- Matsumoto, Y.¹ Nagao, M.²

20
- 0037617911
- Unknown word extraction from corpora using n-gram statistics
- In Japanese
- Mori, S. and Nagao, M. (1998) Unknown word extraction from corpora using n-gram statistics. J. Infor. Process. Soc. Japan 39(7): 2093-2100. In Japanese.
- (1998) J. Infor. Process. Soc. Japan , vol.39 , Issue.7 , pp. 2093-2100
- Mori, S.¹ Nagao, M.²

21
- 0006702241
- A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese
- Nagao, M. and Mori, S. (1994) A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese. Proc. 15th COLING, pp. 611-615.
- (1994) Proc. 15th COLING , pp. 611-615
- Nagao, M.¹ Mori, S.²

22
- 0038293491
- * n-best search algorithm
- * n-best search algorithm. Proc. 15th COLING, pp. 201-207.
- (1994) Proc. 15th COLING , pp. 201-207
- Nagata, M.¹

23
- 0037617941
- Automatic extraction of new words from Japanese texts using generalized forward-backward search
- Nagata, M. (1996a) Automatic extraction of new words from Japanese texts using generalized forward-backward search. Proc. Conf. Empirical Methods in Natural Language Processing. pp. 48-59.
- (1996) Proc. Conf. Empirical Methods in Natural Language Processing , pp. 48-59
- Nagata, M.¹

24
- 0037617940
- Context-based spelling correction for Japanese OCR
- Nagata, M. (1996b) Context-based spelling correction for Japanese OCR. Proc. 16th COLING, pp. 806-811.
- (1996) Proc. 16th COLING , pp. 806-811
- Nagata, M.¹

25
- 0038632330
- A self-organizing Japanese word segmenter using heuristic word identification and re-estimation
- Nagata, M. (1997) A self-organizing Japanese word segmenter using heuristic word identification and re-estimation. Proc. 5th Workshop on Very Large Corpora, pp. 203-215.
- (1997) Proc. 5th Workshop on Very Large Corpora , pp. 203-215
- Nagata, M.¹

26
- 0033349349
- Overlapping statistical segmentation for effective indexing of Japanese text
- Ogawa, Y. and Matsuda, T. (1999) Overlapping statistical segmentation for effective indexing of Japanese text. Infor. Process. & Manage. 35, 463-480.
- (1999) Infor. Process. & Manage , vol.35 , pp. 463-480
- Ogawa, Y.¹ Matsuda, T.²

27
- 85072855288
- A trainable rule-based algorithm for word segmentation
- Palmer, D. (1997) A trainable rule-based algorithm for word segmentation. Proc. 35th ACL/8th EACL, pp. 321-328.
- (1997) Proc. 35th ACL/8th EACL , pp. 321-328
- Palmer, D.¹

28
- 0038293490
- Japanese word segmentation by hidden Markov model
- Papageorgiou, C. P. (1994) Japanese word segmentation by hidden Markov model. Proc. Human Language Technologies Workshop (HLT), pp. 283-288.
- (1994) Proc. Human Language Technologies Workshop (HLT) , pp. 283-288
- Papageorgiou, C.P.¹

29
- 84958533967
- Self-supervised Chinese word segmentation
- Peng, F. and Schuurmans, D. (2001) Self-supervised Chinese word segmentation. Advances in Intelligent Data Analysis (Proc. IDA-01), pp. 238-247.
- (2001) Advances in Intelligent Data Analysis (Proc. IDA-01) , pp. 238-247
- Peng, F.¹ Schuurmans, D.²

30
- 0035743477
- The use of predictive dependencies in language learning
- Saffran, J. R. (2001) The use of predictive dependencies in language learning. J. Memory & Language 44: 493-515.
- (2001) J. Memory & Language , vol.44 , pp. 493-515
- Saffran, J.R.¹

31
- 0030451408
- Statistical learning by 8-month-old infants
- Saffran, J. R., Aslin, R. N. and Newport, E. L. (1996) Statistical learning by 8-month-old infants. Science 274(5294): 1926-1928.
- (1996) Science , vol.274 , Issue.5294 , pp. 1926-1928
- Saffran, J.R.¹ Aslin, R.N.² Newport, E.L.³

32
- 0001465757
- A statistical method for finding word boundaries in Chinese text
- Sproat, R. and Shih, C. (1990) A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages 4: 336-351.
- (1990) Computer Processing of Chinese and Oriental Languages , vol.4 , pp. 336-351
- Sproat, R.¹ Shih, C.²

33
- 0001076101
- A stochastic finite-state word-segmentation algorithm for Chinese
- Sproat, R., Shih, C., Gale, W. and Chang, N. (1996) A stochastic finite-state word-segmentation algorithm for Chinese. Computational Linguistics 22(3): 377-404.
- (1996) Computational Linguistics , vol.22 , Issue.3 , pp. 377-404
- Sproat, R.¹ Shih, C.² Gale, W.³ Chang, N.⁴

34
- 0008573508
- Chinese word segmentation without using lexicon and hand-crafted training data
- Sun, M., Shen, D. and Tsou, B. K. (1998) Chinese word segmentation without using lexicon and hand-crafted training data. Proc. COLING-ACL, pp. 1265-1271.
- (1998) Proc. COLING-ACL , pp. 1265-1271
- Sun, M.¹ Shen, D.² Tsou, B.K.³

35
- 0037617942
- Automatic decomposition of kanji compound words using stochastic estimation
- In Japanese
- Takeda, K. and Fujisaki, T. (1987) Automatic decomposition of kanji compound words using stochastic estimation. J. Infor. Process. Soc. Japan 28(9): 952-961, In Japanese.
- (1987) J. Infor. Process. Soc. Japan , vol.28 , Issue.9 , pp. 952-961
- Takeda, K.¹ Fujisaki, T.²

36
- 0038632299
- HMM parameter learning for Japanese morphological analyzer
- Takeuchi, K. and Matsumoto, Y. (1995) HMM parameter learning for Japanese morphological analyzer. Proc. 10th Pacific Asia Conference on Language, Information and Computation (PACLING), pp. 163-172.
- (1995) Proc. 10th Pacific Asia Conference on Language, Information and Computation (PACLING) , pp. 163-172
- Takeuchi, K.¹ Matsumoto, Y.²

37
- 0001277731
- A compression-based algorithm for Chinese word segmentation
- Teahan, W. J., Wen, Y., McNab, R. and Witten, I. H. (2000) A compression-based algorithm for Chinese word segmentation. Computational Linguistics 26(3): 275-393.
- (2000) Computational Linguistics , vol.26 , Issue.3 , pp. 275-393
- Teahan, W.J.¹ Wen, Y.² McNab, R.³ Witten, I.H.⁴

38
- 0028587505
- A probabilistic algorithm for segmenting non-kanji Japanese strings
- Teller, V. and Batchelder, E. O. (1994) A probabilistic algorithm for segmenting non-kanji Japanese strings. Proc. 12th AAAI, pp. 742-747.
- (1994) Proc. 12th AAAI , pp. 742-747
- Teller, V.¹ Batchelder, E.O.²

39
- 0038632331
- What makes a word: Learning base units in Japanese for speech recognition
- Tomokiyo, L. M. and Ries, K. (1997) What makes a word: learning base units in Japanese for speech recognition. Proc. ACL Special Interest Group in Natural Language Learning (CoNLL), pp. 60-69.
- (1997) Proc. ACL Special Interest Group in Natural Language Learning (CoNLL) , pp. 60-69
- Tomokiyo, L.M.¹ Ries, K.²

40
- 0039669800
- A position statement on Chinese segmentation
- Wu, D. (1998) A position statement on Chinese segmentation. http://www.cs.ust.hk/~dekai/papers/segmentation.html (Presented at the Chinese Language Processing Workshop, University of Pennsylvania.)
- (1998) ( Chinese Language Processing Workshop, University of Pennsylvania.)
- Wu, D.¹

41
- 0038293493
- Corpus-based automatic compound extraction with mutual information and relative frequency count
- Wu, M.-W. and Su, K.-Y. (1993) Corpus-based automatic compound extraction with mutual information and relative frequency count. Proc. R.O.C. Computational Linguistics Conference VI, pp. 207-216.
- (1993) Proc. R.O.C. Computational Linguistics Conference VI , pp. 207-216
- Wu, M.-W.¹ Su, K.-Y.²

42
- 84989592173
- Chinese text segmentation for text retrieval: Achievements and problems
- Wu, Z. and Tseng, G. (1993) Chinese text segmentation for text retrieval: Achievements and problems. J. Am. Soc. Infor. Sci. 44(9): 532-542.
- (1993) J. Am. Soc. Infor. Sci. , vol.44 , Issue.9 , pp. 532-542
- Wu, Z.¹ Tseng, G.²

43
- 0038632297
- A re-estimation method for stochastic language modeling from ambiguous observations
- Yamamoto, M. (1996) A re-estimation method for stochastic language modeling from ambiguous observations. Proc. 4th Workshop on Very Large Corpora, pp. 155-167.
- (1996) Proc. 4th Workshop on Very Large Corpora , pp. 155-167
- Yamamoto, M.¹

44
- 0038632285
- Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus
- Yamamoto, M. and Church, K. W. (2001) Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Computational Linguistics 27(1): 1-30.
- (2001) Computational Linguistics , vol.27 , Issue.1 , pp. 1-30
- Yamamoto, M.¹ Church, K.W.²

45
- 0038293494
- LINGSTAT: An interactive machine-aided translation system
- Yamron, J., Baker, J., Bamberg, P., Chevalier, H., Dietzel, T., Elder, J., Kampmann, F., Mandel, M., Manganaro, L., Margolis, T. and Steele, E. (1993) LINGSTAT: An interactive machine-aided translation system. Proc. Human Language Technologies Workshop (HLT). pp. 191-195.
- (1993) Proc. Human Language Technologies Workshop (HLT) , pp. 191-195
- Yamron, J.¹ Baker, J.² Bamberg, P.³ Chevalier, H.⁴ Dietzel, T.⁵ Elder, J.⁶ Kampmann, F.⁷ Mandel, M.⁸ Manganaro, L.⁹ Margolis, T.¹⁰ Steele, E.¹¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.