SCOPUS 정보 검색 플랫폼

Volumn 10, Issue 7, 2008, Pages 1299-1306

Real-time continuous phoneme recognition system using class-dependent tied-mixture HMM with HBT structure for speech-driven lip-sync

(2) Park, Junho a,b Ko, Hanseok a

a Korea University (South Korea)

b UNIVERSITY OF CAMBRIDGE (United Kingdom)

Author keywords

Head body tail HMM; Phoneme recognition; Real time lip sync

Indexed keywords

MARKOV PROCESSES; SPEECH; SPEECH RECOGNITION;

CODEBOOK; CONTINUOUS HIDDEN MARKOV MODELS; GAUSSIANS; HBT MODELS; HEAD-BODY-TAIL HMM; INDEPENDENT MODELS; MODEL PARAMETERS; PHONEME RECOGNITION; PHONEME RECOGNITION SYSTEMS; PHONEME RECOGNITIONS; REAL-TIME LIP-SYNC; RECOGNITION PERFORMANCES; SPEECH SIGNALS; TRANSIENT PARTS; VOWEL RECOGNITIONS;

HIDDEN MARKOV MODELS;

EID: 56549088313 PISSN: 15209210 EISSN: None Source Type: Journal
DOI: 10.1109/TMM.2008.2004908 Document Type: Article

Times cited : (14)

References (30)

1
- 85016090818
- Automated lip-sync: Direct translation of speech-sound to mouth-shape
- B. E. Koster, R. D. Rodman, and D. Bitzer, "Automated lip-sync: Direct translation of speech-sound to mouth-shape," in Proc. 28th Annu. Asilomar Conf. Signals, 1994, pp. 583-586.
- (1994) Proc. 28th Annu. Asilomar Conf. Signals , pp. 583-586
- Koster, B.E.¹ Rodman, R.D.² Bitzer, D.³

2
- 0003350303
- Lip synchronization for animation
- Los Angeles, CA
- D. V. McAllister et al., "Lip synchronization for animation," in Proc. SIGGRAPH 97, Los Angeles, CA, 1997, pp. 225-228.
- (1997) Proc. SIGGRAPH 97 , pp. 225-228
- McAllister, D.V.¹

3
- 85133460248
- Visual speech synthesis based on parameter generation from HMM: Speech driven and text-and-speech driven approaches
- M. Tamura et al., "Visual speech synthesis based on parameter generation from HMM: Speech driven and text-and-speech driven approaches," in Proc. AVSP 98, 1998, pp. 221-226.
- (1998) Proc. AVSP , vol.98 , pp. 221-226
- Tamura, M.¹

4
- 0032179320
- Lip movement synthesis from speech based on Hidden Markov models
- E. Yamamoto et al., "Lip movement synthesis from speech based on Hidden Markov models," Speech Commun., vol. 26, no. 1-2, pp. 105-115, 1998.
- (1998) Speech Commun , vol.26 , Issue.1-2 , pp. 105-115
- Yamamoto, E.¹

5
- 0242664388
- Real-time talking head driven by voice and its application to communication and entertainment
- S. Morishima, "Real-time talking head driven by voice and its application to communication and entertainment," in Proc. AVSP 98, 1998, pp. 195-200.
- (1998) Proc. AVSP , vol.98 , pp. 195-200
- Morishima, S.¹

6
- 56549120930
- Lip movements synthesis using time delay neural networks
- S. Curinga, F. Lavagetto, and F. Vignoli, "Lip movements synthesis using time delay neural networks," in Proc. EUSIPCO 96 - Systems and Computers, 1996, pp. 36-46.
- (1996) Proc. EUSIPCO 96 - Systems and Computers , pp. 36-46
- Curinga, S.¹ Lavagetto, F.² Vignoli, F.³

7
- 84995135281
- Automatic lip-sync: Background and techniques
- J. Lewis, "Automatic lip-sync: Background and techniques," Visualiz. and Comput. Anim., vol. 2, no. 4, pp. 118-122, 1991.
- (1991) Visualiz. and Comput. Anim , vol.2 , Issue.4 , pp. 118-122
- Lewis, J.¹

8
- 84963799348
- Achieving real-time lip synch via SVM-based phoneme classification and lip shape refinement
- T. Kim, Y. Kang, and H. Ko, "Achieving real-time lip synch via SVM-based phoneme classification and lip shape refinement," in Proc. ICMI'02, 2002, pp. 299-304.
- (2002) Proc. ICMI'02 , pp. 299-304
- Kim, T.¹ Kang, Y.² Ko, H.³

9
- 85027136924
- Minimum error rate training of inter-word context-dependent acoustic model units in speech recognition
- W. Chou et al., "Minimum error rate training of inter-word context-dependent acoustic model units in speech recognition," in Proc. ICSLP, 1994, pp. 439-142.
- (1994) Proc. ICSLP , pp. 439-142
- Chou, W.¹

10
- 0031643811
- Natural number recognition using MCE trained inter-word context-dependent acoustic models
- M. B. Gandhi and J. Jacob, "Natural number recognition using MCE trained inter-word context-dependent acoustic models," in Proc. ICASSP, 1998, pp. 457-460.
- (1998) Proc. ICASSP , pp. 457-460
- Gandhi, M.B.¹ Jacob, J.²

11
- 85009154926
- Modeling phonetic context using head-body-tail models for connected digit recognition
- J. Strum and E. Sanders, "Modeling phonetic context using head-body-tail models for connected digit recognition," in Proc. ICSLP, 2000, pp. 429-432.
- (2000) Proc. ICSLP , pp. 429-432
- Strum, J.¹ Sanders, E.²

12
- 44849099128
- A new state-dependent phonetic tied-mixture model with head-body-tail structured HMM for real-time continuous phoneme recognition system
- J. Park and H. Ko, "A new state-dependent phonetic tied-mixture model with head-body-tail structured HMM for real-time continuous phoneme recognition system," in Proc. INTERSPEECH, 2006, pp. 1583-1586.
- (2006) Proc. INTERSPEECH , pp. 1583-1586
- Park, J.¹ Ko, H.²

13
- 0032074310
- Audio-visual integration in multimodal communication
- May
- T. Chen and R. Rao, "Audio-visual integration in multimodal communication," Proc. IEEE (Special Issue on Multimedia Signal Processing), vol. 86, no. 5, pp. 837-852, May 1998.
- (1998) Proc. IEEE (Special Issue on Multimedia Signal Processing) , vol.86 , Issue.5 , pp. 837-852
- Chen, T.¹ Rao, R.²

14
- 85017188218
- Real-time lip-sync face animation driven by human voice
- F. J. Huang and T. Chen, "Real-time lip-sync face animation driven by human voice," in Proc. IEEE Workshop on Multimedia Signal Processing, 1998, pp. 352-357.
- (1998) Proc. IEEE Workshop on Multimedia Signal Processing , pp. 352-357
- Huang, F.J.¹ Chen, T.²

15
- 84937437186
- Voice puppetry
- M. Brand, "Voice puppetry," in Proc. SIGGRAPH'99, 1999, pp. 21-28.
- (1999) Proc. SIGGRAPH'99 , pp. 21-28
- Brand, M.¹

16
- 33744990170
- Automatic lip sync and its use in the new multimedia services for mobile devices
- G. Zoric and I. S. Pandzic, "Automatic lip sync and its use in the new multimedia services for mobile devices," in Proc. 8th Int. Conf. Telecommunications, 2005, vol. 2, pp. 353-358.
- (2005) Proc. 8th Int. Conf. Telecommunications , vol.2 , pp. 353-358
- Zoric, G.¹ Pandzic, I.S.²

17
- 0035426641
- Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system
- K. Chio et al., "Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system," J. VLSI Signal Process., vol. 29, pp. 51-61, 2001.
- (2001) J. VLSI Signal Process , vol.29 , pp. 51-61
- Chio, K.¹

18
- 10044281988
- Lifelike talking faces for interactive services
- Sep
- E. Cosatto et al., "Lifelike talking faces for interactive services," Proc. IEEE, vol. 91, no. 9, pp. 1406-1429, Sep. 2003.
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1406-1429
- Cosatto, E.¹

19
- 33947583073
- Realistic mouth-synching for speech-driven talking face using articulatory modeling
- Apr
- L. Xie and Z. Liu, "Realistic mouth-synching for speech-driven talking face using articulatory modeling," IEEE Trans. Multimedia, vol. 9, no. 3, pp. 500-510, Apr. 2007.
- (2007) IEEE Trans. Multimedia , vol.9 , Issue.3 , pp. 500-510
- Xie, L.¹ Liu, Z.²

20
- 0023831656
- A new statistical approach for the automatic segmentation of continuous speech signals
- Jan
- R. Andre-Obrecht, "A new statistical approach for the automatic segmentation of continuous speech signals," IEEE Trans. Acoust., Speech, Signal Process., vol. 36, no. 1, pp. 29-40, Jan. 1998.
- (1998) IEEE Trans. Acoust., Speech, Signal Process , vol.36 , Issue.1 , pp. 29-40
- Andre-Obrecht, R.¹

21
- 0020497768
- Detecting and estimating parameters jumps using ladder algorithms and likelihood ratio test
- A. Von Brandt, "Detecting and estimating parameters jumps using ladder algorithms and likelihood ratio test," in Proc. ICASSP, 1983, pp. 1017-1020.
- (1983) Proc. ICASSP , pp. 1017-1020
- Von Brandt, A.¹

22
- 0013871855
- Co-articulation in VCV utterances: Spectrographic measurements
- S. E. G. Ohman, "Co-articulation in VCV utterances: Spectrographic measurements," J. Acoust. Soc. Amer., vol. 31, pp. 151-168, 1966.
- (1966) J. Acoust. Soc. Amer , vol.31 , pp. 151-168
- Ohman, S.E.G.¹

23
- 0025629882
- Tied mixture continuous parameter modeling for speech recognition
- Dec
- J. R. Bellegarda and D. Nahamoo, "Tied mixture continuous parameter modeling for speech recognition," IEEE Trans. Acoust. Speech Signal Process., vol. 38, no. 12, pp. 2033-2045, Dec. 1990.
- (1990) IEEE Trans. Acoust. Speech Signal Process , vol.38 , Issue.12 , pp. 2033-2045
- Bellegarda, J.R.¹ Nahamoo, D.²

24
- 33646259972
- Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling
- J. Park and H. Ko, "Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling," Speech Commun., vol. 48, no. 6, pp. 737-745, 2006.
- (2006) Speech Commun , vol.48 , Issue.6 , pp. 737-745
- Park, J.¹ Ko, H.²

25
- 0033185227
- Improving continuous speech recognition in Spanish by phone-class semi continuous HMMs with pausing and multiple pronunciations
- J. Frreiros and J. M. Pardo, "Improving continuous speech recognition in Spanish by phone-class semi continuous HMMs with pausing and multiple pronunciations," Speech Commun., vol. 29, pp. 65-76, 1999.
- (1999) Speech Commun , vol.29 , pp. 65-76
- Frreiros, J.¹ Pardo, J.M.²

26
- 0001514782
- Modeling coarticulation in synthetic visual speech
- M. M. Cohen and D. W. Massaro, "Modeling coarticulation in synthetic visual speech," in Proc. Computer Animation '93, 1993, pp. 139-156.
- (1993) Proc. Computer Animation '93 , pp. 139-156
- Cohen, M.M.¹ Massaro, D.W.²

27
- 85009277255
- Audiovisual speech synthesis from ground truth to models
- G. Bailly, "Audiovisual speech synthesis from ground truth to models," in Proc. ICSLP, 2002, pp. 1453-1456.
- (2002) Proc. ICSLP , pp. 1453-1456
- Bailly, G.¹

28
- 56549119684
- S. J. Young, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, HTK Book/Tech. Rep. Speech Group, Eng. Dept., Cambridge Univ., Cambridge, U.K., 2002.
- S. J. Young, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, HTK Book/Tech. Rep. Speech Group, Eng. Dept., Cambridge Univ., Cambridge, U.K., 2002.

29
- 33749429018
- Real-time language independent lip synchronization method using a genetic algorithm
- G. Zoric and I. S. Pandzic, "Real-time language independent lip synchronization method using a genetic algorithm," Signal Process., vol. 86, pp. 3644-3656, 2006.
- (2006) Signal Process , vol.86 , pp. 3644-3656
- Zoric, G.¹ Pandzic, I.S.²

30
- 33845963091
- SVM-based phoneme classification and lip shape refinement in real-time lip-sync system
- Nov
- H. Ko and D. K. Han, "SVM-based phoneme classification and lip shape refinement in real-time lip-sync system," Int. J. Pattern Recognit. Artific. Intell., vol. 20, no. 7, pp. 1029-1051, Nov. 2006.
- (2006) Int. J. Pattern Recognit. Artific. Intell , vol.20 , Issue.7 , pp. 1029-1051
- Ko, H.¹ Han, D.K.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.