SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 19, Issue 2, 2011, Pages 417-430

Continuous stochastic feature mapping based on trajectory HMMs

(3) Zen, Heiga a Nankaku, Yoshihiko b Tokuda, Keiichi b

a TOSHIBA CORPORATION (Japan)

b NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

Author keywords

Gaussian mixture model (GMM) based mapping; speech recognition; trajectory hidden Markov model (HMM); voice conversion

Indexed keywords

ARTICULATORY INVERSION; DYNAMIC CHARACTERISTICS; GAUSSIAN MIXTURE MODEL; GAUSSIAN MIXTURE MODEL (GMM)-BASED MAPPING; LEVEL TRANSFORMATION; MAPPING TECHNIQUES; NEW APPROACHES; NOISE-COMPENSATION; STATIC AND DYNAMIC; STOCHASTIC FEATURES; VOICE CONVERSION;

GAUSSIAN DISTRIBUTION; HIDDEN MARKOV MODELS; MAPPING; OBJECT RECOGNITION; SPEECH PROCESSING; STOCHASTIC MODELS; STOCHASTIC SYSTEMS; TRAJECTORIES;

SPEECH RECOGNITION;

EID: 78149260085 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2010.2049685 Document Type: Article

Times cited : (58)

References (49)

1
- 85004448479
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization", J. Acoust. Soc. Jpn. (E), vol. 11, no. 2, pp. 71-76, 1990.
- (1990) J. Acoust. Soc. Jpn. (E) , vol.11 , Issue.2 , pp. 71-76
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

2
- 34547550766
- Stereo-based stochastic mapping for robust speech recognition
- M. Afify, X.-D. Cui, and Y. Gao, "Stereo-based stochastic mapping for robust speech recognition", in Proc. ICASSP, 2007, pp. 377-380.
- (2007) Proc. ICASSP , pp. 377-380
- Afify, M.¹ Cui, X.-D.² Gao, Y.³

3
- 84905560807
- Voice conversion with smoothed GMM and MAP adaptation
- Y. Chen, M. Chu, E. Chang, and J. Liu, "Voice conversion with smoothed GMM and MAP adaptation", in Proc. Interspeech, 2003, pp. 2413-2416.
- (2003) Proc. Interspeech , pp. 2413-2416
- Chen, Y.¹ Chu, M.² Chang, E.³ Liu, J.⁴

4
- 51449114531
- MMSE-based stereo feature stochastic mapping for noise robust speech recognition
- X.-D. Cui, M. Afify, and Y. Gao, "MMSE-based stereo feature stochastic mapping for noise robust speech recognition", in Proc. ICASSP, 2008, pp. 4077-4080.
- (2008) Proc. ICASSP , pp. 4077-4080
- Cui, X.-D.¹ Afify, M.² Gao, Y.³

5
- 0034855352
- High-performance robust speech recognition using stereo training data
- L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "High-performance robust speech recognition using stereo training data", in Proc. ICASSP, 2001, pp. 301-304.
- (2001) Proc. ICASSP , pp. 301-304
- Deng, L.¹ Acero, A.² Jiang, L.³ Droppo, J.⁴ Huang, X.⁵

6
- 0036291376
- Uncertainty decoding with SPLICE for noise robust speech recognition
- J. Droppo, A. Acero, and L. Deng, "Uncertainty decoding with SPLICE for noise robust speech recognition", in Proc. ICASSP, 2002, pp. 57-60.
- (2002) Proc. ICASSP , pp. 57-60
- Droppo, J.¹ Acero, A.² Deng, L.³

7
- 78149261566
- Bandwidth extension of cellular phone speech based on maximum likelihood estimation with GMM
- W. Fujitsuru, H. Sekimoto, T. Toda, H Saruwatari, and K. Shikano, "Bandwidth extension of cellular phone speech based on maximum likelihood estimation with GMM", in Proc. NCSP, 2008, pp. 283-286.
- (2008) Proc. NCSP , pp. 283-286
- Fujitsuru, W.¹ Sekimoto, H.² Toda, T.³ H Saruwatari⁴ Shikano, K.⁵

8
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech", in Proc. ICASSP, 1992, pp. 137-140.
- (1992) Proc. ICASSP , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

9
- 0022667694
- Speaker independent isolated word recognition using dynamic features of speech spectrum
- S. Furui, "Speaker independent isolated word recognition using dynamic features of speech spectrum", IEEE Trans. Acoust., Speech, Signal Process., vol. 34, pp. 52-59, 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Process. , vol.34 , pp. 52-59
- Furui, S.¹

10
- 2142659020
- Estimation of articulatory movements from speech acoustics using an HMM-based speech production model
- Mar
- S. Hiroya and M. Honda, "Estimation of articulatory movements from speech acoustics using an HMM-based speech production model", IEEE Trans. Speech Audio Process., vol. 12, no. 2, pp. 175-185, Mar. 2004.
- (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.2 , pp. 175-185
- Hiroya, S.¹ Honda, M.²

11
- 0038669544
- The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions
- H. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions", in Proc. ISCA ITRW ASR'00, 2000, pp. 181-188.
- (2000) Proc. ISCA ITRW ASR'00 , pp. 181-188
- Hirsch, H.¹ Pearce, D.²

12
- 0020596154
- Cepstral analysis synthesis on the mel frequency scale
- S. Imai, "Cepstral analysis synthesis on the mel frequency scale", in Proc. ICASSP, 1983, pp. 93-96.
- (1983) Proc. ICASSP , pp. 93-96
- Imai, S.¹

13
- 0031623661
- Spectral voice conversion for text-to-speech synthesis
- A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis", in Proc. ICASSP, 1998, pp. 285-288.
- (1998) Proc. ICASSP , pp. 285-288
- Kain, A.¹ Macon, M.²

14
- 85133413596
- Formant re-synthesis of dysarthric speech
- A. Kain, X. Niu, J.-P. Hosom, Q. Miao, and J. van Santen, "Formant re-synthesis of dysarthric speech", in Proc. ISCA SSW5, 2003, pp. 25-30.
- (2003) Proc. ISCA SSW5 , pp. 25-30
- Kain, A.¹ Niu, X.² Hosom, J.-P.³ Miao, Q.⁴ Van Santen, J.⁵

15
- 0006682104
- Vector quantization of speech spectral parameters using statistics of dynamic features
- K. Koishida, K. Tokuda, T. Masuko, and T. Kobayashi, "Vector quantization of speech spectral parameters using statistics of dynamic features", in Proc. Int. Conf. Signal Process.'97, 1997, pp. 247-252.
- (1997) Proc. Int. Conf. Signal Process.'97 , pp. 247-252
- Koishida, K.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴

16
- 33646773080
- Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-LTI-03-177
- J. Kominek and A. Black, "CMU ARCTIC databases for speech synthesis", Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-LTI-03-177, 2003.
- (2003) CMU ARCTIC Databases for Speech Synthesis
- Kominek, J.¹ Black, A.²

17
- 34547503417
- HMM-based unit selection using frame sized speech segments
- Z.-H. Ling and R.-H. Wang, "HMM-based unit selection using frame sized speech segments", in Proc. Interspeech, 2006, pp. 2034-2037.
- (2006) Proc. Interspeech , pp. 2034-2037
- Ling, Z.-H.¹ Wang, R.-H.²

18
- 33646887390
- On the limited memory BFGS method for large scale optimization
- D. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization", Math. Program. B, vol. 45, no. 3, pp. 503-528, 1989.
- (1989) Math. Program. B , vol.45 , Issue.3 , pp. 503-528
- Liu, D.¹ Nocedal, J.²

19
- 84867211725
- Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory", in Proc. Interspeech, 2008, pp. 1076-1079.
- (2008) Proc. Interspeech , pp. 1076-1079
- Muramatsu, T.¹ Ohtani, Y.² Toda, T.³ Saruwatari, H.⁴ Shikano, K.⁵

20
- 44949187612
- Improving body transmitted unvoiced speech with statistical voice conversion
- M. Nakagiri, T. Toda, H. Kashioka, and K. Shikano, "Improving body transmitted unvoiced speech with statistical voice conversion", in Proc. Interspeech, 2006, pp. 2270-2273.
- (2006) Proc. Interspeech , pp. 2270-2273
- Nakagiri, M.¹ Toda, T.² Kashioka, H.³ Shikano, K.⁴

21
- 42649146508
- On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum
- K. Nakamura, T. Toda, Y. Nankaku, and K. Tokuda, "On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum", in Proc. ICASSP, 2006, pp. 93-96.
- (2006) Proc. ICASSP , pp. 93-96
- Nakamura, K.¹ Toda, T.² Nankaku, Y.³ Tokuda, K.⁴

22
- 44949265538
- Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech
- K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech", in Proc. Interspeech, 2006, pp. 1395-1398.
- (2006) Proc. Interspeech , pp. 1395-1398
- Nakamura, K.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

23
- 78149241363
- Spectral conversion based on statistical models including time-frequency matching
- Y. Nankaku, K. Nakamura, T. Toda, and K. Tokuda, "Spectral conversion based on statistical models including time-frequency matching", in Proc. ISCA SSW6, 2007, pp. 333-338.
- (2007) Proc. ISCA SSW6 , pp. 333-338
- Nankaku, Y.¹ Nakamura, K.² Toda, T.³ Tokuda, K.⁴

24
- 0033692729
- Narrowband to wideband conversion of speech using GMM based transformation
- K.-Y. Park and H.-S. Kim, "Narrowband to wideband conversion of speech using GMM based transformation", in Proc. ICASSP, 2000, pp. 1847-1850.
- (2000) Proc. ICASSP , pp. 1847-1850
- Park, K.-Y.¹ Kim, H.-S.²

25
- 4243714433
- Ph. D. dissertation, Centre for Speech Technol. Res., Edinburgh Univ., Edinburgh, U. K.
- K. Richmond, "Estimating articulatory parameters from the acoustic speech signal", Ph. D. dissertation, Centre for Speech Technol. Res., Edinburgh Univ., Edinburgh, U. K., 2002.
- (2002) Estimating Articulatory Parameters From the Acoustic Speech Signal
- Richmond, K.¹

26
- 67650105018
- Trajectory mixture density network with multiple mixtures for acoustic-articulatory inversion
- K. Richmond, "Trajectory mixture density network with multiple mixtures for acoustic-articulatory inversion", in Proc. NOLISP, 2007, pp. 67-70.
- (2007) Proc. NOLISP , pp. 67-70
- Richmond, K.¹

27
- 0038359547
- Modelling the uncertainty in recovering articulation from acoustics
- K. Richmond, S. King, and P. Taylor, "Modelling the uncertainty in recovering articulation from acoustics", Comput. Speech Lang., vol. 17, pp. 153-172, 2003.
- (2003) Comput. Speech Lang. , vol.17 , pp. 153-172
- Richmond, K.¹ King, S.² Taylor, P.³

28
- 77952603871
- Boca Raton, FL: Chapman & Hall/CRC
- H. Rue and L. Held, Gaussian Markov Random Fields: Theory and Applications. Boca Raton, FL: Chapman & Hall/CRC, 2005.
- (2005) Gaussian Markov Random Fields: Theory and Applications
- Rue, H.¹ Held, L.²

29
- 33745199156
- Robust bandwidth extension of noise-corrupted narrowband speech
- M. Seltzer, A. Acero, and J. Droppo, "Robust bandwidth extension of noise-corrupted narrowband speech", in Proc. Interspeech, 2005, pp. 1509-1512.
- (2005) Proc. Interspeech , pp. 1509-1512
- Seltzer, M.¹ Acero, A.² Droppo, J.³

30
- 64149122631
- Accurate spectral envelope estimation for articulation-to-speech synthesis
- Y. Shiga and S. King, "Accurate spectral envelope estimation for articulation-to-speech synthesis", in Proc. ISCA SSW5, 2004, pp. 19-24.
- (2004) Proc. ISCA SSW5 , pp. 19-24
- Shiga, Y.¹ King, S.²

31
- 0032026483
- Continuous probabilistic transform for voice conversion
- Mar
- Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion", IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

32
- 0001455934
- A robust algorithm for pitch tracking (RAPT)
- W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier
- D. Talkin, "A robust algorithm for pitch tracking (RAPT)", in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier, 1995.
- (1995) Speech Coding and Synthesis
- Talkin, D.¹

33
- 85027459007
- Mapping from ariticulatory movements to vocal tract spectrum with Gaussian mixture model for ariticulatory speech synthesis
- T. Toda, A. W. Black, and K. Tokuda, "Mapping from ariticulatory movements to vocal tract spectrum with Gaussian mixture model for ariticulatory speech synthesis", in Proc. ISCA SSW5, 2004, pp. 31-36.
- (2004) Proc. ISCA SSW5 , pp. 31-36
- Toda, T.¹ Black, A.W.² Tokuda, K.³

34
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- Nov
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory", IEEE Trans. Acoust. Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Acoust. Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

35
- 38649140222
- Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
- T. Toda, A. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model", Speech Comm., vol. 50, no. 3, pp. 215-227, 2008.
- (2008) Speech Comm. , vol.50 , Issue.3 , pp. 215-227
- Toda, T.¹ Black, A.² Tokuda, K.³

36
- 33745214435
- NAM-to-speech conversion with Gaussian mixture models
- T. Toda and K. Shikano, "NAM-to-speech conversion with Gaussian mixture models", in Proc. Interspeech, 2005, pp. 1957-1960.
- (2005) Proc. Interspeech , pp. 1957-1960
- Toda, T.¹ Shikano, K.²

37
- 0028996993
- Speech parameter generation from HMM using dynamic features
- K. Tokuda, T. Kobayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features", in Proc. ICASSP, 1995, pp. 660-663.
- (1995) Proc. ICASSP , pp. 660-663
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

38
- 85031628788
- An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features
- K. Tokuda, T. Masuko, Y. Yamada, T. Kobayashi, and S. Imai, "An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features", in Proc. Eurospeech, 1995, pp. 757-760.
- (1995) Proc. Eurospeech , pp. 757-760
- Tokuda, K.¹ Masuko, T.² Yamada, Y.³ Kobayashi, T.⁴ Imai, S.⁵

39
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis", in Proc. ICASSP, 2000, pp. 1315-1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ T Kobayashi⁴ Kitamura, T.⁵

40
- 33646815712
- Online. Available
- A. Wrench, The MOCHA-TIMIT Database, 1999. [Online]. Available: http://www.cstr.ed.ac.uk/artic/mocha.html
- (1999) The MOCHA-TIMIT Database
- Wrench, A.¹

41
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis", in Proc. Eurospeech, 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

42
- 78149252505
- Online. Available
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X.-Y. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The Hidden Markov Model Toolkit (HTK) Version 3.4, 2006. [Online]. Available: http://htk.eng.cam.ac.uk/
- (2006) The Hidden Markov Model Toolkit (HTK) Version 3.4
- Young, S.¹ Evermann, G.² Gales, M.³ Hain, T.⁴ Kershaw, D.⁵ Liu, X.-Y.⁶ Moore, G.⁷ Odell, J.⁸ Ollason, D.⁹ Povey, D.¹⁰ Valtchev, V.¹¹ Woodland, P.¹²

43
- 67650826180
- Model-space MLLR for trajectory HMMs
- H. Zen, Y. Nankaku, and K. Tokuda, "Model-space MLLR for trajectory HMMs", in Proc. Interspeech, 2007, pp. 2065-2068.
- (2007) Proc. Interspeech , pp. 2065-2068
- Zen, H.¹ Nankaku, Y.² Tokuda, K.³

44
- 44949197937
- Speaker adaptation of trajectory HMMs using feature-space MLLR
- H. Zen, Y. Nankaku, K. Tokuda, and T. Kitamura, "Speaker adaptation of trajectory HMMs using feature-space MLLR", in Proc. Interspeech, 2006, pp. 2274-2277.
- (2006) Proc. Interspeech , pp. 2274-2277
- Zen, H.¹ Nankaku, Y.² Tokuda, K.³ Kitamura, T.⁴

45
- 33947642095
- Estimating trajectory HMM parameters by Monte Carlo EM with Gibbs sampler
- H. Zen, K. Tokuda, and T. Kitamura, "Estimating trajectory HMM parameters by Monte Carlo EM with Gibbs sampler", in Proc. ICASSP, 2006, pp. 1173-1176.
- (2006) Proc. ICASSP , pp. 1173-1176
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

46
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features
- H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features", Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007.
- (2007) Comput. Speech Lang. , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

47
- 67650829355
- Ph. D. dissertation, Univ. of Edinburgh, Edinburgh, U. K.
- L. Zhang, "Modelling speech dynamics with trajectory-HMMs", Ph. D. dissertation, Univ. of Edinburgh, Edinburgh, U. K., 2009.
- (2009) Modelling Speech Dynamics with Trajectory-HMMs
- Zhang, L.¹

48
- 67650153217
- Acoustic-articulatory modelling with the trajectory HMM
- L. Zhang and S. Renals, "Acoustic-articulatory modelling with the trajectory HMM", IEEE Signal Process. Lett., vol. 15, pp. 245-248, 2008.
- (2008) IEEE Signal Process. Lett. , vol.15 , pp. 245-248
- Zhang, L.¹ Renals, S.²

49
- 84946719891
- Air-and bone-conductive integrated microphones for robust speech detection and enhancement
- Y. Zheng, Z. Liu, Z. Zhang, M. Sinclair, J. Droppo, L. Deng, A. Acero, and X Huang, "Air-and bone-conductive integrated microphones for robust speech detection and enhancement", in Proc. ASRU, 2003, pp. 249-254.
- (2003) Proc. ASRU , pp. 249-254
- Zheng, Y.¹ Liu, Z.² Zhang, Z.³ Sinclair, M.⁴ Droppo, J.⁵ Deng, L.⁶ Acero, A.⁷ X Huang⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.