메뉴 건너뛰기




Volumn 19, Issue 2, 2011, Pages 417-430

Continuous stochastic feature mapping based on trajectory HMMs

Author keywords

Gaussian mixture model (GMM) based mapping; speech recognition; trajectory hidden Markov model (HMM); voice conversion

Indexed keywords

ARTICULATORY INVERSION; DYNAMIC CHARACTERISTICS; GAUSSIAN MIXTURE MODEL; GAUSSIAN MIXTURE MODEL (GMM)-BASED MAPPING; LEVEL TRANSFORMATION; MAPPING TECHNIQUES; NEW APPROACHES; NOISE-COMPENSATION; STATIC AND DYNAMIC; STOCHASTIC FEATURES; VOICE CONVERSION;

EID: 78149260085     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2010.2049685     Document Type: Article
Times cited : (58)

References (49)
  • 2
    • 34547550766 scopus 로고    scopus 로고
    • Stereo-based stochastic mapping for robust speech recognition
    • M. Afify, X.-D. Cui, and Y. Gao, "Stereo-based stochastic mapping for robust speech recognition", in Proc. ICASSP, 2007, pp. 377-380.
    • (2007) Proc. ICASSP , pp. 377-380
    • Afify, M.1    Cui, X.-D.2    Gao, Y.3
  • 3
    • 84905560807 scopus 로고    scopus 로고
    • Voice conversion with smoothed GMM and MAP adaptation
    • Y. Chen, M. Chu, E. Chang, and J. Liu, "Voice conversion with smoothed GMM and MAP adaptation", in Proc. Interspeech, 2003, pp. 2413-2416.
    • (2003) Proc. Interspeech , pp. 2413-2416
    • Chen, Y.1    Chu, M.2    Chang, E.3    Liu, J.4
  • 4
    • 51449114531 scopus 로고    scopus 로고
    • MMSE-based stereo feature stochastic mapping for noise robust speech recognition
    • X.-D. Cui, M. Afify, and Y. Gao, "MMSE-based stereo feature stochastic mapping for noise robust speech recognition", in Proc. ICASSP, 2008, pp. 4077-4080.
    • (2008) Proc. ICASSP , pp. 4077-4080
    • Cui, X.-D.1    Afify, M.2    Gao, Y.3
  • 5
    • 0034855352 scopus 로고    scopus 로고
    • High-performance robust speech recognition using stereo training data
    • L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "High-performance robust speech recognition using stereo training data", in Proc. ICASSP, 2001, pp. 301-304.
    • (2001) Proc. ICASSP , pp. 301-304
    • Deng, L.1    Acero, A.2    Jiang, L.3    Droppo, J.4    Huang, X.5
  • 6
    • 0036291376 scopus 로고    scopus 로고
    • Uncertainty decoding with SPLICE for noise robust speech recognition
    • J. Droppo, A. Acero, and L. Deng, "Uncertainty decoding with SPLICE for noise robust speech recognition", in Proc. ICASSP, 2002, pp. 57-60.
    • (2002) Proc. ICASSP , pp. 57-60
    • Droppo, J.1    Acero, A.2    Deng, L.3
  • 7
    • 78149261566 scopus 로고    scopus 로고
    • Bandwidth extension of cellular phone speech based on maximum likelihood estimation with GMM
    • W. Fujitsuru, H. Sekimoto, T. Toda, H Saruwatari, and K. Shikano, "Bandwidth extension of cellular phone speech based on maximum likelihood estimation with GMM", in Proc. NCSP, 2008, pp. 283-286.
    • (2008) Proc. NCSP , pp. 283-286
    • Fujitsuru, W.1    Sekimoto, H.2    Toda, T.3    H Saruwatari4    Shikano, K.5
  • 8
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech", in Proc. ICASSP, 1992, pp. 137-140.
    • (1992) Proc. ICASSP , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 9
    • 0022667694 scopus 로고
    • Speaker independent isolated word recognition using dynamic features of speech spectrum
    • S. Furui, "Speaker independent isolated word recognition using dynamic features of speech spectrum", IEEE Trans. Acoust., Speech, Signal Process., vol. 34, pp. 52-59, 1986.
    • (1986) IEEE Trans. Acoust., Speech, Signal Process. , vol.34 , pp. 52-59
    • Furui, S.1
  • 10
    • 2142659020 scopus 로고    scopus 로고
    • Estimation of articulatory movements from speech acoustics using an HMM-based speech production model
    • Mar
    • S. Hiroya and M. Honda, "Estimation of articulatory movements from speech acoustics using an HMM-based speech production model", IEEE Trans. Speech Audio Process., vol. 12, no. 2, pp. 175-185, Mar. 2004.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.2 , pp. 175-185
    • Hiroya, S.1    Honda, M.2
  • 11
    • 0038669544 scopus 로고    scopus 로고
    • The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions
    • H. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions", in Proc. ISCA ITRW ASR'00, 2000, pp. 181-188.
    • (2000) Proc. ISCA ITRW ASR'00 , pp. 181-188
    • Hirsch, H.1    Pearce, D.2
  • 12
    • 0020596154 scopus 로고
    • Cepstral analysis synthesis on the mel frequency scale
    • S. Imai, "Cepstral analysis synthesis on the mel frequency scale", in Proc. ICASSP, 1983, pp. 93-96.
    • (1983) Proc. ICASSP , pp. 93-96
    • Imai, S.1
  • 13
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to-speech synthesis
    • A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis", in Proc. ICASSP, 1998, pp. 285-288.
    • (1998) Proc. ICASSP , pp. 285-288
    • Kain, A.1    Macon, M.2
  • 15
    • 0006682104 scopus 로고    scopus 로고
    • Vector quantization of speech spectral parameters using statistics of dynamic features
    • K. Koishida, K. Tokuda, T. Masuko, and T. Kobayashi, "Vector quantization of speech spectral parameters using statistics of dynamic features", in Proc. Int. Conf. Signal Process.'97, 1997, pp. 247-252.
    • (1997) Proc. Int. Conf. Signal Process.'97 , pp. 247-252
    • Koishida, K.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4
  • 17
    • 34547503417 scopus 로고    scopus 로고
    • HMM-based unit selection using frame sized speech segments
    • Z.-H. Ling and R.-H. Wang, "HMM-based unit selection using frame sized speech segments", in Proc. Interspeech, 2006, pp. 2034-2037.
    • (2006) Proc. Interspeech , pp. 2034-2037
    • Ling, Z.-H.1    Wang, R.-H.2
  • 18
    • 33646887390 scopus 로고
    • On the limited memory BFGS method for large scale optimization
    • D. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization", Math. Program. B, vol. 45, no. 3, pp. 503-528, 1989.
    • (1989) Math. Program. B , vol.45 , Issue.3 , pp. 503-528
    • Liu, D.1    Nocedal, J.2
  • 19
    • 84867211725 scopus 로고    scopus 로고
    • Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory", in Proc. Interspeech, 2008, pp. 1076-1079.
    • (2008) Proc. Interspeech , pp. 1076-1079
    • Muramatsu, T.1    Ohtani, Y.2    Toda, T.3    Saruwatari, H.4    Shikano, K.5
  • 20
    • 44949187612 scopus 로고    scopus 로고
    • Improving body transmitted unvoiced speech with statistical voice conversion
    • M. Nakagiri, T. Toda, H. Kashioka, and K. Shikano, "Improving body transmitted unvoiced speech with statistical voice conversion", in Proc. Interspeech, 2006, pp. 2270-2273.
    • (2006) Proc. Interspeech , pp. 2270-2273
    • Nakagiri, M.1    Toda, T.2    Kashioka, H.3    Shikano, K.4
  • 21
    • 42649146508 scopus 로고    scopus 로고
    • On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum
    • K. Nakamura, T. Toda, Y. Nankaku, and K. Tokuda, "On the use of phonetic information for mapping from articulatory movements to vocal tract spectrum", in Proc. ICASSP, 2006, pp. 93-96.
    • (2006) Proc. ICASSP , pp. 93-96
    • Nakamura, K.1    Toda, T.2    Nankaku, Y.3    Tokuda, K.4
  • 22
    • 44949265538 scopus 로고    scopus 로고
    • Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech
    • K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech", in Proc. Interspeech, 2006, pp. 1395-1398.
    • (2006) Proc. Interspeech , pp. 1395-1398
    • Nakamura, K.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 23
    • 78149241363 scopus 로고    scopus 로고
    • Spectral conversion based on statistical models including time-frequency matching
    • Y. Nankaku, K. Nakamura, T. Toda, and K. Tokuda, "Spectral conversion based on statistical models including time-frequency matching", in Proc. ISCA SSW6, 2007, pp. 333-338.
    • (2007) Proc. ISCA SSW6 , pp. 333-338
    • Nankaku, Y.1    Nakamura, K.2    Toda, T.3    Tokuda, K.4
  • 24
    • 0033692729 scopus 로고    scopus 로고
    • Narrowband to wideband conversion of speech using GMM based transformation
    • K.-Y. Park and H.-S. Kim, "Narrowband to wideband conversion of speech using GMM based transformation", in Proc. ICASSP, 2000, pp. 1847-1850.
    • (2000) Proc. ICASSP , pp. 1847-1850
    • Park, K.-Y.1    Kim, H.-S.2
  • 26
    • 67650105018 scopus 로고    scopus 로고
    • Trajectory mixture density network with multiple mixtures for acoustic-articulatory inversion
    • K. Richmond, "Trajectory mixture density network with multiple mixtures for acoustic-articulatory inversion", in Proc. NOLISP, 2007, pp. 67-70.
    • (2007) Proc. NOLISP , pp. 67-70
    • Richmond, K.1
  • 27
    • 0038359547 scopus 로고    scopus 로고
    • Modelling the uncertainty in recovering articulation from acoustics
    • K. Richmond, S. King, and P. Taylor, "Modelling the uncertainty in recovering articulation from acoustics", Comput. Speech Lang., vol. 17, pp. 153-172, 2003.
    • (2003) Comput. Speech Lang. , vol.17 , pp. 153-172
    • Richmond, K.1    King, S.2    Taylor, P.3
  • 29
    • 33745199156 scopus 로고    scopus 로고
    • Robust bandwidth extension of noise-corrupted narrowband speech
    • M. Seltzer, A. Acero, and J. Droppo, "Robust bandwidth extension of noise-corrupted narrowband speech", in Proc. Interspeech, 2005, pp. 1509-1512.
    • (2005) Proc. Interspeech , pp. 1509-1512
    • Seltzer, M.1    Acero, A.2    Droppo, J.3
  • 30
    • 64149122631 scopus 로고    scopus 로고
    • Accurate spectral envelope estimation for articulation-to-speech synthesis
    • Y. Shiga and S. King, "Accurate spectral envelope estimation for articulation-to-speech synthesis", in Proc. ISCA SSW5, 2004, pp. 19-24.
    • (2004) Proc. ISCA SSW5 , pp. 19-24
    • Shiga, Y.1    King, S.2
  • 31
    • 0032026483 scopus 로고    scopus 로고
    • Continuous probabilistic transform for voice conversion
    • Mar
    • Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion", IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
    • (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
    • Stylianou, Y.1    Cappe, O.2    Moulines, E.3
  • 32
    • 0001455934 scopus 로고
    • A robust algorithm for pitch tracking (RAPT)
    • W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier
    • D. Talkin, "A robust algorithm for pitch tracking (RAPT)", in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier, 1995.
    • (1995) Speech Coding and Synthesis
    • Talkin, D.1
  • 33
    • 85027459007 scopus 로고    scopus 로고
    • Mapping from ariticulatory movements to vocal tract spectrum with Gaussian mixture model for ariticulatory speech synthesis
    • T. Toda, A. W. Black, and K. Tokuda, "Mapping from ariticulatory movements to vocal tract spectrum with Gaussian mixture model for ariticulatory speech synthesis", in Proc. ISCA SSW5, 2004, pp. 31-36.
    • (2004) Proc. ISCA SSW5 , pp. 31-36
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 34
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Nov
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory", IEEE Trans. Acoust. Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Acoust. Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 35
    • 38649140222 scopus 로고    scopus 로고
    • Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
    • T. Toda, A. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model", Speech Comm., vol. 50, no. 3, pp. 215-227, 2008.
    • (2008) Speech Comm. , vol.50 , Issue.3 , pp. 215-227
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 36
    • 33745214435 scopus 로고    scopus 로고
    • NAM-to-speech conversion with Gaussian mixture models
    • T. Toda and K. Shikano, "NAM-to-speech conversion with Gaussian mixture models", in Proc. Interspeech, 2005, pp. 1957-1960.
    • (2005) Proc. Interspeech , pp. 1957-1960
    • Toda, T.1    Shikano, K.2
  • 37
    • 0028996993 scopus 로고
    • Speech parameter generation from HMM using dynamic features
    • K. Tokuda, T. Kobayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features", in Proc. ICASSP, 1995, pp. 660-663.
    • (1995) Proc. ICASSP , pp. 660-663
    • Tokuda, K.1    Kobayashi, T.2    Imai, S.3
  • 38
    • 85031628788 scopus 로고
    • An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features
    • K. Tokuda, T. Masuko, Y. Yamada, T. Kobayashi, and S. Imai, "An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features", in Proc. Eurospeech, 1995, pp. 757-760.
    • (1995) Proc. Eurospeech , pp. 757-760
    • Tokuda, K.1    Masuko, T.2    Yamada, Y.3    Kobayashi, T.4    Imai, S.5
  • 39
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis", in Proc. ICASSP, 2000, pp. 1315-1318.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    T Kobayashi4    Kitamura, T.5
  • 40
    • 33646815712 scopus 로고    scopus 로고
    • Online. Available
    • A. Wrench, The MOCHA-TIMIT Database, 1999. [Online]. Available: http://www.cstr.ed.ac.uk/artic/mocha.html
    • (1999) The MOCHA-TIMIT Database
    • Wrench, A.1
  • 41
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis", in Proc. Eurospeech, 1999, pp. 2347-2350.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 43
    • 67650826180 scopus 로고    scopus 로고
    • Model-space MLLR for trajectory HMMs
    • H. Zen, Y. Nankaku, and K. Tokuda, "Model-space MLLR for trajectory HMMs", in Proc. Interspeech, 2007, pp. 2065-2068.
    • (2007) Proc. Interspeech , pp. 2065-2068
    • Zen, H.1    Nankaku, Y.2    Tokuda, K.3
  • 44
    • 44949197937 scopus 로고    scopus 로고
    • Speaker adaptation of trajectory HMMs using feature-space MLLR
    • H. Zen, Y. Nankaku, K. Tokuda, and T. Kitamura, "Speaker adaptation of trajectory HMMs using feature-space MLLR", in Proc. Interspeech, 2006, pp. 2274-2277.
    • (2006) Proc. Interspeech , pp. 2274-2277
    • Zen, H.1    Nankaku, Y.2    Tokuda, K.3    Kitamura, T.4
  • 45
    • 33947642095 scopus 로고    scopus 로고
    • Estimating trajectory HMM parameters by Monte Carlo EM with Gibbs sampler
    • H. Zen, K. Tokuda, and T. Kitamura, "Estimating trajectory HMM parameters by Monte Carlo EM with Gibbs sampler", in Proc. ICASSP, 2006, pp. 1173-1176.
    • (2006) Proc. ICASSP , pp. 1173-1176
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 46
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features
    • H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features", Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007.
    • (2007) Comput. Speech Lang. , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 48
    • 67650153217 scopus 로고    scopus 로고
    • Acoustic-articulatory modelling with the trajectory HMM
    • L. Zhang and S. Renals, "Acoustic-articulatory modelling with the trajectory HMM", IEEE Signal Process. Lett., vol. 15, pp. 245-248, 2008.
    • (2008) IEEE Signal Process. Lett. , vol.15 , pp. 245-248
    • Zhang, L.1    Renals, S.2
  • 49
    • 84946719891 scopus 로고    scopus 로고
    • Air-and bone-conductive integrated microphones for robust speech detection and enhancement
    • Y. Zheng, Z. Liu, Z. Zhang, M. Sinclair, J. Droppo, L. Deng, A. Acero, and X Huang, "Air-and bone-conductive integrated microphones for robust speech detection and enhancement", in Proc. ASRU, 2003, pp. 249-254.
    • (2003) Proc. ASRU , pp. 249-254
    • Zheng, Y.1    Liu, Z.2    Zhang, Z.3    Sinclair, M.4    Droppo, J.5    Deng, L.6    Acero, A.7    X Huang8


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.