SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 8, Issue 2, 2014, Pages 285-295

Noise in HMM-based speech synthesis adaptation: Analysis, evaluation methods and experiments

(3) Karhila, Reima a Remes, Ulpu a Kurimo, Mikko a

a AALTO UNIVERSITY (Finland)

Author keywords

Adaptation; evaluation methods; noise robustness; speech synthesis

Indexed keywords

ADAPTATION; ENVIRONMENTAL NOISE; EVALUATION METHODS; HMM-BASED SPEECH SYNTHESIS; INVESTIGATE EFFECTS; NOISE ROBUSTNESS; PERSONALIZED VOICE; SYNTHESIZED SPEECH;

FEATURE EXTRACTION; SPEECH SYNTHESIS; TREES (MATHEMATICS);

EXPERIMENTS;

EID: 84897869648 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2013.2278492 Document Type: Article

Times cited : (16)

References (39)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis" Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, W.A.³

2
- 77953708096
- Thousands of voices for HMM-based speech synthesis-analysis and application of TTS systems built on various ASR corpora
- Jul
- J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, Y. Guan, R. Hu, K. Oura, Y. J. Wu, K. Tokuda, R. Karhila, and M. Kurimo, "Thousands of voices for HMM-based speech synthesis-analysis and application of TTS systems built on various ASR corpora" IEEE Trans. Audio, Speech, Lang. Process, vol. 18, no. 5, pp. 984-1004, Jul. 2010
- (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.5 , pp. 984-1004
- Yamagishi, J.¹ Usabaev, B.² King, S.³ Watts, O.⁴ Dines, J.⁵ Tian, J.⁶ Guan, Y.⁷ Hu, R.⁸ Oura, K.⁹ Wu, Y.J.¹⁰ Tokuda, K.¹¹ Karhila, R.¹² Kurimo, M.¹³

3
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- DOI 10.1093/ietisy/e90-d.2.533
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training" IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533-543, 2007 (Pubitemid 46279829)
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

4
- 67650803663
- Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning
- M. Aylett and J. Yamagishi, "Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning" in Proc. LangTech, 2008
- (2008) Proc. LangTech
- Aylett, M.¹ Yamagishi, J.²

5
- 84867223798
- Robustness ofHMM-based speech synthesis
- J.Yamagishi, Z. Ling, and S.King, "Robustness ofHMM-based speech synthesis" in Proc. Interspeech, 2008
- (2008) Proc. Interspeech
- Yamagishi, J.¹ Ling, Z.² King, S.³

6
- 84890528712
- HMM-based speech synthesis adaptation using noisy data: Analysis and evaluation methods
- R. Karhila, U. Remes, andM.Kurimo, "HMM-based speech synthesis adaptation using noisy data: Analysis and evaluation methods" in Proc. ICASSP, 2013
- (2013) Proc. ICASSP
- Karhila, R.¹ Remes, U.² Kurimo, M.³

7
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm" IEEE Trans. Audio, Speech, Lang. Process, vol. 17, no. 1, pp. 66-83, 2009
- (2009) IEEE Trans. Audio, Speech, Lang. Process , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

8
- 85009097035
- Fast speaker adaptation using eigenspace-based maximum likelihood linear regression
- K.-T. Chen, W.-W. Liau, H.-M. Wang, and L.-S. Lee, "Fast speaker adaptation using eigenspace-based maximum likelihood linear regression" in Proc. ICSLP, 2000
- (2000) Proc. ICSLP
- Chen, K.-T.¹ Liau, W.-W.² Wang, H.-M.³ Lee, L.-S.⁴

9
- 84897843217
- ITU-T
- "Recommendation P.835 (2003/11) Subjective Test Methodology for Evaluating Speech Communication Systems that Include Noise Suppression Algorithm" ITU-T
- Recommendation P.835 (2003/11) Subjective Test Methodology for Evaluating Speech Communication Systems That Include Noise Suppression Algorithm

10
- 80051636048
- Speaker similarity evaluation of foreignaccented speech synthesis using HMM-based speaker adaptation
- M. Wester and R. Karhila, "Speaker similarity evaluation of foreignaccented speech synthesis using HMM-based speaker adaptation" in Proc. ICASSP, 2011
- (2011) Proc. ICASSP
- Wester, M.¹ Karhila, R.²

11
- 79959818117
- Non-negative matrix factorization based compensation of music for automatic speech recognition
- B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh, "Non-negative matrix factorization based compensation of music for automatic speech recognition" in Proc. Interspeech, 2010
- (2010) Proc. Interspeech
- Raj, B.¹ Virtanen, T.² Chaudhuri, S.³ Singh, R.⁴

12
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMMbased speech recognition" Comput. Speech Lang., vol. 12, pp. 75-98, 1998 (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

13
- 84865763570
- Rapid adaptation of foreign-accented HMM-based speech synthesis
- R. Karhila and M. Wester, "Rapid adaptation of foreign-accented HMM-based speech synthesis" in Proc. Interspeech, 2011
- (2011) Proc. Interspeech
- Karhila, R.¹ Wester, M.²

14
- 0034320005
- Rapid speaker adaptation in eigenvoice space
- DOI 10.1109/89.876308
- R. Kuhn, J.-C. Junqua, P. Nguyen, and N. Niedzielski, "Rapid speaker adaptation in eigenvoice space" IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 695-707, Nov. 2000 (Pubitemid 32025317)
- (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.6 , pp. 695-707
- Kuhn, R.¹ Junqua, J.-C.² Nguyen, P.³ Niedzielski, N.⁴

15
- 85009257840
- Eigenvoices for HMM-based speech synthesis
- K. Shichiri, A. Sawabe, T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Eigenvoices for HMM-based speech synthesis" in Proc. ICSLP, 2002, vol. 2, pp. 1269-1272
- (2002) Proc. ICSLP , vol.2 , pp. 1269-1272
- Shichiri, K.¹ Sawabe, A.² Yoshimura, T.³ Tokuda, K.⁴ Masuko, T.⁵ Kobayashi, T.⁶ Kitamura, T.⁷

16
- 84898970836
- Kernel PCA and de-noising in feature spaces
- S. Mika, B. Schlkopf, A. Smola, K.-R. Mller, M. Scholz, and G. Rtsch, "Kernel PCA and de-noising in feature spaces" in Proc. NIPS, 1999, pp. 536-542
- (1999) Proc. NIPS , pp. 536-542
- Mika, S.¹ Schlkopf, B.² Smola, A.³ Mller, K.-R.⁴ Scholz, M.⁵ Rtsch, G.⁶

17
- 0034227757
- Cluster adaptive training of hidden Markov models
- Jul
- M. J. F. Gales, "Cluster adaptive training of hidden Markov models" IEEE Trans. Speech Audio Process., vol. 8, no. 4, pp. 417-428, Jul. 2000
- (2000) IEEE Trans. Speech Audio Process , vol.8 , Issue.4 , pp. 417-428
- Gales, M.J.F.¹

18
- 84859765673
- Statistical parametric speech synthesis based on speaker and language factorization
- Aug
- H. Zen, N. Braunschweiler, S. Buchholz, M. J. F. Gales, K. Knill, S. Krstulovic, and J. Latorre, "Statistical parametric speech synthesis based on speaker and language factorization" IEEE Trans. Audio, Speech, Lang. Process, vol. 20, no. 6, pp. 1713-1724, Aug. 2012
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.6 , pp. 1713-1724
- Zen, H.¹ Braunschweiler, N.² Buchholz, S.³ Gales, M.J.F.⁴ Knill, K.⁵ Krstulovic, S.⁶ Latorre, J.⁷

19
- 84878422444
- Combining multiple high quality corpora for improving HMM-TTS
- V. Wan, J. Latorre, K. Chin, L. Chen, M. Gales, H. Zen, K. Knill, and M. Akamine, "Combining multiple high quality corpora for improving HMM-TTS" in Proc. Interspeech, 2012
- (2012) Proc. Interspeech
- Wan, V.¹ Latorre, J.² Chin, K.³ Chen, L.⁴ Gales, M.⁵ Zen, H.⁶ Knill, K.⁷ Akamine, M.⁸

20
- 27644511614
- Kernel eigenvoice speaker adaptation
- DOI 10.1109/TSA.2005.851971
- B. Mak, J. Kwok, and S. Ho, "Kernel eigenvoice speaker adaptation" IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 984-992, Sep. 2005 (Pubitemid 41558912)
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.5 , pp. 984-992
- Mak, B.¹ Kwok, J.T.² Ho, S.³

21
- 34047246852
- Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
- Jul
- B.K.-W. Mak,R. W.-H.Hsiao, S.K.-L. Ho, and J. Kwok, "Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting" IEEE Trans. Audio, Speech, Lang. Process, vol. 14, no. 4, pp. 1267-1280, Jul. 2006
- (2006) IEEE Trans. Audio, Speech, Lang. Process , vol.14 , Issue.4 , pp. 1267-1280
- Makr, W.-H.¹ Hsiao, B.K.-W.² Ho, S.K.-L.³ Kwok, J.⁴

22
- 84897910241
- Kernel eigenvoices (revisited) for largevocabulary speech recognition
- Dec
- Z. Roupakia and M. Gales, "Kernel eigenvoices (revisited) for largevocabulary speech recognition" IEEE Signal Process. Lett., vol. 18, no. 12, pp. 709-712, Dec. 2011
- (2011) IEEE Signal Process. Lett , vol.18 , Issue.12 , pp. 709-712
- Roupakia, Z.¹ Gales, M.²

23
- 56149122221
- Kernel eigenspace-based MLLR adaptation
- Mar
- B. Mak and R. Hsiao, "Kernel eigenspace-based MLLR adaptation" IEEE Trans. Audio, Speech, Lang. Process, vol. 15, no. 3, pp. 784-795, Mar. 2007
- (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.3 , pp. 784-795
- Mak, B.¹ Hsiao, R.²

24
- 0023965454
- Objective quality evaluation for low-bit-rate speech coding systems
- DOI 10.1109/49.601
- N. Kitawaki, H. Nagabuchi, and K. Itoh, "Objective quality evaluation for low-bit-rate speech coding systems" IEEE J. Sel. Areas Commun., vol. 6, no. 2, pp. 242-248, Feb. 1988 (Pubitemid 18596866)
- (1988) IEEE Journal on Selected Areas in Communications , vol.6 , Issue.2 , pp. 242-248
- Kitawaki Nobuhiko¹ Nagabuchi Hiromi² Itoh Kenzo³

25
- 0017787719
- A study of complexity and quality of speech waveform coders
- J. M. Tribolet, P. Noll, B. J. McDermott, and R. E. Crochiere, "A study of complexity and quality of speech waveform coders" in Proc. ICASSP, 1978, pp. 586-590
- (1978) Proc. ICASSP , pp. 586-590
- Tribolet, J.M.¹ Noll, P.² McDermott, B.J.³ Crochiere, R.E.⁴

26
- 59849095077
- ITU-T
- "Recommendation P.862 (02/2001) perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs" ITU-T
- Recommendation P.862 (02/2001) Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-end Speech Quality Assessment of Narrow-band Telephone Networks and Speech Codecs

27
- 44149106061
- Evaluation of objective quality measures for speech enhancement
- Jan
- Y. Hu and P. Loizou, "Evaluation of objective quality measures for speech enhancement" IEEE Trans. Audio, Speech, Lang. Process, vol. 16, no. 1, pp. 229-238, Jan. 2008
- (2008) IEEE Trans. Audio, Speech, Lang. Process , vol.16 , Issue.1 , pp. 229-238
- Hu, Y.¹ Loizou, P.²

28
- 48349113750
- M. Brookes, VOICEBOX: Speech Processing Toolbox for MATLAB, 1998
- (1998) VOICEBOX: Speech Processing Toolbox for MATLAB
- Brookes, M.¹

29
- 67650832556
- Statistical analysis of the Blizzard Challenge 2007 listening test results
- R. A. J. Clark, M. Podsiad?o, M. Fraser, C. Mayo, and S. King, "Statistical analysis of the Blizzard Challenge 2007 listening test results" in Proc. Blizzard Workshop, 2007
- (2007) Proc. Blizzard Workshop
- Clark, R.A.J.¹ Podsiado, M.² Fraser, M.³ Mayo, C.⁴ King, S.⁵

30
- 80051651104
- Univ. of Edinburgh, Edinburgh, U.K, Tech. Rep. EDI-INF-RR-1388
- M. Wester, "The EMIME Bilingual Database" Univ. of Edinburgh, Edinburgh, U.K., 2010, Tech. Rep. EDI-INF-RR-1388
- (2010) The EMIME Bilingual Database
- Wester, M.¹

31
- 0027623210
- Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems" Speech Commun., vol. 12, no. 3, pp. 247-251, 1993
- (1993) Speech Commun , vol.12 , Issue.3 , pp. 247-251
- Varga, A.¹ Steeneken, H.²

32
- 84865777002
- The CSTR/EMIME HTS system for Blizzard Challenge
- J. Yamagishi and O.Watts, "The CSTR/EMIME HTS system for Blizzard Challenge" in Proc. Blizzard Challenge, 2010
- Proc. Blizzard Challenge , vol.2010
- Yamagishi, J.¹ Watts, O.²

33
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveign, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds" Speech Commun., vol. 27, pp. 187-207, 1999
- (1999) Speech Commun , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveign, A.³

34
- 11144317887
- 0 estimation of speech signal using harmonicity measure based on instantaneous frequency
- D. Arifianto, T. Tanaka, T. Masuko, and T. Kobayashi, "Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency" IEICE Trans. Inf. Syst., vol. 87, no. 12, pp. 2812-2820, 2004 (Pubitemid 40021353)
- (2004) IEICE Transactions on Information and Systems , vol.E87-D , Issue.12 , pp. 2812-2820
- Arifianto, D.¹ Tanaka, T.² Masuko, T.³ Kobayashi, T.⁴

35
- 84928118106
- Fixed point analysis of frequency to instantaneous frequencymapping for accurate estimation of F0 and periodicity
- H. Kawahara, H. Katayose, A. de Cheveign, and R. D. Patterson, "Fixed point analysis of frequency to instantaneous frequencymapping for accurate estimation of F0 and periodicity" in Proc. Eurospeech, 1999, pp. 2781-2784
- (1999) Proc. Eurospeech , pp. 2781-2784
- Kawahara, H.¹ Katayose, H.² De Cheveign, A.³ Patterson, R.D.⁴

36
- 0001455934
- A robust algorithm for pitch tracking (RAPT)
- D. Talkin, "A robust algorithm for pitch tracking (RAPT)" Speech Coding Synth., pp. 495-518, 1995
- (1995) Speech Coding Synth , pp. 495-518
- Talkin, D.¹

37
- 84897856883
- Distributed speech recognition ETSI
- "ES 202 050 V1.1.5 speech processing, transmission and quality aspects (STQ), distributed speech recognition" ETSI, 2007
- (2007) ES 202 050 V1.1.5 Speech Processing, Transmission and Quality Aspects (STQ)

38
- 84910032186
- Speecon-speech databases for consumer devices: Database specification and validation
- D. Iskra, B. Grosskopf, K. Marasek, H. van den Heuvel, F. Diehl, and A. Kiessling, "Speecon-speech databases for consumer devices: Database specification and validation" in Proc. LREC, 2002
- (2002) Proc. LREC
- Iskra, D.¹ Grosskopf, B.² Marasek, K.³ Heuvel Den H.Van⁴ Diehl, F.⁵ Kiessling, A.⁶

39
- 84897855414
- Objective evaluation measures for speaker-adaptive HMM-TTS systems
- U. Remes, R. Karhila, and M. Kurimo, "Objective evaluation measures for speaker-adaptive HMM-TTS systems" in Proc. SSW.
- Proc. SSW
- Remes, U.¹ Karhila, R.² Kurimo, M.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.