SCOPUS 정보 검색 플랫폼

IEICE Transactions on Information and Systems

Volumn E97-D, Issue 6, 2014, Pages 1411-1418

Noise-robust voice conversion based on sparse spectral mapping using non-negative matrix factorization

(4) Aihara, Ryo a Takashima, Ryoichi a Takiguchi, Tetsuya a Ariki, Yasuo a

a KOBE UNIVERSITY (Japan)

Author keywords

Noise robustness; Non negative matrix factorization; Sparse representation; Voice conversion

Indexed keywords

FACTORIZATION; MATRIX ALGEBRA; PHOTOMAPPING;

GAUSSIAN MIXTURE MODEL; NOISE ROBUSTNESS; NONNEGATIVE MATRIX FACTORIZATION; PARALLEL TRAINING; SPARSE REPRESENTATION; SPEAKER CONVERSION; SPECTRAL CONVERSION; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84901806271 PISSN: 09168532 EISSN: 17451361 Source Type: Journal
DOI: 10.1587/transinf.E97.D.1411 Document Type: Article

Times cited : (16)

References (24)

1
- 0032026483
- Continuous probabilistic transform for voice conversion
- Y. Stylianou, O. Cappe, and E. Moilines, "Continuous probabilistic transform for voice conversion, " IEEE Trans. Speech Audio Process., vol.6, no.2, pp.131-142, 1998.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moilines, E.³

2
- 0031623661
- Spectral voice conversion for text-tospeech synthesis
- A. Kain and M.W. Macon, "Spectral voice conversion for text-tospeech synthesis, " ICASSP, vol.1, pp.285-288, 1998.
- (1998) ICASSP , vol.1 , pp. 285-288
- Kain, A.¹ Macon, M.W.²

3
- 84865747520
- Interspeech
- C. Veaux and X. Robet, "Intonation conversion from neutral to expressive speech, " Interspeech, pp.2765-2768, 2011.
- (2011) Intonation Conversion from Neutral to Expressive Speech , pp. 2765-2768
- Veaux, C.¹ Robet, X.²

4
- 84890451203
- GMM-based emotional voice conversion using spectrum and prosody features
- R. Aihara, R. Takashima, T. Takiguchi, and Y. Ariki, "GMM-based emotional voice conversion using spectrum and prosody features, " American Journal of Signal Processing, vol.2, no.5, 2012.
- (2012) American Journal of Signal Processing , vol.2 , Issue.5
- Aihara, R.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

5
- 80052698826
- Speakingaid systems using GMM-based voice conversion for electrolaryngeal speech
- K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speakingaid systems using GMM-based voice conversion for electrolaryngeal speech, " Speech Commun., vol.54, no.1, pp.134-146, 2012.
- (2012) Speech Commun , vol.54 , Issue.1 , pp. 134-146
- Nakamura, K.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

6
- 84890519936
- Individualitypreserving voice conversion for articulation disorders based on nonnegative matrix factorization
- R. Aihara, R. Takashima, T. Takiguchi, and Y. Ariki, " Individualitypreserving voice conversion for articulation disorders based on Nonnegative Matrix Factorization, " ICASSP, pp.8037-8040, 2013.
- (2013) ICASSP , pp. 8037-8040
- Aihara, R.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

7
- 0023739214
- Esophageal speech enhancement based on statistical voice conversion with gaussian mixture models
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Esophageal speech enhancement based on statistical voice conversion with gaussian mixture models, " ICASSP, pp.655-658, 1988.
- (1988) ICASSP , pp. 655-658
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

8
- 0026880275
- Voice transformation using PSOLA technique
- H. Valbret, E. Moulines, and J.P. Tubach, "Voice transformation using PSOLA technique, " Speech Commun., vol.11, no.2-3, pp.175- 187, 1992.
- (1992) Speech Commun. , vol.11 , Issue.2-3 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.P.³

9
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio Speech Lang. Process., vol.15, no.8, pp.2222-2235, 2007.
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

10
- 77953712499
- Voice conversion using partial least squares regression
- E. Helander, T. Virtanen, J. Nurminen, and M. Gabbouj, "Voice conversion using partial least squares regression, " IEEE Trans. Audio Speech Lang. Process., vol.18, Issue:5, pp.912-921, 2010.
- (2010) IEEE Trans. Audio Speech Lang. Process. , vol.18 , Issue.5 , pp. 912-921
- Helander, E.¹ Virtanen, T.² Nurminen, J.³ Gabbouj, M.⁴

11
- 44949210554
- Interspeech
- C.H. Lee and C.H. Wu, "MAP-based adaptation for speech conversion using adaptation data selection and non-parallel training, " Interspeech, pp.2254-2257, 2006.
- (2006) MAP-based Adaptation for Speech Conversion Using Adaptation Data Selection and Non-parallel Training , pp. 2254-2257
- Lee, C.H.¹ Wu, C.H.²

12
- 34547512822
- Interspeech
- T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on Gaussian mixture model, " Interspeech, pp.2446-2449, 2006.
- (2006) Eigenvoice Conversion Based on Gaussian Mixture Model , pp. 2446-2449
- Toda, T.¹ Ohtani, Y.² Shikano, K.³

13
- 84865798483
- Interspeech
- D. Saito, K. Yamamoto, N. Minematsu, and K. Hirose, "One-to many voice conversion based on tensor representation of speaker space, " Interspeech, pp.653-656, 2011.
- (2011) One-to Many Voice Conversion Based on Tensor Representation of Speaker Space , pp. 653-656
- Saito, D.¹ Yamamoto, K.² Minematsu, N.³ Hirose, K.⁴

14
- 84898964201
- Algorithms for non-negative matrix factorization
- D.D. Lee and H.S. Seung, "Algorithms for non-negative matrix factorization, " Neural Information Processing System, pp.556-562, 2001.
- (2001) Neural Information Processing System , pp. 556-562
- Lee, D.D.¹ Seung, H.S.²

15
- 50249152311
- Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria
- T. Virtanen, "Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria, " IEEE Trans. Audio Speech Lang. Process., vol.15, no.3, pp.1066-1074, 2007.
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.3 , pp. 1066-1074
- Virtanen, T.¹

16
- 44949110218
- Interspeech
- M.N. Schmidt and R.K. Olsson, "Single-channel speech separation using sparse non-negative matrix factorization, " Interspeech, 2006.
- (2006) Single-channel Speech Separation Using Sparse Non-negative Matrix Factorization
- Schmidt, M.N.¹ Olsson, R.K.²

17
- 79960657803
- Exemplar-based sparse representations for noise robust automatic speech recognition
- J.F. Gemmeke, T. Viratnen, and A. Hurmalainen, "Exemplar-based sparse representations for noise robust automatic speech recognition, " IEEE Trans. Audio Speech Lang. Process., vol.19, no.7, pp.2067-2080, 2011.
- (2011) IEEE Trans. Audio Speech Lang. Process. , vol.19 , Issue.7 , pp. 2067-2080
- Gemmeke, J.F.¹ Viratnen, T.² Hurmalainen, A.³

18
- 84874248255
- Exemplar-based voice conversion in noisy environment
- R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplar-based voice conversion in noisy environment, " SLT, pp.313-317, 2012.
- (2012) SLT , pp. 313-317
- Takashima, R.¹ Takiguchi, T.² Ariki, Y.³

19
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased f0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction: possible role of a repetitive structure in sounds, " Speech Commun., vol.27, no.3-4, pp.187-207, 1999.
- (1999) Speech Commun. , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.³

20
- 78049362257
- Nonnegative matrix factorization as noise-robust feature extractor for speech recognition
- B. Schuller, F. Weninger, M. Wollmer, Y. Sun, and G. Rigoll, "Nonnegative matrix factorization as noise-robust feature extractor for speech recognition, " ICASSP, 2010.
- (2010) ICASSP
- Schuller, B.¹ Weninger, F.² Wollmer, M.³ Sun, Y.⁴ Rigoll, G.⁵

21
- 0025475528
- ATR japanese speech database as a tool of speech recognition and synthesis
- A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR Japanese speech database as a tool of speech recognition and synthesis, " Speech Commun., vol.9, pp.357-363, 1990.
- (1990) Speech Commun. , vol.9 , pp. 357-363
- Kurematsu, A.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Kuwabara, H.⁵ Shikano, K.⁶

22
- 70349094936
- CENSREC-1-c: An evaluation framework for voice activity detection under noisy environments
- 2009
- N. Kitaoka, T. Yamada, S. Tsuge, C. Miyajima, K. Yamamoto, T. Nishiura, M. Nakayama, Y. Denda, M. Fujimoto, T. Takiguchi, S. Tamura, S. Matsuda, T. Ogawa, S. Kuroiwa, K. Takeda, and S. Nakamura, "CENSREC-1-C: An evaluation framework for voice activity detection under noisy environments, " Acoustical Science and Technology, vol.30 (2009), no.5, pp.363-371, 2009.
- (2009) Acoustical Science and Technology , vol.30 , Issue.5 , pp. 363-371
- Kitaoka, N.¹ Yamada, T.² Tsuge, S.³ Miyajima, C.⁴ Yamamoto, K.⁵ Nishiura, T.⁶ Nakayama, M.⁷ Denda, Y.⁸ Fujimoto, M.⁹ Takiguchi, T.¹⁰ Tamura, S.¹¹ Matsuda, S.¹² Ogawa, T.¹³ Kuroiwa, S.¹⁴ Takeda, K.¹⁵ Nakamura, S.¹⁶

23
- 84901788410
- International Telecommunication Union, ", " ITU-T Recommendation P.800
- International Telecommunication Union, "Methods for objective and subjective assessment of quality, " ITU-T Recommendation P.800, 2003.
- (2003) Methods for Objective and Subjective Assessment of Quality

24
- 84901803470
- Exemplarbased voice conversion using non-negative spectrogram deconvolution
- Z. Wu, T. Virtanen, T. Kinnunen, E.S. Chng, and H. Li, "Exemplarbased voice conversion using non-negative spectrogram deconvolution, " SSW8, 2013.
- (2013) SSW8
- Wu, Z.¹ Virtanen, T.² Kinnunen, T.³ Chng, E.S.⁴ Li, H.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.