SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 1642-1646

The USTC system for voice conversion challenge 2016: Neural network based approaches for spectrum, aperiodicity and F0 conversion

(5) Chen, Ling Hui a,b Liu, Li Juan b Ling, Zhen Hua a Jiang, Yuan b Dai, Li Rong a

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b IFLYTEK RESEARCH (China)

Author keywords

DNN; Frequency warping; LSTM; RNN; Voice conversion

Indexed keywords

RECURRENT NEURAL NETWORKS; SPEECH COMMUNICATION;

DEEP NEURAL NETWORKS; FREQUENCY WARPING; LONG SHORT TERM MEMORY; LSTM; NETWORK-BASED APPROACH; RECURRENT NEURAL NETWORK (RNN); SPECTRAL ENVELOPES; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84994337398 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-456 Document Type: Conference Paper

Times cited : (7)

References (20)

1
- 84865698185
- Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
- T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 9, pp. 2505-2517, 2012.
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.9 , pp. 2505-2517
- Toda, T.¹ Nakagiri, M.² Shikano, K.³

2
- 67650657780
- Foreign accent conversion in computer assisted pronunciation training
- D. Felps, H. Bortfeld, and R. Gutierrez-Osuna, "Foreign accent conversion in computer assisted pronunciation training," Speech communication, vol. 51, no. 10, pp. 920-932, 2009.
- (2009) Speech Communication , vol.51 , Issue.10 , pp. 920-932
- Felps, D.¹ Bortfeld, H.² Gutierrez-Osuna, R.³

3
- 0023739214
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," in Proc. ICASSP, 1988, pp. 655-658.
- (1988) Proc. ICASSP , pp. 655-658
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

4
- 0032026483
- Continuous probabilistic transform for voice conversion
- mar
- Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Audio, Speech, and Lang. Process, vol. 6, no. 2, pp. 131-142, mar. 1998.
- (1998) IEEE Trans. Audio, Speech, and Lang. Process , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

5
- 0031623661
- Spectral voice conversion for text-tospeech synthesis
- A. Kain and M. Macon, "Spectral voice conversion for text-tospeech synthesis," in Proc. ICASSP, 1998, pp. 285-288.
- (1998) Proc. ICASSP , pp. 285-288
- Kain, A.¹ Macon, M.²

6
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- nov
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, and Lang. Process, vol. 15, no. 8, pp. 2222-2235, nov. 2007.
- (2007) IEEE Trans. Audio, Speech, and Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

7
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- IEEE
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

8
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7962-7966.
- (2013) Acoustics, Speech and Signal Processing (ICASSP 2013 IEEE International Conference On. IEEE , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

9
- 84906225084
- Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion
- Lyon, France, August 25-29, 2013
- L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion," in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, 2013, pp. 3052-3056.
- (2013) INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association , pp. 3052-3056
- Chen, L.-H.¹ Ling, Z.-H.² Song, Y.³ Dai, L.-R.⁴

10
- 84906280857
- Voice conversion in high-order eigen space using deep belief nets
- Lyon, France, August 25-29, 2013
- T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, "Voice conversion in high-order eigen space using deep belief nets," in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, 2013, pp. 369-372.
- (2013) INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association , pp. 369-372
- Nakashika, T.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

11
- 84923867813
- Voice conversion using rnn pre-trained by recurrent temporal restricted boltzmann machines
- March
- T. Nakashika, T. Takiguchi, and Y. Ariki, "Voice conversion using rnn pre-trained by recurrent temporal restricted boltzmann machines," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 3, pp. 580-587, March 2015.
- (2015) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.23 , Issue.3 , pp. 580-587
- Nakashika, T.¹ Takiguchi, T.² Ariki, Y.³

12
- 84946027999
- Voice conversion using deep bidirectional long short-term memory based recurrent neural networks
- L. Sun, S. Kang, K. Li, and H. Meng, "Voice conversion using deep bidirectional long short-term memory based recurrent neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4869-4873.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference On. IEEE , pp. 4869-4873
- Sun, L.¹ Kang, S.² Li, K.³ Meng, H.⁴

13
- 84921735339
- Voice conversion using deep neural networks with layer-wise generative training
- L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, "Voice conversion using deep neural networks with layer-wise generative training," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 12, pp. 1859-1872, 2014.
- (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.22 , Issue.12 , pp. 1859-1872
- Chen, L.-H.¹ Ling, Z.-H.² Liu, L.-J.³ Dai, L.-R.⁴

14
- 27644522706
- Vocal tract normalization equals linear transformation in cepstral space
- sep
- M. Pitz and H. Ney, "Vocal tract normalization equals linear transformation in cepstral space," Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 5, pp. 930-944, sep. 2005.
- (2005) Speech and Audio Processing, IEEE Transactions on , vol.13 , Issue.5 , pp. 930-944
- Pitz, M.¹ Ney, H.²

15
- 84865785753
- Improved bottleneck features using pretrained deep neural networks
- D. Yu and M. L. Seltzer, "Improved bottleneck features using pretrained deep neural networks." Interspeech, pp. 237-240, 2011.
- (2011) Interspeech , pp. 237-240
- Yu, D.¹ Seltzer, M.L.²

16
- 84959118000
- The fisher corpus: A resource for the next generations of speech-to-text
- C. Cieri, D. Miller, and K. Walker, "The fisher corpus: a resource for the next generations of speech-to-text." in LREC, vol. 4, 2004, pp. 69-71.
- (2004) LREC , vol.4 , pp. 69-71
- Cieri, C.¹ Miller, D.² Walker, K.³

17
- 84906257669
- Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression
- H. Siln, J. Nurminen, E. Helander, and M. Gabbouj, "Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression," in Interspeech, 2013.
- (2013) Interspeech
- Siln, H.¹ Nurminen, J.² Helander, E.³ Gabbouj, M.⁴

18
- 84946020861
- Sparse representation for frequency warping based voice conversion
- X. Tian, Z. Wu, S. W. Lee, and N. Q. Hy, "Sparse representation for frequency warping based voice conversion," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, 2015.
- (2015) Acoustics, Speech and Signal Processing (ICASSP 2015 IEEE International Conference on
- Tian, X.¹ Wu, Z.² Lee, S.W.³ Hy, N.Q.⁴

19
- 0032673049
- Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, vol. 27, no. 3, pp. 187-208, 1999.
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-208
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

20
- 84905283451
- New methods in continuous Mandarin speech recognition
- C. J. Chen, R. A. Gopinath, M. D. Monkowski, M. A. Picheny, and K. Shen, "New methods in continuous mandarin speech recognition." in Eurospeech, 1997.
- (1997) Eurospeech
- Chen, C.J.¹ Gopinath, R.A.² Monkowski, M.D.³ Picheny, M.A.⁴ Shen, K.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.