SCOPUS 정보 검색 플랫폼

2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings

Volumn , Issue , 2014, Pages 19-23

Voice conversion using deep neural networks with speaker-independent pre-training

(2) Mohammadi, Seyed Hamidreza a Kain, Alexander a

a OREGON HEALTH AND SCIENCE UNIVERSITY (United States)

Author keywords

Autoencoder; Deep neural network; Pre training; Voice conversion

Indexed keywords

BACKPROPAGATION; LEARNING SYSTEMS; NEURAL NETWORKS;

AUTO ENCODERS; COMPACT REPRESENTATION; CONVERSION ACCURACIES; DEEP NEURAL NETWORKS; GAUSSIAN MIXTURE MODEL; PRE-TRAINING; SPEAKER INDEPENDENTS; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84946685887 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/SLT.2014.7078543 Document Type: Conference Paper

Times cited : (113)

References (30)

1
- 84890475857
- Transmutative voice conversion
- IEEE
- S. H. Mohammadi and A. Kain. Transmutative voice conversion. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 6920-6924. IEEE, 2013.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pp. 6920-6924
- Mohammadi, S.H.¹ Kain, A.²

2
- 0032026483
- Continuous probabilistic transform for voice conversion
- March
- Y. Stylianou, O. Cappé, and E. Moulines. Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6(2):131-142, March 1998.
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

3
- 0031623661
- Spectral voice conversion for text-tospeech synthesis
- May
- A. Kain and M. Macon. Spectral voice conversion for text-tospeech synthesis. In Proceedings of ICASSP, volume 1, pages 285-299, May 1998.
- (1998) Proceedings of ICASSP , vol.1 , pp. 285-299
- Kain, A.¹ Macon, M.²

4
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- November
- T. Toda, A. W. Black, and K. Tokuda. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing Journal, 15(8):2222-2235, November 2007.
- (2007) IEEE Transactions on Audio, Speech, and Language Processing Journal , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

5
- 0029254176
- Transformation of formants for voice conversion using artificial neural networks
- M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana. Transformation of formants for voice conversion using artificial neural networks. Speech communication, 16(2): 207-216, 1995.
- (1995) Speech Communication , vol.16 , Issue.2 , pp. 207-216
- Narendranath, M.¹ Murthy, H.A.² Rajendran, S.³ Yegnanarayana, B.⁴

6
- 38149073264
- Voice transformation by mapping the features at syllable level
- Springer
- K. S. Rao, R. Laskar, and S. G. Koolagudi. Voice transformation by mapping the features at syllable level. In Pattern Recognition and Machine Intelligence, pages 479-486. Springer, 2007.
- (2007) Pattern Recognition and Machine Intelligence , pp. 479-486
- Rao, K.S.¹ Laskar, R.² Koolagudi, S.G.³

7
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- S. Desai, A. W. Black, B. Yegnanarayana, and K. Prahallad. Spectral mapping using artificial neural networks for voice conversion. Audio, Speech, and Language Processing, IEEE Transactions on, 18(5):954-964, 2010.
- (2010) Audio, Speech, and Language Processing, IEEE Transactions on , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.W.² Yegnanarayana, B.³ Prahallad, K.⁴

8
- 84906281619
- Real-time voice conversion using artificial neural networks with rectified linear units
- E. Azarov, M. Vashkevich, D. Likhachov, and A. Petrovsky. Real-time voice conversion using artificial neural networks with rectified linear units. In INTERSPEECH, pages 1032-1036, 2013.
- (2013) INTERSPEECH , pp. 1032-1036
- Azarov, E.¹ Vashkevich, M.² Likhachov, D.³ Petrovsky, A.⁴

9
- 84893244283
- Voice conversion for arbitrary speakers using articulatory-movement to vocal-tract parameter mapping
- IEEE
- N. W. Ariwardhani, Y. Iribe, K. Katsurada, and T. Nitta. Voice conversion for arbitrary speakers using articulatory-movement to vocal-tract parameter mapping. In Machine Learning for Signal Processing (MLSP), 2013 IEEE InternationalWorkshop on, pages 1-6. IEEE, 2013.
- (2013) Machine Learning for Signal Processing (MLSP), 2013 IEEE InternationalWorkshop on , pp. 1-6
- Ariwardhani, N.W.¹ Iribe, Y.² Katsurada, K.³ Nitta, T.⁴

10
- 84905223323
- Using bidirectional associative memories for joint spectral envelope modeling in voice conversion
- L. J. Liu, L. H. Chen, Z. H. Ling, and L. R. Dai. Using bidirectional associative memories for joint spectral envelope modeling in voice conversion. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On. IEEE
- Liu, L.J.¹ Chen, L.H.² Ling, Z.H.³ Dai, L.R.⁴

11
- 84906225084
- Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
- L. H. Chen, Z. H. Ling, Y. Song, and L. R. Dai. Joint spectral distribution modeling using restricted boltzmann machines for voice conversion. In INTERSPEECH, 2013.
- (2013) INTERSPEECH
- Chen, L.H.¹ Ling, Z.H.² Song, Y.³ Dai, L.R.⁴

12
- 84889579519
- Conditional restricted boltzmann machine for voice conversion
- IEEE
- Z. Wu, E. S. Chng, and H. Li. Conditional restricted boltzmann machine for voice conversion. In Signal and Information Processing (ChinaSIP), 2013 IEEE China Summit & International Conference on, pages 104-108. IEEE, 2013.
- (2013) Signal and Information Processing (ChinaSIP), 2013 IEEE China Summit & International Conference on , pp. 104-108
- Wu, Z.¹ Chng, E.S.² Li, H.³

13
- 84906280857
- Voice conversion in high-order eigen space using deep belief nets
- T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki. Voice conversion in high-order eigen space using deep belief nets. In INTERSPEECH, pages 369-372, 2013.
- (2013) INTERSPEECH , pp. 369-372
- Nakashika, T.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

14
- 0003747605
- Wiley New York
- D. M. Titterington, A. F. Smith, U. E. Makov, et al. Statistical analysis of finite mixture distributions, volume 7. Wiley New York, 1985.
- (1985) Statistical Analysis of Finite Mixture Distributions , vol.7
- Titterington, D.M.¹ Smith, A.F.² Makov, U.E.³

15
- 0024880831
- Multilayer feedforward networks are universal approximators
- K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359-366, 1989.
- (1989) Neural Networks , vol.2 , Issue.5 , pp. 359-366
- Hornik, K.¹ Stinchcombe, M.² White, H.³

16
- 84865847955
- Comparing ann and gmm in a voice conversion framework
- R. Laskar, D. Chakrabarty, F. Talukdar, K. S. Rao, and K. Banerjee. Comparing ann and gmm in a voice conversion framework. Applied Soft Computing, 12(11):3332-3342, 2012.
- (2012) Applied Soft Computing , vol.12 , Issue.11 , pp. 3332-3342
- Laskar, R.¹ Chakrabarty, D.² Talukdar, F.³ Rao, K.S.⁴ Banerjee, K.⁵

17
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82-97, 2012.
- (2012) Signal Processing Magazine, IEEE , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

18
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- IEEE
- H. Ze, A. Senior, and M. Schuster. Statistical parametric speech synthesis using deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 7962-7966. IEEE, 2013.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pp. 7962-7966
- Ze, H.¹ Senior, A.² Schuster, M.³

19
- 84929157442
- Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
- Barcelona, Spain, August
- H. Lu, S. King, and O. Watts. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis. In 8th ISCAWorkshop on Speech Synthesis, pages 281-285, Barcelona, Spain, August 2013.
- (2013) 8th ISCAWorkshop on Speech Synthesis , pp. 281-285
- Lu, H.¹ King, S.² Watts, O.³

20
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z. H. Ling, L. Deng, and D. Yu. Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis. Audio, Speech, and Language Processing, IEEE Transactions on, 21 (10):2129-2139, 2013.
- (2013) Audio, Speech, and Language Processing, IEEE Transactions on , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.H.¹ Deng, L.² Yu., D.³

21
- 77949522811
- Why does unsupervised pre-training help deep learning
- D. Erhan, Y. Bengio, A. Courville, P. A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning The Journal of Machine Learning Research, 11: 625-660, 2010.
- (2010) The Journal of Machine Learning Research , vol.11 , pp. 625-660
- Erhan, D.¹ Bengio, Y.² Courville, A.³ Manzagol, P.A.⁴ Vincent, P.⁵ Bengio, S.⁶

22
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

23
- 79551480483
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:3371-3408, 2010.
- (2010) The Journal of Machine Learning Research , vol.11 , pp. 3371-3408
- Vincent, P.¹ Larochelle, H.² Lajoie, I.³ Bengio, Y.⁴ Manzagol, P.-A.⁵

24
- 84928144072
- Speech signal processing toolkit (sptk)
- Speech signal processing toolkit (sptk). URL http://sp-tk. sourceforge. net/.

25
- 80053460450
- Contractive auto-encoders: Explicit invariance during feature extraction
- S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 833-840, 2011.
- (2011) Proceedings of the 28th International Conference on Machine Learning (ICML-11 , pp. 833-840
- Rifai, S.¹ Vincent, P.² Muller, X.³ Glorot, X.⁴ Bengio, Y.⁵

26
- 34547496196
- Towards a voice conversion system based on frame selection
- IEEE
- T. Dutoit, A. Holzapfel, M. Jottrand, A. Moinet, J. Perez, and Y. Stylianou. Towards a voice conversion system based on frame selection. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, volume 4, pages IV-513. IEEE, 2007.
- (2007) Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on , vol.4 , pp. IV-513
- Dutoit, T.¹ Holzapfel, A.² Jottrand, M.³ Moinet, A.⁴ Perez, J.⁵ Stylianou, Y.⁶

27
- 84893401626
- arXiv preprint arXiv:1308. 4214
- I. J. Goodfellow, D. Warde-Farley, P. Lamblin, V. Dumoulin, M. Mirza, R. Pascanu, J. Bergstra, F. Bastien, and Y. Bengio. Pylearn2: A machine learning research library. arXiv preprint arXiv:1308. 4214, 2013.
- (2013) Pylearn2: A Machine Learning Research Library
- Goodfellow, I.J.¹ Warde-Farley, D.² Lamblin, P.³ Dumoulin, V.⁴ Mirza, M.⁵ Pascanu, R.⁶ Bergstra, J.⁷ Bastien, F.⁸ Bengio, Y.⁹

28
- 79960392344
- Amazon's mechanical turk-A new source of inexpensive, yet high-quality, data?
- January
- M. Buhrmester, T. Kwang, and S. D. Gosling. Amazon's mechanical turk-a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1):3-5, January 2011.
- (2011) Perspectives on Psychological Science , vol.6 , Issue.1 , pp. 3-5
- Buhrmester, M.¹ Kwang, T.² Gosling, S.D.³

29
- 4444285698
- PhD thesis, OGI School of Science & Engineering at Oregon Health & Science University
- A. Kain. High Resolution Voice Transformation. PhD thesis, OGI School of Science & Engineering at Oregon Health & Science University, 2001.
- (2001) High Resolution Voice Transformation
- Kain, A.¹

30
- 0002322469
- On a test of whether one of two random variables is stochastically larger than the other
- H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pages 50-60, 1947.
- (1947) The Annals of Mathematical Statistics , pp. 50-60
- Mann, H.B.¹ Whitney, D.R.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.