SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2018-September, Issue , 2018, Pages 887-891

Machine speech chain with one-shot speaker adaptation

(3) Tjandra, Andros a,b Sakti, Sakriani a,b Nakamura, Satoshi a,b

a NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

b Disaster Resilience Science Team (Japan)

Author keywords

Deep learning; Semi supervised learning; Speech chain; Speech recognition; Speech synthesis

Indexed keywords

CHAINS; CHARACTER RECOGNITION; DEEP LEARNING; MECHANISMS; SPEECH COMMUNICATION; SPEECH SYNTHESIS; SUPERVISED LEARNING;

AUTOMATIC SPEECH RECOGNITION; LABELED AND UNLABELED DATA; RECOGNITION RATES; SEMI- SUPERVISED LEARNING; SPEAKER ADAPTATION; SPEAKER RECOGNITION; TEXT TO SPEECH SYNTHESES (TTS); VOICE CHARACTERISTICS;

SPEECH RECOGNITION;

EID: 85055003209 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2018-1558 Document Type: Conference Paper

Times cited : (49)

References (26)

1
- 20444444063
- Worth Publishers, Online.
- P. Denes and E. Pinson, The Speech Chain, ser. Anchor books. Worth Publishers, 1993. [Online]. Available: https://books.google.co.jp/books?id=ZMTm3nlDfroC
- (1993) The Speech Chain, Ser. Anchor Books
- Denes, P.¹ Pinson, E.²

2
- 85050528588
- Listening while speaking: Speech chain by deep learning
- Dec
- A. Tjandra, S. Sakti, and S. Nakamura, “Listening while speaking: Speech chain by deep learning,” in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2017, pp. 301-308.
- (2017) 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pp. 301-308
- Tjandra, A.¹ Sakti, S.² Nakamura, S.³

3
- 84973293705
- End-to-end attention-based large vocabulary speech recognition
- IEEE
- D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, “End-to-end attention-based large vocabulary speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4945-4949.
- (2016) Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on , pp. 4945-4949
- Bahdanau, D.¹ Chorowski, J.² Serdyuk, D.³ Brakel, P.⁴ Bengio, Y.⁵

4
- 84973351869
- Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
- IEEE
- W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4960-4964.
- (2016) Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on , pp. 4960-4964
- Chan, W.¹ Jaitly, N.² Le, Q.³ Vinyals, O.⁴

5
- 85050561560
- arXiv preprint
- Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio et al., “Tacotron: A fully end-to-end text-to-speech synthesis model,” arXiv preprint arXiv:1703.10135, 2017.
- (2017) Tacotron: A Fully End-to-End Text-to-Speech Synthesis Model
- Wang, Y.¹ Skerry-Ryan, R.² Stanton, D.³ Wu, Y.⁴ Weiss, R.J.⁵ Jaitly, N.⁶ Yang, Z.⁷ Xiao, Y.⁸ Chen, Z.⁹ Bengio, S.¹⁰

6
- 84928547704
- Sequence-to-Sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence-to-Sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104-3112.
- (2014) Advances in Neural Information Processing Systems , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

7
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

8
- 84939821078
- arXiv preprint
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
- (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Chung, J.¹ Gulcehre, C.² Cho, K.³ Bengio, Y.⁴

9
- 84922389693
- arXiv preprint
- D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
- (2014) Neural Machine Translation by Jointly Learning to Align and Translate
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

10
- 85047019895
- arXiv preprint
- C. Li, X. Ma, B. Jiang, X. Li, X. Zhang, X. Liu, Y. Cao, A. Kan-nan, and Z. Zhu, “Deep speaker: an end-to-end neural speaker embedding system,” arXiv preprint arXiv:1705.02304, 2017.
- (2017) Deep Speaker: An End-to-End Neural Speaker Embedding System
- Li, C.¹ Ma, X.² Jiang, B.³ Li, X.⁴ Zhang, X.⁵ Liu, X.⁶ Cao, Y.⁷ Kannan, A.⁸ Zhu, Z.⁹

11
- 84960920723
- arXiv preprint
- B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853, 2015.
- (2015) Empirical Evaluation of Rectified Activations in Convolutional Network
- Xu, B.¹ Wang, N.² Chen, T.³ Li, M.⁴

12
- 84990062230
- Autoencoding beyond pixels using a learned similarity metric
- A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” in International Conference on Machine Learning, 2016, pp. 1558-1566.
- (2016) International Conference on Machine Learning , pp. 1558-1566
- Larsen, A.B.L.¹ Sønderby, S.K.² Larochelle, H.³ Winther, O.⁴

13
- 84990854047
- Perceptual losses for real-time style transfer and super-resolution
- Springer
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision. Springer, 2016, pp. 694-711.
- (2016) European Conference on Computer Vision , pp. 694-711
- Johnson, J.¹ Alahi, A.² Fei-Fei, L.³

14
- 0012330750
- The design for the wall street journal-based csr corpus
- Association for Computational Linguistics
- D. B. Paul and J. M. Baker, “The design for the wall street journal-based csr corpus,” in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 357-362.
- (1992) Proceedings of The Workshop on Speech and Natural Language , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

15
- 84858953642
- The Kaldi speech recognition toolkit
- IEEE Signal Processing Society, Dec. iEEE Catalog CFP11SRW-USB
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The Kaldi speech recognition toolkit,” in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, Dec. 2011, iEEE Catalog No.: CFP11SRW-USB.
- (2011) IEEE 2011 Workshop on Automatic Speech Recognition and Understanding
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsky, J.¹¹ Stemmer, G.¹² Vesely, K.¹³

16
- 85030231537
- Feb
- B. McFee, M. McVicar, O. Nieto, S. Balke, C. Thome, D. Liang, E. Battenberg, J. Moore, R. Bittner, R. Yamamoto, and et al., “li-brosa 0.5.0,” Feb 2017.
- (2017) Li-Brosa 0.5.0
- McFee, B.¹ McVicar, M.² Nieto, O.³ Balke, S.⁴ Thome, C.⁵ Liang, D.⁶ Battenberg, E.⁷ Moore, J.⁸ Bittner, R.⁹ Yamamoto, R.¹⁰

17
- 84978897682
- Supervised sequence labelling
- Springer
- A. Graves, “Supervised sequence labelling,” in Supervised Sequence Labelling with Recurrent Neural Networks. Springer, 2012, pp. 5-13.
- (2012) Supervised Sequence Labelling with Recurrent Neural Networks , pp. 5-13
- Graves, A.¹

18
- 85047343776
- A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
- (2017) Automatic Differentiation in Pytorch
- Paszke, A.¹ Gross, S.² Chintala, S.³ Chanan, G.⁴ Yang, E.⁵ DeVito, Z.⁶ Lin, Z.⁷ Desmaison, A.⁸ Antiga, L.⁹ Lerer, A.¹⁰

19
- 85023752928
- Joint ctc-attention based end-to-end speech recognition using multi-task learning
- IEEE
- S. Kim, T. Hori, and S. Watanabe, “Joint ctc-attention based end-to-end speech recognition using multi-task learning,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 4835-4839.
- (2017) Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on , pp. 4835-4839
- Kim, S.¹ Hori, T.² Watanabe, S.³

20
- 85050532161
- Attention-based wav2text with feature transfer learning
- Dec
- A. Tjandra, S. Sakti, and S. Nakamura, “Attention-based wav2text with feature transfer learning,” in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2017, pp. 309-315.
- (2017) 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pp. 309-315
- Tjandra, A.¹ Sakti, S.² Nakamura, S.³

21
- 26444592207
- Learning from labeled and unlabeled data with label propagation
- X. Zhu and Z. Ghahramani, “Learning from labeled and unlabeled data with label propagation,” Tech. Rep., 2002.
- (2002) Tech. Rep.
- Zhu, X.¹ Ghahramani, Z.²

22
- 85011070895
- arXiv preprint
- A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
- (2016) Wavenet: A Generative Model for Raw Audio
- Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

23
- 84983119674
- Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
- IEEE
- P. Swietojanski and S. Renals, “Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models,” in Spoken Language Technology Workshop (SLT), 2014 IEEE. IEEE, 2014, pp. 171-176.
- (2014) Spoken Language Technology Workshop (SLT), 2014 IEEE , pp. 171-176
- Swietojanski, P.¹ Renals, S.²

24
- 84959112868
- A study of speaker adaptation for dnn-based speech synthesis
- Z. Wu, P. Swietojanski, C. Veaux, S. Renals, and S. King, “A study of speaker adaptation for dnn-based speech synthesis,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
- (2015) Sixteenth Annual Conference of The International Speech Communication Association
- Wu, Z.¹ Swietojanski, P.² Veaux, C.³ Renals, S.⁴ King, S.⁵

25
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.J.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

26
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.