메뉴 건너뛰기




Volumn 2016-May, Issue , 2016, Pages 5145-5149

Deep neural network-guided unit selection synthesis

Author keywords

deep neural networks; embedding; hybrid synthesis; speech synthesis; unit selection

Indexed keywords


EID: 84973402504     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2016.7472658     Document Type: Conference Paper
Times cited : (46)

References (36)
  • 4
    • 84910105608 scopus 로고    scopus 로고
    • Measuring a decade of progress in text-to-speech
    • Simon King, "Measuring a decade of progress in text-to-speech, " Loquens, vol. 1, no. 1, 2014.
    • (2014) Loquens , vol.1 , Issue.1
    • King, S.1
  • 6
    • 84910070288 scopus 로고    scopus 로고
    • Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis
    • Thomas Merritt, Tuomo Raitio, and Simon King, "Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis, " in Proc. Interspeech, 2014, pp. 1509-1513.
    • (2014) Proc. Interspeech , pp. 1509-1513
    • Merritt, T.1    Raitio, T.2    King, S.3
  • 7
    • 84910028520 scopus 로고    scopus 로고
    • Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
    • Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, and Simon King, "Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech, " in Proc. Interspeech, 2014, pp. 1504-1508.
    • (2014) Proc. Interspeech , pp. 1504-1508
    • Eje Henter, G.1    Merritt, T.2    Shannon, M.3    Mayo, C.4    King, S.5
  • 8
    • 84946042252 scopus 로고    scopus 로고
    • Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech
    • Thomas Merritt, Javier Latorre, and Simon King, "Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech, " in Proc. ICASSP, 2015.
    • (2015) Proc. ICASSP
    • Merritt, T.1    Latorre, J.2    King, S.3
  • 9
    • 84959122693 scopus 로고    scopus 로고
    • Deep neural network context embeddings for model selection in rich-context HMM synthesis
    • Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, and Simon King, "Deep neural network context embeddings for model selection in rich-context HMM synthesis, " in Proc. Interspeech, 2015.
    • (2015) Proc. Interspeech
    • Merritt, T.1    Yamagishi, J.2    Wu, Z.3    Watts, O.4    King, S.5
  • 10
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database
    • Andrew J Hunt and Alan W Black, "Unit selection in a concatenative speech synthesis system using a large speech database, " in Proc. ICASSP, 1996, pp. 373-376.
    • (1996) Proc. ICASSP , pp. 373-376
    • Hunt, A.J.1    Black, A.W.2
  • 11
    • 85133526552 scopus 로고    scopus 로고
    • Automatically clustering similar units for unit selection in speech synthesis
    • Alan W Black and Paul A Taylor, "Automatically clustering similar units for unit selection in speech synthesis., " in Proc. Eurospeech, 1997.
    • (1997) Proc. Eurospeech
    • Black, A.W.1    Taylor, P.A.2
  • 13
    • 44949153641 scopus 로고    scopus 로고
    • The target cost formulation in unit selection speech synthesis
    • Paul Taylor, "The target cost formulation in unit selection speech synthesis., " in Proc. Interspeech, 2006, pp. 2038-2041.
    • (2006) Proc. Interspeech , pp. 2038-2041
    • Taylor, P.1
  • 16
    • 78049399368 scopus 로고    scopus 로고
    • Rich-context unit selection (RUS) approach to high quality TTS
    • Zhi-Jie Yan, Yao Qian, and Frank K Soong, "Rich-context unit selection (RUS) approach to high quality TTS, " in Proc. ICASSP, 2010, pp. 4798-4801.
    • (2010) Proc. ICASSP , pp. 4798-4801
    • Yan, Z.1    Qian, Y.2    Soong, F.K.3
  • 17
    • 84867217260 scopus 로고    scopus 로고
    • Synthesis by generation and concatenation of multiform segments
    • Vincent Pollet and Andrew Breen, "Synthesis by generation and concatenation of multiform segments., " in Proc. Interspeech, 2008, pp. 1825-1828.
    • (2008) Proc. Interspeech , pp. 1825-1828
    • Pollet, V.1    Breen, A.2
  • 18
    • 84865718211 scopus 로고    scopus 로고
    • Uniform speech parameterization for multi-form segment synthesis
    • Alexander Sorin, Slava Shechtman, and Vincent Pollet, "Uniform Speech Parameterization for Multi-form Segment Synthesis, " in Proc. Interspeech, 2011, pp. 337-340.
    • (2011) Proc. Interspeech , pp. 337-340
    • Sorin, A.1    Shechtman, S.2    Pollet, V.3
  • 19
    • 84878557723 scopus 로고    scopus 로고
    • Psychoacoustic segment scoring for multi-form speech synthesis
    • Alexander Sorin, Slava Shechtman, and Vincent Pollet, "Psychoacoustic Segment Scoring for Multi-Form Speech Synthesis., " in Proc. Interspeech, 2012, pp. 2214-2217.
    • (2012) Proc. Interspeech , pp. 2214-2217
    • Sorin, A.1    Shechtman, S.2    Pollet, V.3
  • 20
    • 84910091105 scopus 로고    scopus 로고
    • Refined intersegment joining in multi-form speech synthesis
    • Alexander Sorin, Slava Shechtman, and Vincent Pollet, "Refined Intersegment Joining in Multi-Form Speech Synthesis, " in Proc. Interspeech, 2014, pp. 790-794.
    • (2014) Proc. Interspeech , pp. 790-794
    • Sorin, A.1    Shechtman, S.2    Pollet, V.3
  • 21
    • 84959124410 scopus 로고    scopus 로고
    • Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system
    • Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, and Ron Hoory, "Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System, " in Proc. Interspeech, 2015.
    • (2015) Proc. Interspeech
    • Fernandez, R.1    Rendel, A.2    Ramabhadran, B.3    Hoory, R.4
  • 22
    • 84946033275 scopus 로고    scopus 로고
    • Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
    • Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, and Simon King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, " in Proc. ICASSP, 2015.
    • (2015) Proc. ICASSP
    • Wu, Z.1    Valentini-Botinhao, C.2    Watts, O.3    King, S.4
  • 23
    • 85032750981 scopus 로고    scopus 로고
    • Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
    • Zhen-Hua Ling, Shi-Yin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen M Meng, and Li Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 35-52, 2015.
    • (2015) IEEE Signal Processing Magazine , vol.32 , Issue.3 , pp. 35-52
    • Ling, Z.1    Kang, S.2    Zen, H.3    Senior, A.4    Schuster, M.5    Qian, X.6    Meng, H.M.7    Deng, L.8
  • 24
    • 84973282956 scopus 로고    scopus 로고
    • Acoustic modeling in statistical parametric speech synthesis-from hmm to lstm-rnn
    • Heiga Zen, "Acoustic Modeling in Statistical Parametric Speech Synthesis-From HMM to LSTM-RNN, " in Proc. MLSLP, 2015.
    • (2015) Proc. MLSLP
    • Zen, H.1
  • 25
    • 0032073761 scopus 로고    scopus 로고
    • An RNNbased prosodic information synthesizer for Mandarin text-to-speech
    • Sin-Horng Chen, Shaw-Hwa Hwang, and Yih-Ru Wang, "An RNNbased prosodic information synthesizer for Mandarin text-to-speech, " IEEE Transactions on Speech and Audio Processing, vol. 6, no. 3, pp. 226-239, 1998.
    • (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.3 , pp. 226-239
    • Chen, S.1    Hwang, S.2    Wang, Y.3
  • 26
    • 70450161678 scopus 로고    scopus 로고
    • Rich context modeling for high quality HMM-based TTS
    • Zhi-Jie Yan, Yao Qian, and Frank K Soong, "Rich context modeling for high quality HMM-based TTS, " in Proc. Interspeech, 2009, pp. 1755-1758.
    • (2009) Proc. Interspeech , pp. 1755-1758
    • Yan, Z.1    Qian, Y.2    Soong, F.K.3
  • 27
    • 84973381105 scopus 로고    scopus 로고
    • Festival 2-build your own general purpose unit selection speech synthesiser
    • Robert AJ Clark, Korin Richmond, and Simon King, "Festival 2-build your own general purpose unit selection speech synthesiser, " in Proc. SSW5, 2004.
    • (2004) Proc. SSW5
    • Clark, R.A.J.1    Richmond, K.2    King, S.3
  • 28
    • 34047123652 scopus 로고    scopus 로고
    • Multisyn: Opendomain unit selection for the festival speech synthesis system
    • Robert AJ Clark, Korin Richmond, and Simon King, "Multisyn: Opendomain unit selection for the festival speech synthesis system, " Speech Communication, vol. 49, no. 4, pp. 317-330, 2007.
    • (2007) Speech Communication , vol.49 , Issue.4 , pp. 317-330
    • Clark, R.A.J.1    Richmond, K.2    King, S.3
  • 29
    • 34547516258 scopus 로고    scopus 로고
    • Approximating the Kullback-Leibler divergence between Gaussian mixture models
    • John R. Hershey and Peder A. Olsen, "Approximating the Kullback-Leibler divergence between Gaussian mixture models, " in Proc. ICASSP, 2007.
    • (2007) Proc. ICASSP
    • Hershey, J.R.1    Olsen, P.A.2
  • 31
    • 84959135757 scopus 로고    scopus 로고
    • Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features
    • Zhizheng Wu and Simon King, "Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features, " in Proc. Interspeech, 2015.
    • (2015) Proc. Interspeech
    • Wu, Z.1    King, S.2
  • 35
    • 84973403758 scopus 로고    scopus 로고
    • Listening test materials for
    • [dataset] University of Edinburgh, The Centre for Speech Technology Research (CSTR)
    • Thomas Merritt, Robert A. J. Clark, Zhizheng Wu, Junichi Yamagishi, and Simon King, "Listening test materials for "Deep neural network-guided unit selection synthesis", 2016 [dataset], " University of Edinburgh, The Centre for Speech Technology Research (CSTR), doi: 10. 7488/ds/1313.
    • (2016) Deep Neural Network-guided Unit Selection Synthesis
    • Merritt, T.1    Clark, J.R.A.2    Wu, Z.3    Yamagishi, J.4    King, S.5
  • 36
    • 0002609530 scopus 로고    scopus 로고
    • Optimal coupling of diphones
    • Springer
    • Alistair D. Conkie and Stephen Isard, "Optimal coupling of diphones, " in Progress in speech synthesis, pp. 293-304. Springer, 1997.
    • (1997) Progress in Speech Synthesis , pp. 293-304
    • Conkie, A.D.1    Isard, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.