-
1
-
-
67651002140
-
Statistical parametric speech synthesis
-
H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
-
(2009)
Speech Communication
, vol.51
, Issue.11
, pp. 1039-1064
-
-
Zen, H.1
Tokuda, K.2
Black, A.W.3
-
2
-
-
84876687945
-
Speech synthesis based on hidden Markov models
-
May
-
K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models, " Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
-
(2013)
Proceedings of the IEEE
, vol.101
, Issue.5
, pp. 1234-1252
-
-
Tokuda, K.1
Nankaku, Y.2
Toda, T.3
Zen, H.4
Yamagishi, J.5
Oura, K.6
-
3
-
-
84890490547
-
Statistical parametric speech synthesis using deep neural networks
-
May
-
H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. of ICASSP, May 2013, pp. 7962-7966.
-
(2013)
Proc. of ICASSP
, pp. 7962-7966
-
-
Zen, H.1
Senior, A.2
Schuster, M.3
-
4
-
-
85032750981
-
Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
-
May
-
Z.-H. Ling, S.-Y. Kang, H. Zen, A. Senior, M. Schuster, X.-J. Qian, H. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " Signal Processing Magazine, IEEE, vol. 32, no. 3, pp. 35-52, May 2015.
-
(2015)
Signal Processing Magazine, IEEE
, vol.32
, Issue.3
, pp. 35-52
-
-
Ling, Z.-H.1
Kang, S.-Y.2
Zen, H.3
Senior, A.4
Schuster, M.5
Qian, X.-J.6
Meng, H.7
Deng, L.8
-
5
-
-
84910047819
-
TTS synthesis with bidirectional LSTM based recurrent neural networks
-
Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks." in Interspeech, 2014, pp. 1964-1968.
-
(2014)
Interspeech
, pp. 1964-1968
-
-
Fan, Y.1
Qian, Y.2
Xie, F.-L.3
Soong, F.K.4
-
6
-
-
84946045510
-
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
-
IEEE
-
H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4470-4474.
-
(2015)
Acoustics, Speech and Signal Processing (ICASSP 2015 IEEE International Conference on
, pp. 4470-4474
-
-
Zen, H.1
Sak, H.2
-
7
-
-
84874199000
-
Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
-
H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, " in MAVEBA, 2001.
-
(2001)
MAVEBA
-
-
Kawahara, H.1
Estill, J.2
Fujimura, O.3
-
8
-
-
85023745327
-
An autoregressive recurrent mixture density network for parametric speech synthesis
-
IEEE
-
X. Wang, S. Takaki, and J. Yamagishi, "An autoregressive recurrent mixture density network for parametric speech synthesis, " in Proc. of ICASSP. IEEE, 2017, pp. 4895-4899.
-
(2017)
Proc. of ICASSP
, pp. 4895-4899
-
-
Wang, X.1
Takaki, S.2
Yamagishi, J.3
-
9
-
-
85023752230
-
Generative adversarial network-based postfilter for statistical parametric speech synthesis
-
March
-
T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2017, pp. 4910-4914.
-
(2017)
Proc. of ICASSP
, pp. 4910-4914
-
-
Kaneko, T.1
Kameoka, H.2
Hojo, N.3
Ijima, Y.4
Hiramatsu, K.5
Kashino, K.6
-
10
-
-
84946077883
-
Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis
-
April
-
K. Tokuda and H. Zen, "Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis, " in Proc. of ICASSP, April 2015, pp. 4215-4219.
-
(2015)
Proc. of ICASSP
, pp. 4215-4219
-
-
Tokuda, K.1
Zen, H.2
-
11
-
-
84973307947
-
Directly modeling voiced and unvoiced components in speech waveforms by neural networks
-
March
-
-, "Directly modeling voiced and unvoiced components in speech waveforms by neural networks, " in Proc. of ICASSP, March 2016, pp. 5640-5644.
-
(2016)
Proc. of ICASSP
, pp. 5640-5644
-
-
-
12
-
-
84973309345
-
A deep auto-encoder based lowdimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis
-
March
-
S. Takaki and J. Yamagishi, "A deep auto-encoder based lowdimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2016, pp. 5535-5539.
-
(2016)
Proc. of ICASSP
, pp. 5535-5539
-
-
Takaki, S.1
Yamagishi, J.2
-
13
-
-
85039167413
-
Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis
-
submitted
-
S. Takaki, H. Kameoka, and J. Yamagishi, "Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis, " in Interspeech (submitted), 2017.
-
(2017)
Interspeech
-
-
Takaki, S.1
Kameoka, H.2
Yamagishi, J.3
-
14
-
-
85011070895
-
-
Pre-print
-
A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model for raw audio, " Pre-print, 2016, https://arxiv.org/pdf/1609.03499.pdf.
-
(2016)
WaveNet: A Generative Model for Raw Audio
-
-
Van den Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
15
-
-
85088227413
-
SampleRNN: An unconditional end-to-end neural audio generation model
-
submission
-
S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. C. Courville, and Y. Bengio, "SampleRNN: An unconditional end-to-end neural audio generation model, " in ICLR 2017 (submission), 2017. [Online]. Available: http://arxiv.org/abs/1612.07837
-
(2017)
ICLR 2017
-
-
Mehri, S.1
Kumar, K.2
Gulrajani, I.3
Kumar, R.4
Jain, S.5
Sotelo, J.6
Courville, A.C.7
Bengio, Y.8
-
16
-
-
85122685393
-
Char2Waw: End-to-end speech synthesis
-
submission
-
J. Sotelo, S. Mehri, K. Kumar, J. F. Santos, K. Kastner, A. Courville, and Y. Bengio, "Char2Waw: End-to-end speech synthesis, " in ICLR 2017 workshop (submission), 2017, https://openreview.net/pdf?id=B1VWyySKx.
-
(2017)
ICLR 2017 Workshop
-
-
Sotelo, J.1
Mehri, S.2
Kumar, K.3
Santos, J.F.4
Kastner, K.5
Courville, A.6
Bengio, Y.7
-
17
-
-
85039156048
-
Deep voice: Real-time neural text-to-speech
-
submission
-
S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, and M. Shoeybi, "Deep voice: Real-time neural text-to-speech, " in ICML 2017 (submission), 2017, https://arxiv.org/pdf/1702.07825.pdf.
-
(2017)
ICML 2017
-
-
Arik, S.O.1
Chrzanowski, M.2
Coates, A.3
Diamos, G.4
Gibiansky, A.5
Kang, Y.6
Li, X.7
Miller, J.8
Ng, A.9
Raiman, J.10
Sengupta, S.11
Shoeybi, M.12
-
18
-
-
84973293681
-
Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
-
Mar
-
L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network, " in "Proc. of ICASSP", Mar. 2016, pp. 5120-5124.
-
(2016)
Proc. of ICASSP
, pp. 5120-5124
-
-
Juvela, L.1
Bollepalli, B.2
Airaksinen, M.3
Alku, P.4
-
19
-
-
84994338062
-
GlottDNN-a full-band glottal vocoder for statistical parametric speech synthesis
-
M. Airaksinen, B. Bollepalli, L. Juvela, Z. Wu, S. King, and P. Alku, "GlottDNN-a full-band glottal vocoder for statistical parametric speech synthesis, " in Proc. of Interspeech, 2016.
-
(2016)
Proc. of Interspeech
-
-
Airaksinen, M.1
Bollepalli, B.2
Juvela, L.3
Wu, Z.4
King, S.5
Alku, P.6
-
20
-
-
84994228803
-
Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks
-
L. Juvela, X. Wang, S. Takaki, M. Airaksinen, J. Yamagishi, and P. Alku, "Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks, " in "Proc. of Interspeech", Sep. 2016, pp. 2283-2287.
-
(2016)
Proc. of Interspeech
, pp. 2283-2287
-
-
Juvela, L.1
Wang, X.2
Takaki, S.3
Airaksinen, M.4
Yamagishi, J.5
Alku, P.6
-
22
-
-
70450180978
-
Robust LTS rules with the Combilex speech technology lexicon
-
September
-
K. Richmond, R. A. Clark, and S. Fitt, "Robust LTS rules with the Combilex speech technology lexicon, " in Proc. of Interspeech, Brighton, September 2009, pp. 1295-1298.
-
(2009)
Proc. of Interspeech, Brighton
, pp. 1295-1298
-
-
Richmond, K.1
Clark, R.A.2
Fitt, S.3
-
23
-
-
85133720638
-
The HMM-based speech synthesis system version 2.0
-
Bonn, Germany August
-
H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. W. Black, and K. Tokuda, "The HMM-based speech synthesis system version 2.0, " in Proc. of ISCA SSW6, Bonn, Germany, August 2007, pp. 294-299.
-
(2007)
Proc. of ISCA SSW6
, pp. 294-299
-
-
Zen, H.1
Nose, T.2
Yamagishi, J.3
Sako, S.4
Masuko, T.5
Black, A.W.6
Tokuda, K.7
-
24
-
-
84898074254
-
Quasi closed phase glottal inverse filtering analysis with weighted linear prediction
-
March
-
M. Airaksinen, T. Raitio, B. Story, and P. Alku, "Quasi closed phase glottal inverse filtering analysis with weighted linear prediction, " Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 3, pp. 596-607, March 2014.
-
(2014)
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
, vol.22
, Issue.3
, pp. 596-607
-
-
Airaksinen, M.1
Raitio, T.2
Story, B.3
Alku, P.4
-
25
-
-
0001455934
-
A robust algorithm for pitch tracking (RAPT)
-
D. Talkin, "A robust algorithm for pitch tracking (RAPT), " Speech coding and synthesis, vol. 495, p. 518, 1995.
-
(1995)
Speech Coding and Synthesis
, vol.495
, pp. 518
-
-
Talkin, D.1
-
27
-
-
77957744515
-
HMM-based speech synthesis utilizing glottal inverse filtering
-
January
-
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, "HMM-based speech synthesis utilizing glottal inverse filtering, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 153-165, January 2011.
-
(2011)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.19
, Issue.1
, pp. 153-165
-
-
Raitio, T.1
Suni, A.2
Yamagishi, J.3
Pulakka, H.4
Nurminen, J.5
Vainio, M.6
Alku, P.7
-
28
-
-
84865718521
-
Comparison of formant enhancement methods for HMM-based speech synthesis
-
T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Comparison of formant enhancement methods for HMM-based speech synthesis." in SSW, 2010, pp. 334-339.
-
(2010)
SSW
, pp. 334-339
-
-
Raitio, T.1
Suni, A.2
Pulakka, H.3
Vainio, M.4
Alku, P.5
-
29
-
-
0029254163
-
Non-parametric techniques for pitch-scale and time-scale modification of speech
-
E. Moulines and J. Laroche, "Non-parametric techniques for pitch-scale and time-scale modification of speech, " Speech communication, vol. 16, no. 2, pp. 175-205, 1995.
-
(1995)
Speech Communication
, vol.16
, Issue.2
, pp. 175-205
-
-
Moulines, E.1
Laroche, J.2
-
30
-
-
85039167524
-
ETSI TS 126 090: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions
-
"ETSI TS 126 090: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions" , " European Telecommunications Standards Institute, Technical specification, 2016.
-
(2016)
European Telecommunications Standards Institute, Technical Specification
-
-
-
31
-
-
84930639546
-
Introducing CURRENNT: The Munich opensource CUDA recurrent neural network toolkit
-
F. Weninger, "Introducing CURRENNT: The Munich opensource CUDA recurrent neural network toolkit, " Journal of Machine Learning Research, vol. 16, pp. 547-551, 2015.
-
(2015)
Journal of Machine Learning Research
, vol.16
, pp. 547-551
-
-
Weninger, F.1
-
32
-
-
85039147615
-
-
CrowdFlower Inc. (2017) Crowd-sourcing platform. [Online]. Available: https://www.crowdflower.com/
-
(2017)
Crowd-sourcing Platform
-
-
|