-
1
-
-
67651002140
-
Statistical parametric speech synthesis
-
H. Zen, K. Tokuda, A. W. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
-
(2009)
Speech Commun.
, vol.51
, Issue.11
, pp. 1039-1064
-
-
Zen, H.1
Tokuda, K.2
Black, A.W.3
-
2
-
-
84910105608
-
Measuring a decade of progress in text-to-speech
-
S. King, "Measuring a decade of progress in text-to-speech," Loquens, vol. 1, no. 1, 2014.
-
Loquens
, vol.1
, Issue.1
, pp. 2014
-
-
King, S.1
-
3
-
-
0029765811
-
Unit selection in a concatenative speech synthesis system using a large speech database
-
A. J. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, 1996, pp. 373-376.
-
(1996)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, vol.1
, pp. 373-376
-
-
Hunt, A.J.1
Black, A.W.2
-
4
-
-
33846429403
-
Minimum generation error training for HMMbased speech synthesis
-
Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMMbased speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2006, pp. 89-92.
-
(2006)
Proc IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 89-92
-
-
Wu, Y.-J.1
Wang, R.-H.2
-
5
-
-
33749573927
-
Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
-
H. Zen, K. Tokuda, T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Language, vol. 21, no. 1, pp. 153-173, 2007.
-
(2007)
Comput. Speech Language
, vol.21
, Issue.1
, pp. 153-173
-
-
Zen, H.1
Tokuda, K.2
Kitamura, T.3
-
6
-
-
38549096029
-
A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
-
T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inform. Syst., vol. 90, no. 5, pp. 816-824, 2007.
-
(2007)
IEICE Trans. Inform. Syst.
, vol.90
, Issue.5
, pp. 816-824
-
-
Toda, T.1
Tokuda, K.2
-
7
-
-
84905234422
-
A postfilter to modify the modulation spectrum in HMM-based speech synthesis
-
S. Takamichi, T. Toda, G. Neubig, S. Sakti, S. Nakamura, "A postfilter to modify the modulation spectrum in HMM-based speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 290-294.
-
(2014)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 290-294
-
-
Takamichi, S.1
Toda, T.2
Neubig, G.3
Sakti, S.4
Nakamura, S.5
-
8
-
-
84946042252
-
Attributing modelling errors in HMM synthesis by stepping gradually from natural tomodelled speech
-
T. Merritt, J. Latorre, S. King, "Attributing modelling errors in HMM synthesis by stepping gradually from natural tomodelled speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4220-4224.
-
(2015)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 4220-4224
-
-
Merritt, T.1
Latorre, J.2
King, S.3
-
9
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
-
Nov.
-
G. Hinton et al.,"Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
-
(2012)
IEEE Signal Process. Mag.
, vol.29
, Issue.6
, pp. 82-97
-
-
Hinton, G.1
-
10
-
-
85032750981
-
Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
-
May
-
Z.-H. Ling et al., "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends," IEEE Signal Process. Mag., vol. 32, no. 3, pp. 35-52, May 2015.
-
(2015)
IEEE Signal Process. Mag.
, vol.32
, Issue.3
, pp. 35-52
-
-
Ling, Z.-H.1
-
11
-
-
17644409283
-
A high quality text-to-speech system composed of multiple neural networks
-
O. Karaali, G. Corrigan, N. Massey, C. Miller, O. Schnurr, A. Mackie, "A high quality text-to-speech system composed of multiple neural networks," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 2, 1998, pp. 1237-1240.
-
(1998)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, vol.2
, pp. 1237-1240
-
-
Karaali, O.1
Corrigan, G.2
Massey, N.3
Miller, C.4
Schnurr, O.5
MacKie, A.6
-
12
-
-
0027246817
-
Speech synthesiswith artificial neural networks
-
T. Weijters and J. Thole, "Speech synthesiswith artificial neural networks," in Proc. Int. Conf. Neural Netw., 1993, pp. 1764-1769.
-
(1993)
Proc. Int. Conf. Neural Netw.
, pp. 1764-1769
-
-
Weijters, T.1
Thole, J.2
-
14
-
-
84976450820
-
Speech synthesis using artificial neural networks trained on cepstral coefficients
-
C. Tuerk and T. Robinson, "Speech synthesis using artificial neural networks trained on cepstral coefficients." in Proc. Eur. Conf. Speech Commun. Technol., 1993, pp. 4-7.
-
(1993)
Proc. Eur. Conf. Speech Commun. Technol.
, pp. 4-7
-
-
Tuerk, C.1
Robinson, T.2
-
15
-
-
0012235593
-
A neural-network-based model of segmental duration for speech synthesis
-
M. Riedi, "A neural-network-based model of segmental duration for speech synthesis," in Proc. Eur. Conf. Speech Commun. Technol., 1995, pp. 599-602.
-
(1995)
Proc. Eur. Conf. Speech Commun. Technol.
, pp. 599-602
-
-
Riedi, M.1
-
16
-
-
84901237776
-
Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
-
Oct.
-
Z.-H. Ling, L. Deng, D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2129-2139, Oct. 2013.
-
(2013)
IEEE Trans. Audio, Speech, Lang. Process.
, vol.21
, Issue.10
, pp. 2129-2139
-
-
Ling, Z.-H.1
Deng, L.2
Yu, D.3
-
17
-
-
84890527090
-
Multi-distribution deep belief network for speech synthesis
-
S. Kang, X. Qian, H. Meng, "Multi-distribution deep belief network for speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process, 2013, pp. 8012-8016.
-
(2013)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process
, pp. 8012-8016
-
-
Kang, S.1
Qian, X.2
Meng, H.3
-
18
-
-
84910030421
-
Statistical parametric speech synthesis using weighted multi-distribution deep belief network
-
S. Kang and H. Meng, "Statistical parametric speech synthesis using weighted multi-distribution deep belief network," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2014, pp. 1959-1963.
-
(2014)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 1959-1963
-
-
Kang, S.1
Meng, H.2
-
19
-
-
84905262874
-
Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
-
H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 3844-3848.
-
(2014)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 3844-3848
-
-
Zen, H.1
Senior, A.2
-
20
-
-
84946036894
-
Modelling acoustic feature dependencies with artificial neural networks: Trajectoryrnade
-
B. Uria, I. Murray, S. Renals, C. Valentini, "Modelling acoustic feature dependencies with artificial neural networks: Trajectoryrnade," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4465-4469.
-
(2015)
Proc IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 4465-4469
-
-
Uria, B.1
Murray, I.2
Renals, S.3
Valentini, C.4
-
21
-
-
84942607168
-
A deep generative architecture for postfiltering in statistical parametric speech synthesis
-
Nov.
-
L.-H. Chen, T. Raitio,C. Valentini-Botinhao, Z.-H. Ling, J. Yamagishi, "A deep generative architecture for postfiltering in statistical parametric speech synthesis," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 11, pp. 2003-2014, Nov. 2015.
-
(2015)
IEEE/ACM Trans. Audio, Speech, Lang. Process.
, vol.23
, Issue.11
, pp. 2003-2014
-
-
Chen, L.-H.1
Raitio, T.2
Valentini-Botinhao, C.3
Ling, Z.-H.4
Yamagishi, J.5
-
22
-
-
84890490547
-
Statistical parametric speech synthesis using deep neural networks
-
H. Zen, A. Senior, M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 7962-7966.
-
(2013)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 7962-7966
-
-
Zen, H.1
Senior, A.2
Schuster, M.3
-
23
-
-
84929157442
-
Combining a vector space representation of linguistic contextwith a deep neural network for text-to-speech synthesis
-
H. Lu, S. King, O. Watts, "Combining a vector space representation of linguistic contextwith a deep neural network for text-to-speech synthesis," in Proc. 8th Int. Speech Commun. Assoc. Speech Synthesis Workshop, 2013, pp. 281-285.
-
(2013)
Proc. 8th Int. Speech Commun. Assoc. Speech Synthesis Workshop
, pp. 281-285
-
-
Lu, H.1
King, S.2
Watts, O.3
-
24
-
-
84905251808
-
On the training aspects of deep neural network (DNN) for parametric TTS synthesis
-
Y. Qian, Y. Fan,W. Hu, F. K. Soong, "On the training aspects of deep neural network (DNN) for parametric TTS synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 3829-3833.
-
(2014)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 3829-3833
-
-
Qian, Y.1
Fan, Y.2
Hu, W.3
Soong, F.K.4
-
25
-
-
84946033275
-
Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
-
Z. Wu, C. Valentini-Botinhao, O. Watts, S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4460-4464.
-
(2015)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 4460-4464
-
-
Wu, Z.1
Valentini-Botinhao, C.2
Watts, O.3
King, S.4
-
26
-
-
84973359646
-
From HMMs to DNNs:Where do the improvements come from?
-
O. Watts, G. E. Henter, T. Merritt, Z. Wu, S. King, "From HMMs to DNNs:Where do the improvements come from?" in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2016.
-
(2016)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
-
-
Watts, O.1
Henter, G.E.2
Merritt, T.3
Wu, Z.4
King, S.5
-
27
-
-
84978101616
-
A reading list of recent advances in speech synthesis
-
S. King, "A reading list of recent advances in speech synthesis," in Proc. 18th Int. Congr. Phon. Sci., 2015.
-
(2015)
Proc. 18th Int. Congr. Phon. Sci.
-
-
King, S.1
-
28
-
-
84959108894
-
Towards minimum perceptual error training for DNN-based speech synthesis
-
C. Valentini-Botinhao, Z.Wu, S. King, "Towards minimum perceptual error training for DNN-based speech synthesis," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 869-873.
-
(2015)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 869-873
-
-
Valentini-Botinhao, C.1
Wu, Z.2
King, S.3
-
29
-
-
84959144342
-
Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning
-
Q. Hu, Z. Wu, K. Richmond, J. Yamagishi, Y. Stylianou, R. Maia, "Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 854-858.
-
(2015)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 854-858
-
-
Hu, Q.1
Wu, Z.2
Richmond, K.3
Yamagishi, J.4
Stylianou, Y.5
Maia, R.6
-
30
-
-
84910047819
-
TTS synthesis with bidirectional LSTM based recurrent neural networks
-
Y. Fan, Y. Qian, F. Xie, F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2014, pp. 1964-1968.
-
(2014)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 1964-1968
-
-
Fan, Y.1
Qian, Y.2
Xie, F.3
Soong, F.K.4
-
31
-
-
84946045510
-
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
-
H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4470-4474.
-
(2015)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 4470-4474
-
-
Zen, H.1
Sak, H.2
-
33
-
-
84966348891
-
An HMM-based speech synthesis system applied to english
-
K. Tokuda, H. Zen, A. W. Black, "An HMM-based speech synthesis system applied to english," in Proc. IEEE Workshop Speech Synthesis, 2002, pp. 227-230.
-
(2002)
Proc. IEEE Workshop Speech Synthesis
, pp. 227-230
-
-
Tokuda, K.1
Zen, H.2
Black, A.W.3
-
34
-
-
84959135757
-
Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features
-
Z. Wu and S. King, "Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 309-313.
-
(2015)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 309-313
-
-
Wu, Z.1
King, S.2
-
35
-
-
84959172579
-
Sequence generation error (SGE) minimization based deep neural networks training for text-tospeech synthesis
-
Y. Fan, Y. Qian, F. K. Soong, L. He, "Sequence generation error (SGE) minimization based deep neural networks training for text-tospeech synthesis," in Proc. Annu. Conf. Int. SpeechCommun. Assoc., 2015, pp. 864-868.
-
(2015)
Proc. Annu. Conf. Int. SpeechCommun. Assoc.
, pp. 864-868
-
-
Fan, Y.1
Qian, Y.2
Soong, F.K.3
He, L.4
-
36
-
-
84910087395
-
Sequence error (SE) minimization training of neural network for voice conversion
-
F.-L. Xie, Y. Qian, Y. Fan, F. K. Soong, H. Li, "Sequence error (SE) minimization training of neural network for voice conversion," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2014, pp. 2283-2287.
-
(2014)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 2283-2287
-
-
Xie, F.-L.1
Qian, Y.2
Fan, Y.3
Soong, F.K.4
Li, H.5
-
37
-
-
0033708106
-
Speech parameter generation algorithms for HMM-based speech synthesis
-
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2000, vol. 3, pp. 1315-1318.
-
(2000)
Proc IEEE Int. Conf. Acoust., Speech, Signal Process.
, vol.3
, pp. 1315-1318
-
-
Tokuda, K.1
Yoshimura, T.2
Masuko, T.3
Kobayashi, T.4
Kitamura, T.5
-
38
-
-
0022471098
-
Learning representations by back-propagating errors
-
D. E. Rumelhart, G. E. Hinton, R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986.
-
(1986)
Nature
, vol.323
, Issue.6088
, pp. 533-536
-
-
Rumelhart, D.E.1
Hinton, G.E.2
Williams, R.J.3
-
39
-
-
78650474133
-
-
Univ. Toronto, Toronto, ON, Canada, Tech. Rep. UTML TR 2010-003
-
G. Hinton, "A practical guide to training restricted Boltzmann machines," Univ. Toronto, Toronto, ON, Canada, Tech. Rep. UTML TR 2010-003, 2010.
-
(2010)
A Practical Guide to Training Restricted Boltzmann Machines
-
-
Hinton, G.1
-
40
-
-
79959858685
-
A perceptual study of acceleration parameters in HMM-based TTS
-
Y. Chen, Z.-J. Yan, F. K. Soong, "A perceptual study of acceleration parameters in HMM-based TTS," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2010, pp. 426-429.
-
(2010)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 426-429
-
-
Chen, Y.1
Yan, Z.-J.2
Soong, F.K.3
-
41
-
-
84946074523
-
The effect of neural networks in statistical parametric speech synthesis
-
K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, "The effect of neural networks in statistical parametric speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process, 2015, pp. 4455-4459.
-
(2015)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process
, pp. 4455-4459
-
-
Hashimoto, K.1
Oura, K.2
Nankaku, Y.3
Tokuda, K.4
-
42
-
-
51449103447
-
Optimizing bottle-neck features for LVCSR
-
F. Grézl and P. Fousek, "Optimizing bottle-neck features for LVCSR," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2008, pp. 4729-4732.
-
(2008)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 4729-4732
-
-
Grézl, F.1
Fousek, P.2
-
43
-
-
84865785753
-
Improved bottleneck features using pretrained deep neural networks
-
D. Yu and M. L. Seltzer, "Improved bottleneck features using pretrained deep neural networks," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2011, pp. 237-240.
-
(2011)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 237-240
-
-
Yu, D.1
Seltzer, M.L.2
-
44
-
-
84867593213
-
Auto-encoder bottleneck features using deep belief networks
-
T. N. Sainath, B. Kingsbury, B. Ramabhadran, "Auto-encoder bottleneck features using deep belief networks," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2012, pp. 4153-4156.
-
(2012)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
, pp. 4153-4156
-
-
Sainath, T.N.1
Kingsbury, B.2
Ramabhadran, B.3
-
45
-
-
0032673049
-
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
-
H. Kawahara, I. Masuda-Katsuse, A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3, pp. 187-207, 1999.
-
(1999)
Speech Commun.
, vol.27
, Issue.3
, pp. 187-207
-
-
Kawahara, H.1
Masuda-Katsuse, I.2
De Cheveigné, A.3
-
46
-
-
84894152556
-
The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
-
C. Veaux, J. Yamagishi, S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database," in Proc. Int. Conf. Oriental COCOSDA, 2013, pp. 1-4.
-
(2013)
Proc. Int. Conf. Oriental COCOSDA
, pp. 1-4
-
-
Veaux, C.1
Yamagishi, J.2
King, S.3
-
47
-
-
0012330750
-
The design for the Wall Street journalbased CSR corpus
-
D. B. Paul and J. M. Baker, "The design for the Wall Street journalbased CSR corpus," in Proc. Workshop Speech Natural Lang., 1992, pp. 357-362.
-
(1992)
Proc. Workshop Speech Natural Lang.
, pp. 357-362
-
-
Paul, D.B.1
Baker, J.M.2
-
48
-
-
84959122693
-
Deep neural network context embeddings for model selection in rich-context HMM synthesis
-
T. Merritt, J. Yamagishi, Z. Wu, O. Watts, S. King, "Deep neural network context embeddings for model selection in rich-context HMM synthesis," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 2207-2211.
-
(2015)
Proc. Annu. Conf. Int. Speech Commun. Assoc.
, pp. 2207-2211
-
-
Merritt, T.1
Yamagishi, J.2
Wu, Z.3
Watts, O.4
King, S.5
-
49
-
-
84973402504
-
Deep neural network-guided unit selection synthesis
-
T. Merritt, R. A. Clark, Z. Wu, J. Yamagishi, S. King, "Deep neural network-guided unit selection synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2016.
-
(2016)
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
-
-
Merritt, T.1
Clark, R.A.2
Wu, Z.3
Yamagishi, J.4
King, S.5
|