-
1
-
-
85009139544
-
Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
-
Sep.
-
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” in Proc. EUROSPEECH-99, Sep. 1999, pp. 2374–2350.
-
(1999)
Proc. EUROSPEECH-99
, pp. 2374
-
-
Yoshimura, T.1
Tokuda, K.2
Masuko, T.3
Kobayashi, T.4
Kitamura, T.5
-
2
-
-
7044242284
-
Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
-
in Japanese, Nov.
-
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” (in Japanese) IEICE Trans., vol. J83-D-II, no. 11, pp. 2099–2107, Nov. 2000.
-
(2000)
IEICE Trans.
, vol.J83-D-II
, Issue.11
, pp. 2099-2107
-
-
Yoshimura, T.1
Tokuda, K.2
Masuko, T.3
Kobayashi, T.4
Kitamura, T.5
-
3
-
-
34547526960
-
Statistical parametric speech synthesis
-
Apr.
-
A. Black, H. Zen, and K. Tokuda, “Statistical parametric speech synthesis,” in Proc. ICASSP 2007, Apr. 2007, pp. 1229–1232.
-
(2007)
Proc. ICASSP 2007
, pp. 1229-1232
-
-
Black, A.1
Zen, H.2
Tokuda, K.3
-
4
-
-
79952258981
-
-
[Online]. Available: http://www.hts.sp.nitech.ac.jp/
-
K. Tokuda, H. Zen, J. Yamagishi, T. Masuko, S. Sako, A. Black, and T. Nose, The HMM-Based Speech Synthesis System (HTS) Version 2.0.1 [Online]. Available: http://www.hts.sp.nitech.ac.jp/
-
The HMM-Based Speech Synthesis System (HTS) Version 2.0.1
-
-
Tokuda, K.1
Zen, H.2
Yamagishi, J.3
Masuko, T.4
Sako, S.5
Black, A.6
Nose, T.7
-
5
-
-
0028996993
-
Speech parameter generation from HMM using dynamic features
-
May
-
K. Tokuda, T. Kobayashi, and S. Imai, “Speech parameter generation from HMM using dynamic features,” in Proc. ICASSP-95, May 1995, pp. 660–663.
-
(1995)
Proc. ICASSP-95
, pp. 660-663
-
-
Tokuda, K.1
Kobayashi, T.2
Imai, S.3
-
6
-
-
0038582234
-
An algorithm for speech parameter generation from HMM using dynamic features
-
in Japanese, Mar.
-
K. Tokuda, T. Masuko, T. Kobayashi, and S. Imai “An algorithm for speech parameter generation from HMM using dynamic features,” (in Japanese) J. Acoust. Soc. Japan, vol. 53, no. 3, pp. 192–200, Mar. 1997.
-
(1997)
J. Acoust. Soc. Japan
, vol.53
, Issue.3
, pp. 192-200
-
-
Tokuda, K.1
Masuko, T.2
Kobayashi, T.3
Imai, S.4
-
7
-
-
0029725605
-
Speech synthesis using HMMs with dynamic features
-
May
-
T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, “Speech synthesis using HMMs with dynamic features,” in Proc. ICASSP-96, May 1996, pp. 389–392.
-
(1996)
Proc. ICASSP-96
, pp. 389-392
-
-
Masuko, T.1
Tokuda, K.2
Kobayashi, T.3
Imai, S.4
-
8
-
-
0002025578
-
HMM-based speech synthesis using dynamic features
-
in Japanese, Dec.
-
T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai “HMM-based speech synthesis using dynamic features,” (in Japanese) IEICE Trans., vol. J79-D-II, no. 12, pp. 2184–2190, Dec. 1996.
-
(1996)
IEICE Trans.
, vol.J79-D-II
, Issue.12
, pp. 2184-2190
-
-
Masuko, T.1
Tokuda, K.2
Kobayashi, T.3
Imai, S.4
-
9
-
-
0033708106
-
Speech parameter generation algorigthms for HMM-based speech synthesis
-
Jun.
-
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorigthms for HMM-based speech synthesis,” in Proc. ICASSP 2000, Jun. 2000, pp. 1315–1318.
-
(2000)
Proc. ICASSP 2000
, pp. 1315-1318
-
-
Tokuda, K.1
Yoshimura, T.2
Masuko, T.3
Kobayashi, T.4
Kitamura, T.5
-
10
-
-
0036522887
-
Multi-space probability distribution HMM
-
Mar.
-
K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Multi-space probability distribution HMM,” IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455–464, Mar. 2002.
-
(2002)
IEICE Trans. Inf. Syst.
, vol.E85-D
, Issue.3
, pp. 455-464
-
-
Tokuda, K.1
Masuko, T.2
Miyazaki, N.3
Kobayashi, T.4
-
11
-
-
44449177634
-
A hidden semi-Markov model-based speech synthesis system
-
May
-
H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “A hidden semi-Markov model-based speech synthesis system,” IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825–834, May 2007.
-
(2007)
IEICE Trans. Inf. Syst.
, vol.E90-D
, Issue.5
, pp. 825-834
-
-
Zen, H.1
Tokuda, K.2
Masuko, T.3
Kobayashi, T.4
Kitamura, T.5
-
13
-
-
0022234383
-
Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition
-
Mar.
-
M. Russell and R. Moore, “Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition,” in Proc. ICASSP-85, Mar. 1985, pp. 5–8.
-
(1985)
Proc. ICASSP-85
, pp. 5-8
-
-
Russell, M.1
Moore, R.2
-
14
-
-
0022685753
-
Continuously variable duration hidden Markov models for automatic speech recognition
-
S. Levinson, “Continuously variable duration hidden Markov models for automatic speech recognition,” Comput. Speech Lang., vol. 1, no. 1, pp. 29–45, 1986.
-
(1986)
Comput. Speech Lang.
, vol.1
, Issue.1
, pp. 29-45
-
-
Levinson, S.1
-
15
-
-
0029341719
-
A mixed excitation LPC vocoder model for low bit rate speech coding
-
Jul.
-
A. McCree and T. Barnwell, III “A mixed excitation LPC vocoder model for low bit rate speech coding,” IEEE Trans. Speech Audio Process., vol. 3, no. 4, pp. 242–250, Jul. 1995.
-
(1995)
IEEE Trans. Speech Audio Process.
, vol.3
, Issue.4
, pp. 242-250
-
-
McCree, A.1
Barnwell, T.2
-
16
-
-
84874199000
-
Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
-
Sep.
-
H. Kawahara, J. Estill, and O. Fujimura, “Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT,” in Proc. 2nd MAVEBA, Sep. 2001, pp. 13–15.
-
(2001)
Proc. 2nd MAVEBA
, pp. 13-15
-
-
Kawahara, H.1
Estill, J.2
Fujimura, O.3
-
17
-
-
0024060644
-
Multiband excitation vocoder
-
Aug.
-
D. W. Griffin and J. S. Lim “Multiband excitation vocoder,” IEEE Trans. Acoust., Speech, Signal Audio Process., vol. 36, no. 8, pp. 1223–1235, Aug. 1988.
-
(1988)
IEEE Trans. Acoust., Speech, Signal Audio Process.
, vol.36
, Issue.8
, pp. 1223-1235
-
-
Griffin, D.W.1
Lim, J.S.2
-
18
-
-
85009097254
-
Mixed excitation for HMM-based speech synthesis
-
Sep.
-
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Mixed excitation for HMM-based speech synthesis,” in Proc. Eurospeech'01, Sep. 2001, 22632266.
-
(2001)
Proc. Eurospeech'01
-
-
Yoshimura, T.1
Tokuda, K.2
Masuko, T.3
Kobayashi, T.4
Kitamura, T.5
-
19
-
-
78049361102
-
Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
-
in Japanese, Aug.
-
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis,” (in Japanese) IEICE Trans., vol. J87-D-II, no. 8, pp. 1565–1571, Aug. 2004.
-
(2004)
IEICE Trans.
, vol.J87-D-II
, Issue.8
, pp. 1565-1571
-
-
Yoshimura, T.1
Tokuda, K.2
Masuko, T.3
Kobayashi, T.4
Kitamura, T.5
-
20
-
-
33846405723
-
Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
-
Jan.
-
H. Zen, T. Toda, M. Nakamura, and K. Tokuda, “Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005,” IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325–333, Jan. 2007.
-
(2007)
IEICE Trans. Inf. Syst.
, vol.E90-D
, Issue.1
, pp. 325-333
-
-
Zen, H.1
Toda, T.2
Nakamura, M.3
Tokuda, K.4
-
21
-
-
34547542349
-
Improving Arabic HMM based speech synthesis quality
-
Sep.
-
A.-H. Ossama, A. S. Mahdy, and R. Mohsen, “Improving Arabic HMM based speech synthesis quality,” in Proc. Interspeech 2006, Sep. 2006, pp. 1332–1335.
-
(2006)
Proc. Interspeech 2006
, pp. 1332-1335
-
-
Ossama, A.-H.1
Mahdy, A.S.2
Mohsen, R.3
-
22
-
-
38549096029
-
A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
-
May
-
T. Toda and K. Tokuda, “A speech parameter generation algorithm considering global variance for HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816–824, May 2007.
-
(2007)
IEICE Trans. Inf. Syst.
, vol.E90-D
, Issue.5
, pp. 816-824
-
-
Toda, T.1
Tokuda, K.2
-
23
-
-
68249104241
-
The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006
-
Jun.
-
H. Zen, T. Toda, and K. Tokuda, “The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006,” IEICE Trans. Inf. Syst., vol. E91-D, no. 6, pp. 1764–1773, Jun. 2008.
-
(2008)
IEICE Trans. Inf. Syst.
, vol.E91-D
, Issue.6
, pp. 1764-1773
-
-
Zen, H.1
Toda, T.2
Tokuda, K.3
-
24
-
-
67650851754
-
USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method
-
Sep.
-
Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, “USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method,” in Proc. Blizzard Challenge 2006, Sep. 2006.
-
(2006)
Proc. Blizzard Challenge 2006
-
-
Ling, Z.-H.1
Wu, Y.-J.2
Wang, Y.-P.3
Qin, L.4
Wang, R.-H.5
-
25
-
-
77953693469
-
Speaker-independent HMM-based speech synthesis system—HTS-2007 system for the Blizzard Challenge 2007
-
Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_008.html, paper 003
-
J. Yamagishi, H. Zen, T. Toda, and K. Tokuda, “Speaker-independent HMM-based speech synthesis system—HTS-2007 system for the Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_008.html, paper 003.
-
(2007)
Proc. BLZ3-2007 (in Proc. SSW6)
-
-
Yamagishi, J.1
Zen, H.2
Toda, T.3
Tokuda, K.4
-
26
-
-
33745216749
-
The Blizzard Challenge—2005: Evaluating corpus-based speech synthesis on common datasets
-
Sep.
-
A. Black and K. Tokuda, “The Blizzard Challenge—2005: Evaluating corpus-based speech synthesis on common datasets,” in Proc. Eurospeech 2005, Sep. 2005, pp. 77–80.
-
(2005)
Proc. Eurospeech 2005
, pp. 77-80
-
-
Black, A.1
Tokuda, K.2
-
27
-
-
68249083782
-
The blizzard challenge 2006
-
Sep., [Online]. Available: http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf
-
C. Bennett and A. Black, “The blizzard challenge 2006,” in Proc. Blizzard Challenge 2006, Sep. 2006 [Online]. Available: http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf
-
(2006)
Proc. Blizzard Challenge 2006
-
-
Bennett, C.1
Black, A.2
-
28
-
-
79952269421
-
The Blizzard Challenge 2007
-
Aug., [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_001.html, paper 001
-
M. Fraser and S. King, “The Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_001.html, paper 001.
-
(2007)
Proc. BLZ3-2007 (in Proc. SSW6)
-
-
Fraser, M.1
King, S.2
-
29
-
-
0032673049
-
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
-
H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,” Speech Commun., vol. 27, pp. 187–207, 1999.
-
(1999)
Speech Commun.
, vol.27
, pp. 187-207
-
-
Kawahara, H.1
Masuda-Katsuse, I.2
Cheveigne, A.3
-
30
-
-
0032638856
-
Semi-tied covariance matrices for hidden Markov models
-
Mar.
-
M. Gales “Semi-tied covariance matrices for hidden Markov models,” IEEE Trans. Speech Audio Process., vol. 7, pp. 272–281, Mar. 1999.
-
(1999)
IEEE Trans. Speech Audio Process.
, vol.7
, pp. 272-281
-
-
Gales, M.1
-
31
-
-
84892187452
-
Maximum likelihood modeling with Gaussian distributions for classfication
-
May
-
R. Gopinath, “Maximum likelihood modeling with Gaussian distributions for classfication,” in Proc. ICASSP-98, May 1998, pp. 661–664.
-
(1998)
Proc. ICASSP-98
, pp. 661-664
-
-
Gopinath, R.1
-
32
-
-
33847129573
-
Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
-
Feb.
-
J. Yamagishi and T. Kobayashi, “Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training,” IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533–543, Feb. 2007.
-
(2007)
IEICE Trans. Inf. Syst.
, vol.E90-D
, Issue.2
, pp. 533-543
-
-
Yamagishi, J.1
Kobayashi, T.2
-
33
-
-
67650854725
-
Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
-
Jan. 2009
-
J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm,” IEEE Trans. Speech, Audio, Lang. Process., vol. 17, no. 1, pp. 66–83, Jan. 2009, 2007.
-
(2007)
IEEE Trans. Speech, Audio, Lang. Process
, vol.17
, Issue.1
, pp. 66-83
-
-
Yamagishi, J.1
Kobayashi, T.2
Nakano, Y.3
Ogata, K.4
Isogai, J.5
-
34
-
-
0007985533
-
Speaker adaptation for HMM-based speech synthesis system using MLLR
-
Nov.
-
M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Speaker adaptation for HMM-based speech synthesis system using MLLR,” in Proc. 3rd ESCA/COCOSDA Workshop Speech Synth., Nov. 1998, pp. 273–276.
-
(1998)
Proc. 3rd ESCA/COCOSDA Workshop Speech Synth.
, pp. 273-276
-
-
Tamura, M.1
Masuko, T.2
Tokuda, K.3
Kobayashi, T.4
-
35
-
-
0029288633
-
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
-
C. Leggetter and P. Woodland “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Comput. Speech Lang., vol. 9, no. 2, pp. 171–185, 1995.
-
(1995)
Comput. Speech Lang.
, vol.9
, Issue.2
, pp. 171-185
-
-
Leggetter, C.1
Woodland, P.2
-
36
-
-
0034842740
-
Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
-
May
-
M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR,” in Proc. ICASSP-01, May 2001, pp. 805–808.
-
(2001)
Proc. ICASSP-01
, pp. 805-808
-
-
Tamura, M.1
Masuko, T.2
Tokuda, K.3
Kobayashi, T.4
-
37
-
-
85008066911
-
Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
-
in Japanese, Apr.
-
M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Speaker adaptation of pitch and spectrum for HMM-based speech synthesis,” (in Japanese) IEICE Trans., vol. J85-D-II, no. 4, pp. 545–553, Apr. 2002.
-
(2002)
IEICE Trans.
, vol.J85-D-II
, Issue.4
, pp. 545-553
-
-
Tamura, M.1
Masuko, T.2
Tokuda, K.3
Kobayashi, T.4
-
38
-
-
0030362995
-
A compact model for speaker-adaptive training
-
Oct.
-
T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, “A compact model for speaker-adaptive training,” in Proc. ICSLP-96, Oct. 1996, pp. 1137–1140.
-
(1996)
Proc. ICSLP-96
, pp. 1137-1140
-
-
Anastasakos, T.1
McDonough, J.2
Schwartz, R.3
Makhoul, J.4
-
39
-
-
0142007308
-
A training method of average voice model for HMM-based speech synthesis
-
Aug.
-
J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “A training method of average voice model for HMM-based speech synthesis,” IEICE Trans. Fundamentals, vol. E86-A, no. 8, pp. 1956–1963, Aug. 2003.
-
(2003)
IEICE Trans. Fundamentals
, vol.E86-A
, Issue.8
, pp. 1956-1963
-
-
Yamagishi, J.1
Tamura, M.2
Masuko, T.3
Tokuda, K.4
Kobayashi, T.5
-
40
-
-
33645768204
-
A style adaptation technique for speech synthesis using HSMM and suprasegmental features
-
Mar.
-
M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, “A style adaptation technique for speech synthesis using HSMM and suprasegmental features,” IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1092–1099, Mar. 2006.
-
(2006)
IEICE Trans. Inf. Syst.
, vol.E89-D
, Issue.3
, pp. 1092-1099
-
-
Tachibana, M.1
Yamagishi, J.2
Masuko, T.3
Kobayashi, T.4
-
41
-
-
70350485779
-
HMM-based emotional speech synthesis using average emotion model
-
Dec.
-
L. Qin, Z. Ling, Y. Wu, B. Zhang, and R. Wang, “HMM-based emotional speech synthesis using average emotion model,” in Proc. ISCSLP-06 (Springer LNAI Book), Dec. 2006, pp. 233–240.
-
(2006)
Proc. ISCSLP-06 (Springer LNAI Book)
, pp. 233-240
-
-
Qin, L.1
Ling, Z.2
Wu, Y.3
Zhang, B.4
Wang, R.5
-
42
-
-
33748468338
-
New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer
-
J. Latorre, K. Iwano, and S. Furui, “New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer,” Speech Commun., vol. 48, no. 10, pp. 1227–1242, 2006.
-
(2006)
Speech Commun.
, vol.48
, Issue.10
, pp. 1227-1242
-
-
Latorre, J.1
Iwano, K.2
Furui, S.3
-
43
-
-
0030189744
-
Speaker adaptation using combined transformation and Bayesian methods
-
Jul.
-
V. Digalakis and L. Neumeyer “Speaker adaptation using combined transformation and Bayesian methods,” IEEE Trans. Speech Audio Process., vol. 4, pp. 294–300, Jul. 1996.
-
(1996)
IEEE Trans. Speech Audio Process.
, vol.4
, pp. 294-300
-
-
Digalakis, V.1
Neumeyer, L.2
-
44
-
-
0035279111
-
A structural Bayes approach to speaker adaptation
-
Mar.
-
K. Shinoda and C. Lee, “A structural Bayes approach to speaker adaptation,” IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 276–287, Mar. 2001.
-
(2001)
IEEE Trans. Speech Audio Process.
, vol.9
, Issue.3
, pp. 276-287
-
-
Shinoda, K.1
Lee, C.2
-
45
-
-
11144317887
-
Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency
-
Dec.
-
D. Arifianto, T. Tanaka, T. Masuko, and T. Kobayashi, “Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency,” IEICE Trans. Inf. Syst., vol. E87-D, no. 12, pp. 2812–2820, Dec. 2004.
-
(2004)
IEICE Trans. Inf. Syst.
, vol.E87-D
, Issue.12
, pp. 2812-2820
-
-
Arifianto, D.1
Tanaka, T.2
Masuko, T.3
Kobayashi, T.4
-
46
-
-
84928118106
-
Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity
-
Sep.
-
H. Kawahara, H. Katayose, A. Cheveigne, and R. Patterson, “Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity,” in Proc. Eurospeech 1999, Sep. 1999, pp. 2781–2784.
-
(1999)
Proc. Eurospeech 1999
, pp. 2781-2784
-
-
Kawahara, H.1
Katayose, H.2
Cheveigne, A.3
Patterson, R.4
-
47
-
-
0001455934
-
A robust algorithm for pitch tracking (RAPT)
-
W. Kleijn and K. Paliwal, Eds. New York: Elsevier
-
D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Eds. New York: Elsevier, 1995, pp. 495–518.
-
(1995)
Speech Coding and Synthesis
, pp. 495-518
-
-
Talkin, D.1
-
49
-
-
84966348891
-
An HMM-based speech synthesis system applied to English
-
Sep.
-
K. Tokuda, H. Zen, and A. Black, “An HMM-based speech synthesis system applied to English,” in Proc. IEEE Speech Synth. Workshop, Sep. 2002, pp. 227–230.
-
(2002)
Proc. IEEE Speech Synth. Workshop
, pp. 227-230
-
-
Tokuda, K.1
Zen, H.2
Black, A.3
-
50
-
-
0002985991
-
Mora and syllable
-
N. Tsujimura, Ed. Chichester, U.K.: Blackwell
-
H. Kubozono, “Mora and syllable,” in The handbook of Japanese Linguistics, N. Tsujimura, Ed. Chichester, U.K.: Blackwell, 1995, pp. 31–61.
-
(1995)
The handbook of Japanese Linguistics
, pp. 31-61
-
-
Kubozono, H.1
-
51
-
-
0032050110
-
Maximum likelihood linear transformations for HMM-based speech recognition
-
M. Gales “Maximum likelihood linear transformations for HMM-based speech recognition,” Comput. Speech Lang., vol. 12, no. 2, pp. 75–98, 1998.
-
(1998)
Comput. Speech Lang.
, vol.12
, Issue.2
, pp. 75-98
-
-
Gales, M.1
-
52
-
-
0029375590
-
Speaker adaptation using constrained reestimation of Gaussian mixtures
-
Sep.
-
V. Digalakis, D. Rtischev, and L. Neumeyer “Speaker adaptation using constrained reestimation of Gaussian mixtures,” IEEE Trans. Speech Audio Process., vol. 3, no. 5, pp. 357–366, Sep. 1995.
-
(1995)
IEEE Trans. Speech Audio Process.
, vol.3
, Issue.5
, pp. 357-366
-
-
Digalakis, V.1
Rtischev, D.2
Neumeyer, L.3
-
53
-
-
85008042245
-
Maximum likelihood from incomplete data via the EM algorithm
-
A. Dempster, N. Laird, and D. Rubin “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc., Series B, vol. 39, no. 1, pp. 1–38, 1977.
-
(1977)
J. R. Statist. Soc., Series B
, vol.39
, Issue.1
, pp. 1-38
-
-
Dempster, A.1
Laird, N.2
Rubin, D.3
-
54
-
-
24144497811
-
Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis
-
Mar.
-
J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, “Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E88-D, no. 3, pp. 503–509, Mar. 2005.
-
(2005)
IEICE Trans. Inf. Syst.
, vol.E88-D
, Issue.3
, pp. 503-509
-
-
Yamagishi, J.1
Onishi, K.2
Masuko, T.3
Kobayashi, T.4
-
55
-
-
0033906251
-
MDL-based context-dependent subword modeling for speech recognition
-
Mar.
-
K. Shinoda and T. Watanabe, “MDL-based context-dependent subword modeling for speech recognition,” J. Acoust. Soc. Japan (E), vol. 21, pp. 79–86, Mar. 2000.
-
(2000)
J. Acoust. Soc. Japan (E)
, vol.21
, pp. 79-86
-
-
Shinoda, K.1
Watanabe, T.2
-
56
-
-
0025543906
-
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
-
E. Moulines and F. Charpentier “Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Commun., vol. 9, no. 5–6, pp. 453–468, 1990.
-
(1990)
Speech Commun.
, vol.9
, Issue.5-6
, pp. 453-468
-
-
Moulines, E.1
Charpentier, F.2
-
57
-
-
44949143155
-
Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
-
Sep.
-
Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, “Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation,” in Proc. Interspeech 2006, Sep. 2006, pp. 2266–2269.
-
(2006)
Proc. Interspeech 2006
, pp. 2266-2269
-
-
Ohtani, Y.1
Toda, T.2
Saruwatari, H.3
Shikano, K.4
-
58
-
-
84959174906
-
HMM-based synthesis of child speech
-
Oct.
-
O. Watts, J. Yamagishi, K. Berkling, and S. King, “HMM-based synthesis of child speech,” in Proc. 1st Workshop Child, Comput., Interaction (ICMI'08 Post-Conf. Workshop), Oct. 2008.
-
(2008)
Proc. 1st Workshop Child, Comput., Interaction (ICMI'08 Post-Conf. Workshop)
-
-
Watts, O.1
Yamagishi, J.2
Berkling, K.3
King, S.4
-
59
-
-
34547529978
-
Model adaptation approach to speech synthesis with diverse voices and styles
-
Apr.
-
J. Yamagishi, T. Kobayashi, M. Tachibana, K. Ogata, and Y. Nakano, “Model adaptation approach to speech synthesis with diverse voices and styles,” in Proc. ICASSP-07, Apr. 2007, pp. 1233–1236.
-
(2007)
Proc. ICASSP-07
, pp. 1233-1236
-
-
Yamagishi, J.1
Kobayashi, T.2
Tachibana, M.3
Ogata, K.4
Nakano, Y.5
-
60
-
-
85008037473
-
ATRECSS—ATR English speech corpus for speech synthesis
-
Aug.
-
J. Ni, T. Hirai, H. Kawai, T. Toda, K. Tokuda, M. Tsuzaki, S. Sakai, R. Maia, and S. Nakamura, “ATRECSS—ATR English speech corpus for speech synthesis,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007.
-
(2007)
Proc. BLZ3-2007 (in Proc. SSW6)
-
-
Ni, J.1
Hirai, T.2
Kawai, H.3
Toda, T.4
Tokuda, K.5
Tsuzaki, M.6
Sakai, S.7
Maia, R.8
Nakamura, S.9
-
62
-
-
0037278070
-
An efficient forward-backward algorithm for an explicit-duration hidden Markov model
-
Jan.
-
S.-Z. Yu and H. Kobayashi, “An efficient forward-backward algorithm for an explicit-duration hidden Markov model,” IEEE Signal Process. Lett., vol. 10, no. 1, pp. 11–14, Jan. 2003.
-
(2003)
IEEE Signal Process. Lett.
, vol.10
, Issue.1
, pp. 11-14
-
-
Yu, S.-Z.1
Kobayashi, H.2
-
63
-
-
0000176621
-
On the complexity of explicit duration HMM's
-
May
-
C. Mitchell, M. Harper, and L. Jamieson “On the complexity of explicit duration HMM's,” IEEE Trans. Speech Audio Process., vol. 3, no. 3, pp. 213–217, May 1995.
-
(1995)
IEEE Trans. Speech Audio Process.
, vol.3
, Issue.3
, pp. 213-217
-
-
Mitchell, C.1
Harper, M.2
Jamieson, L.3
-
64
-
-
33947110905
-
State duration modeling for HMM-based speech synthesis
-
Mar.
-
H. Zen, K. Tokuda, T. Masuko, T. Yoshimura, T. Kobayashi, and T. Kitamura, “State duration modeling for HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E90-D, no. 3, pp. 692–693, Mar. 2007.
-
(2007)
IEICE Trans. Inf. Syst.
, vol.E90-D
, Issue.3
, pp. 692-693
-
-
Zen, H.1
Tokuda, K.2
Masuko, T.3
Yoshimura, T.4
Kobayashi, T.5
Kitamura, T.6
-
66
-
-
67650832556
-
Statistical analysis of the Blizzard Challenge 2007 listening test results
-
Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_003.html, paper 003
-
R. Clark, M. Podsiadlo, M. Fraser, C. Mayo, and S. King, “Statistical analysis of the Blizzard Challenge 2007 listening test results,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_003.html, paper 003.
-
(2007)
Proc. BLZ3-2007 (in Proc. SSW6)
-
-
Clark, R.1
Podsiadlo, M.2
Fraser, M.3
Mayo, C.4
King, S.5
-
67
-
-
85008031526
-
The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007
-
Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_017.html, paper 017
-
Z.-H. Ling, L. Qin, H. Lu, Y. Gao, L.-R. Dai, R.-H. Wang, Y. Jiang, Z.-W. Zhao, J.-H.Y.J. Chen, and G.-P. Hu, “The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_017.html, paper 017.
-
(2007)
Proc. BLZ3-2007 (in Proc. SSW6)
-
-
Ling, Z.-H.1
Qin, L.2
Lu, H.3
Gao, Y.4
Dai, L.-R.5
Wang, R.-H.6
Jiang, Y.7
Zhao, Z.-W.8
Chen, J.-H.Y.J.9
Hu, G.-P.10
-
68
-
-
51449101140
-
Festival Multisyn voices for the 2007 Blizzard Challenge
-
Aug., [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_006.html, paper 006
-
K. Richmond, V. Strom, R. Clark, J. Yamagishi, and S. Fitt, “Festival Multisyn voices for the 2007 Blizzard Challenge,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_006.html, paper 006.
-
(2007)
Proc. BLZ3-2007 (in Proc. SSW6)
-
-
Richmond, K.1
Strom, V.2
Clark, R.3
Yamagishi, J.4
Fitt, S.5
-
69
-
-
0029765811
-
Unit selection in a concatenative speech synthesis system using a large speech database
-
May
-
A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Proc. ICASSP-96, May 1996, pp. 373–376.
-
(1996)
Proc. ICASSP-96
, pp. 373-376
-
-
Hunt, A.1
Black, A.2
-
70
-
-
34547503417
-
HMM-based unit selection using frame sized speech segments
-
Sep.
-
Z.-H. Ling and R.-H. Wang, “HMM-based unit selection using frame sized speech segments,” in Proc. Interspeech 2006, Sep. 2006, pp. 2034–2037.
-
(2006)
Proc. Interspeech 2006
, pp. 2034-2037
-
-
Ling, Z.-H.1
Wang, R.-H.2
-
71
-
-
34547612590
-
HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion
-
Apr.
-
Z.-H. Ling and R.-H. Wang, “HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion,” in Proc. ICASSP-07, Apr. 2007, pp. 1245–1248.
-
(2007)
Proc. ICASSP-07
, pp. 1245-1248
-
-
Ling, Z.-H.1
Wang, R.-H.2
-
72
-
-
34047123652
-
Multisyn: Open-domain unit selection for the Festival speech synthesis system
-
R. A. J. Clark, K. Richmond, and S. King, “Multisyn: Open-domain unit selection for the Festival speech synthesis system,” Speech Commun., vol. 49, no. 4, pp. 317–330, 2007.
-
(2007)
Speech Commun.
, vol.49
, Issue.4
, pp. 317-330
-
-
Clark, R.A.J.1
Richmond, K.2
King, S.3
-
73
-
-
33846429403
-
Minimum generation error training for HMM-based speech synthesis
-
May, [Online]. Available: http://festvox.org/blizzard/bc2008/hts_Blizzard2008.pdf
-
Y. Wu and R.-H. Wang, “Minimum generation error training for HMM-based speech synthesis,” in Proc. ICASSP-06, May 2006, pp. 89–92 [Online]. Available: http://festvox.org/blizzard/bc2008/hts_Blizzard2008.pdf
-
(2006)
Proc. ICASSP-06
, pp. 89-92
-
-
Wu, Y.1
Wang, R.-H.2
-
74
-
-
0030166343
-
The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences
-
C. Benoit, M. Grice, and V. Hazan, “The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences,” Speech Commun., vol. 18, no. 4, pp. 381–392, 1996.
-
(1996)
Speech Commun.
, vol.18
, Issue.4
, pp. 381-392
-
-
Benoit, C.1
Grice, M.2
Hazan, V.3
-
75
-
-
85030493378
-
Synthesis of regional English using a keyword lexicon
-
Sep.
-
S. Fitt and S. Isard, “Synthesis of regional English using a keyword lexicon,” in Proc. Eurospeech 1999, Sep. 1999, vol. 2, pp. 823–826.
-
(1999)
Proc. Eurospeech 1999
, vol.2
, pp. 823-826
-
-
Fitt, S.1
Isard, S.2
-
76
-
-
70449126171
-
Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
-
Sep.
-
J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, and K. Tokuda, “The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge,” in Proc. Blizzard Challenge 2008, Sep. 2008.
-
(2008)
Proc. Blizzard Challenge 2008
-
-
Yamagishi, J.1
Zen, H.2
Wu, Y.-J.3
Toda, T.4
Tokuda, K.5
-
77
-
-
67650803663
-
Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning
-
Feb., [Online]. Available: http://www. langtech.it/en/poster/03_AYLETT.pdf
-
M. Aylett and J. Yamagishi, “Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning,” in Proc. LangTech 2008, Feb. 2008 [Online]. Available: http://www. langtech.it/en/poster/03_AYLETT.pdf
-
(2008)
Proc. LangTech 2008
-
-
Aylett, M.1
Yamagishi, J.2
|