메뉴 건너뛰기




Volumn 14, Issue 3, 2006, Pages 990-997

A global, boundary-centric framework for unit selection text-to-speech synthesis

Author keywords

Discontinuity perception; Distance measure; Join cost; Modal analysis; Segment concatenation; Text to speech synthesis; Unit selection

Indexed keywords

DISCONTINUITY PERCEPTION; DISTANCE MEASURE; JOIN COST; SEGMENT CONCATENATION; UNIT SELECTION;

EID: 33846261409     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSA.2005.858048     Document Type: Article
Times cited : (17)

References (34)
  • 1
    • 0003058857 scopus 로고
    • On the basic scheme and algorithms in nonuniform unit speech synthesis
    • G. Bailly and C. Benoit, Eds. Amsterdam, The Netherlands: North-Holland
    • K. Takeda, K. Abe, and Y. Sagisaka, "On the basic scheme and algorithms in nonuniform unit speech synthesis," in Talking Machines, G. Bailly and C. Benoit, Eds. Amsterdam, The Netherlands: North-Holland, 1992, pp. 93-105.
    • (1992) Talking Machines , pp. 93-105
    • Takeda, K.1    Abe, K.2    Sagisaka, Y.3
  • 2
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenate speech synthesis system using large speech database
    • Atlanta, GA
    • A. Hunt and A. Black, "Unit selection in a concatenate speech synthesis system using large speech database," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Atlanta, GA, 1996, pp. 373-376.
    • (1996) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 373-376
    • Hunt, A.1    Black, A.2
  • 4
    • 0000237685 scopus 로고    scopus 로고
    • Prosody and the selection of source units for concatenative synthesis
    • J. van Santen, R. Sproat, J. Hirschberg, and J. Olive, Eds. New York: Springer-Verlag
    • W. N. Campbell and A. Black, "Prosody and the selection of source units for concatenative synthesis," in Progress in Speech Synthesis, J. van Santen, R. Sproat, J. Hirschberg, and J. Olive, Eds. New York: Springer-Verlag, 1997, pp. 279-292.
    • (1997) Progress in Speech Synthesis , pp. 279-292
    • Campbell, W.N.1    Black, A.2
  • 5
    • 0035127703 scopus 로고    scopus 로고
    • Applying the harmonic plus noise model in concatenative speech synthesis
    • Jan
    • Y. Stylianou, "Applying the harmonic plus noise model in concatenative speech synthesis," IEEE Trans. Speech Audio Process., vol. 9, no. 1, pp. 21-29, Jan. 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.1 , pp. 21-29
    • Stylianou, Y.1
  • 6
    • 0035124445 scopus 로고    scopus 로고
    • Control of spectral dynamics in concatenative speech synthesis
    • Jan
    • J. Wouters and M. Macon, "Control of spectral dynamics in concatenative speech synthesis," IEEE Trans. Speech Audio Process., vol. 9, no. 1, pp. 30-38, Jan, 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.1 , pp. 30-38
    • Wouters, J.1    Macon, M.2
  • 9
    • 85133526552 scopus 로고    scopus 로고
    • Automatically clustering similar units for unit selection in speech synthesis
    • Rhodes, Greece, Sep
    • A. W. Black and P. Taylor, "Automatically clustering similar units for unit selection in speech synthesis," in Proc. 5th European Conf. Speech Communication Technology, Rhodes, Greece, Sep. 1997, pp. 601-604.
    • (1997) Proc. 5th European Conf. Speech Communication Technology , pp. 601-604
    • Black, A.W.1    Taylor, P.2
  • 11
    • 81155152572 scopus 로고    scopus 로고
    • A perceptual evaluation of distance measures for concatenation speech synthesis
    • Sydney, Australia, Dec
    • J. Wouters and M. W. Macon, "A perceptual evaluation of distance measures for concatenation speech synthesis," in Proc. Int. Conf. Spoken Language Processing, vol. 6, Sydney, Australia, Dec. 1998, pp. 159-163.
    • (1998) Proc. Int. Conf. Spoken Language Processing , vol.6 , pp. 159-163
    • Wouters, J.1    Macon, M.W.2
  • 12
    • 0035127353 scopus 로고    scopus 로고
    • Reducing audible spectral discontinuities
    • Jan
    • E. Klabbers and R. Veldhuis, "Reducing audible spectral discontinuities," IEEE Trans. Speech Audio Process., vol. 9, no. 1, pp. 39-51, Jan. 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.1 , pp. 39-51
    • Klabbers, E.1    Veldhuis, R.2
  • 13
    • 85009282243 scopus 로고    scopus 로고
    • Feature extraction for unit selection in. concatenate speech synthesis: Comparison between AIM, LPC, and MFCC
    • Denver, CO, Sep
    • M. Tsuzaki and H. Kawai, "Feature extraction for unit selection in. concatenate speech synthesis: comparison between AIM, LPC, and MFCC," in Proc. Int. Conf. Spoken Language Processing, Denver, CO, Sep. 2002, pp. 137-140.
    • (2002) Proc. Int. Conf. Spoken Language Processing , pp. 137-140
    • Tsuzaki, M.1    Kawai, H.2
  • 14
    • 85133501802 scopus 로고    scopus 로고
    • Removing phase mismatches in concatenative speech synthesis
    • Jenolan Caves, Australia, Nov
    • Y. Stylianou, "Removing phase mismatches in concatenative speech synthesis," in Proc. 3rd ESCA Speech Synthesis Workshop, Jenolan Caves, Australia, Nov. 1998, pp. 267-272.
    • (1998) Proc. 3rd ESCA Speech Synthesis Workshop , pp. 267-272
    • Stylianou, Y.1
  • 15
    • 85032421249 scopus 로고    scopus 로고
    • A novel discontinuity metric for unit selection text-tospeech synthesis
    • Pittsburgh, PA, Jun
    • J. R. Bellegarda, "A novel discontinuity metric for unit selection text-tospeech synthesis," in Proc. 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, Jun. 2004, pp. 133-138.
    • (2004) Proc. 5th ISCA Speech Synthesis Workshop , pp. 133-138
    • Bellegarda, J.R.1
  • 16
    • 85135263974 scopus 로고    scopus 로고
    • Objective distance measures for assessing concatenative speech synthesis
    • Budapest, Hungary, Sep
    • J.-D. Chen and N. Campbell, "Objective distance measures for assessing concatenative speech synthesis," in Proc. 6th Euro. Conf. Speech Comm. Technology, Budapest, Hungary, Sep. 1999, pp. 611-614.
    • (1999) Proc. 6th Euro. Conf. Speech Comm. Technology , pp. 611-614
    • Chen, J.-D.1    Campbell, N.2
  • 17
    • 0036497601 scopus 로고    scopus 로고
    • A comparison of spectral smoothing methods for segment concatenation based speech synthesis
    • D. Chappell and J. H. L. Hansen, "A comparison of spectral smoothing methods for segment concatenation based speech synthesis," Speech Commun., vol. 36, no. 3-1, pp. 343-373, 2002.
    • (2002) Speech Commun , vol.36 , Issue.3 -1 , pp. 343-373
    • Chappell, D.1    Hansen, J.H.L.2
  • 18
    • 0034854702 scopus 로고    scopus 로고
    • Perceptual and objective detection of discontinuities in concatenative speech synthesis
    • Salt Lake City, UT
    • Y. Stylianou and A. K. Syrdal, "Perceptual and objective detection of discontinuities in concatenative speech synthesis," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Salt Lake City, UT, 2001, pp. 837-840.
    • (2001) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 837-840
    • Stylianou, Y.1    Syrdal, A.K.2
  • 19
    • 80051612889 scopus 로고    scopus 로고
    • A new distance measure for costing discontinuities in concatenative speech synthesis
    • Pethshire, U.K, Sep
    • R. E. Donovan, "A new distance measure for costing discontinuities in concatenative speech synthesis," in Proc. 4th ISCA Speech Synthesis Workshop, Pethshire, U.K., Sep. 2001, pp. 59-62.
    • (2001) Proc. 4th ISCA Speech Synthesis Workshop , pp. 59-62
    • Donovan, R.E.1
  • 20
    • 33750275922 scopus 로고    scopus 로고
    • Join cost for unit selection speech synthesis
    • S. Narayanan and A. Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall
    • J. Vepa and S. King, "Join cost for unit selection speech synthesis," in Text to Speech Synthesis: New Paradigms and Advances, S. Narayanan and A. Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall, 2004, pp. 35-62.
    • (2004) Text to Speech Synthesis: New Paradigms and Advances , pp. 35-62
    • Vepa, J.1    King, S.2
  • 22
    • 0000274403 scopus 로고    scopus 로고
    • Exploiting latent semantic information in statistical language modeling
    • Aug
    • J. R. Bellegarda, "Exploiting latent semantic information in statistical language modeling," Proc. IEEE, vol. 88, no. 8, pp. 1279-1296, Aug. 2000.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1279-1296
    • Bellegarda, J.R.1
  • 23
    • 2742532435 scopus 로고
    • Voicing epoch detection determination with dynamic programming
    • D. Talkin, "Voicing epoch detection determination with dynamic programming," J. Acoust. Soc. Amer., vol. 85, 1989.
    • (1989) J. Acoust. Soc. Amer , vol.85
    • Talkin, D.1
  • 24
    • 0024924999 scopus 로고
    • Automatic and reliable estimation of glottal closure instant and period
    • Dec
    • Y. M. Cheng and D. O'Shaughnessy, "Automatic and reliable estimation of glottal closure instant and period," IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 12, pp. 1805-1815, Dec. 1989.
    • (1989) IEEE Trans. Acoust., Speech, Signal Process , vol.37 , Issue.12 , pp. 1805-1815
    • Cheng, Y.M.1    O'Shaughnessy, D.2
  • 25
    • 34047276215 scopus 로고    scopus 로고
    • J. K. Cullum and R. A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations - I Theory. Boston, MA: Brickhauser, 1985, ch. 5.
    • J. K. Cullum and R. A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations - Vol. I Theory. Boston, MA: Brickhauser, 1985, ch. 5.
  • 27
    • 34047274885 scopus 로고    scopus 로고
    • J. O. Smith III, Why sinusoids are important, in Mathematics of the Discrete Fourier Transform (OFT): W3K Publishing, 2003, sec. 4.1.2.
    • J. O. Smith III, "Why sinusoids are important," in Mathematics of the Discrete Fourier Transform (OFT): W3K Publishing, 2003, sec. 4.1.2.
  • 28
    • 0035121063 scopus 로고    scopus 로고
    • Statistical prosodie modeling: From corpus design to parameter estimation
    • Jan
    • J. R. Bellegarda, K. E. A. Silverman, K. A. Lenzo, and V. Anderson, "Statistical prosodie modeling: from corpus design to parameter estimation," IEEE Trans. Speech Audio Process., vol. SAP-9, no. 1, pp. 52-66, Jan. 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.SAP-9 , Issue.1 , pp. 52-66
    • Bellegarda, J.R.1    Silverman, K.E.A.2    Lenzo, K.A.3    Anderson, V.4
  • 29
    • 34047258092 scopus 로고    scopus 로고
    • private communication
    • C. Waast, private communication, 2004.
    • (2004)
    • Waast, C.1
  • 30
    • 0002609530 scopus 로고    scopus 로고
    • Optimal coupling of diphones
    • J. van Santen, R. Sproat, J. Hirschberg, and J. Olive, Eds. New York: Springer-Verlag
    • A. Conkie and S. Isard, "Optimal coupling of diphones," in Progress in Speech Synthesis, J. van Santen, R. Sproat, J. Hirschberg, and J. Olive, Eds. New York: Springer-Verlag, 1997, pp. 293-304.
    • (1997) Progress in Speech Synthesis , pp. 293-304
    • Conkie, A.1    Isard, S.2
  • 32
    • 34047272434 scopus 로고    scopus 로고
    • Speech Assessment Methods Phonetic Alphabet (SAMPA). Standard Machine-Readable Encoding of Phonetic Notation. ESPRIT project 1541, 1987-89, cf. [Online] Available: http://www.phon.ucl.ac.uk/home/sampa/home.htm
    • Speech Assessment Methods Phonetic Alphabet (SAMPA). Standard Machine-Readable Encoding of Phonetic Notation. ESPRIT project 1541, 1987-89, cf. [Online] Available: http://www.phon.ucl.ac.uk/home/sampa/home.htm
  • 33
    • 0025543906 scopus 로고
    • Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech Commun., vol. 9, pp. 453-467, 1990.
    • (1990) Speech Commun , vol.9 , pp. 453-467
    • Moulines, E.1    Charpentier, F.2
  • 34
    • 34047257276 scopus 로고    scopus 로고
    • private communication
    • K. E. A. Silverman, private communication, 2003.
    • (2003)
    • Silverman, K.E.A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.