SCOPUS 정보 검색 플랫폼

IEEE Transactions on Speech and Audio Processing

Volumn 13, Issue 5, 2005, Pages 930-944

Vocal tract normalization equals linear transformation in cepstral space

(2) Pitz, Michael b,c Ney, Hermann a,b

a IEEE (Germany)

b RWTH AACHEN UNIVERSITY (Germany)

c BMW GROUP (Germany)

Author keywords

Linear transformation; Speaker adaptive modeling and training; Speaker adaptive recognition; Speech recognition; Vocal tract (length) normalization

Indexed keywords

LINEAR TRANSFORMATION; SPEAKER ADAPTIVE MODELING AND TRAINING; SPEAKER ADAPTIVE RECOGNITION; VOCAL TRACT (LENGTH) NORMALIZATION;

APPROXIMATION THEORY; MATHEMATICAL TRANSFORMATIONS; MATRIX ALGEBRA; MAXIMUM LIKELIHOOD ESTIMATION; PROBABILITY; REGRESSION ANALYSIS;

SPEECH RECOGNITION;

EID: 27644522706 PISSN: 10636676 EISSN: None Source Type: Journal
DOI: 10.1109/TSA.2005.848881 Document Type: Article

Times cited : (136)

References (24)

1
- 0346528936
- Speaker adaptation for continuous density HMMs: A review
- Sofia Antinopolis, France, Aug.
- P. C. Woodland, "Speaker adaptation for continuous density HMMs: A review," in Proc. ISCA ITR-Workshop on Adaptation Methods for Speech Recognition, Sofia Antinopolis, France, Aug. 2001, pp. 11-19.
- (2001) Proc. ISCA ITR-Workshop on Adaptation Methods for Speech Recognition , pp. 11-19
- Woodland, P.C.¹

2
- 0029725604
- A parametric approach to vocal tract length normalization
- Atlanta, GA, May
- E. Eide and H. Gish, "A parametric approach to vocal tract length normalization," in IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, Atlanta, GA, May 1996, pp. 346-349.
- (1996) IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.1 , pp. 346-349
- Eide, E.¹ Gish, H.²

3
- 0029747183
- Speaker normalization using efficient frequency warping procedures
- Atlanta, GA, May
- L. Lee and R. Rose, "Speaker normalization using efficient frequency warping procedures," in IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, Atlanta, GA, May 1996, pp. 353-356.
- (1996) IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.1 , pp. 353-356
- Lee, L.¹ Rose, R.²

4
- 0017482612
- Normalization of vowels by vocal tract length and its application to vowel identification
- Apr.
- H. Wakita, "Normalization of vowels by vocal tract length and its application to vowel identification," in IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. ASSP-25, Apr. 1977, pp. 183-192.
- (1977) IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.ASSP-25 , pp. 183-192
- Wakita, H.¹

5
- 0029764708
- Speaker normalization on conversational telephone speech
- Atlanta, GA, May
- S. Wegmann, D. McAllaster, J. Orloff, and B. Peskin, "Speaker normalization on conversational telephone speech," in IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, Atlanta, GA, May 1996, pp. 339-341.
- (1996) IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.1 , pp. 339-341
- Wegmann, S.¹ McAllaster, D.² Orloff, J.³ Peskin, B.⁴

6
- 0004319975
- Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA
- A. Acero, "Acoustical and environmental robustness in automatic speech recognition," Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, 1990.
- (1990) Acoustical and Environmental Robustness in Automatic Speech Recognition
- Acero, A.¹

7
- 84937324786
- Ph.D. dissertation, Johns Hopkins Univ., Baltimore, MD
- J. W. McDonough, "Speaker compensation with all-pass transforms," Ph.D. dissertation, Johns Hopkins Univ., Baltimore, MD, 2000.
- (2000) Speaker Compensation with All-pass Transforms
- McDonough, J.W.¹

8
- 0029725858
- An approach to speaker adaptation based on analytic functions
- Atlanta, GA, May
- J. W. McDonough, G. Zavaliagkos, and H. Gish, "An approach to speaker adaptation based on analytic functions," in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, Atlanta, GA, May 1996, pp. 721-724.
- (1996) IEEE Int. Conf. on Acoustics, Speech, and Signal Processing , vol.2 , pp. 721-724
- McDonough, J.W.¹ Zavaliagkos, G.² Gish, H.³

9
- 85009075256
- Speaker normalization in the MFCC domain
- Bejing, China, Oct.
- S. Cox, "Speaker normalization in the MFCC domain," in Int. Conf. on Spoken Language Processing, vol. 2, Bejing, China, Oct. 2000, pp. 853-856.
- (2000) Int. Conf. on Spoken Language Processing , vol.2 , pp. 853-856
- Cox, S.¹

10
- 85135261079
- An investigation into vocal tract length normalization
- Budapest, Hungary, Sep.
- L. F. Uebel and P. C. Woodland, "An investigation into vocal tract length normalization," in Proc. ISCA Eur. Conf. on Speech Communication and Technology, vol. 6, Budapest, Hungary, Sep. 1999, pp. 2527-2530.
- (1999) Proc. ISCA Eur. Conf. on Speech Communication and Technology , vol.6 , pp. 2527-2530
- Uebel, L.F.¹ Woodland, P.C.²

11
- 0022227187
- Comparative study of several distortion measures for speech recognition
- Atlanta, GA, Apr.
- F. K. Nocerino, L. R. Rabiner, and D. H. Klatt, "Comparative study of several distortion measures for speech recognition," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, Atlanta, GA, Apr. 1985, pp. 25-28.
- (1985) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing , vol.1 , pp. 25-28
- Nocerino, F.K.¹ Rabiner, L.R.² Klatt, D.H.³

12
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Aug.
- S. B. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Speech Audio Process., vol. 28, pp. 357-366, Aug. 1980.
- (1980) IEEE Trans. Speech Audio Process. , vol.28 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

13
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Jun.
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, Jun. 1990.
- (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

14
- 0034843057
- Computing mel-frequency cepstral coefficients on the power spectrum
- Salt Lake City, UT, Jun.
- S. Molau, M. Pitz, R. Schlüter, and H. Ney, "Computing mel-frequency cepstral coefficients on the power spectrum," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, Salt Lake City, UT, Jun. 2001, pp. 73-76.
- (2001) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing , vol.1 , pp. 73-76
- Molau, S.¹ Pitz, M.² Schlüter, R.³ Ney, H.⁴

15
- 0032629626
- Improved methods for vocal tract normalization
- Phoenix, AZ, Apr.
- L. Welling, S. Kanthak, and H. Ney, "Improved methods for vocal tract normalization," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, Phoenix, AZ, Apr. 1999, pp. 761-764.
- (1999) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing , vol.2 , pp. 761-764
- Welling, L.¹ Kanthak, S.² Ney, H.³

16
- 0003483593
- Cambridge, U.K.
- S. J. Young, HTK: Hidden Markov Model Toolkit V1.4. User Manual Cambridge, U.K., 1993.
- (1993) HTK: Hidden Markov Model Toolkit V1.4. User Manual
- Young, S.J.¹

17
- 0036753897
- Speaker adaptive modeling by vocal tract normalization
- Sep.
- L. Welling, H. Ney, and S. Kanthak, "Speaker adaptive modeling by vocal tract normalization," IEEE Trans. Speech Audio Process., vol. 10, no. 6, pp. 415-426, Sep. 2002.
- (2002) IEEE Trans. Speech Audio Process. , vol.10 , Issue.6 , pp. 415-426
- Welling, L.¹ Ney, H.² Kanthak, S.³

18
- 0029375590
- Speaker adaptation using constrained estimation of Gaussian mixtures
- Sep.
- V. Digalakis, D. Rtischev, and L. Neumeyer, "Speaker adaptation using constrained estimation of Gaussian mixtures," IEEE Trans. Speech Audio Process., vol. 3, no. 5, pp. 357-366, Sep. 1995.
- (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 357-366
- Digalakis, V.¹ Rtischev, D.² Neumeyer, L.³

19
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- Apr.
- M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, Apr. 1998.
- (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

20
- 0030149866
- A maximum-likelihood approach to stochastic matching for robust speech recognition
- May
- A. Sankar and C.-H. Lee, "A maximum-likelihood approach to stochastic matching for robust speech recognition," IEEE Trans. Speech Audio Process., vol. 4, no. 3, pp. 190-202, May 1996.
- (1996) IEEE Trans. Speech Audio Process. , vol.4 , Issue.3 , pp. 190-202
- Sankar, A.¹ Lee, C.-H.²

21
- 85009064348
- Constrained maximum likelihood linear regression for speaker adaptation
- Beijing, China, Oct.
- M. Afify and O. Siohan, "Constrained maximum likelihood linear regression for speaker adaptation," in Proc. Int. Conf. Spoken Language Processing, vol. 3, Beijing, China, Oct. 2000, pp. 861-864.
- (2000) Proc. Int. Conf. Spoken Language Processing , vol.3 , pp. 861-864
- Afify, M.¹ Siohan, O.²

22
- 84966252352
- Decay rates for inverse band matrices
- Oct.
- S. Demko, W. F. Moss, and P. W. Smith, "Decay rates for inverse band matrices," Math. Comput., vol. 43, no. 168, pp. 491-499, Oct. 1984.
- (1984) Math. Comput. , vol.43 , Issue.168 , pp. 491-499
- Demko, S.¹ Moss, W.F.² Smith, P.W.³

23
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- Apr.
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, no. 2, pp. 171-185, Apr. 1995.
- (1995) Comput. Speech Lang. , vol.9 , Issue.2 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

24
- 0036556171
- From within-word model search to across-word model search in large vocabulary continuous speech recognition
- May
- A. Sixtus and H. Key, "From within-word model search to across-word model search in large vocabulary continuous speech recognition," Comput. Speech Lang., vol. 16, no. 2, pp. 245-271, May 2002.
- (2002) Comput. Speech Lang. , vol.16 , Issue.2 , pp. 245-271
- Sixtus, A.¹ Key, H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.