SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 16, Issue 3, 2008, Pages 508-518

Combining spectral representations for large-vocabulary continuous speech recognition

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Feature combination; Heteroscedastic linear discriminant analysis (HLDA); Large vocabulary continuous speech recognition (LVCSR); Pitch synchronous; ROVER; STRAIGHT; Vocal tract length normalization (VTLN)

Indexed keywords

FEATURE COMBINATION; HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS (HLDA); LARGE-VOCABULARY CONTINUOUS SPEECH RECOGNITION (LVCSR); PITCH-SYNCHRONOUS; ROVER; STRAIGHT; VOCAL TRACT LENGTH NORMALIZATION (VTLN);

ACOUSTICS; DISCRIMINANT ANALYSIS; FEATURE EXTRACTION; IMAGE RETRIEVAL; SPEECH ANALYSIS; SPEECH PROCESSING; TRANSCRIPTION;

CONTINUOUS SPEECH RECOGNITION;

EID: 59849090295 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2008.916519 Document Type: Article

Times cited : (39)

References (41)

1
- 0033693063
- Conversational speech recognition using acoustic and articulatory input
- K. Kirchhoff, G. A. Fink, and G. Sagerer, "Conversational speech recognition using acoustic and articulatory input," in Proc, IEEE ICASSP, 2000, pp. 1435-1438.
- (2000) Proc, IEEE ICASSP , pp. 1435-1438
- Kirchhoff, K.¹ Fink, G.A.² Sagerer, G.³

2
- 85009097225
- On using MLP features forLVCSR
- Q. Zhu, B. Chen. N. Morgan, and A. Stolcke, "On using MLP features forLVCSR," in Proc. Euivspeech, 2004, pp. 921-924.
- (2004) Proc. Euivspeech , pp. 921-924
- Zhu, Q.¹ Chen, B.² Morgan, N.³ Stolcke, A.⁴

3
- 34250015828
- Using multiple acoustic feature sets for speech recognition
- A. Zolnay, D. Kocharov, R. Schliiter, and H. Ney, "Using multiple acoustic feature sets for speech recognition," Speech Commun., vol. 49, pp. 514-525, 2007.
- (2007) Speech Commun , vol.49 , pp. 514-525
- Zolnay, A.¹ Kocharov, D.² Schliiter, R.³ Ney, H.⁴

4
- 34547539413
- R. Schliiter, I. Bezrukov, H. Wagner, and H. Ney, Gammatone features and feature combination for large vocabulary speech recognition, in Proc. IEEE ICASSP, 2007, pp. IV-649-IV-652.
- R. Schliiter, I. Bezrukov, H. Wagner, and H. Ney, "Gammatone features and feature combination for large vocabulary speech recognition," in Proc. IEEE ICASSP, 2007, pp. IV-649-IV-652.

5
- 85119697883
- iROVER: Improving system combination with classification
- D. Hillard, B. Hoffmeister, M. Ostendorf, R. Schliiter, and H. Ney, "iROVER: Improving system combination with classification," in Proc. NAACL-HLT Companion Volume Short Papers. 2007, pp. 65-68.
- (2007) Proc. NAACL-HLT Companion Volume Short Papers , pp. 65-68
- Hillard, D.¹ Hoffmeister, B.² Ostendorf, M.³ Schliiter, R.⁴ Ney, H.⁵

6
- 85080018809
- J. Cohen, T. Kamm, and A. Andreou, Vocal tract normalization in speech recognition: Compensating for systematic speaker variability, J. Acoust. Soc. Amer., 97, no. 5, pt. 2, pp. 3246-3247, 1995.
- J. Cohen, T. Kamm, and A. Andreou, "Vocal tract normalization in speech recognition: Compensating for systematic speaker variability," J. Acoust. Soc. Amer., vol. 97, no. 5, pt. 2, pp. 3246-3247, 1995.

7
- 0029747183
- Speaker normalization using efficient frequency warping procedures
- L. Lee and R. Rose, "Speaker normalization using efficient frequency warping procedures," Proc. IEEE ICASSP, pp. 353-356, 1996.
- (1996) Proc. IEEE ICASSP , pp. 353-356
- Lee, L.¹ Rose, R.²

8
- 0034847002
- The 1998 HTK system for transcription of conversational telephone speech
- T. Hain, P. Woodland, T. Niesler, and E. Whittaker, "The 1998 HTK system for transcription of conversational telephone speech," in Proc. IEEE ICASSP, 1999, pp. 57-60.
- (1999) Proc. IEEE ICASSP , pp. 57-60
- Hain, T.¹ Woodland, P.² Niesler, T.³ Whittaker, E.⁴

9
- 0036753897
- Speaker adaptive modeling by vocal tract normalization
- Sep
- L. Welling, H. Ney, and S. Kanthak, "Speaker adaptive modeling by vocal tract normalization," IEEE Trans. Speech Audio Process., vol. 10, no. 6, pp. 415-126, Sep. 2002.
- (2002) IEEE Trans. Speech Audio Process , vol.10 , Issue.6 , pp. 415-126
- Welling, L.¹ Ney, H.² Kanthak, S.³

10
- 0029764708
- Speaker normalization on conversational telephone speech
- S. Wegmann, D. McAllaster, J. Orloff, andB. Peskin, "Speaker normalization on conversational telephone speech," in Proc. IEEE ICASSP, 1996, pp. 339-341.
- (1996) Proc. IEEE ICASSP , pp. 339-341
- Wegmann, S.¹ McAllaster, D.² Orloff, J.³ andB⁴ Peskin⁵

11
- 0029725604
- A parametric approach to vocal tract length normalization
- E. Eide and H. Gish, "A parametric approach to vocal tract length normalization," in Proc. IEEE ICASSP, 1996, pp. 346-348.
- (1996) Proc. IEEE ICASSP , pp. 346-348
- Eide, E.¹ Gish, H.²

12
- 0032673049
- Restructuring speech representations using pitch adaptive time-frequency smoothing and instantaneous-frequency-based FO extraction: Possible role of repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using pitch adaptive time-frequency smoothing and instantaneous-frequency-based FO extraction: Possible role of repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² de Cheveigné, A.³

13
- 0003927842
- Upper Saddle River, NJ: Prentice-Hall
- T. F. Quatieri, Discrete Time Speech Signal Processing: Principles and Practice. Upper Saddle River, NJ: Prentice-Hall, 2001.
- (2001) Discrete Time Speech Signal Processing: Principles and Practice
- Quatieri, T.F.¹

14
- 0034841228
- Perceptual harmonic cepstral coefficients for speech recognition in noisy environments
- L. Gu and K. Rose, "Perceptual harmonic cepstral coefficients for speech recognition in noisy environments," in Proc. IEEE ICASSP, 2001, pp. 125-128.
- (2001) Proc. IEEE ICASSP , pp. 125-128
- Gu, L.¹ Rose, K.²

15
- 0347968155
- Pitch adaptive windows for improved exitation coding in low-rate CELP coders
- Nov
- A. V. Rao, S. Ahmadi, J. Linden, A. Gersho, V. Cuperman, and R. Heidari, "Pitch adaptive windows for improved exitation coding in low-rate CELP coders," IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 648-659, Nov. 2003.
- (2003) IEEE Trans. Speech Audio Process , vol.11 , Issue.6 , pp. 648-659
- Rao, A.V.¹ Ahmadi, S.² Linden, J.³ Gersho, A.⁴ Cuperman, V.⁵ Heidari, R.⁶

16
- 85135241246
- Comparison of MFCC and pitch synchronous AM, FM parameters for speaker identification
- H. Ezzaidi and J. Rouat, "Comparison of MFCC and pitch synchronous AM, FM parameters for speaker identification," in Proc. ICSLP, 2000, vol. 8, pp. 318-321.
- (2000) Proc. ICSLP , vol.8 , pp. 318-321
- Ezzaidi, H.¹ Rouat, J.²

17
- 4544386226
- A pitch synchronous feature extraction method for speaker recognition
- pp. I-405-I-408
- S. Kim, T. Eriksson, H. Kang, and D. H. Youn, "A pitch synchronous feature extraction method for speaker recognition," in Proc. IEEE ICASSP, 2004, pp. I-405-I-408.
- Proc. IEEE ICASSP , pp. 2004
- Kim, S.¹ Eriksson, T.² Kang, H.³ Youn, D.H.⁴

18
- 85143191773
- R. Zilca, J. Navratil, and G. N. Ramaswamy, Depitch and the role of fundamental frequency in speaker recognition, in Proc. IEEE ICASSP, 2003, pp. II-81-II-84.
- R. Zilca, J. Navratil, and G. N. Ramaswamy, "Depitch and the role of fundamental frequency in speaker recognition," in Proc. IEEE ICASSP, 2003, pp. II-81-II-84.

19
- 7544241146
- Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables
- T. A. Stephenson, J. Escofet, M. Magimai-Doss, and H. Bourlard, "Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables," in Proc. IEEE Workshop Neural Netw. Signal Process., 2002, pp. 637-646.
- (2002) Proc. IEEE Workshop Neural Netw. Signal Process , pp. 637-646
- Stephenson, T.A.¹ Escofet, J.² Magimai-Doss, M.³ Bourlard, H.⁴

20
- 84863687026
- On the use of phase information for speech recognition
- Online, Available
- B. Bozkurt and L. Couvreur, "On the use of phase information for speech recognition," in Proc. EUSIPCO, 2005 [Online], Available: http://www.eurasip.org/Proeeedings/Eusipco/Eusipco2005/de-fevent/papers/cr 1390.pdf
- (2005) Proc. EUSIPCO
- Bozkurt, B.¹ Couvreur, L.²

21
- 85009135069
- Improving the representation of time structure in front-ends for automatic speech recognition
- W. J. Holmes, "Improving the representation of time structure in front-ends for automatic speech recognition," in Proc. ICSLP, 2000, vol. 2, pp. 1073-1076.
- (2000) Proc. ICSLP , vol.2 , pp. 1073-1076
- Holmes, W.J.¹

22
- 85009118262
- A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR
- M. Ghulam, T. Fukuda, I. Horikawa, and T. Nitta, "A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR," in Proc. ICSLP, 2004, vol. 1. pp. 133-136.
- (2004) Proc. ICSLP , vol.1 , pp. 133-136
- Ghulam, M.¹ Fukuda, T.² Horikawa, I.³ Nitta, T.⁴

23
- 33645781551
- Evaluation of a speech recognition/generation method based on HMM and STRAIGHT
- T. Irino, Y. Minami, T. Nakatani, M. Tsuzaki, and H. Tagawa, "Evaluation of a speech recognition/generation method based on HMM and STRAIGHT." in Proc. ICSLP, 2002, pp. 2545-2548.
- (2002) Proc. ICSLP , pp. 2545-2548
- Irino, T.¹ Minami, Y.² Nakatani, T.³ Tsuzaki, M.⁴ Tagawa, H.⁵

24
- 0001455934
- A robust algorithm for pitch tracking (RAPT)
- D. Talkin, W. B. Kleijn and K. K. Paliwal, Eds, New York: Elsevier
- D. Talkin, W. B. Kleijn and K. K. Paliwal, Eds., "A robust algorithm for pitch tracking (RAPT)," in Speech Coding and Synthesis. New York: Elsevier, 1995. pp. 495-518.
- (1995) Speech Coding and Synthesis , pp. 495-518

25
- 85009141768
- Combination of speech features using smoothed heteroscedastic linear discriminant analysis
- L. Burget, "Combination of speech features using smoothed heteroscedastic linear discriminant analysis," in Proc. ICSLP, 2004, pp. 2549-2552.
- (2004) Proc. ICSLP , pp. 2549-2552
- Burget, L.¹

26
- 0030638031
- A post-processing system to yield reduced word error rates: Recognition Output Voting Error Reduction (ROVER)
- J. G. Fiscus, "A post-processing system to yield reduced word error rates: Recognition Output Voting Error Reduction (ROVER)," in Proc. IEEE Workshop ASKU, 1997, pp. 347-354.
- (1997) Proc. IEEE Workshop ASKU , pp. 347-354
- Fiscus, J.G.¹

27
- 0038133932
- A statistical approach to metrics for word and syllable recognition
- M. J. Hunt, "A statistical approach to metrics for word and syllable recognition," J. Acoust. Soc. Amer., vol. 66, pp. S535-536, 1979.
- (1979) J. Acoust. Soc. Amer , vol.66
- Hunt, M.J.¹

28
- 0023867341
- Speaker dependent and independent speech recognition experiments with an auditory model
- M. J. Hunt and C. Lefebvre, "Speaker dependent and independent speech recognition experiments with an auditory model," in Proc. IEEE ICASSP, 1988, vol. 1, pp. 215-218.
- (1988) Proc. IEEE ICASSP , vol.1 , pp. 215-218
- Hunt, M.J.¹ Lefebvre, C.²

29
- 0032289099
- Heteroscedastic discriminant analysis and reduced rank HMMs for improved recognition
- N. Kumar and A. G. Andreou, "Heteroscedastic discriminant analysis and reduced rank HMMs for improved recognition," Speech Commun., vol. 26, pp. 283-297, 1998.
- (1998) Speech Commun , vol.26 , pp. 283-297
- Kumar, N.¹ Andreou, A.G.²

30
- 33947643073
- Complementarity of speech recognition systems and system combination,
- Ph.D. dissertation, Brno Univ. of Technol, Brno, Czech Republic
- L. Burget, "Complementarity of speech recognition systems and system combination," Ph.D. dissertation, Brno Univ. of Technol., Brno, Czech Republic, 2004.
- (2004)
- Burget, L.¹

31
- 0032638856
- Semi-tied covariance matrices for hidden Markov models
- May
- M. Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 272-281, May 1999.
- (1999) IEEE Trans. Speech Audio Process , vol.7 , Issue.3 , pp. 272-281
- Gales, M.¹

32
- 33745199182
- Applying vocal tract length normalization to meeting recordings
- G. Garau, S. Renals, and T. Hain, "Applying vocal tract length normalization to meeting recordings," in Proc. Eurospeech, 2005, pp. 265-268.
- (2005) Proc. Eurospeech , pp. 265-268
- Garau, G.¹ Renals, S.² Hain, T.³

33
- 85079978469
- Feature combination using linear discriminant analysis and its pitfalls
- R. Schlüter, A. Zolnay, and H. Ney, "Feature combination using linear discriminant analysis and its pitfalls," in Proc. Interspeech, 2006, pp. 1077-1081.
- (2006) Proc. Interspeech , pp. 1077-1081
- Schlüter, R.¹ Zolnay, A.² Ney, H.³

34
- 0003822743
- Cambridge, U.K, Cambridge Univ. Press, Dec
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK book (v3.4). Cambridge.' U.K.: Cambridge Univ. Press, Dec. 2006.
- (2006) The HTK book (v3.4)
- Young, S.¹ Evermann, G.² Gales, M.³ Hain, T.⁴ Kershaw, D.⁵ Liu, X.⁶ Moore, G.⁷ Odell, J.⁸ Ollason, D.⁹ Povey, D.¹⁰ Valtchev, V.¹¹ Woodland, P.¹²

35
- 33745536025
- The 2005 AMI system for the transcription of speech in meetings
- Proc. MLMI'05 Machine Learning for Multimodal Interaction, Springer
- T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, I. McCowan, D. Moore, V. Wan, R. Ordelman, and S. Renals, "The 2005 AMI system for the transcription of speech in meetings," in Proc. MLMI'05 Machine Learning for Multimodal Interaction, 2006, no. 3869, pp. 450-62, ser. Lecture Notes in Computer Science. Springer.
- (2006) ser. Lecture Notes in Computer Science , vol.3869 , pp. 450-462
- Hain, T.¹ Burget, L.² Dines, J.³ Garau, G.⁴ Karafiat, M.⁵ Lincoln, M.⁶ McCowan, I.⁷ Moore, D.⁸ Wan, V.⁹ Ordelman, R.¹⁰ Renals, S.¹¹

36
- 85090317334
- A pitch extraction reference database
- F. Plante, G. F. Meyer, and W. A. Ainsworth, "A pitch extraction reference database," in Proc. Eurospeech, 1995, pp. 837-840.
- (1995) Proc. Eurospeech , pp. 837-840
- Plante, F.¹ Meyer, G.F.² Ainsworth, W.A.³

37
- 0028996854
- WSJCAMO: A British English speech corpus for large vocabulary continuous speech recognition
- Detroit, Ml
- T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals, "WSJCAMO: A British English speech corpus for large vocabulary continuous speech recognition," in Proc. IEEE ICASSP, Detroit, Ml, 1995, pp. 81-84.
- (1995) Proc. IEEE ICASSP , pp. 81-84
- Robinson, T.¹ Fransen, J.² Pye, D.³ Foote, J.⁴ Renals, S.⁵

38
- 33745531876
- An investigation into transcription of conference room meetings
- T. Main, J. Dines, G. Garau, M. Karafiat, D. Moore, V. Wan, R. Or-delman, I. Mc. Cowan, J. Vepa, and S. Renals, "An investigation into transcription of conference room meetings," in Proc. Eurospeech, 2005, pp. 1661-1664.
- (2005) Proc. Eurospeech , pp. 1661-1664
- Main, T.¹ Dines, J.² Garau, G.³ Karafiat, M.⁴ Moore, D.⁵ Wan, V.⁶ Or-delman, R.⁷ Cowan, I.M.⁸ Vepa, J.⁹ Renals, S.¹⁰

39
- 0017097478
- A comparative performance study of several pitch detection algorithms
- Oct
- L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, "A comparative performance study of several pitch detection algorithms," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. 5, pp. 399-418, Oct. 1976.
- (1976) IEEE Trans. Acoust., Speech, Signal Process , vol.ASSP-24 , Issue.5 , pp. 399-418
- Rabiner, L.R.¹ Cheng, M.J.² Rosenberg, A.E.³ McGonegal, C.A.⁴

40
- 34547548247
- T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan, The AMI system for the transcription of speech in meetings, in Proc. IEEE ICASSP, 2007, pp. IV-357-IV-360.
- T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan, "The AMI system for the transcription of speech in meetings," in Proc. IEEE ICASSP, 2007, pp. IV-357-IV-360.

41
- 77249114287
- The Rich Transcription 2006 spring meeting recognition evauation
- Proc. MLMI'06 Machine Learning for Multimodal Interaction, Springer
- J. G. Fiscus, J. Ajot, M. Michel, and J. S. Garofolo, "The Rich Transcription 2006 spring meeting recognition evauation," in Proc. MLMI'06 Machine Learning for Multimodal Interaction, 2006, no. 4299, pp. 309-322, ser. Lecture Notes in Computer Science. Springer.
- (2006) ser. Lecture Notes in Computer Science , vol.4299 , pp. 309-322
- Fiscus, J.G.¹ Ajot, J.² Michel, M.³ Garofolo, J.S.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.