메뉴 건너뛰기




Volumn 16, Issue 3, 2008, Pages 508-518

Combining spectral representations for large-vocabulary continuous speech recognition

Author keywords

Feature combination; Heteroscedastic linear discriminant analysis (HLDA); Large vocabulary continuous speech recognition (LVCSR); Pitch synchronous; ROVER; STRAIGHT; Vocal tract length normalization (VTLN)

Indexed keywords

FEATURE COMBINATION; HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS (HLDA); LARGE-VOCABULARY CONTINUOUS SPEECH RECOGNITION (LVCSR); PITCH-SYNCHRONOUS; ROVER; STRAIGHT; VOCAL TRACT LENGTH NORMALIZATION (VTLN);

EID: 59849090295     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2008.916519     Document Type: Article
Times cited : (39)

References (41)
  • 1
    • 0033693063 scopus 로고    scopus 로고
    • Conversational speech recognition using acoustic and articulatory input
    • K. Kirchhoff, G. A. Fink, and G. Sagerer, "Conversational speech recognition using acoustic and articulatory input," in Proc, IEEE ICASSP, 2000, pp. 1435-1438.
    • (2000) Proc, IEEE ICASSP , pp. 1435-1438
    • Kirchhoff, K.1    Fink, G.A.2    Sagerer, G.3
  • 3
    • 34250015828 scopus 로고    scopus 로고
    • Using multiple acoustic feature sets for speech recognition
    • A. Zolnay, D. Kocharov, R. Schliiter, and H. Ney, "Using multiple acoustic feature sets for speech recognition," Speech Commun., vol. 49, pp. 514-525, 2007.
    • (2007) Speech Commun , vol.49 , pp. 514-525
    • Zolnay, A.1    Kocharov, D.2    Schliiter, R.3    Ney, H.4
  • 4
    • 34547539413 scopus 로고    scopus 로고
    • R. Schliiter, I. Bezrukov, H. Wagner, and H. Ney, Gammatone features and feature combination for large vocabulary speech recognition, in Proc. IEEE ICASSP, 2007, pp. IV-649-IV-652.
    • R. Schliiter, I. Bezrukov, H. Wagner, and H. Ney, "Gammatone features and feature combination for large vocabulary speech recognition," in Proc. IEEE ICASSP, 2007, pp. IV-649-IV-652.
  • 6
    • 85080018809 scopus 로고    scopus 로고
    • J. Cohen, T. Kamm, and A. Andreou, Vocal tract normalization in speech recognition: Compensating for systematic speaker variability, J. Acoust. Soc. Amer., 97, no. 5, pt. 2, pp. 3246-3247, 1995.
    • J. Cohen, T. Kamm, and A. Andreou, "Vocal tract normalization in speech recognition: Compensating for systematic speaker variability," J. Acoust. Soc. Amer., vol. 97, no. 5, pt. 2, pp. 3246-3247, 1995.
  • 7
    • 0029747183 scopus 로고    scopus 로고
    • Speaker normalization using efficient frequency warping procedures
    • L. Lee and R. Rose, "Speaker normalization using efficient frequency warping procedures," Proc. IEEE ICASSP, pp. 353-356, 1996.
    • (1996) Proc. IEEE ICASSP , pp. 353-356
    • Lee, L.1    Rose, R.2
  • 8
    • 0034847002 scopus 로고    scopus 로고
    • The 1998 HTK system for transcription of conversational telephone speech
    • T. Hain, P. Woodland, T. Niesler, and E. Whittaker, "The 1998 HTK system for transcription of conversational telephone speech," in Proc. IEEE ICASSP, 1999, pp. 57-60.
    • (1999) Proc. IEEE ICASSP , pp. 57-60
    • Hain, T.1    Woodland, P.2    Niesler, T.3    Whittaker, E.4
  • 9
    • 0036753897 scopus 로고    scopus 로고
    • Speaker adaptive modeling by vocal tract normalization
    • Sep
    • L. Welling, H. Ney, and S. Kanthak, "Speaker adaptive modeling by vocal tract normalization," IEEE Trans. Speech Audio Process., vol. 10, no. 6, pp. 415-126, Sep. 2002.
    • (2002) IEEE Trans. Speech Audio Process , vol.10 , Issue.6 , pp. 415-126
    • Welling, L.1    Ney, H.2    Kanthak, S.3
  • 10
  • 11
    • 0029725604 scopus 로고    scopus 로고
    • A parametric approach to vocal tract length normalization
    • E. Eide and H. Gish, "A parametric approach to vocal tract length normalization," in Proc. IEEE ICASSP, 1996, pp. 346-348.
    • (1996) Proc. IEEE ICASSP , pp. 346-348
    • Eide, E.1    Gish, H.2
  • 12
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using pitch adaptive time-frequency smoothing and instantaneous-frequency-based FO extraction: Possible role of repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using pitch adaptive time-frequency smoothing and instantaneous-frequency-based FO extraction: Possible role of repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
    • (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    de Cheveigné, A.3
  • 14
    • 0034841228 scopus 로고    scopus 로고
    • Perceptual harmonic cepstral coefficients for speech recognition in noisy environments
    • L. Gu and K. Rose, "Perceptual harmonic cepstral coefficients for speech recognition in noisy environments," in Proc. IEEE ICASSP, 2001, pp. 125-128.
    • (2001) Proc. IEEE ICASSP , pp. 125-128
    • Gu, L.1    Rose, K.2
  • 16
    • 85135241246 scopus 로고    scopus 로고
    • Comparison of MFCC and pitch synchronous AM, FM parameters for speaker identification
    • H. Ezzaidi and J. Rouat, "Comparison of MFCC and pitch synchronous AM, FM parameters for speaker identification," in Proc. ICSLP, 2000, vol. 8, pp. 318-321.
    • (2000) Proc. ICSLP , vol.8 , pp. 318-321
    • Ezzaidi, H.1    Rouat, J.2
  • 17
    • 4544386226 scopus 로고    scopus 로고
    • A pitch synchronous feature extraction method for speaker recognition
    • pp. I-405-I-408
    • S. Kim, T. Eriksson, H. Kang, and D. H. Youn, "A pitch synchronous feature extraction method for speaker recognition," in Proc. IEEE ICASSP, 2004, pp. I-405-I-408.
    • Proc. IEEE ICASSP , pp. 2004
    • Kim, S.1    Eriksson, T.2    Kang, H.3    Youn, D.H.4
  • 18
    • 85143191773 scopus 로고    scopus 로고
    • R. Zilca, J. Navratil, and G. N. Ramaswamy, Depitch and the role of fundamental frequency in speaker recognition, in Proc. IEEE ICASSP, 2003, pp. II-81-II-84.
    • R. Zilca, J. Navratil, and G. N. Ramaswamy, "Depitch and the role of fundamental frequency in speaker recognition," in Proc. IEEE ICASSP, 2003, pp. II-81-II-84.
  • 20
    • 84863687026 scopus 로고    scopus 로고
    • On the use of phase information for speech recognition
    • Online, Available
    • B. Bozkurt and L. Couvreur, "On the use of phase information for speech recognition," in Proc. EUSIPCO, 2005 [Online], Available: http://www.eurasip.org/Proeeedings/Eusipco/Eusipco2005/de-fevent/papers/cr 1390.pdf
    • (2005) Proc. EUSIPCO
    • Bozkurt, B.1    Couvreur, L.2
  • 21
    • 85009135069 scopus 로고    scopus 로고
    • Improving the representation of time structure in front-ends for automatic speech recognition
    • W. J. Holmes, "Improving the representation of time structure in front-ends for automatic speech recognition," in Proc. ICSLP, 2000, vol. 2, pp. 1073-1076.
    • (2000) Proc. ICSLP , vol.2 , pp. 1073-1076
    • Holmes, W.J.1
  • 22
    • 85009118262 scopus 로고    scopus 로고
    • A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR
    • M. Ghulam, T. Fukuda, I. Horikawa, and T. Nitta, "A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR," in Proc. ICSLP, 2004, vol. 1. pp. 133-136.
    • (2004) Proc. ICSLP , vol.1 , pp. 133-136
    • Ghulam, M.1    Fukuda, T.2    Horikawa, I.3    Nitta, T.4
  • 23
    • 33645781551 scopus 로고    scopus 로고
    • Evaluation of a speech recognition/generation method based on HMM and STRAIGHT
    • T. Irino, Y. Minami, T. Nakatani, M. Tsuzaki, and H. Tagawa, "Evaluation of a speech recognition/generation method based on HMM and STRAIGHT." in Proc. ICSLP, 2002, pp. 2545-2548.
    • (2002) Proc. ICSLP , pp. 2545-2548
    • Irino, T.1    Minami, Y.2    Nakatani, T.3    Tsuzaki, M.4    Tagawa, H.5
  • 24
    • 0001455934 scopus 로고
    • A robust algorithm for pitch tracking (RAPT)
    • D. Talkin, W. B. Kleijn and K. K. Paliwal, Eds, New York: Elsevier
    • D. Talkin, W. B. Kleijn and K. K. Paliwal, Eds., "A robust algorithm for pitch tracking (RAPT)," in Speech Coding and Synthesis. New York: Elsevier, 1995. pp. 495-518.
    • (1995) Speech Coding and Synthesis , pp. 495-518
  • 25
    • 85009141768 scopus 로고    scopus 로고
    • Combination of speech features using smoothed heteroscedastic linear discriminant analysis
    • L. Burget, "Combination of speech features using smoothed heteroscedastic linear discriminant analysis," in Proc. ICSLP, 2004, pp. 2549-2552.
    • (2004) Proc. ICSLP , pp. 2549-2552
    • Burget, L.1
  • 26
    • 0030638031 scopus 로고    scopus 로고
    • A post-processing system to yield reduced word error rates: Recognition Output Voting Error Reduction (ROVER)
    • J. G. Fiscus, "A post-processing system to yield reduced word error rates: Recognition Output Voting Error Reduction (ROVER)," in Proc. IEEE Workshop ASKU, 1997, pp. 347-354.
    • (1997) Proc. IEEE Workshop ASKU , pp. 347-354
    • Fiscus, J.G.1
  • 27
    • 0038133932 scopus 로고
    • A statistical approach to metrics for word and syllable recognition
    • M. J. Hunt, "A statistical approach to metrics for word and syllable recognition," J. Acoust. Soc. Amer., vol. 66, pp. S535-536, 1979.
    • (1979) J. Acoust. Soc. Amer , vol.66
    • Hunt, M.J.1
  • 28
    • 0023867341 scopus 로고
    • Speaker dependent and independent speech recognition experiments with an auditory model
    • M. J. Hunt and C. Lefebvre, "Speaker dependent and independent speech recognition experiments with an auditory model," in Proc. IEEE ICASSP, 1988, vol. 1, pp. 215-218.
    • (1988) Proc. IEEE ICASSP , vol.1 , pp. 215-218
    • Hunt, M.J.1    Lefebvre, C.2
  • 29
    • 0032289099 scopus 로고    scopus 로고
    • Heteroscedastic discriminant analysis and reduced rank HMMs for improved recognition
    • N. Kumar and A. G. Andreou, "Heteroscedastic discriminant analysis and reduced rank HMMs for improved recognition," Speech Commun., vol. 26, pp. 283-297, 1998.
    • (1998) Speech Commun , vol.26 , pp. 283-297
    • Kumar, N.1    Andreou, A.G.2
  • 30
    • 33947643073 scopus 로고    scopus 로고
    • Complementarity of speech recognition systems and system combination,
    • Ph.D. dissertation, Brno Univ. of Technol, Brno, Czech Republic
    • L. Burget, "Complementarity of speech recognition systems and system combination," Ph.D. dissertation, Brno Univ. of Technol., Brno, Czech Republic, 2004.
    • (2004)
    • Burget, L.1
  • 31
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • May
    • M. Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 272-281, May 1999.
    • (1999) IEEE Trans. Speech Audio Process , vol.7 , Issue.3 , pp. 272-281
    • Gales, M.1
  • 32
    • 33745199182 scopus 로고    scopus 로고
    • Applying vocal tract length normalization to meeting recordings
    • G. Garau, S. Renals, and T. Hain, "Applying vocal tract length normalization to meeting recordings," in Proc. Eurospeech, 2005, pp. 265-268.
    • (2005) Proc. Eurospeech , pp. 265-268
    • Garau, G.1    Renals, S.2    Hain, T.3
  • 33
    • 85079978469 scopus 로고    scopus 로고
    • Feature combination using linear discriminant analysis and its pitfalls
    • R. Schlüter, A. Zolnay, and H. Ney, "Feature combination using linear discriminant analysis and its pitfalls," in Proc. Interspeech, 2006, pp. 1077-1081.
    • (2006) Proc. Interspeech , pp. 1077-1081
    • Schlüter, R.1    Zolnay, A.2    Ney, H.3
  • 37
    • 0028996854 scopus 로고
    • WSJCAMO: A British English speech corpus for large vocabulary continuous speech recognition
    • Detroit, Ml
    • T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals, "WSJCAMO: A British English speech corpus for large vocabulary continuous speech recognition," in Proc. IEEE ICASSP, Detroit, Ml, 1995, pp. 81-84.
    • (1995) Proc. IEEE ICASSP , pp. 81-84
    • Robinson, T.1    Fransen, J.2    Pye, D.3    Foote, J.4    Renals, S.5
  • 40
    • 34547548247 scopus 로고    scopus 로고
    • T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan, The AMI system for the transcription of speech in meetings, in Proc. IEEE ICASSP, 2007, pp. IV-357-IV-360.
    • T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan, "The AMI system for the transcription of speech in meetings," in Proc. IEEE ICASSP, 2007, pp. IV-357-IV-360.
  • 41
    • 77249114287 scopus 로고    scopus 로고
    • The Rich Transcription 2006 spring meeting recognition evauation
    • Proc. MLMI'06 Machine Learning for Multimodal Interaction, Springer
    • J. G. Fiscus, J. Ajot, M. Michel, and J. S. Garofolo, "The Rich Transcription 2006 spring meeting recognition evauation," in Proc. MLMI'06 Machine Learning for Multimodal Interaction, 2006, no. 4299, pp. 309-322, ser. Lecture Notes in Computer Science. Springer.
    • (2006) ser. Lecture Notes in Computer Science , vol.4299 , pp. 309-322
    • Fiscus, J.G.1    Ajot, J.2    Michel, M.3    Garofolo, J.S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.