SCOPUS 정보 검색 플랫폼

Volumn 49, Issue 10-11, 2007, Pages 763-786

Automatic speech recognition and speech variability: A review

(13) Benzeghiba, M a De Mori, R a Deroo, O a Dupont, S a Erbes, T a Jouvet, D a Fissore, L a Laface, P a Mertins, A a Ris, C a Rose, R a Tyagi, V a Wellekens, C a

a Multitel (Belgium)

Author keywords

Speech analysis; Speech intrinsic variations; Speech modeling; Speech recognition

Indexed keywords

ACOUSTIC NOISE; LEARNING SYSTEMS; POPULATION DYNAMICS; SEMANTICS; SPEECH ANALYSIS;

SPEECH INTRINSIC VARIATIONS; SPEECH MODELING; SPOKEN LANGUAGE SYSTEMS;

SPEECH RECOGNITION;

EID: 34547941599 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2007.02.006 Document Type: Article

Times cited : (440)

References (328)

1
- 85009110440
- Aalburg, S., Hoege, H., 2004. Foreign-accented speaker-independent speech recognition. In: Proceedings of ICSLP, Jeju Island, Korea, pp. 1465-1468.

2
- 4544386378
- Abdel-Haleem, Y.H., Renals, S., Lawrence, N.D., 2004. Acoustic space dimensionality selection and combination using the maximum entropy principle. In: Proceedings of ICASSP, Montreal, Canada, pp. 637-640.

3
- 0029762799
- Abrash, V., Sankar, A., Franco, H., Cohen, M., 1996. Acoustic adaptation using nonlinear transformations of HMM parameters. In: Proceedings of ICASSP, Atlanta, GA, pp. 729-732.

4
- 34547939955
- Achan, K., Roweis, S., Hertzmann, A., Frey, B., 2004. A segmental HMM for speech waveforms. Technical Report UTML Techical Report 2004-001, University of Toronto, Toronto, Canada.

5
- 19944379822
- Investigating syllabic structures and their variation in spontaneous french
- Adda-Decker M., Boula de Mareuil P., Adda G., and Lamel L. Investigating syllabic structures and their variation in spontaneous french. Speech Communication 46 2 (2005) 119-139
- (2005) Speech Communication , vol.46 , Issue.2 , pp. 119-139
- Adda-Decker, M.¹ Boula de Mareuil, P.² Adda, G.³ Lamel, L.⁴

6
- 0033354260
- Pronunciation variants across system configuration, language and speaking style
- Adda-Decker M., and Lamel L. Pronunciation variants across system configuration, language and speaking style. Speech Communication 29 2 (1999) 83-98
- (1999) Speech Communication , vol.29 , Issue.2 , pp. 83-98
- Adda-Decker, M.¹ Lamel, L.²

7
- 0023831656
- A new statistical approach for the automatic segmentation of continuous speech signals
- Andre-Obrecht R. A new statistical approach for the automatic segmentation of continuous speech signals. IEEE Transactions on Acoustics, Speech and Signal Processing 36 1 (1988) 29-40
- (1988) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.36 , Issue.1 , pp. 29-40
- Andre-Obrecht, R.¹

8
- 85009145332
- Prosody-based automatic detection of annoyance and frustration in human-computer
- Denver, Colorado
- Ang J., Dhillon R., Krupski A., Shriberg E., and Stolcke A. Prosody-based automatic detection of annoyance and frustration in human-computer. Proceedings of ICSLP (2002), Denver, Colorado 2037-2040
- (2002) Proceedings of ICSLP , pp. 2037-2040
- Ang, J.¹ Dhillon, R.² Krupski, A.³ Shriberg, E.⁴ Stolcke, A.⁵

9
- 0030165438
- Language accent classification in american english
- Arslan L.M., and Hansen J.H.L. Language accent classification in american english. Speech Communication 18 4 (1996) 353-367
- (1996) Speech Communication , vol.18 , Issue.4 , pp. 353-367
- Arslan, L.M.¹ Hansen, J.H.L.²

10
- 0020602364
- Atal, B., 1983. Efficient coding of LPC parameters by temporal decomposition. In: Proceedings of ICASSP, Boston, USA, pp. 81-84.

11
- 0016962193
- A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition
- Atal B., and Rabiner L. A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 24 3 (1976) 201-212
- (1976) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.24 , Issue.3 , pp. 201-212
- Atal, B.¹ Rabiner, L.²

12
- 84863773378
- Frequency domain linear prediction for temporal features
- St. Thomas, US Virgin Islands, USA
- Athineos M., and Ellis D. Frequency domain linear prediction for temporal features. Proceedings of ASRU (2003), St. Thomas, US Virgin Islands, USA 261-266
- (2003) Proceedings of ASRU , pp. 261-266
- Athineos, M.¹ Ellis, D.²

13
- 0035202534
- Taking the hit: leaving some lexical competition to be resolved post-lexically
- Bard E.G., Sotillo C., Kelly M.L., and Aylett M.P. Taking the hit: leaving some lexical competition to be resolved post-lexically. Language and Cognitive Processes 15 5-6 (2001) 731-737
- (2001) Language and Cognitive Processes , vol.15 , Issue.5-6 , pp. 731-737
- Bard, E.G.¹ Sotillo, C.² Kelly, M.L.³ Aylett, M.P.⁴

14
- 33745201418
- Barrault, L., de Mori, R., Gemello, R., Mana, F., Matrouf, D., 2005. Variability of automatic speech recognition systems using different features. In: Proceedings of Interspeech, Lisboa, Portugal, pp. 221-224.

15
- 34547942942
- Bartkova, K., 2003. Generating proper name pronunciation variants for automatic speech recognition, In: Proceedings of ICPhS. Barcelona, Spain.

16
- 34547945689
- Bartkova, K., Jouvet, D., 1999. Language based phone model combination for ASR adaptation to foreign accent. In: Proceedings of ICPhS, San Francisco, USA, pp. 1725-1728.

17
- 34547933367
- Bartkova, K., Jouvet, D., 2004. Multiple models for improved speech recognition for non-native speakers. In: Proceedings of SPECOM, Saint Petersburg, Russia.

18
- 34547931925
- Beattie, V., Edmondson, S., Miller, D., Patel, Y., Talvola, G., 1995. An integrated multidialect speech recognition system with optional speaker adaptation. In: Proceedings of Eurospeech, Madrid, Spain, pp. 1123-1126.

19
- 34547960250
- Beauford, J.Q., 1999. Compensating for variation in speaking rate, PhD thesis, Electrical Engineering, University of Pittsburgh.

20
- 0037324538
- Effects of disfluencies, predictability, and utterance position on word form variation in english conversation
- Bell A., Jurafsky D., Fosler-Lussier E., Girand C., Gregory M., and Gildea D. Effects of disfluencies, predictability, and utterance position on word form variation in english conversation. The Journal of the Acoustical Society of America 113 2 (2003) 1001-1024
- (2003) The Journal of the Acoustical Society of America , vol.113 , Issue.2 , pp. 1001-1024
- Bell, A.¹ Jurafsky, D.² Fosler-Lussier, E.³ Girand, C.⁴ Gregory, M.⁵ Gildea, D.⁶

21
- 70249086510
- Benitez, C., Burget, L., Chen, B., Dupont, S., Garudadri, H., Hermansky, H., Jain, P., Kajarekar, S., Sivadas, S., 2001. Robust ASR front-end using spectral based and discriminant features: experiments on the aurora task. In: Proceedings of Eurospeech, Aalborg, Denmark, pp. 429-432.

22
- 0026402806
- Adaptation to a speaker's voice in a speech recognition system based on synthetic phoneme references
- Blomberg M. Adaptation to a speaker's voice in a speech recognition system based on synthetic phoneme references. Speech Communication 10 5-6 (1991) 453-461
- (1991) Speech Communication , vol.10 , Issue.5-6 , pp. 453-461
- Blomberg, M.¹

23
- 34547939007
- Bonaventura, P., Gallochio, F., Mari, J., Micca, G., 1998. Speech recognition methods for non-native pronunciation variants. In: Proceedings ISCA Workshop on modelling pronunciation variations for automatic speech recognition, Rolduc, Netherlands, pp. 17-23.

24
- 34547949602
- Bonaventura, P., Gallochio, F., Micca, G., 1997. Multilingual speech recognition for flexible vocabularies. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 355-358.

25
- 85032661370
- Bou-Ghazale, S.E., Hansen, J.L.H., 1994. Duration and spectral based stress token generation for HMM speech recognition under stress. In: Proceedings of ICASSP, Adelaide, Australia, pp. 413-416.

26
- 34547953239
- Bou-Ghazale, S.E., Hansen, J.L.H., 1995. Improving recognition and synthesis of stressed speech via feature perturbation in a source generator framework. In: ECSA-NATO Proceedings Speech Under Stress Workshop, Lisbon, Portugal, pp. 45-48.

27
- 0030643240
- Bourlard, H., Dupont, D., 1997. Sub-band based speech recognition. In: Proceedings of ICASSP, Munich, Germany, pp. 1251-1254.

28
- 84863687026
- Bozkurt, B., Couvreur, L., 2005. On the use of phase information for speech recognition. In: Proceedings of Eusipco, Antalya, Turkey.

29
- 17644365443
- Zeros of z-transform representation with application to source-filter separation in speech
- Bozkurt B., Doval B., d'Alessandro C., and Dutoit T. Zeros of z-transform representation with application to source-filter separation in speech. IEEE Signal Processing Letters 12 4 (2005) 344-347
- (2005) IEEE Signal Processing Letters , vol.12 , Issue.4 , pp. 344-347
- Bozkurt, B.¹ Doval, B.² d'Alessandro, C.³ Dutoit, T.⁴

30
- 33747701692
- Brugnara, F., De Mori, R., Giuliani, D., Omologo, M., 1992. A family of parallel Hidden Markov Models. In: Proceedings of ICASSP, vol. 1. pp. 377-380.

31
- 3042820894
- Automatic recognition of spontaneous speech for access to multilingual oral history archives
- Byrne W., Doermann D., Franz M., Gustman S., Hajic J., Oard D., Picheny M., Psutka J., Ramabhadran B., Soergel D., Ward T., and Wei-Jin Z. Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Transactions on Speech and Audio Processing 12 4 (2004) 420-435
- (2004) IEEE Transactions on Speech and Audio Processing , vol.12 , Issue.4 , pp. 420-435
- Byrne, W.¹ Doermann, D.² Franz, M.³ Gustman, S.⁴ Hajic, J.⁵ Oard, D.⁶ Picheny, M.⁷ Psutka, J.⁸ Ramabhadran, B.⁹ Soergel, D.¹⁰ Ward, T.¹¹ Wei-Jin, Z.¹²

32
- 0030376663
- Carey, M., Parris, E., Lloyd-Thomas, H., Bennett, S., 1996. Robust prosodic features for speaker identification. In: Proceedings of ICSLP, Philadelphia, Pennsylvania, USA, pp. 1800-1803.

33
- 33947656987
- Carlson, B., Clements, M., 1992. Speech recognition in noise using a projection-based likelihood measure for mixture density HMMs. In: Proceedings of ICASSP, San Francisco, CA, pp. 237-240.

34
- 34547936985
- Chase, L., 1997. Error-responsive feedback mechanisms for speech recognizers. PhD thesis, Carnegie Mellon University.

35
- 34547949195
- Chen, C.J., Gopinath, R.A., Monkowski, M.D., Picheny, M.A., Shen, K., 1997. New methods in continuous mandarin speech recognition. In: Proceedings of Eurospeech, pp. 1543-1546.

36
- 0034842303
- Chen, C.J., Li, H., Shen, L., Fu, G., 2001. Recognize tone languages using pitch information on the main vowel of each syllable. In: Proceedings of ICASSP, vol. 1. pp. 61-64.

37
- 0023168987
- Chen, Y., 1987. Cepstral domain stress compensation for robust speech recognition. In: Proceedings of ICASSP, Dallas, TX, pp. 717-720.

38
- 0032627220
- Chesta, C., Laface, P., Ravera, F., 1999. Connected digit recognition using short and long duration models. In: Proceedings of ICASSP, vol. 2. pp. 557-560.

39
- 34547928199
- Chesta, C., Siohan, O., Lee, C.-H., 1999. Maximum a posteriori linear regression for Hidden Markov Model adaptation. In: Proceedings of Eurospeech, Budapest, Hungary, pp. 211-214.

40
- 0019680114
- Chollet, G.F., Astier, A.B.P., Rossi, M., 1981. Evaluating the performance of speech recognizers at the acoustic-phonetic level. In: in Proceedings of ICASSP, Atlanta, USA, pp. 758-761.

41
- 79551471715
- Cincarek, T., Gruhn, R. Nakamura, S., 2004. Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models. In: Proceedings of ICSLP, Jeju Island, Korea, pp. 1509-1512.

42
- 0026686048
- Entropy based algorithms for best basis selection
- Coifman R.R., and Wickerhauser M.V. Entropy based algorithms for best basis selection. IEEE Transactions on Information Theory 38 2 (1992) 713-718
- (1992) IEEE Transactions on Information Theory , vol.38 , Issue.2 , pp. 713-718
- Coifman, R.R.¹ Wickerhauser, M.V.²

43
- 33745190977
- Colibro, D., Fissore, L., Popovici, C., Vair, C., Laface, P., 2005. Learning pronunciation and formulation variants in continuous speech applications. In: Proceedings of ICASSP, Philadelphia, PA, pp. 1001-1004.

44
- 0037382510
- Describing the emotional states that are expressed in speech
- Cowie R., and Cornelius R.R. Describing the emotional states that are expressed in speech. Speech Communication Special Issue on Speech and Emotions 40 1-2 (2003) 5-32
- (2003) Speech Communication Special Issue on Speech and Emotions , vol.40 , Issue.1-2 , pp. 5-32
- Cowie, R.¹ Cornelius, R.R.²

45
- 0034140299
- Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms
- Cucchiarini C., Strik H., and Boves L. Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication 30 2-3 (2000) 109-119
- (2000) Speech Communication , vol.30 , Issue.2-3 , pp. 109-119
- Cucchiarini, C.¹ Strik, H.² Boves, L.³

46
- 34547948254
- Dalsgaard, P., Andersen, O., Barry, W., 1998. Cross-language merged speech units and their descriptive phonetic correlates. In: Proceedings of ICSLP, Sydney, Australia, pp. 482-485.

47
- 85009136762
- D'Arcy, S.M., Wong, L.P., Russell, M.J., 2004. Recognition of read and spontaneous children's speech using two new corpora. In: Proceedings of ICSLP, Jeju Island, Korea.

48
- 34547935449
- Das, S., Lubensky, D., Wu, C., 1999. Towards robust speech recognition in the telephony network environment - cellular and landline conditions. In: Proceedings of Eurospeech, Budapest, Hungary, pp. 1959-1962.

49
- 0031644298
- Das, S., Nix, D., Picheny, M., 1998. Improvements in children speech recognition performance. In: Proceedings of ICASSP, vol. 1. Seattle, USA, pp. 433-436.

50
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Davis S.B., and Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28 (1980) 357-366
- (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

51
- 4444238905
- Evaluation of formant-like features on an automatic vowel classification task
- de Wet F., Weber K., Boves L., Cranen B., Bengio S., and Bourlard H. Evaluation of formant-like features on an automatic vowel classification task. The Journal of the Acoustical Society of America 116 3 (2004) 1781-1792
- (2004) The Journal of the Acoustical Society of America , vol.116 , Issue.3 , pp. 1781-1792
- de Wet, F.¹ Weber, K.² Boves, L.³ Cranen, B.⁴ Bengio, S.⁵ Bourlard, H.⁶

52
- 0343867184
- Recognition of syllables in a tone language
- Demeechai T., and Mäkeläinen K. Recognition of syllables in a tone language. Speech Communication 33 3 (2001) 241-254
- (2001) Speech Communication , vol.33 , Issue.3 , pp. 241-254
- Demeechai, T.¹ Mäkeläinen, K.²

53
- 84988831195
- Demuynck, K., Garcia, O., Van Compernolle, D., 2004. Synthesizing speech from speech recognition parameters. In: Proceedings of ICSLP'04, Jeju Island, Korea.

54
- 85009171168
- Deng, Y., Mahajan, M., Acero, A., 2003. Estimating speech recognition error rate without acoustic test data. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 929-932.

55
- 34547946960
- Di Benedetto, M.-G., Liénard, J.-S., 1992. Extrinsic normalization of vowel formant values based on cardinal vowels mapping. In: Proceedings of ICSLP, Alberta, USA, pp. 579-582.

56
- 34547962914
- Disfluency in spontaneous speech (diss'05). 2005. Aix-en-Provence, France.

57
- 62249192231
- Doddington, G., 2003. Word alignment issues in ASR scoring. In: Proceedings of ASRU, US Virgin Islands, pp. 630-633.

58
- 34547956910
- Draxler, C., Burger, S., 1997. Identification of regional variants of high german from digit sequences in german telephone speech. In: Proceedings of Eurospeech, pp. 747-750.

59
- 0003472470
- Wiley, New York
- Duda R.O., and Hart P.E. Pattern Classification and Scene Analysis (1973), Wiley, New York
- (1973) Pattern Classification and Scene Analysis
- Duda, R.O.¹ Hart, P.E.²

60
- 34547959626
- Dupont, S., Ris, C., Couvreur, L., Boite, J.-M., 2005. A study of implicit anf explicit modeling of coarticulation and pronunciation variation. In: Proceedings of Interspeech, Lisboa, Portugal, pp. 1353-1356.

61
- 33846194657
- Dupont, S., Ris, C., Deroo, O., Poitoux, S., 2005. Feature extraction and acoustic modeling: an approach for improved generalization across languages and accents. In: Proceedings of ASRU, San Juan, Puerto-Rico, pp. 29-34.

62
- 84871614151
- Eide, E., 2001. Distinctive features for use in automatic speech recognition. In: Proceedings of Eurospeech, Aalborg, Denmark, pp. 1613-1616.

63
- 0029725604
- Eide, E., Gish, H., 1996. A parametric approach to vocal tract length normalization. In: Proceedings of ICASSP, Atlanta, GA, pp. 346-348.

64
- 0029725604
- Eide, E., Gish, H., 1996. A parametric approach to vocal tract length normalization. In: Proceedings of ICASSP, Atlanta, GA, pp. 346-349.

65
- 0028996886
- Eide, E., Gish, H., Jeanrenaud, P., Mielke, A., 1995. Understanding and improving speech recognition performance through the use of diagnostic tools. In: Proceedings of ICASSP, Detroit, Michigan, pp. 221-224.

66
- 0035426930
- Xenophones: an investigation of phone set expansion in swedish and implications for speech recognition and speech synthesis
- Eklund R., and Lindström A. Xenophones: an investigation of phone set expansion in swedish and implications for speech recognition and speech synthesis. Speech Communication 35 1-2 (2001) 81-102
- (2001) Speech Communication , vol.35 , Issue.1-2 , pp. 81-102
- Eklund, R.¹ Lindström, A.²

67
- 34547945866
- Elenius, D., Blomberg, M., 2004. Comparing speech recognition for adults and children. In: Proceedings of FONETIK, Stockholm, Sweden, pp. 156-159.

68
- 0034848926
- Ellis, D., Singh, R., Sivadas, S., 2001. Tandem acoustic modeling in large-vocabulary recognition. In: Proceedings of ICASSP, Salt Lake City, USA, pp. 517-520.

69
- 34547939006
- ESCA Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition, 1998.

70
- 0030373675
- Eskenazi, M., 1996. Detection of foreign speakers' pronunciation errors for second language training-preliminary results. In: Proceedings of ICSLP, Philadelphia, PA, pp. 1465-1468.

71
- 33745197755
- Kids: a database of children's speech
- Eskenazi M. Kids: a database of children's speech. The Journal of the Acoustical Society of America (1996) 2759
- (1996) The Journal of the Acoustical Society of America , pp. 2759
- Eskenazi, M.¹

72
- 34547942941
- Eskenazi, M., Pelton, G., 2002. Pinpointing pronunciation errors in children speech: examining the role of the speech recognizer. In: Proceedings of the PMLA Workshop, Colorado, USA.

73
- 0033692966
- Falthauser, R., Pfau, T., Ruske, G., 2000. On-line speaking rate estimation using gaussian mixture models. In: Proceedings of ICASSP, Istanbul, Turkey, pp. 1355-1358.

74
- 0003418124
- Mouton, The Hague
- Fant G. Acoustic Theory of Speech Production (1960), Mouton, The Hague
- (1960) Acoustic Theory of Speech Production
- Fant, G.¹

75
- 0030638031
- Fiscus, J.G., 1997. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER). In: Proceedings of ASRU, pp. 347-354.

76
- 34547943338
- Fitt, S., 1995. The pronunciation of unfamiliar native and non-native town names. In: Proceedings of Eurospeech, Madrid, Spain, pp. 2227-2230.

77
- 0003757962
- Springer-Verlag, Berlin-Heidelberg-New York
- Flanagan J. Speech Analysis and Synthesis and Perception (1972), Springer-Verlag, Berlin-Heidelberg-New York
- (1972) Speech Analysis and Synthesis and Perception
- Flanagan, J.¹

78
- 0038188722
- Interaction between the native and second language phonetic subsystems
- Flege J.E., Schirru C., and MacKay I.R.A. Interaction between the native and second language phonetic subsystems. Speech Communication 40 (2003) 467-491
- (2003) Speech Communication , vol.40 , pp. 467-491
- Flege, J.E.¹ Schirru, C.² MacKay, I.R.A.³

79
- 19944369427
- A framework for predicting speech recognition errors
- Fosler-Lussier E., Amdal I., and Kuo H.-K.J. A framework for predicting speech recognition errors. Speech Communication 46 2 (2005) 153-170
- (2005) Speech Communication , vol.46 , Issue.2 , pp. 153-170
- Fosler-Lussier, E.¹ Amdal, I.² Kuo, H.-K.J.³

80
- 0033321442
- Effects of speaking rate and word predictability on conversational pronunciations
- Fosler-Lussier E., and Morgan N. Effects of speaking rate and word predictability on conversational pronunciations. Speech Communication 29 2-4 (1999) 137-158
- (1999) Speech Communication , vol.29 , Issue.2-4 , pp. 137-158
- Fosler-Lussier, E.¹ Morgan, N.²

81
- 0034140838
- Combination of machine scores for automatic grading of pronunciation quality
- Franco H., Neumeyer L., Digalakis V., and Ronen O. Combination of machine scores for automatic grading of pronunciation quality. Speech Communication 30 2-3 (2000) 121-130
- (2000) Speech Communication , vol.30 , Issue.2-3 , pp. 121-130
- Franco, H.¹ Neumeyer, L.² Digalakis, V.³ Ronen, O.⁴

82
- 0034855363
- Fujinaga, K., Nakai, M., Shimodaira, H., Sagayama, S., 2001. Multiple-regression Hidden Markov Model. In: Proceedings of ICASSP, vol. 1. Salt Lake City, USA, pp. 513-516.

83
- 0032097263
- Academic Press, New York
- Fukunaga K. Introduction to Statistical Pattern Recognition (1972), Academic Press, New York
- (1972) Introduction to Statistical Pattern Recognition
- Fukunaga, K.¹

84
- 0032677425
- Fung, P., Liu, W.K., 1999. Fast accent identification and accented speech recognition. In: Proceedings of ICASSP, Phoenix, Arizona, USA, pp. 221-224.

85
- 3042697131
- Introduction to the special issue on spontaneous speech processing
- Furui S., Beckman M., Hirschberg J.B., Itahashi S., Kawahara T., Nakamura S., and Narayanan S. Introduction to the special issue on spontaneous speech processing. IEEE Transactions on Speech and Audio Processing 12 4 (2004) 349-350
- (2004) IEEE Transactions on Speech and Audio Processing , vol.12 , Issue.4 , pp. 349-350
- Furui, S.¹ Beckman, M.² Hirschberg, J.B.³ Itahashi, S.⁴ Kawahara, T.⁵ Nakamura, S.⁶ Narayanan, S.⁷

86
- 34547956384
- Gales, M.J.F., 1998. Cluster adaptive training for speech recognition. In: Proceedings of ICSLP, Sydney, Australia, pp. 1783-1786.

87
- 0032638856
- Semi-tied covariance matrices for hidden markov models
- Gales M.J.F. Semi-tied covariance matrices for hidden markov models. IEEE Transactions on Speech and Audio Processing 7 (1999) 272-281
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , pp. 272-281
- Gales, M.J.F.¹

88
- 34547928595
- Gales, M.J.F., 2001. Acoustic factorization. In: Proceedings of ASRU, Madona di Campiglio, Italy.

89
- 0034853390
- Gales, M.J.F., 2001. Multiple-cluster adaptive training schemes. In: Proceedings of ICASSP, Salt Lake City, Utah, USA, pp. 361-364.

90
- 34547955597
- Gao, Y., Ramabhadran, B., Chen, J., Erdogan, H., Picheny, M., 2001. Innovative approaches for large vocabulary name recognition. In: Proceedings of ICASSP, Salt Lake City, Utah, pp. 333-336.

91
- 0031632620
- Garner, P., Holmes, W., 1998. On the robust incorporation of formant features into hidden markov models for automatic speech recognition. In: Proceedings of ICASSP, pp. 1-4.

92
- 0000673740
- Speaker identification and message identification in speech recognition
- Garvin P.L., and Ladefoged P. Speaker identification and message identification in speech recognition. Phonetica 9 (1963) 193-199
- (1963) Phonetica , vol.9 , pp. 193-199
- Garvin, P.L.¹ Ladefoged, P.²

93
- 27744486572
- Multiple resolution analysis for robust automatic speech recognition
- Gemello R., Mana F., Albesano D., and De Mori R. Multiple resolution analysis for robust automatic speech recognition. Computer, Speech and Language 20 (2006) 2-21
- (2006) Computer, Speech and Language , vol.20 , pp. 2-21
- Gemello, R.¹ Mana, F.² Albesano, D.³ De Mori, R.⁴

94
- 34547931739
- Girardi, A., Shikano, K., Nakamura, S., 1998. Creating speaker independent HMM models for restricted database using straight-tempo morphing. In: Proceedings of ICSLP, Sydney, Australia, pp. 687-690.

95
- 85143190854
- Giuliani, D., Gerosa, M., 2003. Investigating recognition of children speech. In: Proceedings of ICASSP, Hong Kong, pp. 137-140.

96
- 2442562479
- Segmental minimum Bayes-risk decoding for automatic speech recognition
- Goel V., Kumar S., and Byrne W. Segmental minimum Bayes-risk decoding for automatic speech recognition. Transactions of IEEE Speech and Audio Processing 12 3 (2004) 234-249
- (2004) Transactions of IEEE Speech and Audio Processing , vol.12 , Issue.3 , pp. 234-249
- Goel, V.¹ Kumar, S.² Byrne, W.³

97
- 84892187452
- Gopinath, R.A., 1998. Maximum likelihood modeling with gaussian distributions for classification. In: Proceedings of ICASSP, Seattle, WA, pp. 661-664.

98
- 0346008163
- Generating non-native pronunciation variants for lexicon adaptation
- Goronzy S., Rapp S., and Kompe R. Generating non-native pronunciation variants for lexicon adaptation. Speech Communication 42 1 (2004) 109-123
- (2004) Speech Communication , vol.42 , Issue.1 , pp. 109-123
- Goronzy, S.¹ Rapp, S.² Kompe, R.³

99
- 4544351495
- Graciarena, M., France, H., Zheng, J., Vergyri, D., Stolcke, A., 2004. Voicing feature integration in SRI's DECIPHER LVCSR system. In: Proceedings of ICASSP, Montreal, Canada, pp. 921-924.

100
- 34547952311
- Greenberg, S., Chang, S., 2000. Linguistic dissection of switchboard-corpus automatic speech recognition systems. In: Proceedings of ISCA Workshop on Automatic Speech Recognition: Challenges for the New Millenium, Paris, France.

101
- 34547950990
- Greenberg, S., Fosler-Lussier, E., 2000. The uninvited guest: information's role in guiding the production of spontaneous speech. In: in Proceedings of the Crest Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling. Kloster Seeon, Germany.

102
- 0029726002
- Gupta, S.K., Soong, F., Haimi-Cohen, R., 1996. High-accuracy connected digit recognition for mobile applications. In: Proceedings of ICASSP, vol. 1, pp. 57-60.

103
- 85017287487
- Haeb-Umbach, R., Ney, H., 1992. Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: Proceedings of ICASSP, San Francisco, CA, pp. 13-16.

104
- 84946707630
- Children speech recognition with application to interactive books and tutors
- St. Thomas, US Virgin Islands
- Hagen A., Pellom B., and Cole R. Children speech recognition with application to interactive books and tutors. Proceedings of ASRU (2003), St. Thomas, US Virgin Islands 186-191
- (2003) Proceedings of ASRU , pp. 186-191
- Hagen, A.¹ Pellom, B.² Cole, R.³

105
- 19944415893
- Implicit modelling of pronunciation variation in automatic speech recognition
- Hain T. Implicit modelling of pronunciation variation in automatic speech recognition. Speech Communication 46 2 (2005) 171-188
- (2005) Speech Communication , vol.46 , Issue.2 , pp. 171-188
- Hain, T.¹

106
- 34547934712
- Hain, T., Woodland, P.C., 1999. Dynamic HMM selection for continuous speech recognition. In: Proceedings of Eurospeech, Budapest, Hungary, pp. 1327-1330.

107
- 0024914051
- Hansen, J.H.L., 1989. Evaluation of acoustic correlates of speech under stress for robust speech recognition. In: IEEE Proceedings 15th Northeast Bioengineering Conference, Boston, MA. Boston, Mass, pp. 31-32.

108
- 0027307739
- Hansen, J.H.L., 1993. Adaptive source generator compensation and enhancement for speech recognition in noisy stressful environments. In: Proceedings of ICASSP, Minneapolis, Minnesota, pp. 95-98.

109
- 34547948253
- A source generator framework for analysis of acoustic correlates of speech under stress. part i: pitch, duration, and intensity effects
- Hansen J.H.L. A source generator framework for analysis of acoustic correlates of speech under stress. part i: pitch, duration, and intensity effects. The Journal of the Acoustical Society of America (1995)
- (1995) The Journal of the Acoustical Society of America
- Hansen, J.H.L.¹

110
- 0030283741
- Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition
- Hansen J.H.L. Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communications, Special Issue on Speech Under Stress 20 2 (1996) 151-170
- (1996) Speech Communications, Special Issue on Speech Under Stress , vol.20 , Issue.2 , pp. 151-170
- Hansen, J.H.L.¹

111
- 0025681006
- Hanson, B.A., Applebaum, T., 1990. Robust speaker-independent word recognition using instantaneous, dynamic and acceleration features: experiments with Lombard and noisy speech. In: Proceedings of ICASSP, Albuquerque, New Mexico, pp. 857-860.

112
- 0035510539
- Noise robust speech parameterization using multiresolution feature extraction
- Hariharan R., Kiss I., and Viikki O. Noise robust speech parameterization using multiresolution feature extraction. IEEE Transactions on Speech and Audio Processing 9 8 (2001) 856-865
- (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.8 , pp. 856-865
- Hariharan, R.¹ Kiss, I.² Viikki, O.³

113
- 0003807773
- Prentice-Hall Publishers, NJ, USA
- Haykin S. Adaptive Filter Theory (1993), Prentice-Hall Publishers, NJ, USA
- (1993) Adaptive Filter Theory
- Haykin, S.¹

114
- 0003396123
- John Wiley and Sons, New York, USA
- Haykin S. Communication Systems. third ed. (1994), John Wiley and Sons, New York, USA
- (1994) Communication Systems. third ed.
- Haykin, S.¹

115
- 19944423811
- Pronunciation modeling using a finite-state transducer representation
- Hazen T.J., Hetherington I.L., Shu H., and Livescu K. Pronunciation modeling using a finite-state transducer representation. Speech Communication 46 2 (2005) 189-203
- (2005) Speech Communication , vol.46 , Issue.2 , pp. 189-203
- Hazen, T.J.¹ Hetherington, I.L.² Shu, H.³ Livescu, K.⁴

116
- 0042594404
- Fast model selection based speaker adaptation for nonnative speech
- He X., and Zhao Y. Fast model selection based speaker adaptation for nonnative speech. IEEE Transactions on Speech and Audio Processing 11 4 (2003) 298-307
- (2003) IEEE Transactions on Speech and Audio Processing , vol.11 , Issue.4 , pp. 298-307
- He, X.¹ Zhao, Y.²

117
- 85009067731
- Hegde, R.M., Murthy, H.A., Gadde, V.R.R., 2004. Continuous speech recognition using joint features derived from the modified group delay function and MFCC. In: Proceedings of ICSLP, Jeju, Korea, pp. 905-908.

118
- 33646756506
- Hegde, R.M., Murthy, H.A., Rao, G.V.R., 2005. Speech processing using joint features derived from the modified group delay function. In: Proceedings of ICASSP, vol. I. Philadelphia, PA, pp. 541-544.

119
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Hermansky H. Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87 4 (1990) 1738-1752
- (1990) The Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

120
- 0028517164
- RASTA processing of speech
- Hermansky H., and Morgan N. RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2 4 (1994) 578-589
- (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

121
- 34547961189
- Hermansky, H., Sharma, S., 1998. TRAPS: classifiers of temporal patterns. In: Proceedings of ICSLP, Sydney, Australia, pp. 1003-1006.

122
- 34547945010
- Hetherington, L., 1995. New words: Effect on recognition performance and incorporation issues. In: Proceedings of Eurospeech, Madrid, Spain, pp. 1645-1648.

123
- 2942568545
- Prosodic and other cues to speech recognition failures
- Hirschberg J., Litman D., and Swerts M. Prosodic and other cues to speech recognition failures. Speech Communication 43 1-2 (2004) 155-175
- (2004) Speech Communication , vol.43 , Issue.1-2 , pp. 155-175
- Hirschberg, J.¹ Litman, D.² Swerts, M.³

124
- 34547949601
- Holmes, J.N., Holmes, W.J., Garner, P.N., 1997. Using formant frequencies in speech recognition. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 2083-2086.

125
- 34547960808
- Combining frame and segment based models for large vocabulary continuous speech recognition
- Keystone, Colorado
- Hon H.W., and Wang K. Combining frame and segment based models for large vocabulary continuous speech recognition. Proceedings of ASRU (1999), Keystone, Colorado
- (1999) Proceedings of ASRU
- Hon, H.W.¹ Wang, K.²

126
- 85009113198
- Huang, C., Chen, T., Li, S., Chang, E., Zhou, J., 2001. Analysis of speaker variability. In: Proceedings of Eurospeech, Aalborg, Denmark, pp. 1377-1380.

127
- 0026388713
- Huang, X., Lee, K., 1991. On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. In: Proceedings of ICASSP, Toronto, Canada, pp. 877-880.

128
- 0033677211
- Huank, H.C.-H., Seide, F., 2000. Pitch tracking and tone features for Mandarin speech recognition. In: Proceedings of ICASSP, vol. 3. pp. 1523-1526.

129
- 34547955984
- Humphries, J.J., Woodland, P.C., Pearce, D., 1996. Using accent-specific pronunciation modelling for robust speech recognition. In: Proceedings of ICSLP, Rhodes, Greece, pp. 2367-2370.

130
- 0141517832
- Spectral signal processing for ASR
- Keystone, Colorado
- Hunt M.J. Spectral signal processing for ASR. Proceedings of ASRU (1999), Keystone, Colorado
- (1999) Proceedings of ASRU
- Hunt, M.J.¹

131
- 85009080383
- Hunt, M.J., 2004. Speech recognition, syllabification and statistical phonetics. In: Proceedings of ICSLP, Jeju Island, Korea.

132
- 0024905238
- Hunt, M.J., Lefebvre, C., 1989. A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. In: Proceedings of ICASSP, Glasgow, UK, pp. 262-265.

133
- 34547962708
- Iivonen, A., Harinen, K., Keinanen, L., Kirjavainen, J., Meister, E., Tuuri, L., 2003. Development of a multiparametric speaker profile for speaker recognition. In: Proceedings of ICPhS, Barcelona, Spain, pp. 695-698.

134
- 34547956725
- ISCA Tutorial and Research Workshop, 2002. Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (PMLA-2002).

135
- 1142288445
- Word perception in fast speech: artificially time-compressed vs. naturally produced fast speech
- Janse E. Word perception in fast speech: artificially time-compressed vs. naturally produced fast speech. Speech Communication 42 2 (2004) 155-173
- (2004) Speech Communication , vol.42 , Issue.2 , pp. 155-173
- Janse, E.¹

136
- 34547954621
- Acoustic feature selection using speech recognizers
- Keystone, Colorado
- Jiang K., and Huang X. Acoustic feature selection using speech recognizers. Proceedings of ASRU (1999), Keystone, Colorado
- (1999) Proceedings of ASRU
- Jiang, K.¹ Huang, X.²

137
- 0023165215
- On the use of bandpass liftering in speech recognition
- Juang B.-H., Rabiner L.R., and Wilpon J.G. On the use of bandpass liftering in speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 35 (1987) 947-953
- (1987) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.35 , pp. 947-953
- Juang, B.-H.¹ Rabiner, L.R.² Wilpon, J.G.³

138
- 0034853397
- Jurafsky, D., Ward, W., Jianping, Z., Herold, K., Xiuyang, Y., Sen, Z., 2001. What kind of pronunciation variation is hard for triphones to model? In: Proceedings of ICASSP, Salt Lake City, Utah, pp. 577-580.

139
- 34547958273
- Kajarekar, S., Malayath, N., Hermansky, H., 1999. Analysis of sources of variability in speech. In: Proceedings of Eurospeech, Budapest, Hungary, pp. 343-346.

140
- 33745191628
- Analysis of speaker and channel variability in speech
- Keystone, Colorado
- Kajarekar S., Malayath N., and Hermansky H. Analysis of speaker and channel variability in speech. Proceedings of ASRU (1999), Keystone, Colorado
- (1999) Proceedings of ASRU
- Kajarekar, S.¹ Malayath, N.² Hermansky, H.³

141
- 18744386134
- Eigenvoice modeling with sparse training data
- Kenny P., Boulianne G., and Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing 13 3 (2005) 345-354
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 345-354
- Kenny, P.¹ Boulianne, G.² Dumouchel, P.³

142
- 0030371812
- Köhler, J., 1996. Multilingual phonemes recognition exploiting acoustic-phonetic similarities of sounds. In: Proceedings of ICSLP, Philadelphia, PA, pp. 2195-2198.

143
- 34547960807
- Konig, Y., Morgan, N., 1992. GDNN: a gender-dependent neural network for continuous speech recognition. In: Proceedings of Int. Joint Conf. on Neural Networks, vol. 2. Baltimore, Maryland, pp. 332-337.

144
- 1842580255
- Rapid online adaptation using speaker space model evolution
- Kim D.K., and Kim N.S. Rapid online adaptation using speaker space model evolution. Speech Communication 42 3-4 (2004) 467-478
- (2004) Speech Communication , vol.42 , Issue.3-4 , pp. 467-478
- Kim, D.K.¹ Kim, N.S.²

145
- 0032136330
- Robust speech recognition using the modulation spectrogram
- Kingsbury B., Morgan N., and Greenberg S. Robust speech recognition using the modulation spectrogram. Speech Communication 25 1-3 (1998) 117-132
- (1998) Speech Communication , vol.25 , Issue.1-3 , pp. 117-132
- Kingsbury, B.¹ Morgan, N.² Greenberg, S.³

146
- 17344389852
- Kingsbury, B., Saon, G., Mangua, L., Padmanabhan, M., Sarikaya, R., 2002. Robust speech recognition in noisy environments: the 2001 IBM SPINE evaluation system. In: Proceedings of ICASSP, vol. I. Orlando, FL, pp. 53-56.

147
- 34547928978
- Kirchhoff, K., 1998. Combining articulatory and acoustic information for speech recognition in noise and reverberant environments. In: Proceedings of ICSLP, Sydney, Australia, pp. 891-894.

148
- 44849123618
- Kitaoka, N., Yamada, D., Nakagawa, S., 2002. Speaker independent speech recognition using features based on glottal sound source. In: Proceedings of ICSLP, Denver, USA, pp. 2125-2128.

149
- 85009233038
- Kleinschmidt, M., Gelbart, D., 2002. Improving word accuracy with gabor feature extraction. In: Proceedings of ICSLP, Denver, Colorado, pp. 25-28.

150
- 0030672089
- Korkmazskiy, F., Juang, B.-H., Soong, F., 1997. Generalized mixture of HMMs for continuous speech recognition. In: Proceedings of ICASSP, vol. 2. pp. 1443-1446.

151
- 85009062725
- Korkmazsky, F., Deviren, M., Fohr, D., Illina, I., 2004. Hidden factor dynamic bayesian networks for speech recognition. In: Proceedings of ICSLP, Jeju Island, Korea.

152
- 0039670397
- Kubala, F., Anastasakos, A., Makhoul, J., Nguyen, L., Schwartz, R., Zavaliagkos, E., 1994. Comparative experiments on large vocabulary speech recognition. In: Proceedings of ICASSP, Adelaide, Australia, pp. 561-564.

153
- 0032289099
- Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition
- Kumar N., and Andreou A.G. Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26 4 (1998) 283-297
- (1998) Speech Communication , vol.26 , Issue.4 , pp. 283-297
- Kumar, N.¹ Andreou, A.G.²

154
- 0032187138
- An inverse signal approach to computing the envelope of a real valued signal
- Kumaresan R. An inverse signal approach to computing the envelope of a real valued signal. IEEE Signal Processing Letters 5 10 (1998) 256-259
- (1998) IEEE Signal Processing Letters , vol.5 , Issue.10 , pp. 256-259
- Kumaresan, R.¹

155
- 0033004349
- Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications
- Kumaresan R., and Rao A. Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications. The Journal of the Acoustical Society of America 105 3 (1999) 1912-1924
- (1999) The Journal of the Acoustical Society of America , vol.105 , Issue.3 , pp. 1912-1924
- Kumaresan, R.¹ Rao, A.²

156
- 0030363012
- Kumpf, K., King, R.W., 1996. Automatic accent classification of foreign accented australian english speech. In: Proceedings of ICSLP, Philadelphia, PA, pp. 1740-1743.

157
- 34547939546
- Kuwabara, H., 1997. Acoustic and perceptual properties of phonemes in continuous speech as a function of speaking rate. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 1003-1006.

158
- 0001870986
- Information conveyed by vowels
- Ladefoged P., and Broadbent D.E. Information conveyed by vowels. The Journal of the Acoustical Society of America 29 (1957) 98-104
- (1957) The Journal of the Acoustical Society of America , vol.29 , pp. 98-104
- Ladefoged, P.¹ Broadbent, D.E.²

159
- 33646805430
- Lamel, L., Gauvain, J.-L., 2005. Alternate phone models for conversational speech. In: Proceedings of ICASSP, Philadelphia, Pennsylvania, pp. 1005-1008.

160
- 80054370614
- Cambridge University Press, Cambridge
- Laver J. Principles of Phonetics (1994), Cambridge University Press, Cambridge
- (1994) Principles of Phonetics
- Laver, J.¹

161
- 34548727590
- Lawson, A.D., Harris, D.M., Grieco, J.J., 2003. Effect of foreign accent on speech recognition in the NATO N-4 corpus. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 1505-1508.

162
- 0026142334
- A study on speaker adaptation of the parameters of continuous density Hidden Markov Models
- Lee C., Lin C., and Juang B. A study on speaker adaptation of the parameters of continuous density Hidden Markov Models. IEEE Transactions Signal Processing 39 4 (1991) 806-813
- (1991) IEEE Transactions Signal Processing , vol.39 , Issue.4 , pp. 806-813
- Lee, C.¹ Lin, C.² Juang, B.³

163
- 0027192618
- Lee, C.-H., Gauvain, J.-L., 1993. Speaker adaptation based on MAP estimation of HMM parameters. In: Proceedings of ICASSP, vol. 2. pp. 558-561.

164
- 0029747183
- Lee, L., Rose, R.C., 1996. Speaker normalization using efficient frequency warping procedures. In: Proceedings of ICASSP, vol. 1. Atlanta, Georgia, pp. 353-356.

165
- 0032969462
- Acoustics of children speech: developmental changes of temporal and spectral parameters
- Lee S., Potamianos A., and Narayanan S. Acoustics of children speech: developmental changes of temporal and spectral parameters. The Journal of the Acoustical Society of America 105 (1999) 1455-1468
- (1999) The Journal of the Acoustical Society of America , vol.105 , pp. 1455-1468
- Lee, S.¹ Potamianos, A.² Narayanan, S.³

166
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density Hidden Markov Models
- Leggetter C., and Woodland P. Maximum likelihood linear regression for speaker adaptation of continuous density Hidden Markov Models. Computer, Speech and Language 9 2 (1995) 171-185
- (1995) Computer, Speech and Language , vol.9 , Issue.2 , pp. 171-185
- Leggetter, C.¹ Woodland, P.²

167
- 34547960603
- Leonard, R.G., 1984. A database for speaker independent digit recognition. In: Proceedings of ICASSP, San Diego, US, pp. 328-331.

168
- 21644484352
- Lin, X., Simske, S., 2004. Phoneme-less hierarchical accent classification. In: Proceedings of Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, vol. 2. Pacific Grove, CA, pp. 1801-1804.

169
- 34547953237
- Lincoln, M., Cox, S.J., Ringland, S., 1997. A fast method of speaker normalisation using formant estimation. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 2095-2098.

170
- 0000665734
- Explaining phonetic variation: a sketch of the H& H theory
- Hardcastle W.J., and Marchal A. (Eds), Kluwer Academic Publishers
- Lindblom B. Explaining phonetic variation: a sketch of the H& H theory. In: Hardcastle W.J., and Marchal A. (Eds). Speech Production and Speech Modelling (1990), Kluwer Academic Publishers
- (1990) Speech Production and Speech Modelling
- Lindblom, B.¹

171
- 0031187171
- Speech recognition by machines and humans
- Lippmann R.P. Speech recognition by machines and humans. Speech Communication 22 1 (1997) 1-15
- (1997) Speech Communication , vol.22 , Issue.1 , pp. 1-15
- Lippmann, R.P.¹

172
- 0023263708
- Lippmann, R.P., Martin, E.A., Paul, D.B., 1987. Multi-style training for robust isolated-word speech recognition. In: Proceedings of ICASSP, Dallas, TX, pp. 705-708.

173
- 0031220487
- Effects of phase on the perception of intervocalic stop consonants
- Liu L., He J., and Palm G. Effects of phase on the perception of intervocalic stop consonants. Speech Communication 22 4 (1997) 403-417
- (1997) Speech Communication , vol.22 , Issue.4 , pp. 403-417
- Liu, L.¹ He, J.² Palm, G.³

174
- 34547960249
- Liu, S., Doyle, S., Morris, A., Ehsani, F., 1998. The effect of fundamental frequency on mandarin speech recognition. In: Proceedings of ICSLP, vol. 6. Sydney, Australia, pp. 2647-2650.

175
- 85126499605
- Liu, W.K., Fung, P., 2000. MLLR-based accent model adaptation without accented data. In: Proceedings of ICSLP, vol. 3. Beijing, China, pp. 738-741.

176
- 0033705981
- Livescu, K., Glass, J., 2000. Lexical modeling of non-native speech for automatic speech recognition. In: Proceedings of ICASSP, vol. 3. Istanbul, Turkey, pp. 1683-1686.

177
- 85009101259
- Ljolje, A., 2002. Speech recognition using fundamental frequency and voicing in acoustic modeling. In: Proceedings of ICSLP, Denver, USA, pp. 2137-2140.

178
- 84995620255
- Llitjos, A.F., Black, A.W., 2001. Knowledge of language origin improves pronunciation accuracy of proper names. In: Proceedings of Eurospeech, Aalborg, Denmark.

179
- 0000874053
- Le signe de l'élévation de la voix
- Lombard E. Le signe de l'élévation de la voix. Ann. Maladies Oreille, Larynx, Nez, Pharynx 37 (1911)
- (1911) Ann. Maladies Oreille, Larynx, Nez, Pharynx , vol.37
- Lombard, E.¹

180
- 3042673775
- Linear dimensionality reduction via a heteroscedastic extension of LDA: The Chernoff criterion
- Loog M., and Duin R.P.W. Linear dimensionality reduction via a heteroscedastic extension of LDA: The Chernoff criterion. IEEE Transactions Pattern Analysis and Machine Intelligence 26 6 (2004) 732-739
- (2004) IEEE Transactions Pattern Analysis and Machine Intelligence , vol.26 , Issue.6 , pp. 732-739
- Loog, M.¹ Duin, R.P.W.²

181
- 85009070826
- Magimai-Doss, M., Stephenson, T.A., Ikbal, S., Bourlard, H., 2004. Modelling auxiliary features in tandem systems. In: Proceedings of ICSLP, Jeju Island, Korea.

182
- 84946742164
- Maison, B., 2003. Pronunciation modeling for names of foreign origin. In: Proceedings of ASRU, US Virgin Islands, pp. 429-434.

183
- 33646780066
- Mak, B., Hsiao, R., 2004. Improving eigenspace-based MLLR adaptation by kernel PCA. In: Proceedings of ICSLP, Jeju Island, Korea.

184
- 0016495091
- Makhoul, J., 1975. Linear prediction: a tutorial review. In: Proceedings of IEEE, vol. 63(4) pp. 561-580.

185
- 0034296009
- Finding consensus in speech recognition: Word-error minimization and other applications of confusion networks
- Mangu L., Brill E., and Stolcke A. Finding consensus in speech recognition: Word-error minimization and other applications of confusion networks. Computer Speech and Language 14 4 (2000) 373-400
- (2000) Computer Speech and Language , vol.14 , Issue.4 , pp. 373-400
- Mangu, L.¹ Brill, E.² Stolcke, A.³

186
- 0024766457
- A family of distortion measures based upon projection operation for robust speech recognition
- Mansour D., and Juang B.H. A family of distortion measures based upon projection operation for robust speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 37 (1989) 1659-1671
- (1989) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.37 , pp. 1659-1671
- Mansour, D.¹ Juang, B.H.²

187
- 0017626219
- Long-term feature averaging for speaker recognition
- Markel J., Oshika B., and Gray A.H. Long-term feature averaging for speaker recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 25 (1977) 330-337
- (1977) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.25 , pp. 330-337
- Markel, J.¹ Oshika, B.² Gray, A.H.³

188
- 0141590455
- Markov, K., Nakamura, S., 2003. Hybrid HMM/BN LVCSR system integrating multiple acoustic features. In: Proceedings of ICASSP, vol. 1. pp. 840-843.

189
- 85009228917
- Martin, A., Mauuary, L., 2003. Voicing parameter and energy-based speech/non-speech detection for speech recognition in adverse conditions. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 3069-3072.

190
- 0347211260
- Martinez, F., Tapias, D., Alvarez, J., 1998. Towards speech rate independence in large vocabulary continuous speech recognition. In: Proceedings of ICASSP, Seattle, Washington, pp. 725-728.

191
- 34547931329
- Martinez, F., Tapias, D., Alvarez, J., Leon, P., 1997. Characteristics of slow, average and fast speech and their effects in large vocabulary continuous speech recognition. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 469-472.

192
- 85009152776
- Matsuda, S., Jitsuhiro, T., Markov, K., Nakamura, S., 2004. Speech recognition system robust to noise and speaking styles. In: Proceedings of ICSLP, Jeju Island, Korea.

193
- 34547928785
- Mertins, A., Rademacher, J., 2005. Vocal tract length invariant features for automatic speech recognition. In: Proceedings of ASRU, Cancun, Mexico, pp. 308-312.

194
- 85009152057
- Messina, R., Jouvet, D., 2004. Context dependent long units for speech recognition. In: Proceedings of ICSLP, Jeju Island, Korea.

195
- 0030369274
- Milner,. B.P., 1996. Inclusion of temporal information into features for speech recognition. In: Proceedings of ICSLP, Philadelphia, PA, pp. 256-259.

196
- 34547933351
- Mirghafori, N., Fosler, E., Morgan, N., 1995. Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes. In: Proceedings of Eurospeech, Madrid, Spain, pp. 491-494.

197
- 0029748337
- Mirghafori, N., Fosler, E., Morgan, N., 1996. Towards robustness to fast speech in ASR. In: Proceedings of ICASSP, Atlanta, Georgia, pp. 335-338.

198
- 0031247679
- Towards improving ASR robustness for PSN and GSM telephone applications
- Mokbel C., Mauuary L., Karray L., Jouvet D., Monné J., Simonin J., and Bartkova K. Towards improving ASR robustness for PSN and GSM telephone applications. Speech Communication 23 1-2 (1997) 141-159
- (1997) Speech Communication , vol.23 , Issue.1-2 , pp. 141-159
- Mokbel, C.¹ Mauuary, L.² Karray, L.³ Jouvet, D.⁴ Monné, J.⁵ Simonin, J.⁶ Bartkova, K.⁷

199
- 34547949789
- Mokhtari, P., 1998. An acoustic-phonetic and articulatory study of speech-speaker dichotomy. PhD thesis, The University of New South Wales, Canberra, Australia.

200
- 4544224866
- Morgan, N., Chen, B., Zhu, Q., Stolcke, A., 2004. TRAPping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition. In: Proceedings of ICASSP, vol. 1. Montreal, Canada, pp. 536-539.

201
- 34547944260
- Morgan, N., Fosler, E., Mirghafori, N., 1997. Speech recognition using on-line estimation of speaking rate. In: Proceedings of Eurospeech, vol. 4. Rhodes, Greece, pp. 2079-2082.

202
- 84892163293
- Morgan, N., Fosler-Lussier, E., 1998. Combining multiple estimators of speaking rate. In: Proceedings of ICASSP, Seattle, pp. 729-732.

203
- 0027447292
- Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion
- Murray I.R., and Arnott J.L. Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. The Journal of the Acoustical Society of America 93 2 (1993) 1097-1108
- (1993) The Journal of the Acoustical Society of America , vol.93 , Issue.2 , pp. 1097-1108
- Murray, I.R.¹ Arnott, J.L.²

204
- 34547961576
- Masaki Naito, Y.S., LiDeng, 1998. Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions. In: Proceedings of ICASSP, Seattle, WA, pp. 1889-1893.

205
- 0036289926
- Nanjo, H., Kawahara, T., 2002. Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition. In: Proceedings of ICASSP, vol. 1. Orlando, FL, pp. 725-728.

206
- 3042704466
- Language model and speaking rate adaptation for spontaneous presentation speech recognition
- Nanjo H., and Kawahara T. Language model and speaking rate adaptation for spontaneous presentation speech recognition. IEEE Transactions on Speech and Audio Processing 12 4 (2004) 391-400
- (2004) IEEE Transactions on Speech and Audio Processing , vol.12 , Issue.4 , pp. 391-400
- Nanjo, H.¹ Kawahara, T.²

207
- 34547954217
- National Institute of Standards and Technology, 2001. SCLITE scoring software. ftp://jaguar.ncls.nist.gov/pub/sctk-1.2.tar.Z.

208
- 34547952674
- Nearey, T.M., 1978. Phonetic feature systems for vowels. Indiana University Linguistics Club, Bloomington, Indiana, USA.

209
- 0030635350
- Neti, C., Roukos, S., 1997. Phone-context specific gender-dependent acoustic-models for continuous speech recognition. In: Proceedings of ASRU, Santa Barbara, CA, pp. 192-198.

210
- 0034140837
- Automatic scoring of pronunciation quality
- Neumeyer L., Franco H., Digalakis V., and Weintraub M. Automatic scoring of pronunciation quality. Speech Communication 30 2-3 (2000) 83-93
- (2000) Speech Communication , vol.30 , Issue.2-3 , pp. 83-93
- Neumeyer, L.¹ Franco, H.² Digalakis, V.³ Weintraub, M.⁴

211
- 0030374920
- Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, and Patti Price. 1996. Automatic text-independent pronunciation scoring of foreign language student speech. In: Proceedings of ICSLP, Philadelphia, PA, pp. 1457-1460.

212
- 0033707757
- Eigenvoices: a compact representation of speakers in a model space
- Nguyen P., Kuhn R., Junqua J.-C., Niedzielski N., and Wellekens C. Eigenvoices: a compact representation of speakers in a model space. Annales des Télécommunications 55 3-4 (2000)
- (2000) Annales des Télécommunications , vol.55 , Issue.3-4
- Nguyen, P.¹ Kuhn, R.² Junqua, J.-C.³ Niedzielski, N.⁴ Wellekens, C.⁵

213
- 3042569988
- Nguyen, P., Rigazio, L., Junqua, J.-C., 2003. Large corpus experiments for broadcast news recognition. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 1837-1840.

214
- 0003542797
- Cambridge University Press, Cambridge
- Nolan F. The Phonetic Bases of Speaker Recognition (1983), Cambridge University Press, Cambridge
- (1983) The Phonetic Bases of Speaker Recognition
- Nolan, F.¹

215
- 33947688163
- Kamal Omar, M., Chen, K., Hasegawa-Johnson, M., Bradman, Y., 2002. An evaluation of using mutual information for selection of acoustic features representation of phonemes for speech recognition. In: Proceedings of ICSLP, Denver, CO, pp. 2129-2132.

216
- 0036298107
- Kamal Omar, M., Hasegawa-Johnson, M., 2002. Maximum mutual information based acoustic features representation of phonological features for speech recognition. In: Proceedings of ICASSP, vol. 1. Montreal, Canada, pp. 81-84.

217
- 84926060821
- Odell, J.J., Woodlandand, P.C., Valtchev, V., Young, S.J., 1994. Large vocabulary continuous speech recognition using HTK. In: Proceedings of ICASSP, vol. 2. Adelaide, Australia, pp. 125-128.

218
- 34547938564
- Ono, Y., Wakita, H., Zhao, Y., 1993. Speaker normalization using constrained spectra shifts in auditory filter domain. In: Proceedings of Eurospeech, Berlin, Germany, pp. 355-358.

219
- 0003522449
- Addison-Wesley
- O'Saughnessy D. Speech Communication - Human and Machine (1987), Addison-Wesley
- (1987) Speech Communication - Human and Machine
- O'Saughnessy, D.¹

220
- 0032627056
- O'Shaughnessy, D., Tolba, H., 1999. Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision. In: Proceedings of ICASSP, vol. 1. Phoenix, Arizona, pp. 413-416.

221
- 0029747051
- Padmanabhan, M., Bahl, L., Nahamoo, D., Picheny, M., 1996. Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems. In: Proceedings of ICASSP, Atlanta, GA, pp. 701-704.

222
- 7544235206
- Maximum-likelihood nonlinear transformation for acoustic adaptation
- Padmanabhan M., and Dharanipragada S. Maximum-likelihood nonlinear transformation for acoustic adaptation. IEEE Transactions on Speech and Audio Processing 12 6 (2004) 572-578
- (2004) IEEE Transactions on Speech and Audio Processing , vol.12 , Issue.6 , pp. 572-578
- Padmanabhan, M.¹ Dharanipragada, S.²

223
- 22544475024
- Maximizing information content in feature extraction
- Padmanabhan M., and Dharanipragada S. Maximizing information content in feature extraction. IEEE Transactions on Speech and Audio Processing 13 4 (2005) 512-519
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.4 , pp. 512-519
- Padmanabhan, M.¹ Dharanipragada, S.²

224
- 85009100883
- Paliwal, K.K., Alsteris, L., 2003. Usefulness of phase spectrum in human speech perception. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 2117-2120.

225
- 85009192384
- Paliwal, K.K., Atal, B.S., 2003. Frequency-related representation of speech. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 65-68.

226
- 0023246158
- Paul, D.B., 1987. A speaker-stress resistant HMM isolated word recognizer. In: Proceedings of ICASSP, Dallas, Texas, pp. 713-716.

227
- 0030682299
- Paul, D.B., 1997. Extensions to phone-state decision-tree clustering: single tree and tagged clustering. In: Proceedings of ICASSP, vol. 2. Munich, Germany, pp. 1487-1490.

228
- 0032665650
- Peters, S.D., Stubley, P., Valin, J.-M., 1999. On the limits of speech recognition in noise. In: Proceedings of ICASSP'99. Phoenix, Arizona, pp. 365-368.

229
- 84941328385
- Control methods used in a study of the vowels
- Peterson G.E., and Barney H.L. Control methods used in a study of the vowels. The Journal of the Acoustical Society of America 24 (1952) 175-184
- (1952) The Journal of the Acoustical Society of America , vol.24 , pp. 175-184
- Peterson, G.E.¹ Barney, H.L.²

230
- 34547936772
- Pfau, T., Ruske, G., 1998. Creating Hidden Markov Models for fast speech. In: Proceedings of ICSLP, Sydney, Australia.

231
- 34547948996
- Michael Pitz, 2005. Investigations on Linear Transformations for Speaker Adaptation and Normalization. PhD thesis, RWTH Aachen University.

232
- 0032595183
- Modeling of the glottal flow derivative waveform with application to speaker identification
- Plumpe M., Quatieri T., and Reynolds D. Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing 7 5 (1999) 569-586
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , Issue.5 , pp. 569-586
- Plumpe, M.¹ Quatieri, T.² Reynolds, D.³

233
- 0014557625
- Perceptual and physical space of vowel sounds
- Pols L.C.W., Van der Kamp L.J.T., and Plomp R. Perceptual and physical space of vowel sounds. The Journal of the Acoustical Society of America 46 (1969) 458-467
- (1969) The Journal of the Acoustical Society of America , vol.46 , pp. 458-467
- Pols, L.C.W.¹ Van der Kamp, L.J.T.² Plomp, R.³

234
- 0347338002
- Robust recognition of children speech
- Potamianos G., and Narayanan S. Robust recognition of children speech. IEEE Transactions on Speech and Audio Processing 11 (2003) 603-616
- (2003) IEEE Transactions on Speech and Audio Processing , vol.11 , pp. 603-616
- Potamianos, G.¹ Narayanan, S.²

235
- 34547952870
- Potamianos, G., Narayanan, S., Lee, S., 1997. Analysis of children speech: duration, pitch and formants. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 473-476.

236
- 34547962145
- Potamianos, G., Narayanan, S., Lee, S., 1997. Automatic speech recognition for children. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 2371-2374.

237
- 0001255743
- Toward the specification of speech
- Potter R.K., and Steinberg J.C. Toward the specification of speech. The Journal of the Acoustical Society of America 22 (1950) 807-820
- (1950) The Journal of the Acoustical Society of America , vol.22 , pp. 807-820
- Potter, R.K.¹ Steinberg, J.C.²

238
- 0036460984
- Theory and practice of acoustic confusability
- Printz H., and Olsen P.A. Theory and practice of acoustic confusability. Computer Speech and Language 16 1 (2002) 131-164
- (2002) Computer Speech and Language , vol.16 , Issue.1 , pp. 131-164
- Printz, H.¹ Olsen, P.A.²

239
- 11144222882
- Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system
- Pujol P., Pol S., Nadeu C., Hagen A., and Bourlard H. Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system. IEEE Transactions on Speech and Audio Processing SAP-13 1 (2005) 14-22
- (2005) IEEE Transactions on Speech and Audio Processing , vol.SAP-13 , Issue.1 , pp. 14-22
- Pujol, P.¹ Pol, S.² Nadeu, C.³ Hagen, A.⁴ Bourlard, H.⁵

240
- 0004244302
- Prentice Hall PTR, Englewoood Cliffs, NJ, USA (pp. 20-37, Chapter 2)
- Rabiner L., and Juang B.H. Fundamentals of speech recognition (1993), Prentice Hall PTR, Englewoood Cliffs, NJ, USA (pp. 20-37, Chapter 2)
- (1993) Fundamentals of speech recognition
- Rabiner, L.¹ Juang, B.H.²

241
- 0024922972
- Rabiner, L.R., Lee, C.H., Juang, B.H., Wilpon, J.G., 1989. HMM clustering for connected word recognition. In: Proceedings of ICASSP, vol. 1. Glasgow, Scotland, pp. 405-408.

242
- 85009097014
- Raux, A., 2004. Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition. In: Proceedings of ICSLP, Jeju Island, Korea.

243
- 0020779884
- Frequency spectrum deviation between speakers
- Saito S., and Itakura F. Frequency spectrum deviation between speakers. Speech Communication 2 (1983) 149-152
- (1983) Speech Communication , vol.2 , pp. 149-152
- Saito, S.¹ Itakura, F.²

244
- 85009085052
- Sakauchi, S., Yamaguchi, Y., Takahashi, S., Kobashikawa, S., 2004. Robust speech recognition based on HMM composition and modified Wiener filter. In: Proceedings of Interspeech. Jeju Island, Korea, pp. 2053-2056.

245
- 0033677121
- Saon, G., Padmanabhan, M., Gopinath, R., Chen, S., 2000. Maximum likelihood discriminant feature spaces. In: Proceedings of ICASSP, pp. 1129-1132.

246
- 0030715426
- Schaaf, T., Kemp, T., 1997. Confidence measures for spontaneous speech recognition. In: Proceedings of ICASSP, Munich, Germany, pp. 875-878.

247
- 0037384712
- Vocal communication of emotion: A review of research paradigms
- Scherer K.R. Vocal communication of emotion: A review of research paradigms. Speech Communication Special Issue on Speech and Emotions 40 1-2 (2003) 227-256
- (2003) Speech Communication Special Issue on Speech and Emotions , vol.40 , Issue.1-2 , pp. 227-256
- Scherer, K.R.¹

248
- 30644477788
- Schimmel, S., Atlas, L., 2005. Coherent envelope detection for modulation filtering of speech. In: Proceedings of ICASSP, vol. 1. Philadephia, USA, pp. 221-224.

249
- 0022459781
- Flat-spectrum speech
- Schroeder M.R., and Strube H.W. Flat-spectrum speech. The Journal of the Acoustical Society of America 79 5 (1986) 1580-1583
- (1986) The Journal of the Acoustical Society of America , vol.79 , Issue.5 , pp. 1580-1583
- Schroeder, M.R.¹ Strube, H.W.²

250
- 34547941307
- Schötz, S., 2001. A perceptual study of speaker age. In: Working paper 49, Lund University, Dept of Linguistic, pp. 136-139.

251
- 34547926704
- Schultz, T., Waibel, A., 1998. Language independent and language adaptive large vocabulary speech recognition. In: Proceedings of ICSLP, vol. 5. Sydney, Australia, pp. 1819-1822.

252
- 34547950397
- Schwartz, R., Barry, C., Chow, Y.-L., Deft, A., Feng, M.-W., Kimball, O., Kubala, F., Makhoul, J., Vandegrift, J., 1989. The BBN BYBLOS continuous speech recognition system. In: Proceedings of Speech and Natural Language Workshop. Philadelphia, Pennsylvania, pp. 21-23.

253
- 34547958446
- Selouani, S.-A., Tolba, H., O'Shaughnessy, D., 2002. Distinctive features, formants and cepstral coefficients to improve automatic speech recognition. In: Conference on Signal Processing, Pattern Recognition and Applications, IASTED. Crete, Greece, pp. 530-535.

254
- 19944366274
- Statistical modeling of phonological rules through linguistic hierarchies
- Seneff S., and Wang C. Statistical modeling of phonological rules through linguistic hierarchies. Speech Communication 46 2 (2005) 204-216
- (2005) Speech Communication , vol.46 , Issue.2 , pp. 204-216
- Seneff, S.¹ Wang, C.²

255
- 79960308044
- Shi, Y.Y., Liu, J., Liu, R.S., 2002. Discriminative HMM stream model for Mandarin digit string speech recognition. In: Proceedings of Int. Conf. on Signal Processing, vol. 1. Beijing, China, pp. 528-531.

256
- 33645586600
- Shinozaki, T., Furui, S., 2003. Hidden mode HMM using bayesian network for modeling speaking rate fluctuation. In: Proceedings of ASRU. US Virgin Islands, pp. 417-422.

257
- 85009080958
- Shinozaki, T., Furui, S., 2004. Spontaneous speech recognition using a massively parallel decoder. In: Proceedings of ICSLP, Jeju Island, Korea, pp. 1705-1708.

258
- 85009064115
- Shobaki, K., Hosom, J.-P., Cole, R., 2000. The OGI kids speech corpus and recognizers. In: Proceedings of ICSLP, Beijing, China, pp. 564-567.

259
- 34547962510
- Siegler, M.A., 1995. Measuring and compensating for the effects of speech rate in large vocabulary continuous speech recognition. PhD thesis, Carnegie Mellon University.

260
- 0028996973
- Siegler, M.A., Stern, R.M., 1995. On the effect of speech rate in large vocabulary speech recognition system. In: Proceedings of ICASSP, Detroit, Michigan, pp. 612-615.

261
- 85009242921
- Singer, H., Sagayama, S., 1992. Pitch dependent phone modelling for HMM based speech recognition. In: Proceedings of ICASSP, vol. 1. San Francisco, CA, pp. 273-276.

262
- 0028996988
- Slifka, J., Anderson, T.R., 1995. Speaker modification with LPC pole analysis. In: Proceedings of ICASSP, Detroit, MI, pp. 644-647.

263
- 34547959054
- Song, M.G., Jung, H.I., Shim, K.-J., Kim, H.S., 1998. Speech recognition in car noise environments using multiple models according to noise masking levels. In: Proceedings of ICSLP.

264
- 34547946786
- Sotillo, C., Bard, E.G., 1998. Is hypo-articulation lexically constrained?. In: Proceedings of SPoSS. Aix-en-Provence, pp. 109-112.

265
- 15844428932
- Human and machine consonant recognition
- Sroka J.J., and Braida L.D. Human and machine consonant recognition. Speech Communication 45 4 (2005) 401-423
- (2005) Speech Communication , vol.45 , Issue.4 , pp. 401-423
- Sroka, J.J.¹ Braida, L.D.²

266
- 0024945022
- Steeneken, H.J.M., van Velden, J.G., 1989. Objective and diagnostic assessment of (isolated) word recognizers. In: Proceedings of ICASSP, vol. 1. Glasgow, UK, pp. 540-543.

267
- 85009135204
- Stephenson, T.A., Bourlard, H., Bengio, S., Morris, A.C., 2000. Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables. In: Proceedings of ICSLP, vol. 2. Beijing, China, pp. 951-954.

268
- 2442598130
- Speech recognition with auxiliary information
- Stephenson T.A., Doss M.M., and Bourlard H. Speech recognition with auxiliary information. IEEE Transactions on Speech and Audio Processing SAP-12 3 (2004) 189-203
- (2004) IEEE Transactions on Speech and Audio Processing , vol.SAP-12 , Issue.3 , pp. 189-203
- Stephenson, T.A.¹ Doss, M.M.² Bourlard, H.³

269
- 33947619591
- Stolcke, A., Grezl, F., Hwang, M.-Y., Morgan, N., Vergyri, D., 2006. Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptrons. In: Proceedings of ICASSP, vol. 1. Toulouse, France, pp. 321-324.

270
- 0033335618
- Modeling pronunciation variation for ASR: a survey of the literature
- Strik H., and Cucchiarini C. Modeling pronunciation variation for ASR: a survey of the literature. Speech Communication 29 2-4 (1999) 225-246
- (1999) Speech Communication , vol.29 , Issue.2-4 , pp. 225-246
- Strik, H.¹ Cucchiarini, C.²

271
- 0028996878
- Sun, D.X., Deng, L., 1995. Analysis of acoustic-phonetic variations in fluent speech using Timit. In: Proceedings of ICASSP, Detroit, Michigan, pp. 201-204.

272
- 0141813642
- Suzuki, H., Zen, H., Nankaku, Y., Miyajima, C., Tokuda, K., Kitamura, T., 2003. Speech recognition using voice-characteristic-dependent acoustic models. In: Proceedings of ICASSP, vol. 1. Hong-Kong (canceled), pp. 740-743.

273
- 0024934062
- Svendsen, T., Paliwal, K.K., Harborg, E., Husoy, P.O., 1989. An improved sub-word based speech recognizer. In: Proceedings of ICASSP, Glasgow, UK, pp. 108-111.

274
- 0023211850
- Svendsen, T., Soong, F., 1987. On the automatic segmentation of speech signals. In: Proceedings of ICASSP, Dallas, Texas, pp. 77-80.

275
- 0030359630
- Teixeira, C., Trancoso, I., Serralheiro, A., 1996. Accent identification. In: Proceedings of ICSLP, vol. 3. Philadelphia, PA, pp. 1784-1787.

276
- 0031643033
- Thomson, D.L., Chengalvarayan, R., 1998. Use of periodicity and jitter as speech recognition feature. In: Proceedings of ICASSP, vol. 1. Seattle, WA, pp. 21-24.

277
- 0036642777
- Use of voicing features in HMM-based speech recognition
- Thomson D.L., and Chengalvarayan R. Use of voicing features in HMM-based speech recognition. Speech Communication 37 3-4 (2002) 197-211
- (2002) Speech Communication , vol.37 , Issue.3-4 , pp. 197-211
- Thomson, D.L.¹ Chengalvarayan, R.²

278
- 0030682291
- Tibrewala, S., Hermansky, H., 1997. Sub-band based recognition of noisy speech. In: Proceedings of ICASSP, Munich Germany, pp. 1255-1258.

279
- 17544404143
- Tolba, H., Selouani, S.A., O'Shaughnessy, D., 2002. Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm. In: Proceedings of ICASSP, Orlando, FL, pp. 837-840.

280
- 85009162945
- Tolba, H., Selouani, S.A., O'Shaughnessy, D., 2003. Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low-snr car environments. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 3085-3088.

281
- 0030643684
- Tomlinson, M.J., Russell, M.J., Moore, R.K., Buckland, A.P., Fawley, M.A., 1997. Modelling asynchrony in speech using elementary single-signal decomposition. In: Proceedings of ICASSP, Munich Germany, pp. 1247-1250.

282
- 34547928782
- Townshend, B., Bernstein, J., Todic, O., Warren, E., 1998. Automatic text-independent pronunciation scoring of foreign language student speech. In: Proceedings of STiLL-1998, Stockholm, pp. 179-182.

283
- 34547953041
- Traunmüller, H., 1997. Perception of speaker sex, age and vocal effort. Technical Report, Institutionen för lingvistik, Stockholm Universitet.

284
- 18744406714
- Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
- Tsakalidis S., Doumpiotis V., and Byrne W. Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation. IEEE Transactions on Speech and Audio Processing 13 3 (2005) 367-376
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 367-376
- Tsakalidis, S.¹ Doumpiotis, V.² Byrne, W.³

285
- 18744411268
- Segmental eigenvoice with delicate eigenspace for improved speaker adaptation
- Tsao Y., Lee S.-M., and Lee L.-S. Segmental eigenvoice with delicate eigenspace for improved speaker adaptation. IEEE Transactions on Speech and Audio Processing 13 3 (2005) 399-411
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 399-411
- Tsao, Y.¹ Lee, S.-M.² Lee, L.-S.³

286
- 34547949384
- Tuerk, A., Young, S., 1999. Modeling speaking rate using a between frame distance metric. In: Proceedings of Eurospeech, vol. 1. Budapest, Hungary, pp. 419-422.

287
- 34547947654
- Tuerk, C., Robinson, T., 1993. A new frequency shift function for reducing inter-speaker variance. In: Proceedings of Eurospeech, Berlin, Germany, pp. 351-354.

288
- 13544261390
- Combining active and semi-supervised learning for spoken language understanding
- Tur G., Hakkani-Tür D., and Schapire R.E. Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45 2 (2005) 171-186
- (2005) Speech Communication , vol.45 , Issue.2 , pp. 171-186
- Tur, G.¹ Hakkani-Tür, D.² Schapire, R.E.³

289
- 34547955587
- Mel-cepstrum modulation spectrum (MCMS) features for robust ASR
- St. Thomas, US Virgin Islands
- Tyagi V., McCowan I., Bourlard H., and Misra H. Mel-cepstrum modulation spectrum (MCMS) features for robust ASR. Proceedings of ASRU (2003), St. Thomas, US Virgin Islands 381-386
- (2003) Proceedings of ASRU , pp. 381-386
- Tyagi, V.¹ McCowan, I.² Bourlard, H.³ Misra, H.⁴

290
- 33846197546
- Tyagi, V., Wellekens, C., 2005. Cepstrum representation of speech. In: Proceedings of ASRU, Cancun, Mexico.

291
- 33745214472
- Tyagi, V., Wellekens, C., Bourlard, H., 2005. On variable-scale piecewise stationary spectral analysis of speech signals for ASR. In: Proceedings of Interspeech, Lisbon, Portugal, pp. 209-212.

292
- 0035427182
- Multilingual speech recognition in seven languages
- Uebler U. Multilingual speech recognition in seven languages. Speech Communication 35 1-2 (2001) 53-69
- (2001) Speech Communication , vol.35 , Issue.1-2 , pp. 53-69
- Uebler, U.¹

293
- 34547941680
- Uebler, U., Boros, M., 1999. Recognition of non-native german speech with multilingual recognizers. In: Proceedings of Eurospeech, vol. 2. Budapest, Hungary, pp. 911-914.

294
- 0032761999
- Scale transform in speech analysis
- Umesh S., Cohen L., Marinovic N., and Nelson D. Scale transform in speech analysis. IEEE Transactions on Speech and Audio Processing 7 1 (1999) 40-45
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , Issue.1 , pp. 40-45
- Umesh, S.¹ Cohen, L.² Marinovic, N.³ Nelson, D.⁴

295
- 0141701264
- Utsuro, T., Harada, T., Nishizaki, H., Nakagawa, S., 2002. A confidence measure based on agreement among multiple LVCSR models - correlation between pair of acoustic models and confidence. In: Proceedings of ICSLP, Denver, Colorado, pp. 701-704.

296
- 0035427204
- Recognizing speech of goats, wolves, sheep and ... non-natives
- VanCompernolle D. Recognizing speech of goats, wolves, sheep and ... non-natives. Speech Communication 35 1-2 (2001) 71-79
- (2001) Speech Communication , vol.35 , Issue.1-2 , pp. 71-79
- VanCompernolle, D.¹

297
- 34547931326
- VanCompernolle, D., Smolders, J., Jaspers, P., Hellemans, T., 1991. Speaker clustering for dialectic robustness in speaker independent speech recognition. In: Proceedings of Eurospeech, Genova, Italy, pp. 723-726.

298
- 0030643685
- Vaseghi, S.V., Harte, N., Miller, B., 1997. Multi resolution phonetic/segmental features and models for HMM-based speech recognition. In: Proceedings of ICASSP, Munich Germany, pp. 1263-1266.

299
- 84907336951
- Venkataraman, A., Stolcke, A., Wangal, W., Vergyri, D., Ramana Rao Gadde, V., Zheng, J., 2004. An efficient repair procedure for quick transcriptions. In: Proceedings of ICSLP, Jeju Island, Korea.

300
- 0017482612
- Normalization of vowels by vocal-tract length and its application to vowel identification
- Wakita H. Normalization of vowels by vocal-tract length and its application to vowel identification. IEEE Transactions on Acoustics, Speech and Signal Processing 25 (1977) 183-192
- (1977) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.25 , pp. 183-192
- Wakita, H.¹

301
- 0027206092
- Speaker normalization and adaptation using second-order connectionist networks
- Watrous R. Speaker normalization and adaptation using second-order connectionist networks. IEEE Transactions Neural Networks 4 1 (1993) 21-30
- (1993) IEEE Transactions Neural Networks , vol.4 , Issue.1 , pp. 21-30
- Watrous, R.¹

302
- 34547956537
- Mitch Weintraub, Kelsey Taussig, Kate Hunicke-Smith, Amy Snodgrass. 1996. Effect of speaking style on LVCSR performance. In: Proceedings Addendum of ICSLP, Philadelphia, PA, USA.

303
- 0036753897
- Speaker adaptive modeling by vocal tract normalization
- Welling L., Ney H., and Kanthak S. Speaker adaptive modeling by vocal tract normalization. IEEE Transactions on Speech and Audio Processing 10 6 (2002) 415-426
- (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.6 , pp. 415-426
- Welling, L.¹ Ney, H.² Kanthak, S.³

304
- 34547958444
- Weng, F., Bratt, H., Neumeyer, L., Stomcke, A., 1997. A study of multilingual speech recognition. In: Proceedings of Eurospeech, vol. 1. Rhodes, Greece, pp. 359-362.

305
- 33745183789
- Wesker, T., Meyer, B., Wagener, K., Anemüller, J., Mertins, A., Kollmeier, B., 2005. Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines. In: Proceedings of Interspeech. Lisboa, Portugal, pp. 1273-1276.

306
- 34547946949
- Westphal, M., 1997. The use of cepstral means in conversational speech recognition. In: Proceedings of Eurospeech, vol. 3. Rhodes, Greece, pp. 1143-1146.

307
- 34547929803
- Williams, D.A.G., 1999. Knowing what you don't know: Roles for confidence measures in automatic speech recognition. PhD thesis, University of Sheffield.

308
- 0029747582
- Wilpon, J.G., Jacobsen, C.N., 1996. A study of speech recognition for children and the elderly. In: Proceedings of ICASSP, vol. 1. Atlanta, Georgia, pp. 349-352.

309
- 34547962144
- Witt, S.M., Young, S.J., 1999. Off-line acoustic modelling of non-native accents. In: Proceedings of Eurospeech, vol. 3. Budapest, Hungary, pp. 1367-1370.

310
- 0034140966
- Phone-level pronunciation scoring and assessment for interactive language learning
- Witt S.M., and Young S.J. Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30 2-3 (2000) 95-108
- (2000) Speech Communication , vol.30 , Issue.2-3 , pp. 95-108
- Witt, S.M.¹ Young, S.J.²

311
- 4544358972
- Wong, P.-F., Siu, M.-H., 2004. Decision tree based tone modeling for chinese speech recognition. In: Proceedings of ICASSP, vol. 1. Montreal, Canada, pp. 905-908.

312
- 84892320894
- Wrede, B., Fink, G.A., Sagerer, G., 2001. An investigation of modeling aspects for rate-dependent speech recognition. In: Proceedings of Eurospeech, Aalborg, Denmark.

313
- 2142715607
- Speaker adaptation using constrained transformation
- Wu X., and Yan Y. Speaker adaptation using constrained transformation. IEEE Transactions on Speech and Audio Processing 12 2 (2004) 168-174
- (2004) IEEE Transactions on Speech and Audio Processing , vol.12 , Issue.2 , pp. 168-174
- Wu, X.¹ Yan, Y.²

314
- 0024053854
- Yang, W.-J., Lee, J.-C., Chang, Y.-C., Wang, H.-C., 1988. Hidden Markov Model for Mandarin lexical tone recognition. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36. pp. 988-992.

315
- 0029745232
- Zavaliagkos, G., Schwartz, R., McDonough, J., 1996. Maximum a posteriori adaptation for large scale HMM recognizers. In: Proceedings of ICASSP, Atlanta, Georgia, pp. 725-728.

316
- 33646793688
- Zhang, B., Matsoukas, S., 2005. Minimum phoneme error based heteroscedastic linear discriminant analysis for speech recognition. In: Proceedings of ICASSP, vol. 1. Philadelphia, PA, pp. 925-928.

317
- 34547947108
- Zhan, P., Waibel, A., 1997. Vocal tract length normalization for large vocabulary continuous speech recognition. Technical Report CMU-CS-97-148, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania.

318
- 0030705337
- Zhan, P., Westphal, M., 1997. Speaker normalization based on frequency warping. In: Proceedings of ICASSP, vol. 2. Munich, Germany, pp. 1039-1042.

319
- 0028444748
- Zhang, Y., Desilva, C.J.S., Togneri, A., Alder, M., Attikiouzel, Y., 1994. Speaker-independent isolated word recognition using multiple Hidden Markov Models. In: Proceedings IEE Vision, Image and Signal Processing, vol. 141(3). pp. 197-202.

320
- 34547937771
- Zheng, J., Franco, H., Stolcke, A., 2000. Rate of speech modeling for large vocabulary conversational speech recognition. In: Proceedings of ISCA tutorial and research workshop on automatic speech recognition: challenges for the new Millenium. Paris, France, pp. 145-149.

321
- 85009146015
- Zheng, J., Franco, H., Stolcke, A., 2004. Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition. In: Proceedings of ICSLP, Jeju Island, Korea, pp. 401-404.

322
- 22544443963
- Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation
- Zhou B., and Hansen J.H.L. Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation. IEEE Transactions on Speech and Audio Processing 13 4 (2005) 554-564
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.4 , pp. 554-564
- Zhou, B.¹ Hansen, J.H.L.²

323
- 19244369058
- Zhou, G., Deisher, M.E., Sharma, S., 2002. Causal analysis of speech recognition failure in adverse environments. In: Proceedings of ICASSP, vol. 4. Orlando, Florida, pp. 3816-1819.

324
- 85009110489
- Zhu, Q., Alwan, A., 2000. AM-demodualtion of speech spectra and its application to noise robust speech recognition. In: Proceedings of ICSLP, vol. 1. Beijing, China, pp. 341-344.

325
- 4544257926
- Zhu, D., Paliwal, K.K., 2004. Product of power spectrum and group delay function for speech recognition. In: Proceedings of ICASSP, pp. 125-128.

326
- 85009097225
- Zhu, Q., Chen, B., Morgan, N., Stolcke, A., 2004. On using MLP features in LVCSR. In: Proceedings of ICSLP, Jeju Island, Korea.

327
- 85009247781
- Zolnay, A., Schlüter, R., Ney, H., 2002. Robust speech recognition using a voiced-unvoiced feature. In: Proceedings of ICSLP, vol. 2. Denver, CO, pp. 1065-1068.

328
- 33646767079
- Zolnay, A., Schlüter, R., Ney, H., 2005. Acoustic feature combination for robust speech recognition. In: Proceedings of ICASSP, vol. I. Philadelphia, PA, pp. 457-460.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.