SCOPUS 정보 검색 플랫폼

ICMI'04 - Sixth International Conference on Multimodal Interfaces

Volumn , Issue , 2004, Pages 152-158

Articulatory features for robust visual speech recognition

(3) Saenko, Kate a Darrell, Trevor a Glass, James a

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Articulatory features; Audio visual speech recognition; Multimodal interfaces; Speechreading; Support vector machines; Visual feature extraction

Indexed keywords

FEATURE EXTRACTION; HUMAN COMPUTER INTERACTION; IMAGE ANALYSIS; INTERFACES (COMPUTER); SPEECH ANALYSIS; VISUAL COMMUNICATION;

ARTICULATORY FEATURES; AUDIO-VISUAL SPEECH RECOGNITION; MULTIMODAL INTERFACES; SPEECHREADING; SUPPORT VECTOR MACHINES; VISUAL FEATURE EXTRACTION;

SPEECH RECOGNITION;

EID: 14944351246 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1027933.1027960 Document Type: Conference Paper

Times cited : (30)

References (33)

1
- 0001432664
- On the integration of auditory and visual parameters in HMM-based ASR
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer
- A. Adjoudani and C. Benoit, "On the integration of auditory and visual parameters in HMM-based ASR," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer, pp. 461-471, 1996.
- (1996) Speechreading by Humans and Machines , pp. 461-471
- Adjoudani, A.¹ Benoit, C.²

2
- 0003152968
- Speech enhancement in the 1980s: Noise suppression with pattern matching
- Dekker
- S. Boll, "Speech enhancement in the 1980s: noise suppression with pattern matching," In Advances in Speech Signal Processing, pp. 309-325, Dekker, 1992.
- (1992) Advances in Speech Signal Processing , pp. 309-325
- Boll, S.¹

3
- 85013597845
- Eigenlips for robust speech recognition
- C. Bregler and Y. Konig, "Eigenlips for Robust Speech Recognition," In Proc. ICASSP, 1994.
- (1994) Proc. ICASSP
- Bregler, C.¹ Konig, Y.²

4
- 84925639646
- Real-time lip tracking and bimodal continuous speech recognition
- Redondo Beach, CA
- M. Chan, Y. Zhang, and T. Huang, "Real-time lip tracking and bimodal continuous speech recognition," in Proc. Works. Multimedia Signal Processing, pp. 65-70, Redondo Beach, CA, 1998.
- (1998) Proc. Works. Multimedia Signal Processing , pp. 65-70
- Chan, M.¹ Zhang, Y.² Huang, T.³

5
- 0003710380
- C. Chang and C. Lin, LIBSVM: A Library For Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
- (2001) LIBSVM: A Library for Support Vector Machines
- Chang, C.¹ Lin, C.²

6
- 0004119259
- Harper and Row, New York
- N. Chomsky and M. Halle, The Sound Pattern of English, Harper and Row, New York, 1968.
- (1968) The Sound Pattern of English
- Chomsky, N.¹ Halle, M.²

7
- 85009135946
- Bimodal speech recognition using coupled hidden Markov models
- Beijing, China
- S. Chu and T. Huang, "Bimodal speech recognition using coupled hidden Markov models," In Proc. Int. Conf. Spoken Lang. Processing, vol. II, Beijing, China, pp. 747-750, 2000.
- (2000) Proc. Int. Conf. Spoken Lang. Processing , vol.2 , pp. 747-750
- Chu, S.¹ Huang, T.²

8
- 84957810778
- Active appearance models
- Germany
- T. Cootes, G. Edwards, and C. Taylor, "Active appearance models," In Proc. Europ. Conf. Computer Vision, Germany, pp. 484-498, 1998.
- (1998) Proc. Europ. Conf. Computer Vision , pp. 484-498
- Cootes, T.¹ Edwards, G.² Taylor, C.³

9
- 0029182228
- Active shape models - Their training and application
- T. Cootes, C. Taylor, D. Cooper, and J. Graham, "Active shape models - their training and application," Computer Vision Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.
- (1995) Computer Vision Image Understanding , vol.61 , Issue.1 , pp. 38-59
- Cootes, T.¹ Taylor, C.² Cooper, D.³ Graham, J.⁴

10
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

11
- 0003418124
- Netherlands: Mouton and Co.
- G. Fant, Acoustic Theory of Speech Production, Netherlands: Mouton and Co., 1960.
- (1960) Acoustic Theory of Speech Production
- Fant, G.¹

12
- 0036875002
- A support vector machine based dynamic network for visual speech recognition applications
- M. Gordan, C. Kotropoulos, and I. Pitas, "A support vector machine based dynamic network for visual speech recognition applications," EURASIP J. Appl. Signal Processing, vol. 2002, no. 11, pp. 1248-1259, 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , Issue.11 , pp. 1248-1259
- Gordan, M.¹ Kotropoulos, C.² Pitas, I.³

13
- 0034841727
- Application of affine-invariant fourier descriptors to lipreading for audio-visual speech recognition
- Salt Lake City, UT
- S. Gurbuz, Z. Tufekci, E. Patterson, and J. Gowdy, "Application of affine-invariant fourier descriptors to lipreading for audio-visual speech recognition," in Proc. Int. Conf. Acoust., Speech, Signal Processing, pp. 177-180, Salt Lake City, UT, 2001.
- (2001) Proc. Int. Conf. Acoust., Speech, Signal Processing , pp. 177-180
- Gurbuz, S.¹ Tufekci, Z.² Patterson, E.³ Gowdy, J.⁴

14
- 34250090755
- Snakes: Active contour models
- M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active contour models," Int. J. Computer Vision, vol. 1, no. 4, pp. 321-331, 1988.
- (1988) Int. J. Computer Vision , vol.1 , Issue.4 , pp. 321-331
- Kass, M.¹ Witkin, A.² Terzopoulos, D.³

15
- 79952968027
- Speech recognition via phonetically featured syllables
- Sydney
- S. King, T. Stephenson, S. Isard, P. Taylor and A. Strachan, "Speech recognition via phonetically featured syllables," In Proc. ICSLP, Sydney, 1998.
- (1998) Proc. ICSLP
- King, S.¹ Stephenson, T.² Isard, S.³ Taylor, P.⁴ Strachan, A.⁵

16
- 0038193561
- Combining acoustic and articulatory-feature information for robust speech recognition
- Sydney
- K. Kirchhoff, G. Fink and G. Sagerer, "Combining Acoustic and Articulatory-feature Information for Robust Speech Recognition," In Proc. ICSLP, pp. 891-894, Sydney, 1998.
- (1998) Proc. ICSLP , pp. 891-894
- Kirchhoff, K.¹ Fink, G.² Sagerer, G.³

17
- 14944340400
- Neural architectures for sensor fusion in speech recognition
- Greece
- G. Krone, B. Talle, A. Wichert, and G. Palm, "Neural architectures for sensor fusion in speech recognition," In Proc. Europ. Tut. Works. Audio-Visual Speech Processing, pp. 57-60, Greece, 1997.
- (1997) Proc. Europ. Tut. Works. Audio-visual Speech Processing , pp. 57-60
- Krone, G.¹ Talle, B.² Wichert, A.³ Palm, G.⁴

18
- 14944341906
- Feature-based pronunciation modeling for speech recognition
- Boston
- K. Livescu and J. Glass, "Feature-based Pronunciation Modeling for Speech Recognition," In Proc. HLT/NAACL, Boston, 2004.
- (2004) Proc. HLT/NAACL
- Livescu, K.¹ Glass, J.²

19
- 0025750892
- Automatic lipreading by optical flow analysis
- K. Mase and A. Pentland, "Automatic Lipreading by optical flow analysis," Systems and Computers in Japan, vol. 22, no. 6, pp. 67-76, 1991.
- (1991) Systems and Computers in Japan , vol.22 , Issue.6 , pp. 67-76
- Mase, K.¹ Pentland, A.²

20
- 0036472941
- Extraction of visual features for lipreading
- I. Matthews, T. Cootes, A. Bangham, S. Cox and R. Harvey, "Extraction of Visual Features for Lipreading," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, 2002.
- (2002) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.24 , Issue.2
- Matthews, I.¹ Cootes, T.² Bangham, A.³ Cox, S.⁴ Harvey, R.⁵

21
- 85009240321
- A flexible stream architecture for ASR using articulatory features
- Denver
- F. Metze, and A. Waibel, "A Flexible Stream Architecture for ASR Using Articulatory Features," In Proc. ICSLP, Denver, 2002.
- (2002) Proc. ICSLP
- Metze, F.¹ Waibel, A.²

22
- 84955023511
- An analysis of perceptual confusions among some english consonants
- G. Miller and P. Nicely, "An Analysis of Perceptual Confusions among some English Consonants," J. Acoustical Society America, vol. 27, no. 2, pp. 338-352, 1955.
- (1955) J. Acoustical Society America , vol.27 , Issue.2 , pp. 338-352
- Miller, G.¹ Nicely, P.²

23
- 0035790960
- Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins summer 2000 workshop
- Cannes, France
- C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, and D. Vergyri, "Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop," In Proc. Works. Signal Processing, pp. 619-624, Cannes, France, 2001.
- (2001) Proc. Works. Signal Processing , pp. 619-624
- Neti, C.¹ Potamianos, G.² Luettin, J.³ Matthews, I.⁴ Glotin, H.⁵ Vergyri, D.⁶

24
- 0033676801
- Denoising of human speech using combined acoustic and EM sensor signal processing
- Istanbul, Turkey
- L. Ng, G. Burnett, J. Holzrichter, and T. Gable, "Denoising of Human Speech Using Combined Acoustic and EM Sensor Signal Processing," In Proc. ICASSP, Istanbul, Turkey, 2000.
- (2000) Proc. ICASSP
- Ng, L.¹ Burnett, G.² Holzrichter, J.³ Gable, T.⁴

25
- 84900117327
- Feature based representation for audio-visual speech recognition
- Santa Cruz, CA
- P. Niyogi, E. Petajan, and J. Zhong, "Feature Based Representation for Audio-Visual Speech Recognition", Proceedings of the Audio Visual Speech Conference, Santa Cruz, CA, 1999.
- (1999) Proceedings of the Audio Visual Speech Conference
- Niyogi, P.¹ Petajan, E.² Zhong, J.³

26
- 0021541159
- Automatic lipreading to enhance speech recognition
- Atlanta, GA
- E. Petajan, "Automatic lipreading to enhance speech recognition," In Proc. Global Telecomm. Conf., pp. 265-272, Atlanta, GA, 1984.
- (1984) Proc. Global Telecomm. Conf. , pp. 265-272
- Petajan, E.¹

27
- 85009230873
- Audio-visual speech recognition in challenging environments
- Geneva
- G. Potamianos and C. Neti, "Audio-visual speech recognition in challenging environments," In Proc. Eur. Conf. Speech Comm. Tech., pp. 1293-1296, Geneva, 2003.
- (2003) Proc. Eur. Conf. Speech Comm. Tech. , pp. 1293-1296
- Potamianos, G.¹ Neti, C.²

28
- 4544290191
- Recent advances in the automatic recognition of audio-visual speech
- G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. Senior, "Recent Advances in the Automatic Recognition of Audio-Visual Speech", In Proc. IEEE, 2003.
- (2003) Proc. IEEE
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ Senior, A.⁵

29
- 0034517163
- A cascade image transform for speaker-independent automatic speechreading
- New York
- G. Potamianos, A. Verma, C. Neti, G. Iyengar, and S. Basu, "A Cascade Image Transform for Speaker-Independent Automatic Speechreading," In Proc. ICME, volume II, pp. 1097-1100, New York, 2000.
- (2000) Proc. ICME , vol.2 , pp. 1097-1100
- Potamianos, G.¹ Verma, A.² Neti, C.³ Iyengar, G.⁴ Basu, S.⁵

30
- 0001048664
- Visual contribution to speech intelligibility in noise
- W. Sumby, and I. Pollack, "Visual contribution to speech intelligibility in noise," J. Acoustical Society America, vol. 26, no. 2, pp. 212-215, 1954.
- (1954) J. Acoustical Society America , vol.26 , Issue.2 , pp. 212-215
- Sumby, W.¹ Pollack, I.²

31
- 0036165806
- An overlapping-feature based phonological model incorporating linguistic constraints: Applications to speech recognition
- J. Sun and L. Deng, "An Overlapping-Feature Based Phonological Model Incorporating Linguistic Constraints: Applications to Speech Recognition", J. Acoustic Society of America, vol. 111, No. 2, pp. 1086-1101, 2002.
- (2002) J. Acoustic Society of America , vol.111 , Issue.2 , pp. 1086-1101
- Sun, J.¹ Deng, L.²

32
- 0003770986
- Comparing models for audiovisual fusion in a noisy-vowel recognition task
- P. Teissier, J. Robert-Ribes, and J. Schwartz, "Comparing models for audiovisual fusion in a noisy-vowel recognition task," IEEE Trans. Speech Audio Processing, vol. 7, no. 6, pp. 629-642, 1999.
- (1999) IEEE Trans. Speech Audio Processing , vol.7 , Issue.6 , pp. 629-642
- Teissier, P.¹ Robert-Ribes, J.² Schwartz, J.³

33
- 0003991806
- J. Wiley, New York
- V. Vapnik, Statistical Learning Theory, J. Wiley, New York, 1998.
- (1998) Statistical Learning Theory
- Vapnik, V.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.