SCOPUS 정보 검색 플랫폼

Visual Speech Recognition: Lip Segmentation and Mapping

Volumn , Issue , 2009, Pages 1-38

Audio-visual and visual-only speech and speaker recognition: Issues about theory, system design, and implementation

(4) Shiell, Derek J a Terry, Louis H a Aleksic, Petar S b Katsaggelos, Aggelos K c

a NORTHWESTERN UNIVERSITY (United States)

b GOOGLE INC (United States)

c Northwestern University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 84870243209 PISSN: None EISSN: None Source Type: Book
DOI: 10.4018/978-1-60566-186-5.ch001 Document Type: Chapter

Times cited : (6)

References (109)

1
- 84900182369
- The AAM-API, Retrieved November, 2007, from
- The AAM-API. (2008). Retrieved November, 2007, from http://www2.imm.dtu.dk/~aam/aamapi/
- (2008)

2
- 84900171816
- December, Workshop on Multimedia User Authentication (MMUA), Santa Barbara, CA
- Aleksic, P. S., & Katsaggelos, A. K. (2003a, December). An audio-visual person identification and verification system using FAPs as visual features. Workshop on Multimedia User Authentication (MMUA), (pp. 80-84), Santa Barbara, CA.
- (2003) An Audio-visual Person Identification and Verification System Using FAPs as Visual Features , pp. 80-84
- Aleksic, P.S.¹ Katsaggelos, A.K.²

3
- 34247584561
- Product hmms for audio-visual continuous speech recognition using facial animation parameters
- July, In, Baltimore, MD
- Aleksic, P. S., & Katsaggelos, A. K. (2003b, July). Product HMMs for Audio-Visual Continuous Speech Recognition Using Facial Animation Parameters. In Proceedings of IEEE Int. Conf. on Multimedia & Expo (ICME), Vol. 2, (pp. 481-484), Baltimore, MD.
- (2003) Proceedings of IEEE Int. Conf. on Multimedia & Expo (ICME) , vol.2 , pp. 481-484
- Aleksic, P.S.¹ Katsaggelos, A.K.²

4
- 2542499812
- Speech-to-video synthesis using mpeg-4 compliant visual features
- Aleksic, P. S., & Katsaggelos, A. K. (2004). Speech-to-video synthesis using MPEG-4 compliant visual features. IEEE Trans. CSVT, Special Issue Audio Video Analysis for Multimedia Interactive Services, 682-692.
- (2004) IEEE trans. Csvt Special Issue Audio Video Analysis for Multimedia Interactive Services , pp. 682-692
- Aleksic, P.S.¹ Katsaggelos, A.K.²

5
- 33947384963
- Audio-visual biometrics
- Aleksic, P. S., & Katsaggelos, A. K. (2006). Audio-Visual Biometrics. IEEE Proceedings, 94(11), 2025-2044.
- (2006) IEEE Proceedings , vol.94 , Issue.11 , pp. 2025-2044
- Aleksic, P.S.¹ Katsaggelos, A.K.²

6
- 33947376624
- Exploiting visual information in automatic speech processing
- In, Academic Press
- Aleksic, P. S., Potamianos, G., & Katsaggelos, A. K. (2005). Exploiting visual information in automatic speech processing. In Handbook of Image and Video Processing (pp. 1263 -1289): Academic Press.
- (2005) Handbook of Image and Video Processing , pp. 1263
- Aleksic, P.S.¹ Potamianos, G.² Katsaggelos, A.K.³

7
- 0036874915
- Audio-visual speech recognition using mpeg-4 compliant visual features
- Aleksic, P. S., Williams, J. J., Wu, Z., & Katsaggelos, A. K. (2002). Audio-visual speech recognition using mpeg-4 compliant visual features. EURASIP Journal on Applied Signal Processing, 1213-1227.
- (2002) EURASIP Journal on Applied Signal Processing , pp. 1213-1227
- Aleksic, P.S.¹ Williams, J.J.² Wu, Z.³ Katsaggelos, A.K.⁴

8
- 0042003432
- Carnegie Mellon University Robotics Institute
- Baker, S., Gross, R., & Matthews, I. (2003a). Lucas-kanade 20 years on: a unifying framework: Part 2: Carnegie Mellon University Robotics Institute.
- (2003) Lucas-kanade 20 Years On: A Unifying Framework: Part 2:
- Baker, S.¹ Gross, R.² Matthews, I.³

9
- 0042003432
- Carnegie Mellon University Robotics Institute
- Baker, S., Gross, R., & Matthews, I. (2003b). Lucas-kanade 20 years on: a unifying framework: Part 3: Carnegie Mellon University Robotics Institute.
- (2003) Lucas-kanade 20 Years On: A Unifying Framework: Part 3:
- Baker, S.¹ Gross, R.² Matthews, I.³

10
- 1542285823
- Lucas-kanade 20 years on: A unifying framework
- Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. Int. J. Comput. Vision, 56(3), 221-255.
- (2004) Int. J. Comput. Vision , vol.56 , Issue.3 , pp. 221-255
- Baker, S.¹ Matthews, I.²

11
- 0034853042
- Paper presented at the Int. Conf. Acoustics, Speech Signal Processing
- Barbosa, A. V., & Yehia, H. C. (2001). Measuring the relation between speech acoustics and 2-D facial motion. Paper presented at the Int. Conf. Acoustics, Speech Signal Processing.
- (2001) Measuring the Relation Between Speech Acoustics and 2-D Facial Motion
- Barbosa, A.V.¹ Yehia, H.C.²

12
- 22944438473
- Evaluation of a boosted cascade of haar-like features in the presence of partial occlusions and shadows for real time face detection
- In, Berlin, Germany: Springer
- Barczak, A. L. C. (2004). Evaluation of a Boosted Cascade of Haar-Like Features in the Presence of Partial Occlusions and Shadows for Real Time Face Detection. In PRICAI 2004: Trends in Artificial Intelligence, 3157, 969-970. Berlin, Germany: Springer.
- (2004) PRICAI 2004: Trends in Artificial Intelligence , vol.3157 , pp. 969-970
- Barczak, A.L.C.¹

13
- 84900241350
- Paper presented at the Int. Conf. Auditory Visual Speech Processing
- Barker, J. P., & Berthommier, F. (1999). Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models. Paper presented at the Int. Conf. Auditory Visual Speech Processing.
- (1999) Estimation of Speech Acoustics from Visual Speech Features: A Comparison of Linear and Non-linear Models
- Barker, J.P.¹ Berthommier, F.²

14
- 0032594952
- Fusion of face and speech data for person identity verification
- Ben-Yacoub, S., Abdeljaoued, Y., & Mayoraz, E. (1999). Fusion of face and speech data for person identity verification. IEEE Trans. Neural Networks, 10, 1065-1074.
- (1999) IEEE Trans. Neural Networks , vol.10 , pp. 1065-1074
- Ben-Yacoub, S.¹ Abdeljaoued, Y.² Mayoraz, E.³

15
- 84900164299
- Paper presented at the Speaker and Language Recognition Workshop (Odyssey)
- Bengio, S., & Mariethoz, J. (2004). A statistical significance test for person authentication. Paper presented at the Speaker and Language Recognition Workshop (Odyssey).
- (2004) A Statistical Significance Test for Person Authentication
- Bengio, S.¹ Mariethoz, J.²

16
- 84900120370
- Paper presented at the Int. Conf. Machine Learning, Workshop ROC Analysis Machine Learning
- Bengio, S., Mariethoz, J., & Keller, M. (2005). The expected performance curve. Paper presented at the Int. Conf. Machine Learning, Workshop ROC Analysis Machine Learning.
- (2005) The Expected Performance Curve
- Bengio, S.¹ Mariethoz, J.² Keller, M.³

17
- 0344044794
- Gallaudet University, Washington, D.C
- Bernstein, L. E. (1991). Lipreading Corpus V-VI: Disc 3. Gallaudet University, Washington, D.C.
- (1991) Lipreading Corpus V-VI: Disc 3
- Bernstein, L.E.¹

18
- 69949145831
- Northwestern University, Evanston
- Biffiger, R. (2005). Audio-Visual Automatic Isolated Digits Recognition. Northwestern University, Evanston.
- (2005) Audio-Visual Automatic Isolated Digits Recognition
- Biffiger, R.¹

19
- 24644442878
- Paper presented at the Computer Vision Pattern Recognition
- Blanz, V., Grother, P., Phillips, P. J., & Vetter, T. (2005). Face recognition based on frontal views generated from non-frontal images. Paper presented at the Computer Vision Pattern Recognition.
- (2005) Face Recognition Based on Frontal Views Generated from Non-frontal Images
- Blanz, V.¹ Grother, P.² Phillips, P.J.³ Vetter, T.⁴

20
- 27844534088
- A survey of approaches and challenges in 3-d and multi-modal 3-d face recognition
- Bowyer, K. W., Chang, K., & Flynn, P. (2006). A survey of approaches and challenges in 3-D and multi-modal 3-D face recognition. Computer Vision Image Understanding, 101(1), 1-15.
- (2006) Computer Vision Image Understanding , vol.101 , Issue.1 , pp. 1-15
- Bowyer, K.W.¹ Chang, K.² Flynn, P.³

21
- 0029393187
- Person identification using multiple cues
- Brunelli, R., & Falavigna, D. (1995). Person identification using multiple cues. IEEE Trans. Pattern Anal. Machine Intell., 10, 955-965.
- (1995) IEEE Trans. Pattern Anal. Machine Intell , vol.10 , pp. 955-965
- Brunelli, R.¹ Falavigna, D.²

22
- 0031233424
- Speaker recognition: A tutorial
- Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437-1462.
- (1997) Proceedings of the IEEE , vol.85 , Issue.9 , pp. 1437-1462
- Campbell, J.P.¹

23
- 0003835127
- Hove, U.K., Pyschology Press
- Campbell, R., Dodd, B., & Burnham, D. (Eds.). (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory Visual Speech. Hove, U.K.: Pyschology Press.
- (1998) Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory Visual Speech
- Campbell, R.¹ Dodd, B.² Burnham, D.³

24
- 17144413799
- An evaluaion of multimodal 2d + 3d face biometrics
- Chang, K. I., Bowyer, K. W., & Flynn, P. J. (2005). An evaluaion of multimodal 2D + 3D face biometrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4), 619-6124.
- (2005) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.27 , Issue.4 , pp. 619-6124
- Chang, K.I.¹ Bowyer, K.W.² Flynn, P.J.³

25
- 84900237502
- Paper presented at the Int. Conf. Multimedia Expo
- Chaudhari, U. V., & Ramaswamy, G. N. (2003). Information fusion and decision cascading for audio-visual speaker recognition based on time-varying stream reliability prediction. Paper presented at the Int. Conf. Multimedia Expo.
- (2003) Information Fusion and Decision Cascading for Audio-visual Speaker Recognition Based on Time-varying Stream Reliability Prediction
- Chaudhari, U.V.¹ Ramaswamy, G.N.²

26
- 85143190944
- Paper presented at the Int. Conf. Acoustics, Speech Signal Processing, Hong Kong, China
- Chaudhari, U. V., Ramaswamy, G. N., Potamianos, G., & Neti, C. (2003). Audio-visual speaker recognition using time-varying stream reliability prediction. Paper presented at the Int. Conf. Acoustics, Speech Signal Processing, Hong Kong, China.
- (2003) Audio-visual Speaker Recognition Using Time-varying Stream Reliability Prediction
- Chaudhari, U.V.¹ Ramaswamy, G.N.² Potamianos, G.³ Neti, C.⁴

27
- 85032752352
- Audiovisual speech processing
- Chen, T. (2001). Audiovisual speech processing. IEEE Signal Processing Mag., 18, 9-21.
- (2001) IEEE Signal Processing Mag , vol.18 , pp. 9-21
- Chen, T.¹

28
- 84900222377
- Paper presented at the IEEE Int. Symp. Multimedia Technologies Future Appl., South-ampton, U.K
- Chibelushi, C. C., Deravi, F., & Mason, J. S. (1993). Voice and facial image integration for speaker recognition. Paper presented at the IEEE Int. Symp. Multimedia Technologies Future Appl., South-ampton, U.K.
- (1993) Voice and Facial Image Integration for Speaker Recognition
- Chibelushi, C.C.¹ Deravi, F.² Mason, J.S.³

29
- 0031335829
- Paper presented at the Eur. Conf. Security Detection, London, U.K
- Chibelushi, C. C., Deravi, F., & Mason, J. S. (1997). Audio-visual person recognition: An evaluation of data fusion strategies. Paper presented at the Eur. Conf. Security Detection, London, U.K.
- (1997) Audio-visual Person Recognition: An Evaluation of Data Fusion Strategies
- Chibelushi, C.C.¹ Deravi, F.² Mason, J.S.³

30
- 0036502797
- A review of speech-based bimodal recognition
- Chibelushi, C. C., Deravi, F., & Mason, J. S. (2002). A review of speech-based bimodal recognition. IEEE Trans. Multimedia, 4(1), 23-37.
- (2002) IEEE Trans. Multimedia , vol.4 , Issue.1 , pp. 23-37
- Chibelushi, C.C.¹ Deravi, F.² Mason, J.S.³

31
- 46149146042
- Design issues for a digital audio-visual integrated database
- Paper presented at the Integrated Audio-Visual Processing for Recognition, Synthesis and Communication (Digest No: 1996/213), IEE Colloquium on
- Chibelushi, C. C., Gandon, S., Mason, J. S. D., Deravi, F., & Johnston, R. D. (1996). Design issues for a digital audio-visual integrated database. Paper presented at the Integrated Audio-Visual Processing for Recognition, Synthesis and Communication (Digest No: 1996/213), IEE Colloquium on.
- (1996)
- Chibelushi, C.C.¹ Gandon, S.² Mason, J.S.D.³ Deravi, F.⁴ Johnston, R.D.⁵

32
- 80052878189
- Retrieved November, 2007, from
- Cootes, T. (2008). Modelling and Search Software. Retrieved November, 2007, from http://www.isbe. man.ac.uk/~bim/software/am_tools_doc/index.html
- (2008) Modelling and Search Software
- Cootes, T.¹

33
- 84900072132
- Paper presented at the British Machine Vision Conference
- Cootes, T., Edwards, G., & Taylor, C. (1998). A comparitive evaluation of active appearance models algorithms. Paper presented at the British Machine Vision Conference.
- (1998) A Comparitive Evaluation of Active Appearance Models Algorithms
- Cootes, T.¹ Edwards, G.² Taylor, C.³

34
- 0035363218
- Active appearance models
- Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681-685.
- (2001) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.23 , pp. 681-685
- Cootes, T.¹ Edwards, G.² Taylor, C.³

35
- 0003424145
- Englewood Cliffs, NJ, Macmillan
- Deller, J. R., Jr., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete-Time Processing of Speech Signals. Englewood Cliffs, NJ: Macmillan.
- (1993) Discrete-Time Processing of Speech Signals
- Deller Jr., J.R.¹ Proakis, J.G.² Hansen, J.H.L.³

36
- 0003922190
- Hoboken, NJ: Wiley
- Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. Hoboken, NJ: Wiley.
- (2001) Pattern Classification
- Duda, R.O.¹ Hart, P.E.² Stork, D.G.³

37
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Dupont, S., & Luettin, J. (2000). Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia, 2(3), 141-151.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

38
- 0003757962
- Berlin, Germany, Springer-Verlag
- Flanagan, J. L. (1965). Speech Analysis, Synthesis, and Perception. Berlin, Germany: Springer-Verlag.
- (1965) Speech Analysis, Synthesis, and Perception
- Flanagan, J.L.¹

39
- 84921606034
- Paper presented at the ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop (WBMA'03), Berkley, CA
- Fox, N. A., Gross, R., de Chazal, P., Cohn, J. F., & Reilly, R. B. (2003). Person identification using automatic integration of speech, lip, and face experts. Paper presented at the ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop (WBMA'03), Berkley, CA.
- (2003) Person Identification Using Automatic Integration of Speech, Lip, and Face Experts
- Fox, N.A.¹ Gross, R.² de Chazal, P.³ Cohn, J.F.⁴ Reilly, R.B.⁵

40
- 84900082399
- Paper presented at the 5th International Conference on Audio- and Video-Based Biometric Person Authentication
- Fox, N. A., O'Mullane, B., & Reilly, R. B. (2005). The realistic multi-modal VALID database and visual speaker identification comparison experiments. Paper presented at the 5th International Conference on Audio- and Video-Based Biometric Person Authentication.
- (2005) The Realistic Multi-modal VALID Database and Visual Speaker Identification Comparison Experiments
- Fox, N.A.¹ O'Mullane, B.² Reilly, R.B.³

41
- 0012745879
- Rationale for phoneme-viseme mapping and feature selection in visual speech recognition
- In D. G. Stork & M. E. Hennecke (Eds.), Berlin, Germany, Springer
- Goldschen, A. J., Garcia, O. N., & Petajan, E. D. (1996). Rationale for phoneme-viseme mapping and feature selection in visual speech recognition. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines (pp. 505-515). Berlin, Germany: Springer.
- (1996) Speechreading by Humans and Machines , pp. 505-515
- Goldschen, A.J.¹ Garcia, O.N.² Petajan, E.D.³

42
- 0036875002
- A support vector machine-based dynamic network for visual speech recognition applications
- Gordan, M., Kotropoulos, C., & Pitas, I. (2002). A support vector machine-based dynamic network for visual speech recognition applications. EURASIP J. Appl. Signal Processing, 2002(11), 1248-1259.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , Issue.11 , pp. 1248-1259
- Gordan, M.¹ Kotropoulos, C.² Pitas, I.³

43
- 84900162843
- Paper presented at the Human Language Techn. Conf
- Gravier, G., Potamianos, G., & Neti, C. (2002). Asynchrony modeling for audio-visual speech recognition. Paper presented at the Human Language Techn. Conf.
- (2002) Asynchrony Modeling for Audio-visual Speech Recognition
- Gravier, G.¹ Potamianos, G.² Neti, C.³

44
- 85199274825
- Paper presented at the IEEE Workshop on Face Processing in Video
- Gross, R., Matthews, I., & Baker, S. (2004). Constructing and fitting active appearance models with occlusion. Paper presented at the IEEE Workshop on Face Processing in Video.
- (2004) Constructing and Fitting Active Appearance Models with Occlusion
- Gross, R.¹ Matthews, I.² Baker, S.³

45
- 33745007363
- Active appearance models with occlusion
- Gross, R., Matthews, I., & Baker, S. (2006). Active appearance models with occlusion. Image and Vision Computing, 24, 593-604.
- (2006) Image and Vision Computing , vol.24 , pp. 593-604
- Gross, R.¹ Matthews, I.² Baker, S.³

46
- 0003807773
- 4th Edition, Upper Saddle River, NJ, Prentice Hall
- Haykin, S. (2002). Adaptive Filter Theory: 4th Edition. Upper Saddle River, NJ: Prentice Hall.
- (2002) Adaptive Filter Theory
- Haykin, S.¹

47
- 14944353581
- Paper presented at the International Conference on Multimodal Interfaces
- Hazen, T. J., Saenko, K., La, C.-H., & Glass, J. (2004). A segment-based audio-visual speech recognizer: Data collection, development and initial experiments. Paper presented at the International Conference on Multimodal Interfaces.
- (2004) A Segment-based Audio-visual Speech Recognizer: Data Collection, Development and Initial Experiments
- Hazen, T.J.¹ Saenko, K.² La, C.-H.³ Glass, J.⁴

48
- 0035438492
- Face detection: A survey
- Hjelmas, E., & Low, B. K. (2001). Face detection: A survey. Computer Vision Image Understanding, 83(3), 236-274.
- (2001) Computer Vision Image Understanding , vol.83 , Issue.3 , pp. 236-274
- Hjelmas, E.¹ Low, B.K.²

49
- 0032295436
- Integrating faces and fingerprints for personal identification
- Hong, L., & Jain, A. (1998). Integrating faces and fingerprints for personal identification. IEEE Trans. Pattern Anal. Machine Intell., 20, 1295-1307.
- (1998) IEEE Trans. Pattern Anal. Machine Intell , vol.20 , pp. 1295-1307
- Hong, L.¹ Jain, A.²

50
- 84900181001
- HTK Speech Recognition Toolkit, Retrieved November, 2007, from
- HTK Speech Recognition Toolkit. (2008). Retrieved November, 2007, from http://htk.eng.cam.ac.uk/
- (2008)

51
- 84900240618
- Paper presented at the British Machine Vision Conference
- Hu, C., Xiao, J., Matthews, I., Baker, S., Cohn, J., & Kanade, T. (2004). Fitting a single active appearance model simultaneously to multiple images. Paper presented at the British Machine Vision Conference.
- (2004) Fitting a Single Active Appearance Model Simultaneously to Multiple Images
- Hu, C.¹ Xiao, J.² Matthews, I.³ Baker, S.⁴ Cohn, J.⁵ Kanade, T.⁶

52
- 27944492024
- Sequential mean field variational analysis of structured deformable shapes
- Hua, G., & Y.Wu. (2006). Sequential mean field variational analysis of structured deformable shapes. Computer Vision and Image Understanding, 101, 87-99.
- (2006) Computer Vision and Image Understanding , vol.101 , pp. 87-99
- Hua, G.¹ Wu, Y.²

53
- 0742290133
- An introduction to biometric recognition
- Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Trans. Circuits Systems Video Technol., 14(1), 4-20.
- (2004) IEEE Trans. Circuits Systems Video Technol , vol.14 , Issue.1 , pp. 4-20
- Jain, A.K.¹ Ross, A.² Prabhakar, S.³

54
- 0025680225
- Ntimit: A phonetically balanced continuous speech telephone bandwidth speech database
- Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A Phonetically Balanced Continuous Speech Telephone Bandwidth Speech Database. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 1, 109-112.
- (1990) IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP) , vol.1 , pp. 109-112
- Jankowski, C.¹ Kalyanswamy, A.² Basson, S.³ Spitz, J.⁴

55
- 0036874551
- On the relationship between face movements, tongue movements, and speech acoustics
- Jiang, J., Alwan, A., Keating, P. A., E. T. Auer, J., & Bernstein, L. E. (2002). On the relationship between face movements, tongue movements, and speech acoustics. EURASIP J. Appl. Signal Processing, 2002(11), 1174-1188.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , Issue.11 , pp. 1174-1188
- Jiang, J.¹ Alwan, A.² Keating, P.A.³ Auer, E.T.J.⁴ Bernstein, L.E.⁵

56
- 84900160763
- Paper presented at the 5th Eur. Conf. Speech Communication Technology, Rhodes, Greece
- Jourlin, P., Luettin, J., Genoud, D., & Wassner, H. (1997). Integrating acoustic and labial information for speaker identification and verification. Paper presented at the 5th Eur. Conf. Speech Communication Technology, Rhodes, Greece.
- (1997) Integrating Acoustic and Labial Information for Speaker Identification and Verification
- Jourlin, P.¹ Luettin, J.² Genoud, D.³ Wassner, H.⁴

57
- 34250090755
- Snakes: Active contour models
- Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. Int. J. Comput. Vision, 4(4), 321-331.
- (1988) Int. J. Comput. Vision , vol.4 , Issue.4 , pp. 321-331
- Kass, M.¹ Witkin, A.² Terzopoulos, D.³

58
- 84900192191
- Paper presented at the European Conference on Computer Vision
- Kaucic, R., Dalton, B., & Blake, A. (1996). Real-time lip tracking for audio-visual speech recognition applications. Paper presented at the European Conference on Computer Vision.
- (1996) Real-time Lip Tracking for Audio-visual Speech Recognition Applications
- Kaucic, R.¹ Dalton, B.² Blake, A.³

59
- 11144226973
- Recent advances in visual and infrared face recognition - a review
- Kong, S. G., Heo, J., Abidi, B. R., Paik, J., & Abidi, M. A. (2005). Recent advances in visual and infrared face recognition - A review. Computer Vision Image Understanding, 97(1), 103-135.
- (2005) Computer Vision Image Understanding , vol.97 , Issue.1 , pp. 103-135
- Kong, S.G.¹ Heo, J.² Abidi, B.R.³ Paik, J.⁴ Abidi, M.A.⁵

60
- 84900286638
- Paper presented at the Tenth IEEE International Conference on Computer Vision
- Koterba, S., Baker, S., Matthews, I., Hu, C., Xiao, J., Cohn, J., et al. (2005). Multi-view aam fitting and camera calibration. Paper presented at the Tenth IEEE International Conference on Computer Vision
- (2005) Multi-view Aam Fitting and Camera Calibration
- Koterba, S.¹ Baker, S.² Matthews, I.³ Hu, C.⁴ Xiao, J.⁵ Cohn, J.⁶

61
- 84900210954
- Paper presented at the Conf. Spoken Language
- Lee, B., Hasegawa-Johnson, M., Goudeseune, C., Kamdar, S., Borys, S., Liu, M., et al. (2004). AVICAR: Audio-visual speech corpus in a car environment. Paper presented at the Conf. Spoken Language.
- (2004) AVICAR: Audio-visual Speech Corpus in a Car Environment
- Lee, B.¹ Hasegawa-Johnson, M.² Goudeseune, C.³ Kamdar, S.⁴ Borys, S.⁵ Liu, M.⁶

62
- 17744406666
- An extended set of haar-like features for rapid object detection
- Lienhart, R., & Maydt, J. (2002). An Extended Set of Haar-like Features for Rapid Object Detection. IEEE ICIP, 1, 900-903.
- (2002) IEEE ICIP , vol.1 , pp. 900-903
- Lienhart, R.¹ Maydt, J.²

63
- 0031187171
- Speech perception by humans and machines
- Lippmann, R. (1997). Speech perception by humans and machines. Speech Communication, 22, 1-15.
- (1997) Speech Communication , vol.22 , pp. 1-15
- Lippmann, R.¹

64
- 0003794291
- Unpublished Ph.D. dissertation, University of Sheffield, Sheffield, U.K
- Luettin, J. (1997). Visual speech and speaker recognition. Unpublished Ph.D. dissertation, University of Sheffield, Sheffield, U.K.
- (1997) Visual Speech and Speaker Recognition
- Luettin, J.¹

65
- 0030366433
- Paper presented at the Int. Conf. Speech and Language Processing
- Luettin, J., Thacker, N., & Beet, S. (1996). Speaker identification by lipreading. Paper presented at the Int. Conf. Speech and Language Processing.
- (1996) Speaker Identification by Lipreading
- Luettin, J.¹ Thacker, N.² Beet, S.³

66
- 34547497793
- Paper presented at the Int. Conf. Acoust. Speech Signal Process
- Marcheret, E., Libal, V., & Potamianos, G. (2007). Dynamic stream weight modeling for audio-visual speech recognition. Paper presented at the Int. Conf. Acoust. Speech Signal Process.
- (2007) Dynamic Stream Weight Modeling for Audio-visual Speech Recognition
- Marcheret, E.¹ Libal, V.² Potamianos, G.³

67
- 3042791915
- Active appearance models revisited
- Matthews, I., & Baker, S. (2004). Active appearance models revisited. Int. J. Comput. Vision, 60(2), 135-164.
- (2004) Int. J. Comput. Vision , vol.60 , Issue.2 , pp. 135-164
- Matthews, I.¹ Baker, S.²

68
- 0017199877
- Hearing lips and seeing voices
- McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.
- (1976) Nature , vol.264 , pp. 746-748
- McGurk, H.¹ Macdonald, J.²

69
- 84900069970
- Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication
- Messer, K., Matas, J., Kittler, J., Luettin, J., & Maitre, G. (1999). XM2VTSDB: The extended M2VTS database. Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication.
- (1999) XM2VTSDB: The Extended M2VTS Database
- Messer, K.¹ Matas, J.² Kittler, J.³ Luettin, J.⁴ Maitre, G.⁵

70
- 0000886386
- Visual speech recognition with stochastic networks
- In G. Tesauro, D. Toruetzky & T. Leen (Eds.), Cambridge, MA: MIT Press
- Movellan, J. R. (1995). Visual speech recognition with stochastic networks. In G. Tesauro, D. Toruetzky & T. Leen (Eds.), Advances in Neural Information Processing Systems (Vol. 7). Cambridge, MA: MIT Press.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Movellan, J.R.¹

71
- 84900200800
- Paper presented at the Audio- and Video-Based Biometric Person Authentication (AVBPA)
- Nakamura, S.(2001). Fusion of Audio-Visual Information for Integrated Speech Processing. Paper presented at the Audio- and Video-Based Biometric Person Authentication (AVBPA).
- (2001) Fusion of Audio-Visual Information for Integrated Speech Processing
- Nakamura, S.¹

72
- 84955044992
- Effect of visual factors on the intelligibility of speech
- Neely, K. K. (1956). Effect of visual factors on the intelligibility of speech. J. Acoustic. Soc. Amer., 28, 1275.
- (1956) J. Acoustic. Soc. Amer , vol.28 , pp. 1275
- Neely, K.K.¹

73
- 0036874999
- Dynamic bayesian networks for audio-visual speech recognition
- Nefian, A., Liang, L., Pi, X., Liu, X., & Murphy, K. (2002). Dynamic bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Signal Processing, 11, 1274-1288.
- (2002) Eurasip J. Appl. Signal Processing , vol.11 , pp. 1274-1288
- Nefian, A.¹ Liang, L.² Pi, X.³ Liu, X.⁴ Murphy, K.⁵

74
- 0004052871
- Audio-visual speech recognition
- Johns Hopkins Univesity, Baltimore
- Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., et al. (2000). Audio-visual speech recognition, Technical Report. Johns Hopkins Univesity, Baltimore.
- (2000) Technical Report
- Neti, C.¹ Potamianos, G.² Luettin, J.³ Matthews, I.⁴ Glotin, H.⁵ Vergyri, D.⁶

75
- 84900261148
- Open Computer Vision Library Retrieved November, 2007, from
- Open Computer Vision Library. (2008). Retrieved November, 2007, from http://sourceforge.net/projects/opencvlibrary/
- (2008)

76
- 85143190363
- Paper presented at the Int. Conf. Acoustics, Speech and Signal Processing
- Patterson, E. K., Gurbuz, S., Tufekci, Z., & Gowdy, J. N. (2002). CUAVE: A new audio-visual database for multimodal human-computer interface research. Paper presented at the Int. Conf. Acoustics, Speech and Signal Processing.
- (2002) CUAVE: A New Audio-visual Database for Multimodal Human-computer Interface Research
- Patterson, E.K.¹ Gurbuz, S.² Tufekci, Z.³ Gowdy, J.N.⁴

77
- 0022228262
- Paper presented at the IEEE Conference on CVPR
- Petajan, E. (1985). Automatic lipreading to enhance speech recognition. Paper presented at the IEEE Conference on CVPR.
- (1985) Automatic Lipreading to Enhance Speech Recognition
- Petajan, E.¹

78
- 84947917954
- Paper presented at the 1st Int. Conf. Audio- and Video-Based Biometric Person Authentication
- Pigeon, S., & Vandendorpe, L. (1997). The M2VTS multimodal face database (release 1.00). Paper presented at the 1st Int. Conf. Audio- and Video-Based Biometric Person Authentication.
- (1997) The M2VTS Multimodal Face Database (release 1.00)
- Pigeon, S.¹ Vandendorpe, L.²

79
- 44949227080
- Paper presented at the INTERSPEECH
- Pitsikalis, V., Katsamanis, A., Papandreou, G., & Maragos, P. (2006). Adaptive Multimodal Fusion by Uncertainty Compensation. Paper presented at the INTERSPEECH 2006.
- (2006) Adaptive Multimodal Fusion by Uncertainty Compensation , pp. 2006
- Pitsikalis, V.¹ Katsamanis, A.² Papandreou, G.³ Maragos, P.⁴

80
- 84900073512
- Paper presented at the 4th International Conference on Audio- and Video-Based Biometric Person Authentication
- Popovici, V., Thiran, J., Bailly-Bailliere, E., Bengio, S., Bimbot, F., Hamouz, M., et al. (2003). The BANCA Database and Evaluation Protocol. Paper presented at the 4th International Conference on Audio- and Video-Based Biometric Person Authentication.
- (2003) The BANCA Database and Evaluation Protocol
- Popovici, V.¹ Thiran, J.² Bailly-Bailliere, E.³ Bengio, S.⁴ Bimbot, F.⁵ Hamouz, M.⁶

81
- 4544290191
- Recent advances in the automatic recognition of audio-visual speech
- Potamianos, G., Neti, C., Gravier, G., Garg, A., & Senior, A. W. (2003). Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE, 91, 1306-12326.
- (2003) Proceedings of the IEEE , vol.91 , pp. 1306-12326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ Senior, A.W.⁵

82
- 15044345504
- Audio-visual automatic speech recognition: An overview
- In G. Bailly, E. Vatikiotis-Bateson & P. Perrier (Eds.), MIT Press
- Potamianos, G., Neti, C., Luettin, J., & Matthews, I. (2004). Audio-visual automatic speech recognition: An overview. In G. Bailly, E. Vatikiotis-Bateson & P. Perrier (Eds.), Issues in Visual and Audio-Visual Speech Processing: MIT Press.
- (2004) Issues in Visual and Audio-Visual Speech Processing
- Potamianos, G.¹ Neti, C.² Luettin, J.³ Matthews, I.⁴

83
- 0004244302
- Englewood Cliffs, Prentice Hall
- Rabiner, L., & Juang, B.-H. (1993). Fundamentals of Speech Recognition. Englewood Cliffs: Prentice Hall.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.-H.²

84
- 0038343934
- Information fusion in biometrics
- 2215-2125
- Ross, A., & Jain, A. (2003). Information fusion in biometrics. Pattern Recogn. Lett., 24, 2215-2125.
- (2003) Pattern Recogn. Lett , vol.24
- Ross, A.¹ Jain, A.²

85
- 27744546990
- On transforming statistical models for non-frontal face verification
- Sanderson, C., Bengio, S., & Gao, Y. (2006). On transforming statistical models for non-frontal face verification. Pattern Recognition, 29(2), 288-302.
- (2006) Pattern Recognition , vol.29 , Issue.2 , pp. 288-302
- Sanderson, C.¹ Bengio, S.² Gao, Y.³

86
- 0036487270
- Noise compensation in a person verification system using face and multiple speech features
- Sanderson, C., & Paliwal, K. K. (2003). Noise compensation in a person verification system using face and multiple speech features. Pattern Recognition, 36(2), 293-302.
- (2003) Pattern Recognition , vol.36 , Issue.2 , pp. 293-302
- Sanderson, C.¹ Paliwal, K.K.²

87
- 4544228318
- Identity verification using speech and face information
- Sanderson, C., & Paliwal, K. K. (2004). Identity verification using speech and face information. Digital Signal Processing, 14(5), 449-480.
- (2004) Digital Signal Processing , vol.14 , Issue.5 , pp. 449-480
- Sanderson, C.¹ Paliwal, K.K.²

88
- 33947376189
- May, Paper presented at the Int. Conf. Acoustics, Speech Signal Processing, Toulouse, France
- Sargin, M. E., Erzin, E., Yemez, Y., & Tekalp, A. M. (2006, May). Multimodal speaker identification using canonical correlation analysis. Paper presented at the Int. Conf. Acoustics, Speech Signal Processing, Toulouse, France.
- (2006) Multimodal Speaker Identification Using Canonical Correlation Analysis
- Sargin, M.E.¹ Erzin, E.² Yemez, Y.³ Tekalp, A.M.⁴

89
- 84940668557
- September, Paper presented at the Forty-Fifth Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL
- Shiell, D. J., Terry, L. H., Aleksic, P. S., & Katsaggelos, A. K. (2007, September). An Automated System for Visual Biometrics. Paper presented at the Forty-Fifth Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL.
- (2007) An Automated System for Visual Biometrics
- Shiell, D.J.¹ Terry, L.H.² Aleksic, P.S.³ Katsaggelos, A.K.⁴

90
- 0018701386
- Use of visual information in phonetic perception
- Summerfield, Q. (1979). Use of visual information in phonetic perception. Phonetica, 36, 314-331.
- (1979) Phonetica , vol.36 , pp. 314-331
- Summerfield, Q.¹

91
- 0002028032
- Some preliminaries to a comprehensive account of audio-visual speech perception
- In R. Campbell & B. Dodd (Eds.), London, U.K., Lawrence Erlbaum
- Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In R. Campbell & B. Dodd (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). London, U.K.: Lawrence Erlbaum.
- (1987) Hearing by Eye: The Psychology of Lip-Reading , pp. 3-51
- Summerfield, Q.¹

92
- 0027128576
- Lipreading and audio-visual speech perception
- Summerfield, Q. (1992). Lipreading and audio-visual speech perception. Philosophical Transactions: Biological Sciences, 335(1273), 71-78.
- (1992) Philosophical Transactions: Biological Sciences , vol.335 , Issue.1273 , pp. 71-78
- Summerfield, Q.¹

93
- 33646814706
- A stream-weight optimization method for multi--stream hmms based on likelihood value normalization
- Tamura, S., Iwano, K., & Furui, S. (2005). A Stream-Weight Optimization Method for Multi--Stream HMMs Based on Likelihood Value Normalization. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP '05), 1, 469-472.
- (2005) Int. Conf. Acoustics, Speech and Signal Processing (ICASSP '05) , vol.1 , pp. 469-472
- Tamura, S.¹ Iwano, K.² Furui, S.³

94
- 84900124666
- Unknown, Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication, Washington, D. C
- Unknown. (1999). Robust speaker verification via asynchronous fusion of speech and lip information. Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication, Washington, D. C.
- (1999) Robust Speaker Verification Via Asynchronous Fusion of Speech and Lip Information

95
- 0035680116
- Paper presented at the IEEE Conf. on Computer Vision and Pattern Recognition
- Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Paper presented at the IEEE Conf. on Computer Vision and Pattern Recognition.
- (2001) Rapid Object Detection Using a Boosted Cascade of Simple Features
- Viola, P.¹ Jones, M.²

96
- 84900197162
- Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication, Washington, D. C
- Wark, T., Sridharan, S., & Chandran, V. (1999). Robust speaker verification via asynchronous fusion of speech and lip information. Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication, Washington, D. C.
- (1999) Robust Speaker Verification Via Asynchronous Fusion of Speech and Lip Information
- Wark, T.¹ Sridharan, S.² Chandran, V.³

97
- 0033692608
- Paper presented at the Int. Conf. Acoustics, Speech Signal Processing, Istanbul, Turkey
- Wark, T., Sridharan, S., & Chandran, V. (2000). The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs. Paper presented at the Int. Conf. Acoustics, Speech Signal Processing, Istanbul, Turkey.
- (2000) The Use of Temporal Speech and Lip Information for Multi-modal Speaker Identification Via Multi-stream HMMs
- Wark, T.¹ Sridharan, S.² Chandran, V.³

98
- 0032178399
- Frame rate and viseme analysis for multimedia applications
- Williams, J. J., Rutledge, J. C., Garstecki, D. C., & Katsaggelos, A. K. (1998). Frame rate and viseme analysis for multimedia applications. VLSI Signal Processing Systems, 23(1/2), 7-23.
- (1998) VLSI Signal Processing Systems , vol.23 , Issue.1-2 , pp. 7-23
- Williams, J.J.¹ Rutledge, J.C.² Garstecki, D.C.³ Katsaggelos, A.K.⁴

99
- 84963799333
- October, Paper presented at the Int. Conf. on Multimodal Interfaces, Pittsburgh, PA
- Wu, Z., Aleksic, P. S., & Katsaggelos, A. K. (2002, October). Lip tracking for MPEG-4 facial animation. Paper presented at the Int. Conf. on Multimodal Interfaces, Pittsburgh, PA.
- (2002) Lip Tracking for MPEG-4 Facial Animation
- Wu, Z.¹ Aleksic, P.S.² Katsaggelos, A.K.³

100
- 4544321778
- May, Paper presented at the Int. Conf. Acoust., Speech, Signal Processing, Montreal, Canada
- Wu, Z., Aleksic, P. S., & Katsaggelos, A. K. (2004, May). Inner lip feature extraction for MPEG-4 facial animation. Paper presented at the Int. Conf. Acoust., Speech, Signal Processing, Montreal, Canada.
- (2004) Inner Lip Feature Extraction for MPEG-4 Facial Animation
- Wu, Z.¹ Aleksic, P.S.² Katsaggelos, A.K.³

101
- 5044240235
- Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition
- Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004). Real-time combined 2D+3D active appearance models. Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition.
- (2004) Real-time Combined 2D+3D Active Appearance Models
- Xiao, J.¹ Baker, S.² Matthews, I.³ Kanade, T.⁴

102
- 0036223025
- Detecting faces in images: A survey
- Yang, M.-H., Kriegman, D., & Ahuja, N. (2002). Detecting faces in images: A survey. IEEE Trans. Pattern Anal. Machine Intell., 24(1), 34-58.
- (2002) IEEE Trans. Pattern Anal. Machine Intell , vol.24 , Issue.1 , pp. 34-58
- Yang, M.-H.¹ Kriegman, D.² Ahuja, N.³

103
- 84900083985
- Paper presented at the 14th Int. Congr. Phonetic Sciences
- Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (1999). Using speech acoustics to drive facial motion. Paper presented at the 14th Int. Congr. Phonetic Sciences.
- (1999) Using Speech Acoustics to Drive Facial Motion
- Yehia, H.C.¹ Kuratate, T.² Vatikiotis-Bateson, E.³

104
- 0032178592
- Quantitative association of vocal-tract and facial behavior
- Yehia, H. C., Rubin, P., & Vatikiotis-Bateson, E. (1998). Quantitative association of vocal-tract and facial behavior. Speech Communication, 26(1-2), 23-43.
- (1998) Speech Communication , vol.26 , Issue.1-2 , pp. 23-43
- Yehia, H.C.¹ Rubin, P.² Vatikiotis-Bateson, E.³

105
- 37849038203
- Retrieved November, 2007, from
- Young, S. (2008). The ATK Real-Time API for HTK. Retrieved November, 2007, from http://mi.eng. cam.ac.uk/research/dialogue/atk_home
- (2008) The ATK Real-Time API for HTK
- Young, S.¹

106
- 0003822743
- London, U.K.: Entropic
- Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., et al. (2005). The HTK Book. London, U.K.: Entropic.
- (2005) The HTK Book
- Young, S.¹ Evermann, G.² Hain, T.³ Kershaw, D.⁴ Moore, G.⁵ Odell, J.⁶

107
- 34948876367
- Paper presented at the IEEE Conf. on Computer Vision and Pattern Recognition
- Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of Collocation Patterns: from Visual Words to Visual Phrases. Paper presented at the IEEE Conf. on Computer Vision and Pattern Recognition.
- (2007) Discovery of Collocation Patterns: From Visual Words to Visual Phrases
- Yuan, J.¹ Wu, Y.² Yang, M.³

108
- 0026903014
- Feature extraction from faces using deformable templates
- Yuille, A. L., Hallinan, P. W., & Cohen, D. S. (1992). Feature extraction from faces using deformable templates. Int. J. Comput. Vision, 8(2), 99-111.
- (1992) Int. J. Comput. Vision , vol.8 , Issue.2 , pp. 99-111
- Yuille, A.L.¹ Hallinan, P.W.² Cohen, D.S.³

109
- 1842499650
- Face recognition: A literature survey
- Zhao, W.-Y., Chellappa, R., Phillips, P. J., & Rosenfeld, A. (2003). Face recognition: A literature survey. ACM Computing Survey, 35(4), 399-458.
- (2003) ACM Computing Survey , vol.35 , Issue.4 , pp. 399-458
- Zhao, W.-Y.¹ Chellappa, R.² Phillips, P.J.³ Rosenfeld, A.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.