메뉴 건너뛰기




Volumn , Issue , 2009, Pages 1-38

Audio-visual and visual-only speech and speaker recognition: Issues about theory, system design, and implementation

Author keywords

[No Author keywords available]

Indexed keywords


EID: 84870243209     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.4018/978-1-60566-186-5.ch001     Document Type: Chapter
Times cited : (6)

References (109)
  • 1
    • 84900182369 scopus 로고    scopus 로고
    • The AAM-API, Retrieved November, 2007, from
    • The AAM-API. (2008). Retrieved November, 2007, from http://www2.imm.dtu.dk/~aam/aamapi/
    • (2008)
  • 3
    • 34247584561 scopus 로고    scopus 로고
    • Product hmms for audio-visual continuous speech recognition using facial animation parameters
    • July, In, Baltimore, MD
    • Aleksic, P. S., & Katsaggelos, A. K. (2003b, July). Product HMMs for Audio-Visual Continuous Speech Recognition Using Facial Animation Parameters. In Proceedings of IEEE Int. Conf. on Multimedia & Expo (ICME), Vol. 2, (pp. 481-484), Baltimore, MD.
    • (2003) Proceedings of IEEE Int. Conf. on Multimedia & Expo (ICME) , vol.2 , pp. 481-484
    • Aleksic, P.S.1    Katsaggelos, A.K.2
  • 5
  • 10
    • 1542285823 scopus 로고    scopus 로고
    • Lucas-kanade 20 years on: A unifying framework
    • Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. Int. J. Comput. Vision, 56(3), 221-255.
    • (2004) Int. J. Comput. Vision , vol.56 , Issue.3 , pp. 221-255
    • Baker, S.1    Matthews, I.2
  • 12
    • 22944438473 scopus 로고    scopus 로고
    • Evaluation of a boosted cascade of haar-like features in the presence of partial occlusions and shadows for real time face detection
    • In, Berlin, Germany: Springer
    • Barczak, A. L. C. (2004). Evaluation of a Boosted Cascade of Haar-Like Features in the Presence of Partial Occlusions and Shadows for Real Time Face Detection. In PRICAI 2004: Trends in Artificial Intelligence, 3157, 969-970. Berlin, Germany: Springer.
    • (2004) PRICAI 2004: Trends in Artificial Intelligence , vol.3157 , pp. 969-970
    • Barczak, A.L.C.1
  • 14
    • 0032594952 scopus 로고    scopus 로고
    • Fusion of face and speech data for person identity verification
    • Ben-Yacoub, S., Abdeljaoued, Y., & Mayoraz, E. (1999). Fusion of face and speech data for person identity verification. IEEE Trans. Neural Networks, 10, 1065-1074.
    • (1999) IEEE Trans. Neural Networks , vol.10 , pp. 1065-1074
    • Ben-Yacoub, S.1    Abdeljaoued, Y.2    Mayoraz, E.3
  • 16
    • 84900120370 scopus 로고    scopus 로고
    • Paper presented at the Int. Conf. Machine Learning, Workshop ROC Analysis Machine Learning
    • Bengio, S., Mariethoz, J., & Keller, M. (2005). The expected performance curve. Paper presented at the Int. Conf. Machine Learning, Workshop ROC Analysis Machine Learning.
    • (2005) The Expected Performance Curve
    • Bengio, S.1    Mariethoz, J.2    Keller, M.3
  • 20
    • 27844534088 scopus 로고    scopus 로고
    • A survey of approaches and challenges in 3-d and multi-modal 3-d face recognition
    • Bowyer, K. W., Chang, K., & Flynn, P. (2006). A survey of approaches and challenges in 3-D and multi-modal 3-D face recognition. Computer Vision Image Understanding, 101(1), 1-15.
    • (2006) Computer Vision Image Understanding , vol.101 , Issue.1 , pp. 1-15
    • Bowyer, K.W.1    Chang, K.2    Flynn, P.3
  • 22
    • 0031233424 scopus 로고    scopus 로고
    • Speaker recognition: A tutorial
    • Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437-1462.
    • (1997) Proceedings of the IEEE , vol.85 , Issue.9 , pp. 1437-1462
    • Campbell, J.P.1
  • 27
    • 85032752352 scopus 로고    scopus 로고
    • Audiovisual speech processing
    • Chen, T. (2001). Audiovisual speech processing. IEEE Signal Processing Mag., 18, 9-21.
    • (2001) IEEE Signal Processing Mag , vol.18 , pp. 9-21
    • Chen, T.1
  • 31
    • 46149146042 scopus 로고    scopus 로고
    • Design issues for a digital audio-visual integrated database
    • Paper presented at the Integrated Audio-Visual Processing for Recognition, Synthesis and Communication (Digest No: 1996/213), IEE Colloquium on
    • Chibelushi, C. C., Gandon, S., Mason, J. S. D., Deravi, F., & Johnston, R. D. (1996). Design issues for a digital audio-visual integrated database. Paper presented at the Integrated Audio-Visual Processing for Recognition, Synthesis and Communication (Digest No: 1996/213), IEE Colloquium on.
    • (1996)
    • Chibelushi, C.C.1    Gandon, S.2    Mason, J.S.D.3    Deravi, F.4    Johnston, R.D.5
  • 32
    • 80052878189 scopus 로고    scopus 로고
    • Retrieved November, 2007, from
    • Cootes, T. (2008). Modelling and Search Software. Retrieved November, 2007, from http://www.isbe. man.ac.uk/~bim/software/am_tools_doc/index.html
    • (2008) Modelling and Search Software
    • Cootes, T.1
  • 37
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modeling for continuous speech recognition
    • Dupont, S., & Luettin, J. (2000). Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia, 2(3), 141-151.
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 41
    • 0012745879 scopus 로고    scopus 로고
    • Rationale for phoneme-viseme mapping and feature selection in visual speech recognition
    • In D. G. Stork & M. E. Hennecke (Eds.), Berlin, Germany, Springer
    • Goldschen, A. J., Garcia, O. N., & Petajan, E. D. (1996). Rationale for phoneme-viseme mapping and feature selection in visual speech recognition. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines (pp. 505-515). Berlin, Germany: Springer.
    • (1996) Speechreading by Humans and Machines , pp. 505-515
    • Goldschen, A.J.1    Garcia, O.N.2    Petajan, E.D.3
  • 42
    • 0036875002 scopus 로고    scopus 로고
    • A support vector machine-based dynamic network for visual speech recognition applications
    • Gordan, M., Kotropoulos, C., & Pitas, I. (2002). A support vector machine-based dynamic network for visual speech recognition applications. EURASIP J. Appl. Signal Processing, 2002(11), 1248-1259.
    • (2002) EURASIP J. Appl. Signal Processing , vol.2002 , Issue.11 , pp. 1248-1259
    • Gordan, M.1    Kotropoulos, C.2    Pitas, I.3
  • 46
    • 0003807773 scopus 로고    scopus 로고
    • 4th Edition, Upper Saddle River, NJ, Prentice Hall
    • Haykin, S. (2002). Adaptive Filter Theory: 4th Edition. Upper Saddle River, NJ: Prentice Hall.
    • (2002) Adaptive Filter Theory
    • Haykin, S.1
  • 49
    • 0032295436 scopus 로고    scopus 로고
    • Integrating faces and fingerprints for personal identification
    • Hong, L., & Jain, A. (1998). Integrating faces and fingerprints for personal identification. IEEE Trans. Pattern Anal. Machine Intell., 20, 1295-1307.
    • (1998) IEEE Trans. Pattern Anal. Machine Intell , vol.20 , pp. 1295-1307
    • Hong, L.1    Jain, A.2
  • 50
    • 84900181001 scopus 로고    scopus 로고
    • HTK Speech Recognition Toolkit, Retrieved November, 2007, from
    • HTK Speech Recognition Toolkit. (2008). Retrieved November, 2007, from http://htk.eng.cam.ac.uk/
    • (2008)
  • 52
    • 27944492024 scopus 로고    scopus 로고
    • Sequential mean field variational analysis of structured deformable shapes
    • Hua, G., & Y.Wu. (2006). Sequential mean field variational analysis of structured deformable shapes. Computer Vision and Image Understanding, 101, 87-99.
    • (2006) Computer Vision and Image Understanding , vol.101 , pp. 87-99
    • Hua, G.1    Wu, Y.2
  • 62
    • 17744406666 scopus 로고    scopus 로고
    • An extended set of haar-like features for rapid object detection
    • Lienhart, R., & Maydt, J. (2002). An Extended Set of Haar-like Features for Rapid Object Detection. IEEE ICIP, 1, 900-903.
    • (2002) IEEE ICIP , vol.1 , pp. 900-903
    • Lienhart, R.1    Maydt, J.2
  • 63
    • 0031187171 scopus 로고    scopus 로고
    • Speech perception by humans and machines
    • Lippmann, R. (1997). Speech perception by humans and machines. Speech Communication, 22, 1-15.
    • (1997) Speech Communication , vol.22 , pp. 1-15
    • Lippmann, R.1
  • 64
    • 0003794291 scopus 로고    scopus 로고
    • Unpublished Ph.D. dissertation, University of Sheffield, Sheffield, U.K
    • Luettin, J. (1997). Visual speech and speaker recognition. Unpublished Ph.D. dissertation, University of Sheffield, Sheffield, U.K.
    • (1997) Visual Speech and Speaker Recognition
    • Luettin, J.1
  • 67
    • 3042791915 scopus 로고    scopus 로고
    • Active appearance models revisited
    • Matthews, I., & Baker, S. (2004). Active appearance models revisited. Int. J. Comput. Vision, 60(2), 135-164.
    • (2004) Int. J. Comput. Vision , vol.60 , Issue.2 , pp. 135-164
    • Matthews, I.1    Baker, S.2
  • 68
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.
    • (1976) Nature , vol.264 , pp. 746-748
    • McGurk, H.1    Macdonald, J.2
  • 70
    • 0000886386 scopus 로고
    • Visual speech recognition with stochastic networks
    • In G. Tesauro, D. Toruetzky & T. Leen (Eds.), Cambridge, MA: MIT Press
    • Movellan, J. R. (1995). Visual speech recognition with stochastic networks. In G. Tesauro, D. Toruetzky & T. Leen (Eds.), Advances in Neural Information Processing Systems (Vol. 7). Cambridge, MA: MIT Press.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Movellan, J.R.1
  • 72
    • 84955044992 scopus 로고
    • Effect of visual factors on the intelligibility of speech
    • Neely, K. K. (1956). Effect of visual factors on the intelligibility of speech. J. Acoustic. Soc. Amer., 28, 1275.
    • (1956) J. Acoustic. Soc. Amer , vol.28 , pp. 1275
    • Neely, K.K.1
  • 75
    • 84900261148 scopus 로고    scopus 로고
    • Open Computer Vision Library Retrieved November, 2007, from
    • Open Computer Vision Library. (2008). Retrieved November, 2007, from http://sourceforge.net/projects/opencvlibrary/
    • (2008)
  • 82
  • 84
    • 0038343934 scopus 로고    scopus 로고
    • Information fusion in biometrics
    • 2215-2125
    • Ross, A., & Jain, A. (2003). Information fusion in biometrics. Pattern Recogn. Lett., 24, 2215-2125.
    • (2003) Pattern Recogn. Lett , vol.24
    • Ross, A.1    Jain, A.2
  • 85
    • 27744546990 scopus 로고    scopus 로고
    • On transforming statistical models for non-frontal face verification
    • Sanderson, C., Bengio, S., & Gao, Y. (2006). On transforming statistical models for non-frontal face verification. Pattern Recognition, 29(2), 288-302.
    • (2006) Pattern Recognition , vol.29 , Issue.2 , pp. 288-302
    • Sanderson, C.1    Bengio, S.2    Gao, Y.3
  • 86
    • 0036487270 scopus 로고    scopus 로고
    • Noise compensation in a person verification system using face and multiple speech features
    • Sanderson, C., & Paliwal, K. K. (2003). Noise compensation in a person verification system using face and multiple speech features. Pattern Recognition, 36(2), 293-302.
    • (2003) Pattern Recognition , vol.36 , Issue.2 , pp. 293-302
    • Sanderson, C.1    Paliwal, K.K.2
  • 87
    • 4544228318 scopus 로고    scopus 로고
    • Identity verification using speech and face information
    • Sanderson, C., & Paliwal, K. K. (2004). Identity verification using speech and face information. Digital Signal Processing, 14(5), 449-480.
    • (2004) Digital Signal Processing , vol.14 , Issue.5 , pp. 449-480
    • Sanderson, C.1    Paliwal, K.K.2
  • 89
    • 84940668557 scopus 로고    scopus 로고
    • September, Paper presented at the Forty-Fifth Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL
    • Shiell, D. J., Terry, L. H., Aleksic, P. S., & Katsaggelos, A. K. (2007, September). An Automated System for Visual Biometrics. Paper presented at the Forty-Fifth Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL.
    • (2007) An Automated System for Visual Biometrics
    • Shiell, D.J.1    Terry, L.H.2    Aleksic, P.S.3    Katsaggelos, A.K.4
  • 90
    • 0018701386 scopus 로고
    • Use of visual information in phonetic perception
    • Summerfield, Q. (1979). Use of visual information in phonetic perception. Phonetica, 36, 314-331.
    • (1979) Phonetica , vol.36 , pp. 314-331
    • Summerfield, Q.1
  • 91
    • 0002028032 scopus 로고
    • Some preliminaries to a comprehensive account of audio-visual speech perception
    • In R. Campbell & B. Dodd (Eds.), London, U.K., Lawrence Erlbaum
    • Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In R. Campbell & B. Dodd (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). London, U.K.: Lawrence Erlbaum.
    • (1987) Hearing by Eye: The Psychology of Lip-Reading , pp. 3-51
    • Summerfield, Q.1
  • 94
    • 84900124666 scopus 로고    scopus 로고
    • Unknown, Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication, Washington, D. C
    • Unknown. (1999). Robust speaker verification via asynchronous fusion of speech and lip information. Paper presented at the 2nd Int. Conf. Audio- and Video-Based Biometric Person Authentication, Washington, D. C.
    • (1999) Robust Speaker Verification Via Asynchronous Fusion of Speech and Lip Information
  • 104
    • 0032178592 scopus 로고    scopus 로고
    • Quantitative association of vocal-tract and facial behavior
    • Yehia, H. C., Rubin, P., & Vatikiotis-Bateson, E. (1998). Quantitative association of vocal-tract and facial behavior. Speech Communication, 26(1-2), 23-43.
    • (1998) Speech Communication , vol.26 , Issue.1-2 , pp. 23-43
    • Yehia, H.C.1    Rubin, P.2    Vatikiotis-Bateson, E.3
  • 105
    • 37849038203 scopus 로고    scopus 로고
    • Retrieved November, 2007, from
    • Young, S. (2008). The ATK Real-Time API for HTK. Retrieved November, 2007, from http://mi.eng. cam.ac.uk/research/dialogue/atk_home
    • (2008) The ATK Real-Time API for HTK
    • Young, S.1
  • 108
    • 0026903014 scopus 로고
    • Feature extraction from faces using deformable templates
    • Yuille, A. L., Hallinan, P. W., & Cohen, D. S. (1992). Feature extraction from faces using deformable templates. Int. J. Comput. Vision, 8(2), 99-111.
    • (1992) Int. J. Comput. Vision , vol.8 , Issue.2 , pp. 99-111
    • Yuille, A.L.1    Hallinan, P.W.2    Cohen, D.S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.