-
1
-
-
0017199877
-
Hearing lips and seeing voices
-
H. McGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, no. 5588, pp. 746-748, 1976.
-
(1976)
Nature
, vol.264
, Issue.5588
, pp. 746-748
-
-
McGurk, H.1
MacDonald, J.2
-
2
-
-
0001432664
-
On the integration of auditory and visual parameters in an HMM-based ASR
-
D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer
-
A. Adjoudani and C. Benôit, "On the integration of auditory and visual parameters in an HMM-based ASR," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer, 1996, pp. 461-471.
-
(1996)
Speechreading by Humans and Machines
, pp. 461-471
-
-
Adjoudani, A.1
Benôit, C.2
-
3
-
-
85032752352
-
Audiovisual speech processing: Lip reading and lip synchronization
-
DOI 10.1109/79.911195
-
T. Chen, "Audiovisual speech processing. lip reading and lip synchronization," IEEE Signal Process. Mag., vol. 18, no. 1, pp. 9-21, Jan. 2001. (Pubitemid 32287667)
-
(2001)
IEEE Signal Processing Magazine
, vol.18
, Issue.1
, pp. 9-21
-
-
Chen, T.1
-
4
-
-
0034853041
-
Hierarchical discriminant features for audio-visual LVCSR
-
G. Potamianos, J. Luettin, and C. Neti, "Hierarchical discriminant features for audio-visual LVCSR," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2001, pp. 165-168. (Pubitemid 32839213)
-
(2001)
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
, vol.1
, pp. 165-168
-
-
Potamianos, G.1
Luettin, J.2
Neti, C.3
-
5
-
-
0036295990
-
Noisy audio feature enhancement using audio-visual speech data
-
R. Goecke, G. Potamianos, and C. Neti, "Noisy audio feature enhancement using audio-visual speech data," in Proc. Int. Conf. Acoust. Speech Signal Process., 2002, pp. 2025-2028.
-
(2002)
Proc. Int. Conf. Acoust. Speech Signal Process
, pp. 2025-2028
-
-
Goecke, R.1
Potamianos, G.2
Neti, C.3
-
6
-
-
0034842342
-
Asynchronous stream modeling for large vocabulary audio-visual speech recognition
-
J. Luettin, G. Potamianos, and C. Neti, "Asynchronous stream modeling for large vocabulary audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2001, pp. 169-172. (Pubitemid 32839214)
-
(2001)
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
, vol.1
, pp. 169-172
-
-
Luettin, J.1
Potamianos, G.2
Neti, C.3
-
7
-
-
80051637579
-
A multi-stream ASR framework for BLSTM modeling of conversational speech
-
May
-
M. Wollmer, F. Eyben, B. Schuller, and G. Rigoll, "A multi-stream ASR framework for BLSTM modeling of conversational speech," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2011, pp. 4860-4863.
-
(2011)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process
, pp. 4860-4863
-
-
Wollmer, M.1
Eyben, F.2
Schuller, B.3
Rigoll, G.4
-
8
-
-
77949373348
-
Improved decision trees for multistream HMM-based audio-visual continuous speech recognition
-
Understanding, Nov.
-
J. Huang and K. Visweswariah, "Improved decision trees for multistream HMM-based audio-visual continuous speech recognition," in Proc. Workshop IEEE Autom. Speech Recognit. Understanding, Nov. 2009, pp. 228-231.
-
(2009)
Proc. Workshop IEEE Autom. Speech Recognit
, pp. 228-231
-
-
Huang, J.1
Visweswariah, K.2
-
9
-
-
84890568355
-
A novel algorithm for acoustic and visual classifiers decision fusion in audio-visual speech recognition system
-
R. Rajavel and P. S. Sathidevi, "A novel algorithm for acoustic and visual classifiers decision fusion in audio-visual speech recognition system," Signal Process. Int. J., vol. 4, no. 1 pp. 23-37, 2010.
-
(2010)
Signal Process. Int. J.
, vol.4
, Issue.1
, pp. 23-37
-
-
Rajavel, R.1
Sathidevi, P.S.2
-
10
-
-
84897584045
-
On dynamic stream weighting for audio-visual speech recognition
-
May
-
V. Estellers, M. Gurban, and J. Thiran, "On dynamic stream weighting for audio-visual speech recognition," IEEE Trans. Audio Speech Language Process., vol. 20, no. 4, pp. 1145-1157, May 2012.
-
(2012)
IEEE Trans. Audio Speech Language Process.
, vol.20
, Issue.4
, pp. 1145-1157
-
-
Estellers, V.1
Gurban, M.2
Thiran, J.3
-
11
-
-
56149109954
-
Fused HMMadaptation of multi-stream HMMS for audio-visual speech recognition
-
D. B. Dean, P. J. Lucey, S. Sridharan, and T. J. Wark, "Fused HMMadaptation of multi-stream HMMS for audio-visual speech recognition," in Proc. 8th Annu. Conf. Int. Speech Commun. Assoc., 2007, pp. 666-669.
-
(2007)
Proc. 8th Annu. Conf. Int. Speech Commun. Assoc.
, pp. 666-669
-
-
Dean, D.B.1
Lucey, P.J.2
Sridharan, S.3
Wark, T.J.4
-
12
-
-
0034270644
-
Audio-visual speech modeling for continuous speech recognition
-
Sep.
-
S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
-
(2000)
IEEE Trans. Multimedia
, vol.2
, Issue.3
, pp. 141-151
-
-
Dupont, S.1
Luettin, J.2
-
13
-
-
0036874999
-
Dynamic Bayesian networks for audio-visual speech recognition
-
Nov.
-
A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, "Dynamic Bayesian networks for audio-visual speech recognition," EURASIP J. Appl. Signal Process., pp. 1274-1288, Nov. 2002.
-
(2002)
EURASIP J. Appl. Signal Process.
, pp. 1274-1288
-
-
Nefian, A.V.1
Liang, L.2
Pi, X.3
Liu, X.4
Murphy, K.5
-
14
-
-
0036874527
-
Noise adaptive stream weighting in audio-visual speech recognition
-
Nov.
-
M. Heckmann, F. Berthommier, and K. Kroschel, "Noise adaptive stream weighting in audio-visual speech recognition," EURASIP J. Appl. Signal Process., vol. 11, pp. 1260-1273, Nov. 2002.
-
(2002)
EURASIP J. Appl. Signal Process.
, vol.11
, pp. 1260-1273
-
-
Heckmann, M.1
Berthommier, F.2
Kroschel, K.3
-
15
-
-
0034848499
-
Optimal weighting of posteriors for audio-visual speech recognition
-
M. Heckmann, F. Berthommier, and K. Kroschel, "Optimal weighting of posteriors for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. May 2001, pp. 161-164. (Pubitemid 32839212)
-
(2001)
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
, vol.1
, pp. 161-164
-
-
Heckmann, M.1
Berthommier, F.2
Kroschel, K.3
-
16
-
-
69949118452
-
Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition
-
Oct.
-
L. Terry, D. Shiell, and A. Katsaggelos, "Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition," in Proc. 15th IEEE Int. Conf. Image Process., Oct. 2008, pp. 1316-1319.
-
(2008)
Proc. 15th IEEE Int. Conf. Image Process
, pp. 1316-1319
-
-
Terry, L.1
Shiell, D.2
Katsaggelos, A.3
-
17
-
-
0036875048
-
Automatic speechreading with applications to human-computer interfaces
-
Nov.
-
X. Zhang, C. C. Broun, R. M. Mersereau, and M. A. Clements, "Automatic speechreading with applications to human-computer interfaces," EURASIP J. Appl. Signal Process., vol. 11, pp. 1228-1247, Nov. 2002.
-
(2002)
EURASIP J. Appl. Signal Process.
, vol.11
, pp. 1228-1247
-
-
Zhang, X.1
Broun, C.C.2
Mersereau, R.M.3
Clements, M.A.4
-
18
-
-
85009154155
-
Stream weight optimization of speech and lip image sequence for audiovisual speech recognition
-
S. Nakamura, H. Ito, and K. Shikano, "Stream weight optimization of speech and lip image sequence for audiovisual speech recognition," in Proc. Int. Conf. Spoken Language Process., vol. 3. 2000, pp. 20-23.
-
(2000)
Proc. Int. Conf. Spoken Language Process.
, vol.3
, pp. 20-23
-
-
Nakamura, S.1
Ito, H.2
Shikano, K.3
-
19
-
-
17344376380
-
Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR
-
May
-
G. Gravier, S. Axelrod, G. Potamianos, and C. Neti, "Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. May 2002, pp. 853-856.
-
(2002)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process
, vol.1
, pp. 853-856
-
-
Gravier, G.1
Axelrod, S.2
Potamianos, G.3
Neti, C.4
-
20
-
-
0141814785
-
Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
-
Apr.
-
A. Garg, G. Potamianos, C. Neti, and T. S. Huang, "Frame-dependent multi-stream reliability indicators for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. Apr. 2003, pp. 24-27.
-
(2003)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process
, vol.1
, pp. 24-27
-
-
Garg, A.1
Potamianos, G.2
Neti, C.3
Huang, T.S.4
-
21
-
-
75749106784
-
Audio-visual integration for robust speech recognition using maximum weighted stream posteriors
-
R. Seymour, D. Stewart, and J. Ming, "Audio-visual integration for robust speech recognition using maximum weighted stream posteriors," in Proc. Interspeech, 2007, pp. 654-657.
-
(2007)
Proc. Interspeech
, pp. 654-657
-
-
Seymour, R.1
Stewart, D.2
Ming, J.3
-
22
-
-
84885728886
-
Your word is my command': Google search by voice: A case study
-
ch. 4
-
J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Kamvar, and B. Strope, "'Your word is my command': Google search by voice: A case study," in Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, 2010, ch. 4, pp. 61-90.
-
(2010)
Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics
, pp. 61-90
-
-
Schalkwyk, J.1
Beeferman, D.2
Beaufays, F.3
Byrne, B.4
Chelba, C.5
Cohen, M.6
Kamvar, M.7
Strope, B.8
-
23
-
-
33745224761
-
A new posterior based audio-visual integration method for robust speech recognition
-
9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
-
R. Seymour, J. Ming, and D. Stewart, "A new posterior based audiovisual integration method for robust speech recognition," in Proc. Interspeech-Eurospeech, Sep. 2005, pp. 1229-1232. (Pubitemid 43908290)
-
(2005)
9th European Conference on Speech Communication and Technology
, pp. 1229-1232
-
-
Seymour, R.1
Ming, J.2
Stewart, D.3
-
24
-
-
33646410695
-
A posterior union model with applications to robust speech and speaker recognition
-
Apr.
-
J. Ming, J. Lin, and F. J. Smith, "A posterior union model with applications to robust speech and speaker recognition," EURASIP J. Applied Signal Process., Apr. 2006, pp. 1-12.
-
(2006)
EURASIP J. Applied Signal Process.
, pp. 1-12
-
-
Ming, J.1
Lin, J.2
Smith, F.J.3
-
25
-
-
69449094603
-
Robust face recognition using posterior union model based neural networks
-
Sep.
-
J. Lin, J. Ming, and D. Crookes, "Robust face recognition using posterior union model based neural networks," Comput. Vision, IET, vol. 3, no. 3, pp. 130-142, Sep. 2009.
-
(2009)
Comput. Vision, IET
, vol.3
, Issue.3
, pp. 130-142
-
-
Lin, J.1
Ming, J.2
Crookes, D.3
-
26
-
-
0001935972
-
XM2VTSDB: The extended M2VTS database
-
Mar.
-
K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, "XM2VTSDB: The extended M2VTS database," in Proc. Audio video-Based Biometric Person Authentication, Mar. 1999, pp. 72-77.
-
(1999)
Proc. Audio Video-Based Biometric Person Authentication
, pp. 72-77
-
-
Messer, K.1
Matas, J.2
Kittler, J.3
Luettin, J.4
Maitre, G.5
-
27
-
-
0003822743
-
-
(for HTK Version 3.0), Microsoft Corporation [Online] Available
-
S. Young. (2000). The HTK Book (for HTK Version 3.0), Microsoft Corporation [Online]. Available: http://htk.eng.cam.ac.uk/docs/docs.shtml
-
(2000)
The HTK Book
-
-
Young, S.1
-
29
-
-
0032314380
-
An image transform approach for HMM based automatic lipreading
-
G. Potamianos, H. P. Graf, and E. Cosatto, "An image transform approach for HMM based automatic lipreading," in Proc. Int. Conf. Image Process., vol. 3. 1998, pp. 173-177.
-
(1998)
Proc. Int. Conf. Image Process
, vol.3
, pp. 173-177
-
-
Potamianos, G.1
Graf, H.P.2
Cosatto, E.3
-
30
-
-
43949091431
-
Comparison of image transformbased features for visual speech recognition in clean and corrupted videos
-
article 14, Apr.
-
R. Seymour, D. Stewart, and J. Ming, "Comparison of image transformbased features for visual speech recognition in clean and corrupted videos," EURASIP J. Image Video Process., vol. 2008, article 14, Apr. 2008.
-
(2008)
EURASIP J. Image Video Process
, vol.2008
-
-
Seymour, R.1
Stewart, D.2
Ming, J.3
-
31
-
-
70349494073
-
Dynamic visual features for audio-visual speaker verification
-
D. Dean and S. Sridharan, "Dynamic visual features for audio-visual speaker verification," Comput. Speech Language, vol. 24, no. 2, pp. 136-149, 2010.
-
(2010)
Comput. Speech Language
, vol.24
, Issue.2
, pp. 136-149
-
-
Dean, D.1
Sridharan, S.2
-
32
-
-
85009284526
-
DCTbased video features for audio-visual speech recognition
-
Denver, CO, USA Sep.
-
M. Heckmann, K. Kroschel, C. Savariaux, and F. Berthommier, "DCTbased video features for audio-visual speech recognition," in Proc. Int. Conf. Spoken Language Process., Denver, CO, USA, Sep. 2002, pp. 1925-1928.
-
(2002)
Proc. Int. Conf. Spoken Language Process
, pp. 1925-1928
-
-
Heckmann, M.1
Kroschel, K.2
Savariaux, C.3
Berthommier, F.4
-
33
-
-
84893419257
-
An examination of audio-visual fused HMMS for speaker recognition
-
Toulouse, France [Online]. Available
-
D. B. Dean, T. J. Wark, and S. Sridharan. (2006). "An examination of audio-visual fused HMMS for speaker recognition," in Proc. 2nd Workshop Multimodal User Authentication, Toulouse, France [Online]. Available: http://eprints.qut.edu.au/5343/
-
(2006)
Proc. 2nd Workshop Multimodal User Authentication
-
-
Dean, D.B.1
Wark, T.J.2
Sridharan, S.3
|