-
2
-
-
84867605836
-
Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
-
Speech,and Signal Processing, Kyoto
-
Abdel-Hamid O, rahman Mohamed A, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech,and Signal Processing, Kyoto, pp 4277–4280
-
Proceedings of the IEEE International Conference on Acoustics
, pp. 4277-4280
-
-
Abdel-Hamid, O.1
rahman Mohamed, A.2
Jiang, H.3
-
3
-
-
4544329810
-
Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 5, Montreal
-
Aleksic PS, Katsaggelos AK (2004) Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 5, Montreal, pp 917–920
-
(2004)
pp 917–920
-
-
Aleksic, P.S.1
Katsaggelos, A.K.2
-
4
-
-
84977800621
-
Evidence of correlation between acoustic and visual features of speech. In: Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco
-
Barker J, Berthommier F (1999) Evidence of correlation between acoustic and visual features of speech. In: Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, pp 5–9
-
(1999)
pp 5–9
-
-
Barker, J.1
Berthommier, F.2
-
5
-
-
69349090197
-
Learning deep architectures for AI
-
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
-
(2009)
Found Trends Mach Learn
, vol.2
, Issue.1
-
-
Bengio, Y.1
-
6
-
-
0030355935
-
A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol 1, Philadelphia
-
Bourlard H, Dupont S (1996) A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proceedings of the 4th International Conference on Spoken Language Processing, vol 1, Philadelphia, pp 426–429
-
(1996)
pp 426–429
-
-
Bourlard, H.1
Dupont, S.2
-
7
-
-
84939943424
-
-
Ris C: Multi-stream speech recognition.IDIAP research report
-
Bourlard H, Dupont S, Ris C (1996) Multi-stream speech recognition.IDIAP research report
-
(1996)
Dupont S
-
-
Bourlard, H.1
-
9
-
-
0022920273
-
(1986) Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics
-
Techniques and Applications, London
-
Brooke N, Petajan ED (1986) Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics. In: Proceedings of the International Conference on Speech Input and Output, Techniques and Applications, London, pp 104–109
-
Proceedings of the International Conference on Speech Input and Output
, pp. 104-109
-
-
Brooke, N.1
Petajan, E.D.2
-
10
-
-
84894294885
-
Deep learning with COTS HPC. In: Proceedings of the 30th international conference on machine learning, Atlanta
-
Coates A, Huval B, Wang T, Wu DJ, Ng AY, Catanzaro B (2013) Deep learning with COTS HPC. In: Proceedings of the 30th international conference on machine learning, Atlanta, pp 1337–1345
-
(2013)
pp 1337–1345
-
-
Coates, A.1
Huval, B.2
Wang, T.3
Wu, D.J.4
Ng, A.Y.5
Catanzaro, B.6
-
12
-
-
84055222005
-
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
-
Dahl GE, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
-
(2012)
IEEE Trans Audio Speech Lang Process
, vol.20
, Issue.1
, pp. 30-42
-
-
Dahl, G.E.1
Acero, A.2
-
13
-
-
84905259759
-
Glass J (2014) Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition
-
Speech, and Signal Processing, Florence
-
Feng X, Zhang Y, Glass J (2014) Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, pp 1759–1763
-
Proceedings of the IEEE International Conference on Acoustics
, pp. 1759-1763
-
-
Feng, X.1
Zhang, Y.2
-
14
-
-
63449120701
-
Dynamic modality weighting for multi-stream HMMs in audio-visual speech recognition. In: Proceedings of the 10th International Conference on Multimodal Interfaces, Chania
-
Gurban M, Thiran JP, Drugman T, Dutoit T (2008) Dynamic modality weighting for multi-stream HMMs in audio-visual speech recognition. In: Proceedings of the 10th International Conference on Multimodal Interfaces, Chania, pp 237– 240
-
(2008)
pp 237– 240
-
-
Gurban, M.1
Thiran, J.P.2
Drugman, T.3
Dutoit, T.4
-
15
-
-
85009284526
-
DCT-based video features for audio-visual speech recognition. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol 3, Denver
-
Heckmann M, Kroschel K, Savariaux C (2002) DCT-based video features for audio-visual speech recognition. In: Proceedings of the 7th International Conference on Spoken Language Processing, vol 3, Denver, pp 1925–1928
-
(2002)
pp 1925–1928
-
-
Heckmann, M.1
Kroschel, K.2
Savariaux, C.3
-
16
-
-
0033709098
-
Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 3, Istanbul
-
Hermansky H, Ellis D, Sharma S (2000) Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 3, Istanbul, pp 1635–1638
-
(2000)
pp 1635–1638
-
-
Hermansky, H.1
Ellis, D.2
Sharma, S.3
-
17
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition
-
Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Proc Mag 29:82–97
-
(2012)
IEEE Signal Proc Mag
, vol.29
, pp. 82-97
-
-
Hinton, G.1
Deng, L.2
Yu, D.3
Dahl, G.4
Mohamed, A.5
Jaitly, N.6
Senior, A.7
Vanhoucke, V.8
Nguyen, P.9
Sainath, T.10
Kingsbury, B.11
-
18
-
-
33746600649
-
Reducing the dimensionality of data with neural networks
-
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–7
-
(2006)
Science
, vol.313
, Issue.5786
, pp. 504-507
-
-
Hinton, G.E.1
Salakhutdinov, R.R.2
-
19
-
-
84890465549
-
Kingsbury B (2013) Audio-visual deep learning for noise robust speech recognition
-
Speech, and Signal Processing, Vancouver
-
Huang J, Kingsbury B (2013) Audio-visual deep learning for noise robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, pp 7596–7599
-
Proceedings of the IEEE International Conference on Acoustics
, pp. 7596-7599
-
-
Huang, J.1
-
22
-
-
84939939168
-
-
Hinton G: Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems
-
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems
-
(2012)
Sutskever
, vol.I
-
-
Krizhevsky, A.1
-
23
-
-
0024874951
-
Watanabe T (1989) Construction of a large-scale Japanese speech database and its management system
-
Speech, and Signal Processing, Glasgow
-
Kuwabara H, Takeda K, Sagisaka Y, Katagiri S, Morikawa S, Watanabe T (1989) Construction of a large-scale Japanese speech database and its management system. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Glasgow, pp 560–563
-
Proceedings of the IEEE International Conference on Acoustics
, pp. 560-563
-
-
Kuwabara, H.1
Takeda, K.2
Sagisaka, Y.3
Katagiri, S.4
Morikawa, S.5
-
24
-
-
82955182641
-
Improving visual features for lip-reading
-
Proceedings of the International Conference on Auditory-Visual Speech Processing, Hakone,Japan
-
Lan Y, Theobald BJ, Harvey R, Ong EJ, Bowden R (2010) Improving visual features for lip-reading. In: Proceedings of the International Conference on Auditory-Visual Speech Processing. Hakone,Japan
-
(2010)
In
-
-
Lan, Y.1
Theobald, B.J.2
Harvey, R.3
Ong, E.J.4
Bowden, R.5
-
25
-
-
84867135575
-
Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh
-
Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of the 29th International Conference on Machine Learning, Edinburgh, pp 81–88
-
(2012)
pp 81–88
-
-
Le, Q.V.1
Ranzato, M.2
Monga, R.3
Devin, M.4
Chen, K.5
Corrado, G.S.6
Dean, J.7
Ng, A.Y.8
-
26
-
-
5044231640
-
Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 2, Washington
-
LeCun Y, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 2, Washington, pp 97–104
-
(2004)
pp 97–104
-
-
LeCun, Y.1
Bottou, L.2
-
27
-
-
0032203257
-
Gradient-based learning applied to document recognition
-
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
-
(1998)
Proc IEEE
, vol.86
, Issue.11
, pp. 2278-2324
-
-
LeCun, Y.1
Bottou, L.2
Bengio, Y.3
Haffner, P.4
-
28
-
-
71149119164
-
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th International Conference on Machine Learning, Montreal
-
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th International Conference on Machine Learning, Montreal, pp 609– 616
-
(2009)
pp 609– 616
-
-
Lee, H.1
Grosse, R.2
Ranganath, R.3
Ng, A.Y.4
-
29
-
-
84863380535
-
Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Proceedings of the Advances in Neural Information Processing Systems 22, Vancouver
-
Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Proceedings of the Advances in Neural Information Processing Systems 22, Vancouver, pp 1096–1104
-
(2009)
pp 1096–1104
-
-
Lee, H.1
Pham, P.2
Largman, Y.3
Ng, A.Y.4
-
30
-
-
0032822143
-
A comparative study of neural network based feature extraction paradigms
-
Lerner B, Guterman H, Aladjem M, Dinstein I (1999) A comparative study of neural network based feature extraction paradigms. Pattern Recogn Lett 20(1):7–14
-
(1999)
Pattern Recogn Lett
, vol.20
, Issue.1
, pp. 7-14
-
-
Lerner, B.1
Guterman, H.2
Aladjem, M.3
Dinstein, I.4
-
31
-
-
0029765665
-
Visual speech recognition using active shape models and hidden Markov models. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 2, Atlanta
-
Luettin J, Thacker N, Beet S (1996) Visual speech recognition using active shape models and hidden Markov models. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol 2, Atlanta, pp 817–820
-
(1996)
pp 817–820
-
-
Luettin, J.1
Thacker, N.2
Beet, S.3
-
32
-
-
84977820250
-
Recurrent neural network feature enhancement: The 2nd chime challenge. In: Proceedings of the 2nd International Workshop on Machine Listening in Multisource Environments.Vancouver
-
Maas AL, O’Neil TM, Hannun AY, Ng AY (2013) Recurrent neural network feature enhancement: The 2nd chime challenge. In: Proceedings of the 2nd International Workshop on Machine Listening in Multisource Environments.Vancouver, Canada
-
(2013)
Canada
-
-
Maas, A.L.1
O’Neil, T.M.2
Hannun, A.Y.3
Ng, A.Y.4
-
34
-
-
0036472941
-
Extraction of visual features for lipreading
-
Matthews I, Cootes T, Bangham J, Cox S, Harvey R (2002) Extraction of visual features for lipreading. IEEE Trans Pattern Anal Mach Intell 24(2):198–213
-
(2002)
IEEE Trans Pattern Anal Mach Intell
, vol.24
, Issue.2
, pp. 198-213
-
-
Matthews, I.1
Cootes, T.2
Bangham, J.3
Cox, S.4
Harvey, R.5
-
37
-
-
80053437179
-
Multimodal deep learning
-
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning
-
(2011)
In: Proceedings of the 28th International Conference on Machine Learning
-
-
Ngiam, J.1
Khosla, A.2
Kim, M.3
Nam, J.4
Lee, H.5
Ng, A.Y.6
-
40
-
-
0000255539
-
Fast exact multiplication by the Hessian
-
Pearlmutter B (1994) Fast exact multiplication by the Hessian. Neural Comput 6(1):147–160
-
(1994)
Neural Comput
, vol.6
, Issue.1
, pp. 147-160
-
-
Pearlmutter, B.1
-
41
-
-
0028194709
-
-
Renals S, Morgan N, Member S, Bourlard H, Cohen M, Franco H (1994) Connectionist probability estimators in HMM speech recognition 2(1):161–174
-
(1994)
Connectionist probability estimators in HMM speech recognition
, vol.2
, Issue.1
, pp. 161-174
-
-
Renals, S.1
Morgan, N.2
Member, S.3
Bourlard, H.4
Cohen, M.5
Franco, H.6
-
42
-
-
0004762797
-
Exploiting sensor fusion architectures and stimuli complementarity in av speech recognition
-
Stork D, Hennecke M, (eds), Springer, Berlin Heidelberg
-
Robert-Ribes J, Piquemal M, Schwartz JL, Escudier P (1996) Exploiting sensor fusion architectures and stimuli complementarity in av speech recognition. In: Stork D, Hennecke M (eds) Speechreading by Humans and Machines. Springer, Berlin Heidelberg, pp 193–210
-
(1996)
Speechreading by Humans and Machines
, pp. 193-210
-
-
Robert-Ribes, J.1
Piquemal, M.2
Schwartz, J.L.3
Escudier, P.4
-
43
-
-
84867593213
-
Auto-encoder bottleneck features using deep belief networks. In:Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto
-
Sainath TN, Kingsbury B, Ramabhadran B (2012) Auto-encoder bottleneck features using deep belief networks. In:Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, pp 4153–4156
-
(2012)
pp 4153–4156
-
-
Sainath, T.N.1
Kingsbury, B.2
Ramabhadran, B.3
-
44
-
-
0035791204
-
Reilly R (2001) Feature analysis for automatic speechreading
-
Multimedia Signal, Cannes
-
Scanlon P, Reilly R (2001) Feature analysis for automatic speechreading. In: Proceedings of the IEEE 4th Workshop on Processing, Multimedia Signal, Cannes, pp 625–630
-
Proceedings of the IEEE 4th Workshop on Processing
, pp. 625-630
-
-
Scanlon, P.1
-
45
-
-
0036631778
-
Fast curvature matrix-vector products for second-order gradient descent
-
Schraudolph NN (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–38
-
(2002)
Neural Comput
, vol.14
, Issue.7
, pp. 1723-1738
-
-
Schraudolph, N.N.1
-
46
-
-
0004213132
-
Auditory toolbox: A MATLAB toolbox for auditory modeling work version 2
-
Slaney M (1998) Auditory toolbox: A MATLAB toolbox for auditory modeling work version 2. Interval research corproation
-
(1998)
Interval research corproation
-
-
Slaney, M.1
-
47
-
-
80053459857
-
Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue
-
Sutskever I, Martens J, Hinton G (2011) Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning, Bellevue, pp 1017–1024
-
(2011)
pp 1017–1024
-
-
Sutskever, I.1
Martens, J.2
Hinton, G.3
-
48
-
-
56449089103
-
Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, New York
-
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, New York, pp 1096–1103
-
(2008)
pp 1096–1103
-
-
Vincent, P.1
Larochelle, H.2
Bengio, Y.3
Manzagol, P.A.4
-
49
-
-
79551480483
-
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
-
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
-
(2010)
J Mach Learn Res
, vol.11
, pp. 3371-3408
-
-
Vincent, P.1
Larochelle, H.2
Lajoie, I.3
Bengio, Y.4
Manzagol, P.A.5
-
50
-
-
0032178592
-
Quantitative association of vocal-tract and facial behavior
-
Yehia H, Rubin P, Vatikiotis-Bateson E (1998) Quantitative association of vocal-tract and facial behavior. Speech Comm 26:23–43
-
(1998)
Speech Comm
, vol.26
, pp. 23-43
-
-
Yehia, H.1
Rubin, P.2
Vatikiotis-Bateson, E.3
-
51
-
-
77950563943
-
Automatic speech recognition improved by two-layered audio-visual integration for robot audition. In: Proceedings of the 9th IEEE-RAS International Conference on Humanoid Robots, Paris
-
Yoshida T, Nakadai K, Okuno HG (2009) Automatic speech recognition improved by two-layered audio-visual integration for robot audition. In: Proceedings of the 9th IEEE-RAS International Conference on Humanoid Robots, Paris, pp 604–609
-
(2009)
pp 604–609
-
-
Yoshida, T.1
Nakadai, K.2
Okuno, H.G.3
-
52
-
-
84939954092
-
-
Young S, Evermann G, Gales M, Hain T, Liu XA, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2009) The HTK Book (for HTK Version 3.4),.Cambridge University Engineering Department
-
Young S, Evermann G, Gales M, Hain T, Liu XA, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2009) The HTK Book (for HTK Version 3.4),.Cambridge University Engineering Department
-
-
-
|