SCOPUS 정보 검색 플랫폼

Speech and Audio Processing for Coding, Enhancement and Recognition

Volumn , Issue , 2015, Pages 153-195

Deep dynamic models for learning hidden representations of speech features

(2) Deng, Li a Togneri, Roberto b

a MICROSOFT RESEARCH (United States)

b UNIVERSITY OF WESTERN AUSTRALIA (Australia)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 84944075741 PISSN: None EISSN: None Source Type: Book
DOI: 10.1007/978-1-4939-1456-2_6 Document Type: Book

Times cited : (16)

References (112)

1
- 85009113852
- Hmm adaptation using vector taylor series for noisy speech recognition
- A. Acero, L. Deng, T. Kristjansson, J. Zhang, HMM adaptation using vector taylor series for noisy speech recognition, in Proceedings of International Conference on Spoken Language Processing (2000), pp. 869–872
- (2000) Proceedings of International Conference on Spoken Language Processing , pp. 869-872
- Acero, A.¹ Deng, L.² Kristjansson, T.³ Zhang, J.⁴

2
- 0040856612
- Stochastic modeling for automatic speech recognition
- ed. by D. Reddy (Academic, New York
- J. Baker, Stochastic modeling for automatic speech recognition, in Speech Recognition, ed. by D. Reddy (Academic, New York, 1976)
- (1976) Speech Recognition
- Baker, J.¹

3
- 85032751593
- Research developments and directions in speech recognition and understanding, part i
- J. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, D. O’Shgughnessy, Research developments and directions in speech recognition and understanding, part i. IEEE Signal Process. Mag. 26(3), 75–80 (2009)
- (2009) IEEE Signal Process. Mag , vol.26 , Issue.3 , pp. 75-80
- Baker, J.¹ Deng, L.² Glass, J.³ Khudanpur, S.⁴ Lee, C.-H.⁵ Morgan, N.⁶ O’Shgughnessy, D.⁷

4
- 85032759066
- Updated minds report on speech recognition and understanding
- J. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, D. O’Shgughnessy, Updated MINDS report on speech recognition and understanding. IEEE Signal Process. Mag. 26(4), 78–85 (2009)
- (2009) IEEE Signal Process. Mag , vol.26 , Issue.4 , pp. 78-85
- Baker, J.¹ Deng, L.² Glass, J.³ Khudanpur, S.⁴ Lee, C.-H.⁵ Morgan, N.⁶ O’Shgughnessy, D.⁷

5
- 0000342467
- Statistical inference for probabilistic functions of finite state markov chains
- L. Baum, T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)
- (1966) Ann. Math. Stat , vol.37 , Issue.6 , pp. 1554-1563
- Baum, L.¹ Petrie, T.²

6
- 84890543516
- Advances in optimizing recurrent networks
- Y. Bengio, N. Boulanger, R. Pascanu, Advances in optimizing recurrent networks, in Proceedings of ICASSP, Vancouver, 2013
- (2013) Proceedings of ICASSP, Vancouver
- Bengio, Y.¹ Boulanger, N.² Pascanu, R.³

7
- 84890543516
- Advances in optimizing recurrent networks
- Y. Bengio, N. Boulanger-Lewandowski, R. Pascanu, Advances in optimizing recurrent networks, in Proceedings of ICASSP, Vancouver, 2013
- (2013) Proceedings of ICASSP, Vancouver
- Bengio, Y.¹ Boulanger-Lewandowski, N.² Pascanu, R.³

8
- 0038021376
- Buried markov models: A graphical modeling approach to automatic speech recognition
- J. Bilmes, Buried markov models: a graphical modeling approach to automatic speech recognition. Comput. Speech Lang. 17, 213–231 (2003)
- (2003) Comput. Speech Lang , vol.17 , pp. 213-231
- Bilmes, J.¹

9
- 33645791324
- What hmms can do
- J. Bilmes, What HMMs can do. IEICE Trans. Inf. Syst. E89-D(3), 869–891 (2006)
- (2006) IEICE Trans. Inf. Syst. E89-D(3) , pp. 869-891
- Bilmes, J.¹

10
- 33745718966
- Tech. rep., T2002:03, SICS
- M. Boden, A guide to recurrent neural networks and backpropagation. Tech. rep., T2002:03, SICS (2002)
- (2002) A Guide to Recurrent Neural Networks and Backpropagation
- Boden, M.¹

11
- 0009296228
- Connectionist speech recognition: A hybrid approach
- Kluwer Academic, Boston
- H. Bourlard, N. Morgan, Connectionist Speech Recognition: A Hybrid Approach. The Kluwer International Series in Engineering and Computer Science, vol. 247 (Kluwer Academic, Boston, 1994)
- (1994) The Kluwer International Series in Engineering and Computer Science , vol.247
- Bourlard, H.¹ Morgan, N.²

12
- 84944117044
- Final Report for 1998 Workshop on Langauge Engineering, CLSP (Johns Hopkins
- J. Bridle, L. Deng, J. Picone, H. Richards, J. Ma, T. Kamm, M. Schuster, S. Pike, R. Reagan, An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Langauge Engineering, CLSP (Johns Hopkins, 1998)
- (1998) An Investigation of Segmental Hidden Dynamic Models of Speech Coarticulation for Automatic Speech Recognition
- Bridle, J.¹ Deng, L.² Picone, J.³ Richards, H.⁴ Ma, J.⁵ Kamm, T.⁶ Schuster, M.⁷ Pike, S.⁸ Reagan, R.⁹

13
- 85083950550
- A primal-dual method for training recurrent neural networks constrained by the echo-state property
- J. Chen, L. Deng, A primal-dual method for training recurrent neural networks constrained by the echo-state property, in Proceedings of ICLR (2014)
- (2014) Proceedings of ICLR
- Chen, J.¹ Deng, L.²

14
- 78149256857
- Dirichlet class language models for speech recognition
- J.-T. Chien, C.-H. Chueh, Dirichlet class language models for speech recognition. IEEE Trans. Audio Speech Lang. Process. 27, 43–54 (2011)
- (2011) IEEE Trans. Audio Speech Lang. Process , vol.27 , pp. 43-54
- Chien, J.-T.¹ Chueh, C.-H.²

15
- 80051616844
- Large vocabulary continuous speech recognition with context-dependent dbn-hmms
- G. Dahl, D. Yu, L. Deng, A. Acero, Large vocabulary continuous speech recognition with context-dependent DBN-HMMs, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2011)
- (2011) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

16
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
- (2012) IEEE Trans. Audio Speech Lang. Process , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

17
- 0002629270
- Maximum-likelihood from incomplete data via the em algorithm
- A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum-likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. 39, 1–38 (1977)
- (1977) J. R. Stat. Soc. Ser. B , vol.39 , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

18
- 0026854213
- A generalized hidden markov model with state-conditioned trend functions of time for the speech signal
- L. Deng, A generalized hidden markov model with state-conditioned trend functions of time for the speech signal. Signal Process. 27(1), 65–78 (1992)
- (1992) Signal Process , vol.27 , Issue.1 , pp. 65-78
- Deng, L.¹

19
- 0032119268
- A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition
- L. Deng, A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Commun. 24(4), 299–323 (1998)
- (1998) Speech Commun , vol.24 , Issue.4 , pp. 299-323
- Deng, L.¹

20
- 0039503389
- Articulatory features and associated production models in statistical speech recognition
- Springer, New York
- L. Deng, Articulatory features and associated production models in statistical speech recognition, in Computational Models of Speech Pattern Processing (Springer, New York, 1999), pp. 214–224
- (1999) Computational Models of Speech Pattern Processing , pp. 214-224
- Deng, L.¹

21
- 0039503389
- Computational models for speech production
- Springer, New York
- L. Deng, Computational models for speech production, in Computational Models of Speech Pattern Processing (Springer, New York, 1999), pp. 199–213
- (1999) Computational Models of Speech Pattern Processing , pp. 199-213
- Deng, L.¹

22
- 33744966595
- Switching dynamic system models for speech articulation and acoustics
- Springer, New York
- L. Deng, Switching dynamic system models for speech articulation and acoustics, in Mathematical Foundations of Speech and Language Processing (Springer, New York, 2003), pp. 115–134
- (2003) Mathematical Foundations of Speech and Language Processing , pp. 115-134
- Deng, L.¹

23
- 34547507549
- Morgan and Claypool, San Rafael
- L. Deng, Dynamic Speech Models—Theory, Algorithm, and Applications (Morgan and Claypool, San Rafael, 2006)
- (2006) Dynamic Speech Models—Theory, Algorithm, and Applications
- Deng, L.¹

24
- 0028516022
- Speech recognition using hidden markov models with polynomial regression functions as non-stationary states
- L. Deng, M. Aksmanovic, D. Sun, J. Wu, Speech recognition using hidden Markov models with polynomial regression functions as non-stationary states. IEEE Trans. Acoust. Speech Signal Process. 2(4), 101–119 (1994)
- (1994) IEEE Trans. Acoust. Speech Signal Process , vol.2 , Issue.4 , pp. 101-119
- Deng, L.¹ Aksmanovic, M.² Sun, D.³ Wu, J.⁴

25
- 84905280906
- Sequence classification using high-level features extracted from deep neural networks
- L. Deng, J. Chen, Sequence classification using high-level features extracted from deep neural networks, in Proceedings of ICASSP (2014)
- (2014) Proceedings of ICASSP
- Deng, L.¹ Chen, J.²

26
- 0036299277
- A bayesian approach to speech feature enhancement using the dynamic cepstral prior
- L. Deng, J. Droppo, A. Acero, A Bayesian approach to speech feature enhancement using the dynamic cepstral prior, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2002), pp. I-829–I-832
- (2002) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , vol.1
- Deng, L.¹ Droppo, J.² Acero, A.³

27
- 0028256706
- Analysis of the correlation structure for a neural predictive model with application to speech recognition
- L. Deng, K. Hassanein, M. Elmasry, Analysis of the correlation structure for a neural predictive model with application to speech recognition. Neural Netw. 7(2), 331–339 (1994)
- (1994) Neural Netw , vol.7 , Issue.2 , pp. 331-339
- Deng, L.¹ Hassanein, K.² Elmasry, M.³

28
- 84890526837
- New types of deep neural network learning for speech recognition and related applications: An overview
- L. Deng, G. Hinton, B. Kingsbury, New types of deep neural network learning for speech recognition and related applications: an overview, in Proceedings of IEEE ICASSP, Vancouver, 2013
- (2013) Proceedings of IEEE ICASSP, Vancouver
- Deng, L.¹ Hinton, G.² Kingsbury, B.³

29
- 84890468916
- Deep learning for speech recognition and related applications
- L. Deng, G. Hinton, D. Yu, Deep learning for speech recognition and related applications, in NIPS Workshop, Whistler, 2009
- (2009) NIPS Workshop, Whistler
- Deng, L.¹ Hinton, G.² Yu, D.³

30
- 0026189555
- Phonemic hidden markov models with continuous mixture output densities for large vocabulary word recognition
- L. Deng, P. Kenny, M. Lennig, V. Gupta, F. Seitz, P. Mermelsten, Phonemic hidden markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans. Acoust. Speech Signal Process. 39(7), 1677–1681 (1991)
- (1991) IEEE Trans. Acoust. Speech Signal Process , vol.39 , Issue.7 , pp. 1677-1681
- Deng, L.¹ Kenny, P.² Lennig, M.³ Gupta, V.⁴ Seitz, F.⁵ Mermelsten, P.⁶

31
- 34547517867
- Adaptive kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model
- L. Deng, L. Lee, H. Attias, A. Acero, Adaptive kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio Speech Lang. Process. 15(1), 13–23 (2007)
- (2007) IEEE Trans. Audio Speech Lang. Process , vol.15 , Issue.1 , pp. 13-23
- Deng, L.¹ Lee, L.² Attias, H.³ Acero, A.⁴

32
- 10244257175
- Large vocabulary word recognition using context-dependent allophonic hidden markov models
- L. Deng, M. Lennig, F. Seitz, P. Mermelstein, Large vocabulary word recognition using context-dependent allophonic hidden markov models. Comput. Speech Lang. 4, 345–357 (1991)
- (1991) Comput. Speech Lang , vol.4 , pp. 345-357
- Deng, L.¹ Lennig, M.² Seitz, F.³ Mermelstein, P.⁴

33
- 84876672166
- Machine learning paradigms in speech recognition: An overview
- L. Deng, X. Li, Machine learning paradigms in speech recognition: an overview. IEEE Trans. Audio Speech Lang. Process. 21(5), 1060–1089 (2013)
- (2013) IEEE Trans. Audio Speech Lang. Process , vol.21 , Issue.5 , pp. 1060-1089
- Deng, L.¹ Li, X.²

34
- 0003911245
- A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics
- L. Deng, J. Ma, A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics, in EUROSPEECH (1999), pp. 1499–1502
- (1999) EUROSPEECH , pp. 1499-1502
- Deng, L.¹ Ma, J.²

35
- 0033623527
- Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics
- L. Deng, J. Ma, Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics. J. Acoust. Soc. Am. 108, 3036–3048 (2000)
- (2000) J. Acoust. Soc. Am , vol.108 , pp. 3036-3048
- Deng, L.¹ Ma, J.²

36
- 4243117872
- (Marcel Dekker, New York
- L. Deng, D. O’Shaughnessy, Speech Processing—A Dynamic and Optimization-Oriented Approach (Marcel Dekker, New York, 2003)
- (2003) Speech Processing—A Dynamic and Optimization-Oriented Approach
- Deng, L.¹ O’Shaughnessy, D.²

37
- 0031198059
- Production models as a structural basis for automatic speech recognition
- L. Deng, G. Ramsay, D. Sun, Production models as a structural basis for automatic speech recognition. Speech Commun. 33(2–3), 93–111 (1997)
- (1997) Speech Commun , vol.33 , Issue.23 , pp. 93-111
- Deng, L.¹ Ramsay, G.² Sun, D.³

38
- 34547551709
- Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition
- L. Deng, D. Yu, Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2007), pp. 445–448
- (2007) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 445-448
- Deng, L.¹ Yu, D.²

39
- 33744966561
- A bidirectional target filtering model of speech coarticulation: Two-stage implementation for phonetic recognition
- L. Deng, D. Yu, A. Acero, A bidirectional target filtering model of speech coarticulation: two-stage implementation for phonetic recognition. IEEE Trans. Speech Audio Process. 14, 256–265 (2006)
- (2006) IEEE Trans. Speech Audio Process , vol.14 , pp. 256-265
- Deng, L.¹ Yu, D.² Acero, A.³

40
- 34047266395
- Structured speech modeling
- L. Deng, D. Yu, A. Acero, Structured speech modeling. IEEE Trans. Speech Audio Process. 14, 1492–1504 (2006)
- (2006) IEEE Trans. Speech Audio Process , vol.14 , pp. 1492-1504
- Deng, L.¹ Yu, D.² Acero, A.³

41
- 55849151300
- IOS Press, Amsterdam
- P. Divenyi, S. Greenberg, G. Meyer, Dynamics of Speech Production and Perception (IOS Press, Amsterdam, 2006)
- (2006) Dynamics of Speech Production and Perception
- Divenyi, P.¹ Greenberg, S.² Meyer, G.³

42
- 4544236840
- Noise robust speech recognition with a switching linear dynamic model
- J. Droppo, A. Acero, Noise robust speech recognition with a switching linear dynamic model, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2004), pp. I-953–I-956
- (2004) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , vol.1
- Droppo, J.¹ Acero, A.²

43
- 85032752250
- Bayesian nonparametric methods for learning markov switching processes
- E. Fox, E. Sudderth, M. Jordan, A. Willsky, Bayesian nonparametric methods for learning markov switching processes. IEEE Signal Process. Mag. 27(6), 43–54 (2010)
- (2010) IEEE Signal Process. Mag , vol.27 , Issue.6 , pp. 43-54
- Fox, E.¹ Sudderth, E.² Jordan, M.³ Willsky, A.⁴

44
- 85009074657
- Algonquin: Iterating laplaces method to remove multiple types of acoustic distortion for robust speech recognition
- B. Frey, L. Deng, A. Acero, T. Kristjansson, Algonquin: iterating laplaces method to remove multiple types of acoustic distortion for robust speech recognition, in Proceedings of Eurospeech (2000)
- (2000) Proceedings of Eurospeech
- Frey, B.¹ Deng, L.² Acero, A.³ Kristjansson, T.⁴

45
- 0030245128
- Robust continuous speech recognition using parallel model combination
- M. Gales, S. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)
- (1996) IEEE Trans. Speech Audio Process , vol.4 , Issue.5 , pp. 352-359
- Gales, M.¹ Young, S.²

46
- 0034170950
- Variational learning for switching state-space models
- Z. Ghahramani, G.E. Hinton, Variational learning for switching state-space models. Neural Comput. 12, 831–864 (2000)
- (2000) Neural Comput , vol.12 , pp. 831-864
- Ghahramani, Z.¹ Hinton, G.E.²

47
- 0030372585
- Modeling long term variability information in mixture stochastic trajectory framework
- Y. Gong, I. Illina, J.-P. Haton, Modeling long term variability information in mixture stochastic trajectory framework, in Proceedings of International Conference on Spoken Language Processing (1996)
- (1996) Proceedings of International Conference on Spoken Language Processing
- Gong, Y.¹ Illina, I.² Haton, J.-P.³

48
- 84897549167
- Sequence transduction with recurrent neural networks
- A. Graves, Sequence transduction with recurrent neural networks, in Representation Learning Workshop, ICML (2012)
- (2012) Representation Learning Workshop, ICML
- Graves, A.¹

49
- 84890543083
- Speech recognition with deep recurrent neural networks
- A. Graves, A. Mahamed, G. Hinton, Speech recognition with deep recurrent neural networks, in Proceedings of ICASSP, Vancouver, 2013
- (2013) Proceedings of ICASSP, Vancouver
- Graves, A.¹ Mahamed, A.² Hinton, G.³

50
- 78650474133
- A practical guide to training restricted boltzmann machines
- Machine Learning Group, University of Toronto, 2010
- G. E. Hinton, “A practical guide to training restricted Boltzmann machines,” in Technical report 2010-003, Machine Learning Group, University of Toronto, 2010.
- Technical Report 2010-003
- Hinton, G.E.¹

51
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

52
- 33745805403
- A fast learning algorithm for deep belief nets
- G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
- (2006) Neural Comput , vol.18 , pp. 1527-1554
- Hinton, G.¹ Osindero, S.² Teh, Y.³

53
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.¹ Salakhutdinov, R.²

54
- 0032673963
- Probabilistic-trajectory segmental hmms
- W. Holmes, M. Russell, Probabilistic-trajectory segmental HMMs. Comput. Speech Lang. 13, 3–37 (1999)
- (1999) Comput. Speech Lang , vol.13 , pp. 3-37
- Holmes, W.¹ Russell, M.²

55
- 0004056285
- (Upper Saddle River, New Jersey 07458)
- X. Huang, A. Acero, H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Upper Saddle River, New Jersey 07458)
- Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
- Huang, X.¹ Acero, A.² Hon, H.-W.³

56
- 33749833931
- Tutorial on training recurrent neural networks, covering bppt, rtrl, ekf and the “echo state network” approach. Gmd report 159
- H. Jaeger, Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD Report 159, GMD - German National Research Institute for Computer Science (2002)
- (2002) GMD - German National Research Institute for Computer Science
- Jaeger, H.¹

57
- 0016939124
- Continuous speech recognition by statistical methods
- F. Jelinek, Continuous speech recognition by statistical methods. Proc. IEEE 64(4), 532–557 (1976)
- (1976) Proc. IEEE , vol.64 , Issue.4 , pp. 532-557
- Jelinek, F.¹

58
- 0022691022
- Maximum likelihood estimation for mixture multivariate stochastic observations of markov chains
- B.-H. Juang, S.E. Levinson, M.M. Sondhi, Maximum likelihood estimation for mixture multivariate stochastic observations of markov chains. IEEE Trans. Inf. Theory 32(2), 307–309 (1986)
- (1986) IEEE Trans. Inf. Theory , vol.32 , Issue.2 , pp. 307-309
- Juang, B.-H.¹ Levinson, S.E.² Sondhi, M.M.³

59
- 84878379108
- Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
- B. Kingsbury, T. Sainath, H. Soltau, Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization, in Proceedings of Interspeech (2012)
- (2012) Proceedings of Interspeech
- Kingsbury, B.¹ Sainath, T.² Soltau, H.³

60
- 56449110012
- Classification using discriminative restricted boltzmann machines
- ACM, New York
- H. Larochelle, Y. Bengio, Classification using discriminative restricted Boltzmann machines, in Proceedings of the 25th International Conference on Machine learning (ACM, New York, 2008), pp. 536–543
- (2008) Proceedings of the 25Th International Conference on Machine Learning , pp. 536-543
- Larochelle, H.¹ Bengio, Y.²

61
- 0141813573
- Variational inference and learning for segmental switching state space models of hidden speech dynamics
- L. Lee, H. Attias, L. Deng, Variational inference and learning for segmental switching state space models of hidden speech dynamics, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2003), pp. I-872–I-875
- (2003) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , vol.1
- Lee, L.¹ Attias, H.² Deng, L.³

62
- 0034842603
- A functional articulatory dynamic model for speech production
- L.J. Lee, P. Fieguth, L. Deng, A functional articulatory dynamic model for speech production, in Proceedings of ICASSP, Salt Lake City, vol. 2, 2001, pp. 797–800
- (2001) Proceedings of ICASSP, Salt Lake City , vol.2 , pp. 797-800
- Lee, L.J.¹ Fieguth, P.² Deng, L.³

63
- 84897953008
- Temporally varying weight regression: A semi-parametric trajectory model for automatic speech recognition
- S. Liu, K. Sim, Temporally varying weight regression: a semi-parametric trajectory model for automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 22(1) 151–160 (2014)
- (2014) IEEE Trans. Audio Speech Lang. Process , vol.22 , Issue.1 , pp. 151-160
- Liu, S.¹ Sim, K.²

64
- 84875405186
- Exploiting deep neural networks for detectionbased speech recognition
- S.M. Siniscalchia, D. Yu, L. Deng, C.-H. Lee, Exploiting deep neural networks for detectionbased speech recognition. Neurocomputing 106, 148–157 (2013)
- (2013) Neurocomputing , vol.106 , pp. 148-157
- Siniscalchia, S.M.¹ Yu, D.² Deng, L.³ Lee, C.-H.⁴

65
- 0001523807
- A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech
- J. Ma, L. Deng, A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech. Comput. Speech Lang. 14, 101–104 (2000)
- (2000) Comput. Speech Lang , vol.14 , pp. 101-104
- Ma, J.¹ Deng, L.²

66
- 0347968275
- Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model
- J. Ma, L. Deng, Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio Speech Process. 11(6), 590–602 (2003)
- (2003) IEEE Trans. Audio Speech Process , vol.11 , Issue.6 , pp. 590-602
- Ma, J.¹ Deng, L.²

67
- 0347968275
- Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model
- J. Ma, L. Deng, Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio Speech Lang. Process 11(6), 590–602 (2004)
- (2004) IEEE Trans. Audio Speech Lang. Process , vol.11 , Issue.6 , pp. 590-602
- Ma, J.¹ Deng, L.²

68
- 0742307392
- Target-directed mixture dynamic models for spontaneous speech recognition
- J. Ma, L. Deng, Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Trans. Audio Speech Process. 12(1), 47–58 (2004)
- (2004) IEEE Trans. Audio Speech Process , vol.12 , Issue.1 , pp. 47-58
- Ma, J.¹ Deng, L.²

69
- 84878409063
- Recurrent neural networks for noise reduction in robust asr
- A.L. Maas, Q. Le, T.M. O’Neil, O. Vinyals, P. Nguyen, A.Y. Ng, Recurrent neural networks for noise reduction in robust asr, in Proceedings of INTERSPEECH, Portland, 2012
- (2012) Proceedings of INTERSPEECH, Portland
- Maas, A.L.¹ Le, Q.² O’Neil, T.M.³ Vinyals, O.⁴ Nguyen, P.⁵ Ng, A.Y.⁶

70
- 80053451847
- Learning recurrent neural networks with hessian-free optimization
- J. Martens, I. Sutskever, Learning recurrent neural networks with hessian-free optimization, in Proceedings of ICML, Bellevue, 2011, pp. 1033–1040
- (2011) Proceedings of ICML, Bellevue , pp. 1033-1040
- Martens, J.¹ Sutskever, I.²

71
- 84906237242
- Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding
- G. Mesnil, X. He, L. Deng, Y. Bengio, Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding, in Proceedings of INTERSPEECH, Lyon, 2013
- (2013) Proceedings of INTERSPEECH, Lyon
- Mesnil, G.¹ He, X.² Deng, L.³ Bengio, Y.⁴

72
- 54349106040
- Switching linear dynamical systems for noise robust speech recognition
- B. Mesot, D. Barber, Switching linear dynamical systems for noise robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 15(6), 1850–1858 (2007)
- (2007) IEEE Trans. Audio Speech Lang. Process , vol.15 , Issue.6 , pp. 1850-1858
- Mesot, B.¹ Barber, D.²

73
- 84874250121
- Ph.D. thesis, Brno University of Technology
- T. Mikolov, Statistical language models based on neural networks, Ph.D. thesis, Brno University of Technology, 2012
- (2012) Statistical Language Models Based on Neural Networks
- Mikolov, T.¹

74
- 84858966958
- Strategies for training large scale neural network language models
- IEEE, Honolulu
- T. Mikolov, A. Deoras, D. Povey, L. Burget, J. Cernocky, Strategies for training large scale neural network language models, in Proceedings of IEEE ASRU (IEEE, Honolulu, 2011), pp. 196–201
- (2011) Proceedings of IEEE ASRU , pp. 196-201
- Mikolov, T.¹ Deoras, A.² Povey, D.³ Burget, L.⁴ Cernocky, J.⁵

75
- 79959829092
- Recurrent neural network based language model
- T. Mikolov, M. Karafiát, L. Burget, J. Cernocky, S. Khudanpur, Recurrent neural network based language model, in Proceedings of INTERSPEECH, Makuhari, 2010, pp. 1045–1048
- (2010) Proceedings of INTERSPEECH, Makuhari , pp. 1045-1048
- Mikolov, T.¹ Karafiát, M.² Burget, L.³ Cernocky, J.⁴ Khudanpur, S.⁵

76
- 80051643236
- Extensions of recurrent neural network language model
- in
- T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, S. Khudanpur, Extensions of recurrent neural network language model, in Proceedings of IEEE ICASSP, Prague, 2011, pp. 5528–5531
- (2011) Proceedings of IEEE ICASSP, Prague , pp. 5528-5531
- Mikolov, T.¹ Kombrink, S.² Burget, L.³ Cernocky, J.⁴ Khudanpur, S.⁵

77
- 84055211743
- Acoustic modeling using deep belief networks
- A. Mohamed, G. Dahl, G. Hinton, Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)
- (2012) IEEE Trans. Audio Speech Lang. Process , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

78
- 79959828738
- Deep belief networks for phone recognition
- A. Mohamed, G.E. Dahl, G.E. Hinton, Deep belief networks for phone recognition, in NIPS Workshop on Deep Learning for Speech Recognition and Related Applications (2009)
- (2009) NIPS Workshop on Deep Learning for Speech Recognition and Related Applications
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.E.³

79
- 80051654263
- Deep belief networks using discriminative features for phone recognition
- A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, M. Picheny, Deep belief networks using discriminative features for phone recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2011), pp. 5060–5063
- (2011) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 5060-5063
- Mohamed, A.¹ Sainath, T.² Dahl, G.³ Ramabhadran, B.⁴ Hinton, G.⁵ Picheny, M.⁶

80
- 84255177123
- Deep and wide: Multiple layers in automatic speech recognition
- N. Morgan, Deep and wide: multiple layers in automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 7–13 (2012)
- (2012) IEEE Trans. Audio Speech Lang. Process , vol.20 , Issue.1 , pp. 7-13
- Morgan, N.¹

81
- 0030245363
- From hmm’s to segment models: A unified view of stochastic modeling for speech recognition
- M. Ostendorf, V. Digalakis, O. Kimball, From HMM’s to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans. Speech Audio Process. 4(5), 360– 378 (1996)
- (1996) IEEE Trans. Speech Audio Process , vol.4 , Issue.5
- Ostendorf, M.¹ Digalakis, V.² Kimball, O.³

82
- 33645771960
- Continuous word recognition based on the stochastic segment model
- M. Ostendorf, A. Kannan, O. Kimball, J. Rohlicek, Continuous word recognition based on the stochastic segment model, in Proceedings of DARPA Workshop CSR (1992)
- (1992) Proceedings of DARPA Workshop CSR
- Ostendorf, M.¹ Kannan, A.² Kimball, O.³ Rohlicek, J.⁴

83
- 69249099357
- Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying dirichlet process mixture models
- E. Ozkan, I. Ozbek, M. Demirekler, Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying dirichlet process mixture models. IEEE Trans. Audio Speech Lang. Process. 17(8), 1518–1532 (2009)
- (2009) IEEE Trans. Audio Speech Lang. Process , vol.17 , Issue.8 , pp. 1518-1532
- Ozkan, E.¹ Ozbek, I.² Demirekler, M.³

84
- 84897497795
- On the difficulty of training recurrent neural networks
- R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in Proceedings of ICML, Atlanta, 2013
- (2013) Proceedings of ICML, Atlanta
- Pascanu, R.¹ Mikolov, T.² Bengio, Y.³

85
- 0141698849
- Variational learning in mixed-state dynamic graphical models
- V. Pavlovic, B. Frey, T. Huang, Variational learning in mixed-state dynamic graphical models, in Proceedings of UAI, Stockholm, 1999, pp. 522–530
- (1999) Proceedings of UAI, Stockholm , pp. 522-530
- Pavlovic, V.¹ Frey, B.² Huang, T.³

86
- 0032639922
- Initial evaluation of hidden dynamic models on conversational speech
- J. Picone, S. Pike, R. Regan, T. Kamm, J. Bridle, L. Deng, Z. Ma, H. Richards, M. Schuster, Initial evaluation of hidden dynamic models on conversational speech, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (1999)
- (1999) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing
- Picone, J.¹ Pike, S.² Regan, R.³ Kamm, T.⁴ Bridle, J.⁵ Deng, L.⁶ Ma, Z.⁷ Richards, H.⁸ Schuster, M.⁹

87
- 0028401031
- Neurocontrol of nonlinear dynamical systems with kalman filter trained recurrent networks
- G. Puskorius, L. Feldkamp, Neurocontrol of nonlinear dynamical systems with kalman filter trained recurrent networks. IEEE Trans. Neural Netw. 5(2), 279–297 (1998)
- (1998) IEEE Trans. Neural Netw , vol.5 , Issue.2 , pp. 279-297
- Puskorius, G.¹ Feldkamp, L.²

88
- 0004244302
- Prentice-Hall, Upper Saddle River
- L. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Upper Saddle River, 1993)
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.-H.²

89
- 85032751986
- Single-channel multitalker speech recognition—graphical modeling approaches
- S. Rennie, J. Hershey, P. Olsen, Single-channel multitalker speech recognition—graphical modeling approaches. IEEE Signal Process.Mag. 33, 66–80 (2010)
- (2010) IEEE Signal Process.Mag , vol.33 , pp. 66-80
- Rennie, S.¹ Hershey, J.² Olsen, P.³

90
- 0028392167
- An application of recurrent nets to phone probability estimation
- A.J. Robinson, An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5(2), 298–305 (1994)
- (1994) IEEE Trans. Neural Netw , vol.5 , Issue.2 , pp. 298-305
- Robinson, A.J.¹

91
- 4544302569
- Rao-blackwellised gibbs sampling for switching linear dynamical systems
- A. Rosti, M. Gales, Rao-blackwellised gibbs sampling for switching linear dynamical systems, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2004), pp. I-809–I-812
- (2004) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , vol.1
- Rosti, A.¹ Gales, M.²

92
- 10844250035
- Linear/linear segmental hmm with a formant-based intermediate layer. Comput
- M. Russell, P. Jackson, A multiple-level linear/linear segmental HMM with a formant-based intermediate layer. Comput. Speech Lang. 19, 205–225 (2005)
- (2005) Speech Lang , vol.19 , pp. 205-225
- Russell, M.¹ Jackson, P.² Multiple-Level, A.³

93
- 84886829539
- Optimization techniques to improve training speed of deep neural networks for large speech tasks
- T. Sainath, B. Kingsbury, H. Soltau, B. Ramabhadran, Optimization techniques to improve training speed of deep neural networks for large speech tasks. IEEE Trans. Audio Speech Lang. Process. 21(11), 2267–2276 (2013)
- (2013) IEEE Trans. Audio Speech Lang. Process , vol.21 , Issue.11 , pp. 2267-2276
- Sainath, T.¹ Kingsbury, B.² Soltau, H.³ Ramabhadran, B.⁴

94
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- Waikoloa, HI, USA
- F. Seide, G. Li, X. Chen, D. Yu, Feature engineering in context-dependent deep neural networks for conversational speech transcription, in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011 (Waikoloa, HI, USA), pp. 24–29
- IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011 , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

95
- 0031074957
- Maximum likelihood in statistical estimation of dynamical systems: Decomposition algorithm and simulation results
- X. Shen, L. Deng, Maximum likelihood in statistical estimation of dynamical systems: decomposition algorithm and simulation results. Signal Process. 57, 65–79 (1997)
- (1997) Signal Process , vol.57 , pp. 65-79
- Shen, X.¹ Deng, L.²

96
- 0004129646
- MIT Press
- K. N. Stevens, Acoustic phonetics, Vol. 30, MIT Press, 2000.
- (2000) Acoustic phonetics , vol.30
- Stevens, K.N.¹

97
- 84883148756
- Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure
- V. Stoyanov, A. Ropson, J. Eisner, Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure, in Proceedings of AISTAT (2011)
- (2011) Proceedings of AISTAT
- Stoyanov, V.¹ Ropson, A.² Eisner, J.³

98
- 80053459857
- Generating text with recurrent neural networks
- I. Suskever, J. Martens, G.E. Hinton, Generating text with recurrent neural networks, in Proceedings of 28th International Conference on Machine Learning (2011)
- (2011) Proceedings of 28Th International Conference on Machine Learning
- Suskever, I.¹ Martens, J.² Hinton, G.E.³

99
- 84884966819
- Ph.D. thesis, University of Toronto
- I. Sutskever, Training recurrent neural networks, Ph.D. thesis, University of Toronto, 2013
- (2013) Training Recurrent Neural Networks
- Sutskever, I.¹

100
- 80053459857
- Generating text with recurrent neural networks
- I. Sutskever, J. Martens, G.E. Hinton, Generating text with recurrent neural networks, in Proceedings of ICML, Bellevue, 2011, pp. 1017–1024
- (2011) Proceedings of ICML, Bellevue , pp. 1017-1024
- Sutskever, I.¹ Martens, J.² Hinton, G.E.³

101
- 0344443787
- Joint state and parameter estimation for a target-directed nonlinear dynamic system model
- R. Togneri, L. Deng, Joint state and parameter estimation for a target-directed nonlinear dynamic system model. IEEE Trans. Signal Process. 51(12), 3061–3070 (2003)
- (2003) IEEE Trans. Signal Process , vol.51 , Issue.12 , pp. 3061-3070
- Togneri, R.¹ Deng, L.²

102
- 33745373922
- A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from mel-cepstral coefficients
- R. Togneri, L. Deng, A state-space model with neural-network prediction for recovering vocal tract resonances in fluent speech from mel-cepstral coefficients. Speech Commun. 48(8), 971– 988 (2006)
- (2006) Speech Commun , vol.48 , Issue.8
- Togneri, R.¹ Deng, L.²

103
- 84886714036
- Acoustic modeling with hierarchical reservoirs
- F. Triefenbach, A. Jalalvand, K. Demuynck, J.-P. Martens, Acoustic modeling with hierarchical reservoirs. EEE Trans. Audio Speech Lang. Process. 21(11), 2439–2450 (2013)
- (2013) EEE. Trans. Audio Speech Lang. Process , vol.21 , Issue.11 , pp. 2439-2450
- Triefenbach, F.¹ Jalalvand, A.² Demuynck, K.³ Martens, J.-P.⁴

104
- 84887037596
- Optimization algorithms and applications for speech and language processing
- S. Wright, D. Kanevsky, L. Deng, X. He, G. Heigold, H. Li, Optimization algorithms and applications for speech and language processing. IEEE Trans. Audio Speech Lang. Process. 21(11), 2231–2243 (2013)
- (2013) IEEE Trans. Audio Speech Lang. Process , vol.21 , Issue.11 , pp. 2231-2243
- Wright, S.¹ Kanevsky, D.² Deng, L.³ He, X.⁴ Heigold, G.⁵ Li, H.⁶

105
- 3242679207
- A generalized mean field algorithm for variational inference in exponential families
- X. Xing, M. Jordan, S. Russell, A generalized mean field algorithm for variational inference in exponential families, in Proceedings of UAI (2003)
- (2003) Proceedings of UAI
- Xing, X.¹ Jordan, M.² Russell, S.³

106
- 33749541517
- Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation. Comput
- D. Yu, L. Deng, Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation. Comput. Speech Lang. 27, 72–87 (2007)
- (2007) Speech Lang , vol.27 , pp. 72-87
- Yu, D.¹ Deng, L.²

107
- 84867789985
- US Patent 20130138436 A1
- D. Yu, L. Deng, Discriminative pretraining of deep neural networks, US Patent 20130138436 A1, 2013
- (2013) Discriminative Pretraining of Deep Neural Networks
- Yu, D.¹ Deng, L.²

108
- 84865713025
- Roles of pre-training and fine-tuning in context-dependent dbn-hmms for real-world speech recognition
- D. Yu, L. Deng, G. Dahl, Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition, in NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)
- (2010) NIPS Workshop on Deep Learning and Unsupervised Feature Learning
- Yu, D.¹ Deng, L.² Dahl, G.³

109
- 84867606668
- Exploiting sparseness in deep neural networks for large vocabulary speech recognition
- D. Yu, F. Seide, G. Li, L. Deng, Exploiting sparseness in deep neural networks for large vocabulary speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2012), pp. 4409–4412
- (2012) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 4409-4412
- Yu, D.¹ Seide, F.² Li, G.³ Deng, L.⁴

110
- 84867329143
- Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition
- D. Yu, S. Siniscalchi, L. Deng, C. Lee, Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2012)
- (2012) Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing
- Yu, D.¹ Siniscalchi, S.² Deng, L.³ Lee, C.⁴

111
- 85133439657
- An introduction of trajectory model into hmm-based speech synthesis
- H. Zen, K. Tokuda, T. Kitamura, An introduction of trajectory model into HMM-based speech synthesis, in Proceedings of ISCA SSW5 (2004), pp. 191–196
- (2004) Proceedings of ISCA SSW5 , pp. 191-196
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

112
- 67650153217
- Acoustic-articulatory modelling with the trajectory hmm
- L. Zhang, S. Renals, Acoustic-articulatory modelling with the trajectory HMM. IEEE Signal Process. Lett. 15, 245–248 (2008)
- (2008) IEEE Signal Process. Lett , vol.15 , pp. 245-248
- Zhang, L.¹ Renals, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.