메뉴 건너뛰기




Volumn , Issue , 2011, Pages 1233-1236

Feature frame stacking in RNN-based Tandem ASR systems - Learned vs. predefined context

Author keywords

Automatic speech recognition; Context modeling; Long short term memory; Recurrent neural networks

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION SYSTEM; CONTEXT MODELING; CONTEXTUAL INFORMATION; EMPIRICAL EVIDENCE; FEATURE LEVEL; FEATURE VECTORS; MULTI-LAYER PERCEPTRONS; MULTI-STREAM; PHONEME RECOGNITION; SHORT TERM MEMORY; TANDEM SYSTEM;

EID: 84865748400     PISSN: None     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (11)

References (15)
  • 1
    • 24144487688 scopus 로고    scopus 로고
    • Tandem connectionist feature extraction for conversational speech recognition
    • Springer
    • Q. Zhu, B. Chen, N. Morgan, and A. Stolcke, "Tandem connectionist feature extraction for conversational speech recognition," in Machine Learning for Multimodal Interaction. Springer, 2005, pp. 223-231.
    • (2005) Machine Learning for Multimodal Interaction , pp. 223-231
    • Zhu, Q.1    Chen, B.2    Morgan, N.3    Stolcke, A.4
  • 2
    • 51449103447 scopus 로고    scopus 로고
    • Optimizing bottle-neck features for LVCSR
    • Las Vegas, NV
    • F. Grezl and P. Fousek, "Optimizing bottle-neck features for LVCSR," in Proc. of ICASSP, Las Vegas, NV, 2008, pp. 4729-4732.
    • (2008) Proc. of ICASSP , pp. 4729-4732
    • Grezl, F.1    Fousek, P.2
  • 3
    • 78651563436 scopus 로고    scopus 로고
    • Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework
    • M. Wöllmer, F. Eyben, A. Graves, B. Schuller, and G. Rigoll, "Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework," Cognitive Computation, vol. 2, no. 3, pp. 180-190, 2010.
    • (2010) Cognitive Computation , vol.2 , Issue.3 , pp. 180-190
    • Wöllmer, M.1    Eyben, F.2    Graves, A.3    Schuller, B.4    Rigoll, G.5
  • 4
    • 70349212558 scopus 로고    scopus 로고
    • Phoneme recognition using spectral envelope and modulation frequency features
    • Taipei,Taiwan
    • S. Thomas, S. Ganapathy, and H. Hermansky, "Phoneme recognition using spectral envelope and modulation frequency features," in Proc. of ICASSP, Taipei,Taiwan, 2009, pp. 4453-4456.
    • (2009) Proc. of ICASSP , pp. 4453-4456
    • Thomas, S.1    Ganapathy, S.2    Hermansky, H.3
  • 5
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 6
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    • A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Networks, vol. 18, no. 5-6, pp. 602-610, 2005.
    • (2005) Neural Networks , vol.18 , Issue.5-6 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 7
    • 38149014113 scopus 로고    scopus 로고
    • An application of recurrent neural networks to discriminative keyword spotting
    • Porto, Portugal
    • S. Fernandez, A. Graves, and J. Schmidhuber, "An application of recurrent neural networks to discriminative keyword spotting," in Proc. of ICANN, Porto, Portugal, 2007, pp. 220-229.
    • (2007) Proc. of ICANN , pp. 220-229
    • Fernandez, S.1    Graves, A.2    Schmidhuber, J.3
  • 8
    • 80051637579 scopus 로고    scopus 로고
    • A multi-stream ASR framework for BLSTM modeling of conversational speech
    • Prague, Czech Republic
    • M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll, "A multi-stream ASR framework for BLSTM modeling of conversational speech," in Proc. of ICASSP, Prague, Czech Republic, 2011.
    • (2011) Proc. of ICASSP
    • Wöllmer, M.1    Eyben, F.2    Schuller, B.3    Rigoll, G.4
  • 9
    • 84865753113 scopus 로고    scopus 로고
    • The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments
    • to appear
    • A. Stupakov, E. Hanusa, D. Vijaywargi, D. Fox, and J. Bilmes, "The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments," Computer Speech and Language, 2011, to appear.
    • (2011) Computer Speech and Language
    • Stupakov, A.1    Hanusa, E.2    Vijaywargi, D.3    Fox, D.4    Bilmes, J.5
  • 10
    • 0041914606 scopus 로고    scopus 로고
    • Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
    • S. C. Kremer and J. F. Kolen, Eds. IEEE Press
    • S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, "Gradient flow in recurrent nets: the difficulty of learning long-term dependencies," in A Field Guide to Dynamical Recurrent Neural Networks, S. C. Kremer and J. F. Kolen, Eds. IEEE Press, 2001, pp. 1-15.
    • (2001) A Field Guide to Dynamical Recurrent Neural Networks , pp. 1-15
    • Hochreiter, S.1    Bengio, Y.2    Frasconi, P.3    Schmidhuber, J.4
  • 11
    • 77956721304 scopus 로고    scopus 로고
    • Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening
    • M. Wöllmer, B. Schuller, F. Eyben, and G. Rigoll, "Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening," IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 867-881, 2010.
    • (2010) IEEE Journal of Selected Topics in Signal Processing , vol.4 , Issue.5 , pp. 867-881
    • Wöllmer, M.1    Schuller, B.2    Eyben, F.3    Rigoll, G.4
  • 14
    • 78650977476 scopus 로고    scopus 로고
    • OpenSMILE - The Munich versatile and fast open-source audio feature extractor
    • Firenze, Italy
    • F. Eyben, M. Wöllmer, and B. Schuller, "openSMILE - the Munich versatile and fast open-source audio feature extractor," in Proc. of ACM Multimedia, Firenze, Italy, 2010, pp. 1459-1462.
    • (2010) Proc. of ACM Multimedia , pp. 1459-1462
    • Eyben, F.1    Wöllmer, M.2    Schuller, B.3
  • 15
    • 79959821052 scopus 로고    scopus 로고
    • Recognition of spontaneous conversational speech using long short-term memory phoneme predictions
    • Makuhari, Japan
    • M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll, "Recognition of spontaneous conversational speech using long short-term memory phoneme predictions," in Proc. of Interspeech, Makuhari, Japan, 2010, pp. 1946-1949.
    • (2010) Proc. of Interspeech , pp. 1946-1949
    • Wöllmer, M.1    Eyben, F.2    Schuller, B.3    Rigoll, G.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.