메뉴 건너뛰기




Volumn , Issue , 2011, Pages 36-41

A novel bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition

Author keywords

[No Author keywords available]

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; CONTEXT MODELING; CONVERSATIONAL SPEECH RECOGNITION; FEATURE GENERATION; FEATURE LEVEL; FEATURE VECTORS; MULTI-LAYER PERCEPTRONS; MULTI-STREAM; NETWORK TRAINING; PHONEME RECOGNITION; RECURRENT NETWORKS; SHORT TERM MEMORY; SPEECH FEATURES;

EID: 84858961864     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ASRU.2011.6163902     Document Type: Conference Paper
Times cited : (11)

References (17)
  • 1
    • 0033709098 scopus 로고    scopus 로고
    • Tandem connectionist feature extraction for conventional HMM systems
    • Istanbul, Turkey
    • H. Hermansky, D. P. W. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. of ICASSP, Istanbul, Turkey, 2000, pp. 1635-1638.
    • (2000) Proc. of ICASSP , pp. 1635-1638
    • Hermansky, H.1    Ellis, D.P.W.2    Sharma, S.3
  • 3
    • 77949350062 scopus 로고    scopus 로고
    • Robust vocabulary independent keyword spotting with graphical models
    • Merano, Italy
    • M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll, "Robust vocabulary independent keyword spotting with graphical models," in Proc. of ASRU, Merano, Italy, 2009, pp. 349-353.
    • (2009) Proc. of ASRU , pp. 349-353
    • Wöllmer, M.1    Eyben, F.2    Schuller, B.3    Rigoll, G.4
  • 4
    • 70349212558 scopus 로고    scopus 로고
    • Phoneme recognition using spectral envelope and modulation frequency features
    • Taipei, Taiwan
    • S. Thomas, S. Ganapathy, and H. Hermansky, "Phoneme recognition using spectral envelope and modulation frequency features," in Proc. of ICASSP, Taipei, Taiwan, 2009, pp. 4453-4456.
    • (2009) Proc. of ICASSP , pp. 4453-4456
    • Thomas, S.1    Ganapathy, S.2    Hermansky, H.3
  • 5
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    • DOI 10.1016/j.neunet.2005.06.042, PII S0893608005001206
    • A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Networks, vol. 18, no. 5-6, pp. 602-610, 2005. (Pubitemid 43186580)
    • (2005) Neural Networks , vol.18 , Issue.5-6 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 6
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. (Pubitemid 127462305)
    • (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 7
    • 78651563436 scopus 로고    scopus 로고
    • Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework
    • M. Wöllmer, F. Eyben, A. Graves, B. Schuller, and G. Rigoll, "Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework," Cognitive Computation, vol. 2, no. 3, pp. 180-190, 2010.
    • (2010) Cognitive Computation , vol.2 , Issue.3 , pp. 180-190
    • Wöllmer, M.1    Eyben, F.2    Graves, A.3    Schuller, B.4    Rigoll, G.5
  • 8
    • 79959821052 scopus 로고    scopus 로고
    • Recognition of spontaneous conversational speech using long short-term memory phoneme predictions
    • Makuhari, Japan
    • M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll, "Recognition of spontaneous conversational speech using long short-term memory phoneme predictions," in Proc. of Interspeech, Makuhari, Japan, 2010, pp. 1946-1949.
    • (2010) Proc. of Interspeech , pp. 1946-1949
    • Wöllmer, M.1    Eyben, F.2    Schuller, B.3    Rigoll, G.4
  • 9
    • 80051637579 scopus 로고    scopus 로고
    • A multi-stream ASR framework for BLSTM modeling of conversational speech
    • Prague, Czech Republic
    • -, "A multi-stream ASR framework for BLSTM modeling of conversational speech," in Proc. of ICASSP, Prague, Czech Republic, 2011, pp. 4860-4863.
    • (2011) Proc. of ICASSP , pp. 4860-4863
    • Wöllmer, M.1    Eyben, F.2    Schuller, B.3    Rigoll, G.4
  • 10
    • 84865748400 scopus 로고    scopus 로고
    • Feature frame stacking in RNN-based tandem ASR systems - Learned vs. predefined context
    • Florence, Italy
    • M. Wöllmer, B. Schuller, and G. Rigoll, "Feature frame stacking in RNN-based Tandem ASR systems - learned vs. predefined context," in Proc. of Interspeech, Florence, Italy, 2011.
    • (2011) Proc. of Interspeech
    • Wöllmer, M.1    Schuller, B.2    Rigoll, G.3
  • 11
    • 0041914606 scopus 로고    scopus 로고
    • Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
    • S. C. Kremer and J. F. Kolen, Eds. IEEE Press
    • S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, "Gradient flow in recurrent nets: the difficulty of learning long-term dependencies," in A Field Guide to Dynamical Recurrent Neural Networks, S. C. Kremer and J. F. Kolen, Eds. IEEE Press, 2001, pp. 1-15.
    • (2001) A Field Guide to Dynamical Recurrent Neural Networks , pp. 1-15
    • Hochreiter, S.1    Bengio, Y.2    Frasconi, P.3    Schmidhuber, J.4
  • 12
    • 0031268931 scopus 로고    scopus 로고
    • Bidirectional recurrent neural networks
    • PII S1053587X97080550
    • M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, vol. 45, pp. 2673-2681, 1997. (Pubitemid 127766336)
    • (1997) IEEE Transactions on Signal Processing , vol.45 , Issue.11 , pp. 2673-2681
    • Schuster, M.1    Paliwal, K.K.2
  • 13
    • 79959404069 scopus 로고    scopus 로고
    • The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments
    • A. Stupakov, E. Hanusa, D. Vijaywargi, D. Fox, and J. Bilmes, "The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments," Computer Speech and Language, vol. 26, no. 1, pp. 52-66, 2011.
    • (2011) Computer Speech and Language , vol.26 , Issue.1 , pp. 52-66
    • Stupakov, A.1    Hanusa, E.2    Vijaywargi, D.3    Fox, D.4    Bilmes, J.5
  • 15
    • 84858960416 scopus 로고    scopus 로고
    • [www.buckeyecorpus.osu.edu].
  • 16
    • 80051621128 scopus 로고    scopus 로고
    • Localization of non-linguistic events in spontaneous speech by non-negative matrix factorization and long short-term memory
    • Prague, Czech Republic
    • F. Weninger, B. Schuller, M. Wöllmer, and G. Rigoll, "Localization of non-linguistic events in spontaneous speech by non-negative matrix factorization and Long Short-Term Memory," in Proc. of ICASSP, Prague, Czech Republic, 2011, pp. 5840-5843.
    • (2011) Proc. of ICASSP , pp. 5840-5843
    • Weninger, F.1    Schuller, B.2    Wöllmer, M.3    Rigoll, G.4
  • 17
    • 78650977476 scopus 로고    scopus 로고
    • OpenSMILE - The munich versatile and fast open-source audio feature extractor
    • Firenze, Italy
    • F. Eyben, M. Wöllmer, and B. Schuller, "openSMILE - the Munich versatile and fast open-source audio feature extractor," in Proc. of ACM Multimedia, Firenze, Italy, 2010, pp. 1459-1462.
    • (2010) Proc. of ACM Multimedia , pp. 1459-1462
    • Eyben, F.1    Wöllmer, M.2    Schuller, B.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.