SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2013, Pages 1668-1672

Detecting overlapping speech with long short-term memory recurrent neural networks

(4) Geiger, Jürgen T a Eyben, Florian a Schuller, Björn a,b Rigoll, Gerhard a

a TECHNICAL UNIVERSITY OF MUNICH (Germany)

b UNIVERSITY OF PASSAU (Germany)

Author keywords

Long short term memory; Neural networks; Speaker diarization; Speech overlap detection

Indexed keywords

BRAIN; FORECASTING; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS;

AUDIO FEATURES; FEATURE SETS; HMM-BASED SYSTEMS; LONG SHORT-TERM MEMORY; OVERLAP DETECTIONS; SPEAKER DIARIZATION; SPEECH-OVERLAPS;

SPEECH RECOGNITION;

EID: 84906242216 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (49)

References (28)

1
- 33745224103
- Spontaneous speech: How people really talk and why engineers should care
- Lisbon, Portugal
- E. Shriberg, "Spontaneous Speech: How People Really Talk and Why Engineers Should Care, " in Proc. Eurospeech, Lisbon, Portugal, 2005, pp. 1781-1784.
- (2005) Proc. Eurospeech , pp. 1781-1784
- Shriberg, E.¹

2
- 84878379997
- On the dynamics of overlap in multi-party conversation
- Portland, OR, USA
- K. Laskowski, M. Heldner, and J. Edlund, "On the Dynamics of Overlap in Multi-Party Conversation, " in Proc. Interspeech, Portland, OR, USA, 2012.
- (2012) Proc. Interspeech
- Laskowski, K.¹ Heldner, M.² Edlund, J.³

3
- 84878390551
- Temporal entrainment in overlapped speech: Cross-linguistic study
- Portland, OR, USA
- M. Wlodarczak, J. Simko, and P. Wagner, "Temporal entrainment in overlapped speech: Cross-linguistic study, " in Proc. Interspeech, Portland, OR, USA, 2012.
- (2012) Proc. Interspeech
- Wlodarczak, M.¹ Simko, J.² Wagner, P.³

4
- 79952619484
- Turn-taking cues in taskoriented dialogue
- A. Gravano and J. Hirschberg, "Turn-taking cues in taskoriented dialogue, " Computer Speech & Language, vol. 25, no. 3, pp. 601-634, 2011.
- (2011) Computer Speech & Language , vol.25 , Issue.3 , pp. 601-634
- Gravano, A.¹ Hirschberg, J.²

5
- 85009145345
- Observations on overlap: Findings and implications for automatic processing of multi-party conversation
- Aalborg, Denmark
- E. Shriberg, A. Stolcke, and D. Baron, "Observations on Overlap: Findings and Implications for Automatic Processing of Multi-Party Conversation, " in Proc. Eurospeech, Aalborg, Denmark, 2001, pp. 1359-1362.
- (2001) Proc. Eurospeech , pp. 1359-1362
- Shriberg, E.¹ Stolcke, A.² Baron, D.³

6
- 84878407920
- Analysis of the characteristics of talk-show TV programs
- Portland, OR, USA
- F. Brugnara, D. Falavigna, D. Giuliani, and R. Gretter, "Analysis of the Characteristics of Talk-show TV Programs, " in Proc. Interspeech, Portland, OR, USA, 2012.
- (2012) Proc. Interspeech
- Brugnara, F.¹ Falavigna, D.² Giuliani, D.³ Gretter, R.⁴

7
- 84865726897
- Study of overlapped speech detection for NIST SRE summed channel speaker recognition
- Florence, Italy
- H. Sun and B. Ma, "Study of Overlapped Speech Detection for NIST SRE Summed Channel Speaker Recognition, " in Proc. Interspeech, Florence, Italy, 2011, pp. 2345-2348.
- (2011) Proc. Interspeech , pp. 2345-2348
- Sun, H.¹ Ma, B.²

8
- 85008561901
- Speaker diarization error analysis using oracle components
- M. Huijbregts, D. van Leeuwen, and C. Wooters, "Speaker Diarization Error Analysis Using Oracle Components, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 393-403, 2012.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.2 , pp. 393-403
- Huijbregts, M.¹ Van Leeuwen, D.² Wooters, C.³

9
- 51449111990
- Overlapped speech detection for improved diarization in multi-party meetings
- Las Vegas, NV, USA
- K. Boakye, B. Trueba-Hornero, O. Vinyals, and G. Friedland, "Overlapped Speech Detection for Improved Diarization in Multi-Party Meetings, " in Proc. ICASSP, Las Vegas, NV, USA, 2008, pp. 4353-4356.
- (2008) Proc. ICASSP , pp. 4353-4356
- Boakye, K.¹ Trueba-Hornero, B.² Vinyals, O.³ Friedland, G.⁴

10
- 84865798489
- The detection of overlapping speech with prosodic features for speaker diarization
- Florence, Italy
- M. Zelenak and J. Hernando, "The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization, " in Proc. Interspeech, Florence, Italy, 2011, pp. 1041-1044.
- (2011) Proc. Interspeech , pp. 1041-1044
- Zelenak, M.¹ Hernando, J.²

11
- 85008556062
- Simultaneous speech detection with spatial features for speaker diarization
- M. Zelenak, C. Segura, J. Luque, and J. Hernando, "Simultaneous speech detection with spatial features for speaker diarization, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 436- 446, 2012.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.2 , pp. 436-446
- Zelenak, M.¹ Segura, C.² Luque, J.³ Hernando, J.⁴

12
- 84867599879
- Speech overlap detection and attribution using convolutive non-negative sparse coding
- Kyoto, Japan
- R. Vipperla, J. Geiger, S. Bozonnet, D. Wang, N. Evans, B. Schuller, and G. Rigoll, "Speech Overlap Detection and Attribution Using Convolutive Non-Negative Sparse Coding, " in Proc. ICASSP, Kyoto, Japan, 2012, pp. 4181- 4184.
- (2012) Proc. ICASSP , pp. 4181-4184
- Vipperla, R.¹ Geiger, J.² Bozonnet, S.³ Wang, D.⁴ Evans, N.⁵ Schuller, B.⁶ Rigoll, G.⁷

13
- 84878541035
- Convolutive non-negative sparse coding and new features for speech overlap handling in speaker diarization
- Portland, OR, USA
- J. Geiger, R. Vipperla, S. Bozonnet, N. Evans, B. Schuller, and G. Rigoll, "Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization, " in Proc. Interspeech, Portland, OR, USA, 2012.
- (2012) Proc. Interspeech
- Geiger, J.¹ Vipperla, R.² Bozonnet, S.³ Evans, N.⁴ Schuller, B.⁵ Rigoll, G.⁶

14
- 84869791486
- Speech overlap detection and attribution using convolutive non-negative sparse coding: New improvements and insights
- Bucharest, Romania
- J. Geiger, R. Vipperla, N. Evans, B. Schuller, and G. Rigoll, "Speech Overlap Detection and Attribution Using Convolutive Non-Negative Sparse Coding: New Improvements and Insights, " in Proc. EUSIPCO, Bucharest, Romania, 2012, pp. 340-344.
- (2012) Proc. EUSIPCO , pp. 340-344
- Geiger, J.¹ Vipperla, R.² Evans, N.³ Schuller, B.⁴ Rigoll, G.⁵

15
- 84878382961
- Speaker diarization of overlapping speech based on silence distribution in meeting recordings
- Portland, OR, USA
- S. H. Yella and F. Valente, "Speaker Diarization of Overlapping Speech based on Silence Distribution in Meeting Recordings, " in Proc. Interspeech, Portland, OR, USA, 2012.
- (2012) Proc. Interspeech
- Yella, S.H.¹ Valente, F.²

16
- 84890508572
- Improved overlap speech diarization of meeting recordings using long-term conversational features
- Vancouver, Canada
- S. H. Yella and H. Bourlard, "Improved Overlap Speech Diarization of Meeting Recordings using Long-Term Conversational Features, " in Proc. ICASSP, Vancouver, Canada, 2013.
- (2013) Proc. ICASSP
- Yella, S.H.¹ Bourlard, H.²

17
- 27144509179
- Learning long-term temporal features in lvcsr using neural networks
- Jeju Island, Korea
- Barry Chen, Qifeng Zhu, and Nelson Morgan, "Learning long-term temporal features in lvcsr using neural networks, " in Proc. Interspeech, Jeju Island, Korea, 2004, pp. 612-615.
- (2004) Proc. Interspeech , pp. 612-615
- Chen, B.¹ Zhu, Q.² Morgan, N.³

18
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural Computation, vol. 9, no. 8, pp. 1735- 1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

19
- 84890443834
- Real-life voice activity detection with lstm recurrent neural networks and an application to hollywood movies
- Vancouver, Canada
- F. Eyben, F. Weninger, S. Squartini, and B. Schuller, "Real-life Voice Activity Detection with LSTM Recurrent Neural Networks and an Application to Hollywood Movies, " in Proc. ICASSP, Vancouver, Canada, 2013.
- (2013) Proc. ICASSP
- Eyben, F.¹ Weninger, F.² Squartini, S.³ Schuller, B.⁴

20
- 78650977476
- Opensmile: The munich versatile and fast open-source audio feature extractor
- Florence, Italy
- F. Eyben, M. Wöllmer, and B. Schuller, "openSMILE: The Munich Versatile and Fast Open-Source Audio Feature Extractor, " in Proc. ACM Multimedia (MM), Florence, Italy, 2010, pp. 1459-1462.
- (2010) Proc. ACM Multimedia (MM) , pp. 1459-1462
- Eyben, F.¹ Wöllmer, M.² Schuller, B.³

21
- 84900510076
- Non-negative matrix factorization with sparseness constraints
- P. O. Hoyer, "Non-negative Matrix Factorization with Sparseness Constraints, " Journal of Machine Learning Research, vol. 5, pp. 1457-1469, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1457-1469
- Hoyer, P.O.¹

22
- 38049021850
- Convolutive speech bases and their application to supervised speech separation
- P. Smaragdis, "Convolutive Speech Bases and Their Application to Supervised Speech Separation, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 1-12, 2007.
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.1 , pp. 1-12
- Smaragdis, P.¹

23
- 84865777049
- Online pattern learning for non-negative convolutive sparse coding
- Florence, Italy
- D. Wang, R. Vipperla, and N. Evans, "Online pattern learning for non-negative convolutive sparse coding, " in Proc. Interspeech, Florence, Italy, 2011, pp. 65-68.
- (2011) Proc. Interspeech , pp. 65-68
- Wang, D.¹ Vipperla, R.² Evans, N.³

24
- 84870974763
- Online non-negative convolutive pattern learning for speech signals
- D.Wang, R. Vipperla, N. Evans, and T. F. Zheng, "Online non-negative convolutive pattern learning for speech signals, " IEEE Transactions on Signal Processing, vol. 61, no. 1, pp. 44-56, 2013.
- (2013) IEEE Transactions on Signal Processing , vol.61 , Issue.1 , pp. 44-56
- Wang, D.¹ Vipperla, R.² Evans, N.³ Zheng, T.F.⁴

25
- 78049378635
- The liaeurecom RT09 speaker diarization system: Enhancements in speaker modelling and cluster purification
- Dallas, TX, USA
- S. Bozonnet, N. Evans, and C. Fredouille, "The LIAEurecom RT09 Speaker Diarization System: Enhancements in Speaker Modelling and Cluster Purification, " in Proc. ICASSP, Dallas, TX, USA, 2010, pp. 4958-4961.
- (2010) Proc. ICASSP , pp. 4958-4961
- Bozonnet, S.¹ Evans, N.² Fredouille, C.³

26
- 70349287581
- Multidimensional recurrent neural networks
- Porto, Portugal
- A. Graves, S. Ferńandez, and J. Schmidhuber, " Multidimensional recurrent neural networks, " in Proc. of the 2007 International Conference on Artificial Neural Networks, Porto, Portugal, 2007.
- (2007) Proc. of the 2007 International Conference on Artificial Neural Networks
- Graves, A.¹ Ferńandez, S.² Schmidhuber, J.³

27
- 33745530242
- The ami meeting corpus: A preannouncement
- J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraaij, M. Kronenthal, et al., "The AMI meeting corpus: A preannouncement, " Machine Learning for Multimodal Interaction, pp. 28-39, 2006.
- (2006) Machine Learning for Multimodal Interaction , pp. 28-39
- Carletta, J.¹ Ashby, S.² Bourban, S.³ Flynn, M.⁴ Guillemot, M.⁵ Hain, T.⁶ Kadlec, J.⁷ Karaiskos, V.⁸ Kraaij, W.⁹ Kronenthal, M.¹⁰

28
- 84865747509
- Improved overlapped speech handling for speaker diarization
- Florence, Italy
- K. Boakye, O. Vinyals, and G. Friedland, "Improved Overlapped Speech Handling for Speaker Diarization, " in Proc. Interspeech, Florence, Italy, 2011, pp. 941-944.
- (2011) Proc. Interspeech , pp. 941-944
- Boakye, K.¹ Vinyals, O.² Friedland, G.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.