SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 20, Issue 2, 2012, Pages 436-446

Simultaneous Speech Detection with Spatial Features for Speaker Diarization

(4) Zelenak, Martin a Segura, Carlos a Luque, Jordi a Hernando, Javier a

a UNIVERSITAT POLITÈCNICA DE CATALUNYA (Spain)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 85008556062 PISSN: 15587916 EISSN: 15587924 Source Type: Journal
DOI: 10.1109/TASL.2011.2160167 Document Type: Article

Times cited : (31)

References (35)

1
- 85009145345
- Observations on overlap: Findings and implications for automatic processing of multi-party conversation
- E. Shriberg, A. Stolcke, and D. Baron, “Observations on overlap: Findings and implications for automatic processing of multi-party conversation,” in Proc. Eurospeech′01, Aalborg, Denmark, 2001, vol. 2, pp. 1359–1362.
- (2001) Proc. Eurospeech′01, Aalborg, Denmark , vol.2 , pp. 1359-1362
- Shriberg, E.¹ Stolcke, A.² Baron, D.³

2
- 33745224103
- Spontaneous speech: How people really talk and why engineers should care
- E. Shriberg, “Spontaneous speech: How people really talk and why engineers should care,” in Proc. Interspeech′05, Lisbon, Portugal, 2005, pp. 1781–1784.
- (2005) Proc. Interspeech′05, Lisbon, Portugal , pp. 1781-1784
- Shriberg, E.¹

3
- 44849101173
- Efficient use of overlap information in speaker diarization
- S. Otterson and M. Ostendorf, “Efficient use of overlap information in speaker diarization,” in Proc. ASRU′07 Workshop, Kyoto, Japan, 2007, pp. 683–686.
- (2007) Proc. ASRU′07 Workshop, Kyoto, Japan , pp. 683-686
- Otterson, S.¹ Ostendorf, M.²

4
- 0141469852
- Multispeaker speech activity detection for the ICSI meeting recorder
- T. Pfau, D. Ellis, and A. Stolcke, “Multispeaker speech activity detection for the ICSI meeting recorder,” in Proc. ASRU′01 Workshop, Madonna di Campiglio, Italy, 2001, pp. 107–110.
- (2001) Proc. ASRU′01 Workshop, Madonna di Campiglio, Italy , pp. 107-110
- Pfau, T.¹ Ellis, D.² Stolcke, A.³

5
- 85009097062
- Crosscorrelation-based multispeaker speech activity detection
- K. Laskowski, Q. Jin, and T. Schultz, “Crosscorrelation-based multispeaker speech activity detection,” in Proc. Interspeech′04-ICSLP, Jeju Island, Korea, 2004, pp. 973–976.
- (2004) Proc. Interspeech′04-ICSLP, Jeju Island, Korea , pp. 973-976
- Laskowski, K.¹ Jin, Q.² Schultz, T.³

6
- 11144232847
- Speech and crosstalk detection in multichannel audio
- Jan.
- S. Wrigley, G. Brown, V. Wan, and S. Renals, “Speech and crosstalk detection in multichannel audio,” IEEE Trans. Speech Audio Process., vol. 13, no. 1, pp. 84–91, Jan. 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.1 , pp. 84-91
- Wrigley, S.¹ Brown, G.² Wan, V.³ Renals, S.⁴

7
- 33947615205
- Unsupervised learning of overlapped speech model parameters for multichannel speech activity detection in meetings
- K. Laskowski and T. Schultz, “Unsupervised learning of overlapped speech model parameters for multichannel speech activity detection in meetings,” in Proc. ICASSP′06, Toulouse, France, 2006, vol. I, pp. 993–996.
- (2006) Proc. ICASSP′06, Toulouse, France , vol.1 , pp. 993-996
- Laskowski, K.¹ Schultz, T.²

8
- 84867222811
- Multi-speaker meeting audio segmentation
- T. Nwe, M. Dong, S. Khine, and H. Li, “Multi-speaker meeting audio segmentation,” in Proc. Interspeech′08, Brisbane, Australia, 2008, pp. 2522–2525.
- (2008) Proc. Interspeech′08, Brisbane, Australia , pp. 2522-2525
- Nwe, T.¹ Dong, M.² Khine, S.³ Li, H.⁴

9
- 0344425668
- Location based speaker segmentation
- G. Lathoud and I. A. McCowan, “Location based speaker segmentation,” in Proc. 2003 Int. Conf. Multimedia and Expo (ICME′03), Baltimore, MD, 2003, vol. 3, pp. III-621-III-624.
- (2003) Proc. 2003 Int. Conf. Multimedia and Expo (ICME′03), Baltimore, MD , vol.3 , pp. III-621-III-624
- Lathoud, G.¹ McCowan, I.A.²

10
- 77249176190
- The AMI speaker diarization system for NIST RT06s meeting data
- D. van Leeuwen and M. Huijbregts, “The AMI speaker diarization system for NIST RT06s meeting data,” Mach. Learn. Multimodal Interact., vol. 4299/2006, pp. 371–384, 2006.
- (2006) Mach. Learn. Multimodal Interact. , vol.4299-2006 , pp. 371-384
- van Leeuwen, D.¹ Huijbregts, M.²

11
- 70450179489
- Speech Overlap Detection in a Two-Pass Speaker Diarization System
- M. Huijbregts, D. van Leeuwen, and F. de Jong, “Speech Overlap Detection in a Two-Pass Speaker Diarization System,” in Proc. Inter-speech′09, Brighton, U. K., 2009, pp. 1063–1066.
- (2009) Proc. Inter-speech′09, Brighton, U. K. , pp. 1063-1066
- Huijbregts, M.¹ van Leeuwen, D.² de Jong, F.³

12
- 84867228708
- Two's a crowd: Improving speaker diarization by automatically identifying and excluding overlapped speech
- K. Boakye, O. Vinyals, and G. Friedland, “Two's a crowd: Improving speaker diarization by automatically identifying and excluding overlapped speech,” in Proc. Interspeech′08, Brisbane, Australia, 2008, pp. 32–35.
- (2008) Proc. Interspeech′08, Brisbane, Australia , pp. 32-35
- Boakye, K.¹ Vinyals, O.² Friedland, G.³

13
- 50449086237
- Acoustic beamforming for speaker diarization of meetings
- Sep.
- X. Anguera, C. Wooters, and J. Hernando, “Acoustic beamforming for speaker diarization of meetings,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2011–2022, Sep. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 2011-2022
- Anguera, X.¹ Wooters, C.² Hernando, J.³

14
- 0346707503
- Source localization in reverberant environments: Modeling and statistical analysis
- Nov.
- T. Gustafsson, B. Rao, and M. Trivedi, “Source localization in reverberant environments: Modeling and statistical analysis,” IEEE Trans. SpeechAudioProcess., vol. 11, no. 6, pp. 791–803, Nov. 2003.
- (2003) IEEE Trans. SpeechAudioProcess. , vol.11 , Issue.6 , pp. 791-803
- Gustafsson, T.¹ Rao, B.² Trivedi, M.³

15
- 79959829540
- Overlap detection for speaker diarization by fusing spectral and spatial features
- M. Zelenak, C. Segura, and J. Hernando, “Overlap detection for speaker diarization by fusing spectral and spatial features,” in Proc. Interspeech′10, Makuhari, Japan, 2010, pp. 2302–2305.
- (2010) Proc. Interspeech′10, Makuhari, Japan , pp. 2302-2305
- Zelenak, M.¹ Segura, C.² Hernando, J.³

16
- 51449111990
- Overlapped speech detection for improved speaker diarization in multiparty meetings
- K. Boakye, B. Trueba-Hornero, O. Vinyals, and G. Friedland, “Overlapped speech detection for improved speaker diarization in multiparty meetings,” in Proc. ICASSP′08, Las Vegas, NV, 2008, pp. 4353–4356.
- (2008) Proc. ICASSP′08, Las Vegas, NV , pp. 4353-4356
- Boakye, K.¹ Trueba-Hornero, B.² Vinyals, O.³ Friedland, G.⁴

17
- 11144286121
- The Spectral Autocorrelation Peak Valley Ratio (SAPVR) --A usable speech measure employed as a co-channel detection system
- R. Yantorno, et al., “The Spectral Autocorrelation Peak Valley Ratio (SAPVR) --A usable speech measure employed as a co-channel detection system,” in Proc. IEEE Int. Workshop Intell. Signal Process. (WISP), Budapest, Hungary, 2001.
- (2001) Proc. IEEE Int. Workshop Intell. Signal Process. (WISP), Budapest, Hungary
- Yantorno, R.¹

18
- 84857343608
- Usable speech detection using linear predictive analysis--A model based approach
- N. Sundaram, R. Yantorno, B. Smolenski, and A. Iyer, “Usable speech detection using linear predictive analysis--A model based approach,” in Proc. ISPACS, Awaji Island, Japan, 2003, pp. 231–235.
- (2003) Proc. ISPACS, Awaji Island, Japan , pp. 231-235
- Sundaram, N.¹ Yantorno, R.² Smolenski, B.³ Iyer, A.⁴

19
- 85008549381
- A robust method for speech signal time-delay estimation in reverberant rooms
- P. Svaizer, et al., “A robust method for speech signal time-delay estimation in reverberant rooms,” in Proc. ICASSP′97, Munich, Germany, 1997, pp. 231–234.
- (1997) Proc. ICASSP′97, Munich, Germany , pp. 231-234
- Svaizer, P.¹

20
- 0030701369
- A robust method for speech signal time-delay estimation in reverberant rooms
- M. Brandstein and H. Silverman, “A robust method for speech signal time-delay estimation in reverberant rooms,” in Proc. ICASSP′97, Munich, Germany, 1997, pp. 375–378.
- (1997) Proc. ICASSP′97, Munich, Germany , pp. 375-378
- Brandstein, M.¹ Silverman, H.²

21
- 47749127366
- Speaker diarization for conference room: The UPC RT07s evaluation system
- J. Luque, X. Anguera, A. Temko, and J. Hernando, “Speaker diarization for conference room: The UPC RT07s evaluation system,” Multimodal Technol. Percept. Humans, vol. 4625/2008, pp. 543–553, 2008.
- (2008) Multimodal Technol. Percept. Humans , vol.4625-2008 , pp. 543-553
- Luque, J.¹ Anguera, X.² Temko, A.³ Hernando, J.⁴

22
- 51449113843
- Speaker indexing and speech enhancement in real meetings/conversations
- S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, and S. Makino, “Speaker indexing and speech enhancement in real meetings/conversations,” in Proc. ICASSP′08, Las Vegas, NV, 2008, vol. 1, pp. 93–96.
- (2008) Proc. ICASSP′08, Las Vegas, NV , vol.1 , pp. 93-96
- Araki, S.¹ Fujimoto, M.² Ishizuka, K.³ Sawada, H.⁴ Makino, S.⁵

23
- 39749173057
- Incremental learning for robust visual tracking
- May
- D. Ross, J. Lim, R. Lin, and M. Yang, “Incremental learning for robust visual tracking,” Int. J. Comput. Vis., vol. 77, no. 1, pp. 125–141, May 2008.
- (2008) Int. J. Comput. Vis. , vol.77 , Issue.1 , pp. 125-141
- Ross, D.¹ Lim, J.² Lin, R.³ Yang, M.⁴

24
- 0034247885
- Sequential Karhunen-Loeve basis extraction and its application to images
- Aug.
- A. Levy and M. Lindenbaum, “Sequential Karhunen-Loeve basis extraction and its application to images,” IEEE Trans. Image Process., vol. 9, no. 8, pp. 1371–1374, Aug. 2000.
- (2000) IEEE Trans. Image Process. , vol.9 , Issue.8 , pp. 1371-1374
- Levy, A.¹ Lindenbaum, M.²

25
- 70349225212
- Improved location features for meeting speaker diarization
- S. Otterson, “Improved location features for meeting speaker diarization,” in Proc. Interspeech′07, Antwerp, Belgium, 2007, pp. 1849–1852.
- (2007) Proc. Interspeech′07, Antwerp, Belgium , pp. 1849-1852
- Otterson, S.¹

26
- 84946742526
- A robust speaker clustering algorithm
- J. Ajmera and C. Wooters, “A robust speaker clustering algorithm,” in Proc. ASRU′03 Workshop, St. Thomas, U. S. Virgin Islands, 2003.
- (2003) Proc. ASRU′03 Workshop, St. Thomas, U. S. Virgin Islands
- Ajmera, J.¹ Wooters, C.²

27
- 85022104971
- Automatic cluster complexity and quantity selection: Towards robust speaker diarization
- X. Anguera, C. Wooters, and J. Hernando, “Automatic cluster complexity and quantity selection: Towards robust speaker diarization,” in Proc. Speaker Odyssey′06, San Juan, Puerto Rico, 2006.
- (2006) Proc. Speaker Odyssey′06, San Juan, Puerto Rico
- Anguera, X.¹ Wooters, C.² Hernando, J.³

28
- 85008563256
- The Rich Transcription 2009 Meeting Recognition Evaluation
- [Online]. Available: http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-pl%an-v2.pdf
- “The Rich Transcription 2009 Meeting Recognition Evaluation,” [Online]. Available: http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-pl%an-v2.pdf.

29
- 84867198548
- Clustering initialization based on spatial information for speaker diarization of meetings
- J. Luque, C. Segura, and J. Hernando, “Clustering initialization based on spatial information for speaker diarization of meetings,” in Proc. Interspeech′08, Brisbane, Australia, 2008, pp. 383–386.
- (2008) Proc. Interspeech′08, Brisbane, Australia , pp. 383-386
- Luque, J.¹ Segura, C.² Hernando, J.³

30
- 85009231870
- Qualcomm-ICSI-OGI features for ASR
- A. Adami, et al., “Qualcomm-ICSI-OGI features for ASR,” in Proc. ICSLP-Interspeech′02, Denver, CO, 2002, pp. 21–24.
- (2002) Proc. ICSLP-Interspeech′02, Denver, CO , pp. 21-24
- Adami, A.¹

31
- 0022352370
- Computer-steered microphone arrays for sound transduction in large rooms
- J. Flanagan, J. Johnson, R. Kahn, and G. Elko, “Computer-steered microphone arrays for sound transduction in large rooms,” J. Acoust. Soc. Amer., vol. 78, no. 5, pp. 1508–1518, 1985.
- (1985) J. Acoust. Soc. Amer. , vol.78 , Issue.5 , pp. 1508-1518
- Flanagan, J.¹ Johnson, J.² Kahn, R.³ Elko, G.⁴

32
- 34547526911
- Enhanced SVM training for robust speech activity detection
- A. Temko, D. Macho, and C. Nadeu, “Enhanced SVM training for robust speech activity detection,” in Proc. ICASSP′07, Honolulu, HI, 2007, pp. 1025–1028.
- (2007) Proc. ICASSP′07, Honolulu, HI , pp. 1025-1028
- Temko, A.¹ Macho, D.² Nadeu, C.³

33
- 0035789613
- Proximal support vector machine classifiers
- G. Fung and O. Mangasarian, “Proximal support vector machine classifiers,” in Proc. KDDM, 2001, pp. 77–86.
- (2001) Proc. KDDM , pp. 77-86
- Fung, G.¹ Mangasarian, O.²

34
- 79951598795
- Speaker diarization based on intensity channel contribution
- May
- R. Barra-Chicote, J. M. Pardo, J. Ferreiros, and J. M. Montero, “Speaker diarization based on intensity channel contribution,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 754–761, May 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.4 , pp. 754-761
- Barra-Chicote, R.¹ Pardo, J.M.² Ferreiros, J.³ Montero, J.M.⁴

35
- 47749103773
- Progress in the AMIDA speaker diarization systemfor meeting data
- D. A. van Leeuwen and M. Konecn$yU, “Progress in the AMIDA speaker diarization systemfor meeting data,” Multimodal Technol Percept. Humans, vol. 4625/2008, pp. 475–483, 2008.
- (2008) Multimodal Technol Percept. Humans , vol.4625-2008 , pp. 475-483
- van Leeuwen, D.A.¹ Konecn$yU, M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.