SCOPUS 정보 검색 플랫폼

2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings

Volumn , Issue , 2010, Pages 424-429

Real-time meeting recognition and understanding using distant microphones and omni-directional camera

(13) Hori, Takaaki a Araki, Shoko a Yoshioka, Takuya a Fujimoto, Masakiyo a Watanabe, Shinji a Oba, Takanobu a Ogawa, Atsunori a Otsuka, Kazuhiro a Mikami, Dan a Kinoshita, Keisuke a Nakatani, Tomohiro a Nakamura, Atsushi a Yamato, Junji a

a Nippon Telegraph and Telephone Corporation (Japan)

Author keywords

Distant microphones; Meeting analysis; Speaker diarization; Speech enhancement; Speech recognition; Topic tracking

Indexed keywords

AUDIO PROCESSING; DISTANT MICROPHONES; FACE POSE; LOW-LATENCY; MEETING ANALYSIS; MICROPHONE ARRAYS; OMNIDIRECTIONAL CAMERAS; SPEAKER DIARIZATION; SPEECH RECOGNIZER; SPEECH SIGNALS; TOPIC TRACKING;

AUDIO SIGNAL PROCESSING; CAMERAS; INFORMATION ANALYSIS; MICROPHONES; SPEECH ENHANCEMENT; TRANSCRIPTION; WIRELESS SENSOR NETWORKS;

SPEECH RECOGNITION;

EID: 79951797950 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/SLT.2010.5700890 Document Type: Conference Paper

Times cited : (12)

References (24)

1
- 0001979427
- Meeting browser: Tracking and summarizing meetings
- A.Waibel, M. Bett, and M. Finke, "Meeting browser: tracking and summarizing meetings," in Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 281-286.
- Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998 , pp. 281-286
- Waibel, A.¹ Bett, M.² Finke, M.³

2
- 84857722303
- The ICSI meeting project: Resources and research
- A. Janin, J. Ang, S. Bhagat, R. Dhillon, J. Edwards, J. Macías- Guarasa, N. Morgan, B. Peskin, E Shriberg, A. Stolcke, C. Wooters, and B. Wrede, "The ICSI meeting project: resources and research," in Proc. ICASSP'04 Meeting Recognition Workshop, 2004.
- Proc. ICASSP'04 Meeting Recognition Workshop, 2004
- Janin, A.¹ Ang, J.² Bhagat, S.³ Dhillon, R.⁴ Edwards, J.⁵ Macías-Guarasa, J.⁶ Morgan, N.⁷ Peskin, B.⁸ Shriberg, E.⁹ Stolcke, A.¹⁰ Wooters, C.¹¹ Wrede, B.¹²

3
- 32044458420
- Browsing recorded meetings with Ferret
- P. Wellner, M. Flynn, and M. Guillemot, "Browsing recorded meetings with Ferret," in Proc. ICMI-MLMI, 2004, pp. 12-21.
- Proc. ICMI-MLMI, 2004 , pp. 12-21
- Wellner, P.¹ Flynn, M.² Guillemot, M.³

4
- 44849090969
- Recognition and understanding of meetings the AMI and AMIDA projects
- S. Renals, T. Hain, and H. Bourlard, "Recognition and understanding of meetings The AMI and AMIDA projects," in Proc. ASRU, 2007, pp. 238-247.
- Proc. ASRU, 2007 , pp. 238-247
- Renals, S.¹ Hain, T.² Bourlard, H.³

5
- 67649528017
- The CALO meeting speech recognition and understanding system
- G. Tur, A. Stolcke, L. Voss, J. Dowding, B. Favre, R. Fernandez, M. Frampton, M. Frandsen, C. Frederickson, M. Graciarena, D. Hakkani-Tür, D. Kintzing, K. Leveque, S. Mason, J. Niekrasz, S. Peters, M. Purver, K. Riedhammer, E. Shriberg, J. Tien, D. Vergyri, and F. Yang, "The CALO meeting speech recognition and understanding system," in Proc. SLT, 2008.
- Proc. SLT, 2008
- Tur, G.¹ Stolcke, A.² Voss, L.³ Dowding, J.⁴ Favre, B.⁵ Fernandez, R.⁶ Frampton, M.⁷ Frandsen, M.⁸ Frederickson, C.⁹ Graciarena, M.¹⁰ Hakkani-Tür, D.¹¹ Kintzing, D.¹² Leveque, K.¹³ Mason, S.¹⁴ Niekrasz, J.¹⁵ Peters, S.¹⁶ Purver, M.¹⁷ Riedhammer, K.¹⁸ Shriberg, E.¹⁹ Tien, J.²⁰ Vergyri, D.²¹ Yang, F.²² more..

6
- 60949097180
- A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization
- K. Otsuka, S. Araki, K. Ishizuka, M. Fujimoto, M. Heinrich, and J. Yamato, "A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization," in Proc. ICMI, 2008, pp. 257-264.
- Proc. ICMI, 2008 , pp. 257-264
- Otsuka, K.¹ Araki, S.² Ishizuka, K.³ Fujimoto, M.⁴ Heinrich, M.⁵ Yamato, J.⁶

7
- 74049143046
- A multimedia retrieval system using speech input
- A. Popescu-Belis, P. Poller, and J. Kilgour, "A multimedia retrieval system using speech input," in Proc. ICMI-MLMI, 2009, pp. 223-224.
- Proc. ICMI-MLMI, 2009 , pp. 223-224
- Popescu-Belis, A.¹ Poller, P.² Kilgour, J.³

8
- 70450174924
- Real-time ASR from meetings
- P. N. Garner, J. Dines, T. Hain, E. A. Hannani, M. Karafiát, D. Korchagin, M. Lincoln, V. Wan, and L. Zhang, "Real-time ASR from meetings," in Proc. Interspeech, 2009, pp. 2119-2122.
- Proc. Interspeech, 2009 , pp. 2119-2122
- Garner, P.N.¹ Dines, J.² Hain, T.³ Hannani, E.A.⁴ Karafiát, M.⁵ Korchagin, D.⁶ Lincoln, M.⁷ Wan, V.⁸ Zhang, L.⁹

9
- 70450204727
- A study of mutual front-end processing method based on statistical model for noise robust speech recognition
- M. Fujimoto, K. Ishizuka, and T. Nakatani, "A study of mutual front-end processing method based on statistical model for noise robust speech recognition," in Proc. Interspeech, 2009, pp. 1235-1238.
- Proc. Interspeech, 2009 , pp. 1235-1238
- Fujimoto, M.¹ Ishizuka, K.² Nakatani, T.³

10
- 33645758265
- NTT Speech recognizer with Outlook on the Next generation: SOLON
- [Online]. Available
- T. Hori, "NTT Speech recognizer with Outlook On the Next generation: SOLON," in Proc. NTT Workshop on Communication Scene Analysis, 2004, pp. SP-6. [Online]. Available: www.kecl.ntt.co.jp/icl/signal/hori/publications/thori csa2004.pdf.
- Proc. NTT Workshop on Communication Scene Analysis, 2004
- Hori, T.¹

11
- 77957745677
- Blind separation and dereverberation of speech mixtures by joint optimization
- accepted for publication, doi:10.1109/TASL.2010.2045183
- T. Yoshioka, T. Nakatani, T. Miyoshi, and H. G. Okuno, "Blind separation and dereverberation of speech mixtures by joint optimization," IEEE Transactions on Audio, Speech, and Language Processing, 2010, accepted for publication, doi:10.1109/TASL.2010.2045183.
- (2010) IEEE Transactions on Audio, Speech, and Language Processing
- Yoshioka, T.¹ Nakatani, T.² Miyoshi, T.³ Okuno, H.G.⁴

12
- 51449113843
- Speaker indexing and speech enhancement in real meeting / conversations
- S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, and S. Makino, "Speaker indexing and speech enhancement in real meeting / conversations," in Proc. ICASSP, 2008, vol. I, pp. 93-96.
- Proc. ICASSP, 2008 , vol.1 , pp. 93-96
- Araki, S.¹ Fujimoto, M.² Ishizuka, K.³ Sawada, H.⁴ Makino, S.⁵

13
- 0016990291
- The generalized correlation method for estimation of time delay
- C. H. Knapp and G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust. Speech and Signal Processing, vol. 24, no. 4, pp. 320-327, 1976.
- (1976) IEEE Trans. Acoust. Speech and Signal Processing , vol.24 , Issue.4 , pp. 320-327
- Knapp, C.H.¹ Carter, G.C.²

14
- 34247223586
- Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors
- Aug
- S. Araki, H. Sawada, R. Mukai, and S. Makino, "Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors," Signal Processing, vol. 77, no. 8, pp. 1833-1847, Aug 2007.
- (2007) Signal Processing , vol.77 , Issue.8 , pp. 1833-1847
- Araki, S.¹ Sawada, H.² Mukai, R.³ Makino, S.⁴

15
- 50449097931
- Noise robust voice activity detection based on switching Kalman filter
- M. Fujimoto and K. Ishizuka, "Noise robust voice activity detection based on switching Kalman filter," in Proc. Interspeech, 2007, pp. 2933-2936.
- Proc. Interspeech, 2007 , pp. 2933-2936
- Fujimoto, M.¹ Ishizuka, K.²

16
- 0032762471
- A statistical model-based voice activity detection
- January
- J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1-3, January 1999.
- (1999) IEEE Signal Processing Letters , vol.6 , Issue.1 , pp. 1-3
- Sohn, J.¹ Kim, N.S.² Sung, W.³

17
- 33745207361
- A Japanese national project on spontaneous speech corpus and processing technology
- S. Furui, K. Maekawa, and H. Isahara, "A Japanese national project on spontaneous speech corpus and processing technology," in Proc. ASR, 2000, pp. 244-248.
- Proc. ASR, 2000 , pp. 244-248
- Furui, S.¹ Maekawa, K.² Isahara, H.³

18
- 78049393373
- A comparative study on methods of weighted language model training for reranking LVCSR n-best hypotheses
- T. Oba, T. Hori, and A. Nakamura, "A comparative study on methods of weighted language model training for reranking LVCSR n-best hypotheses," in Proc. ICASSP, 2010, pp. 5126-5129.
- Proc. ICASSP, 2010 , pp. 5126-5129
- Oba, T.¹ Hori, T.² Nakamura, A.³

19
- 45849093239
- Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
- T. Hori, C. Hori, Y. Minami, and A. Nakamura, "Effi- cient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1352-1365, 2007.
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.4 , pp. 1352-1365
- Hori, T.¹ Hori, C.² Minami, Y.³ Nakamura, A.⁴

20
- 33646426591
- Generalized fast on-the-fly composition algorithm for WFST-based speech recognition
- T. Hori and A. Nakamura, "Generalized fast on-the-fly composition algorithm for WFST-based speech recognition," in Proc. Interspeech- Eurospeech, 2005, pp. 557-560.
- Proc. Interspeech-Eurospeech, 2005 , pp. 557-560
- Hori, T.¹ Nakamura, A.²

21
- 85009271609
- Towards automatic closed captioning: Low latency real time broadcast news transcription
- M. Saraclar, M. Riley, E. Bocchieri, and V. Goffin, "Towards automatic closed captioning: low latency real time broadcast news transcription," in Proc. ICSLP, 2002, pp. 1741-1744.
- Proc. ICSLP, 2002 , pp. 1741-1744
- Saraclar, M.¹ Riley, M.² Bocchieri, E.³ Goffin, V.⁴

22
- 38049176869
- CLEAR evaluation of acoustic event detection and classification systems
- A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu, and M. Omologo, "CLEAR evaluation of acoustic event detection and classification systems," Multimodal Technologies for Perception of Humans, pp. 311-322, 2007.
- (2007) Multimodal Technologies for Perception of Humans , pp. 311-322
- Temko, A.¹ Malkin, R.² Zieger, C.³ Macho, D.⁴ Nadeu, C.⁵ Omologo, M.⁶

23
- 77956207114
- Topic tracking model for analyzing consumer purchase behavior
- T. Iwata, S. Watanabe, T. Yamada, and N. Ueda, "Topic tracking model for analyzing consumer purchase behavior," in Proc. IJCAI, 2009, pp. 1427-1432.
- Proc. IJCAI, 2009 , pp. 1427-1432
- Iwata, T.¹ Watanabe, S.² Yamada, T.³ Ueda, N.⁴

24
- 70450162101
- Memory-based particle filter for face pose tracking robust under complex dynamics
- D. Mikami, K. Otsuka, and J. Yamato, "Memory-based particle filter for face pose tracking robust under complex dynamics," in Proc. CVPR, 2009, pp. 999-1006.
- Proc. CVPR, 2009 , pp. 999-1006
- Mikami, D.¹ Otsuka, K.² Yamato, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.