SCOPUS 정보 검색 플랫폼

IEEE Signal Processing Magazine

Volumn 29, Issue 6, 2012, Pages 127-140

Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors

(3) Kumatani, Kenichi a McDonough, John b Raj, Bhiksha b

a DISNEY RESEARCH (United States)

b Carnegie Mellon University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ACOUSTICS; ARRAY PROCESSING; DEEP NEURAL NETWORKS; DISTRIBUTED COMPUTER SYSTEMS; HUMAN COMPUTER INTERACTION; MICROPHONES; SPEECH; SPEECH PROCESSING; SPHERES;

DISTANT SPEECH RECOGNITION; EMERGING TECHNOLOGIES; HUMAN COMPUTER INTERFACES; MICROPHONE ARRAY PROCESSING; MICROPHONE ARRAYS; PERFORMANCE COMPARISON; RECOGNITION ACCURACY; SPHERICAL MICROPHONE ARRAY;

SPEECH RECOGNITION;

EID: 85032750883 PISSN: 10535888 EISSN: None Source Type: Journal
DOI: 10.1109/MSP.2012.2205285 Document Type: Article

Times cited : (122)

References (43)

1
- 0032142014
- Environmental conditions and acoustic transduction in hands-free speech recognition
- PII S0167639398000302
- M. Omologo, M. Matassoni, and P. Svaizer, "Environmental conditions and acoustic transduction in hands-free speech recognition," Speech Commun., vol. 25, no. 1-3, pp. 75-95, 1998. (Pubitemid 128413635)
- (1998) Speech Communication , vol.25 , Issue.1-3 , pp. 75-95
- Omologo, M.¹ Svaizer, P.² Matassoni, M.³

2
- 50449083999
- Hoboken, NJ: Wiley
- M. Wölfel and J. McDonough, Distant Speech Recognition. Hoboken, NJ: Wiley, 2009.
- (2009) Distant Speech Recognition
- Wölfel, M.¹ McDonough, J.²

3
- 79958016776
- A prototype of distant-talking interface for control of interactive TV
- M. Omologo, "A prototype of distant-talking interface for control of interactive TV," in Proc. Asilomar Conf. Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, 2010, pp. 1711-1715.
- (2010) Proc. Asilomar Conf. Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA , pp. 1711-1715
- Omologo, M.¹

4
- 33846217002
- The multi-channel Wall Street Journal audio visual corpus (MCWSJ-AV): Specification and initial experiments
- M. Lincoln, I. McCowan, I. Vepa, and H. K. Maganti, "The multi-channel Wall Street Journal audio visual corpus (MCWSJ-AV): Specification and initial experiments," in Proc. IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), 2005, pp. 357-362.
- (2005) Proc. IEEE Workshop Automatic Speech Recognition and Understanding (ASRU) , pp. 357-362
- Lincoln, M.¹ McCowan, I.² Vepa, I.³ Maganti, H.K.⁴

5
- 40249114843
- To separate speech!: A system for recognizing simultaneous speech
- J. McDonough, K. Kumatani, T. Gehrig, E. Stoimenov, U. Mayer, S. Schacht, M. Wölfel, and D. Klakow, "To separate speech!: A system for recognizing simultaneous speech," in Proc. MLMI, 2007, pp. 283-294.
- (2007) Proc. MLMI , pp. 283-294
- McDonough, J.¹ Kumatani, K.² Gehrig, T.³ Stoimenov, E.⁴ Mayer, U.⁵ Schacht, S.⁶ Wölfel, M.⁷ Klakow, D.⁸

6
- 50449097590
- Distant speech recognition: Bridging the gaps
- J. McDonough and M. Wölfel, "Distant speech recognition: Bridging the gaps," in Proc. IEEE Joint Workshop Hands-free Speech Communication and Microphone Arrays (HSCMA), Trento, Italy, 2008, pp. 108-114.
- (2008) Proc. IEEE Joint Workshop Hands-free Speech Communication and Microphone Arrays (HSCMA), Trento, Italy , pp. 108-114
- McDonough, J.¹ Wölfel, M.²

7
- 50449092852
- Bridging the gap: Towards a unified framework for hands-free speech recognition using microphone arrays
- M. Seltzer, "Bridging the gap: Towards a unified framework for hands-free speech recognition using microphone arrays," in Proc. HSCMA, Trento, Italy, 2008, pp. 104-107.
- (2008) Proc. HSCMA, Trento, Italy , pp. 104-107
- Seltzer, M.¹

8
- 79959845286
- The CHiME corpus: A resource and a challenge for computational hearing in multisource environments
- H. Christensen, J. Barker, N. Ma, and P. Green, "The CHiME corpus: A resource and a challenge for computational hearing in multisource environments," in Proc. Interspeech, Makuhari, Japan, 2010, pp. 1918-1921.
- (2010) Proc. Interspeech, Makuhari, Japan , pp. 1918-1921
- Christensen, H.¹ Barker, J.² Ma, N.³ Green, P.⁴

9
- 84867591985
- Logmax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise
- Kyoto, Japan
- T. Nakatani, T. Yoshioka, S. Araki, M. Delcroix, and M. Fujimoto, "Logmax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise," in Proc. ICASSP 2012, Kyoto, Japan, pp. 4029-4032.
- (2012) Proc. ICASSP , pp. 4029-4032
- Nakatani, T.¹ Yoshioka, T.² Araki, S.³ Delcroix, M.⁴ Fujimoto, M.⁵

10
- 84867600087
- Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?
- Kyoto, Japan
- F. Weninger, M. Wöllmer, J. Geiger, B. Schuller, J. F. Gemmeke, A. Hurmalainen, T. Virtanen, and G. Rigoll, "Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?" in Proc. ICASSP 2012, Kyoto, Japan, pp. 4681-4684.
- (2012) Proc. ICASSP , pp. 4681-4684
- Weninger, F.¹ Wöllmer, M.² Geiger, J.³ Schuller, B.⁴ Gemmeke, J.F.⁵ Hurmalainen, A.⁶ Virtanen, T.⁷ Rigoll, G.⁸

11
- 84867602659
- Integration of beamforming and automatic speech recognition through propagation of the wiener posterior
- Kyoto, Japan
- R. F. Astudillo, A. Abad, and J. P. S. Neto, "Integration of beamforming and automatic speech recognition through propagation of the wiener posterior," in Proc. ICASSP 2012, Kyoto, Japan, pp. 4909-4912.
- (2012) Proc. ICASSP , pp. 4909-4912
- Astudillo, R.F.¹ Abad, A.² Neto, J.P.S.³

12
- 51449086836
- A microphone array beamforming approach to blind speech separation
- I. McCowan, I. Himawan, and M. Lincoln, "A microphone array beamforming approach to blind speech separation," in Proc. MLMI, 2007, pp. 295-305.
- (2007) Proc. MLMI , pp. 295-305
- McCowan, I.¹ Himawan, I.² Lincoln, M.³

13
- 77956766546
- Audio-visual fusion and tracking with multilevel iterative decoding: Framework and experimental evaluation
- S. T. Shivappa, B. D. Rao, and M. M. Trivedi, "Audio-visual fusion and tracking with multilevel iterative decoding: Framework and experimental evaluation," J. Sel. Topics Signal Processing, vol. 4, no. 5, pp. 882-894, 2010.
- (2010) J. Sel. Topics Signal Processing , vol.4 , Issue.5 , pp. 882-894
- Shivappa, S.T.¹ Rao, B.D.² Trivedi, M.M.³

14
- 34250174176
- Microphone array driven speech recognition: Influence of localization on the word error rate
- M. Wölfel, K. Nickel, and J. W. McDonough, "Microphone array driven speech recognition: Influence of localization on the word error rate," in Proc. MLMI, 2005, pp. 320-331.
- (2005) Proc. MLMI , pp. 320-331
- Wölfel, M.¹ Nickel, K.² McDonough, J.W.³

15
- 78049384144
- Chichester, UK: Wiley
- I. J. Tashev, Sound Capture and Processing: Practical Approaches. Chichester, UK: Wiley, 2009.
- (2009) Sound Capture and Processing: Practical Approaches
- Tashev, I.J.¹

16
- 0003980102
- Heidelberg, Germany: Springer-Verlag
- M. Brandstein and D. Ward, Eds., Microphone Arrays. Heidelberg, Germany: Springer-Verlag, 2001.
- (2001) Microphone Arrays
- Brandstein, M.¹ Ward, D.²

17
- 0003964055
- New York: Wiley-Interscience
- H. L. Van Trees, Optimum Array Processing. New York: Wiley-Interscience, 2002.
- (2002) Optimum Array Processing
- Van Trees, H.L.¹

18
- 33746653380
- Time delay estimation in room acoustic environments: An overview
- J. Chen, J. Benesty, and Y. Huang, "Time delay estimation in room acoustic environments: An overview," EURASIP J. Adv. Signal Processing, vol. 2006, no. AD-26503, pp. 1-19, 2006.
- (2006) EURASIP J. Adv. Signal Processing , vol.2006 , Issue.AD26503 , pp. 1-19
- Chen, J.¹ Benesty, J.² Huang, Y.³

19
- 50449084235
- Comparison between different sound source localization techniques based on a real data collection
- A. Brutti, M. Omologo, and P. Svaizer, "Comparison between different sound source localization techniques based on a real data collection," in Proc. HSCMA, Trento, Italy, 2008, pp. 69-72.
- (2008) Proc. HSCMA, Trento, Italy , pp. 69-72
- Brutti, A.¹ Omologo, M.² Svaizer, P.³

20
- 33645696863
- Kalman filters for time delay of arrivalbased source localization
- U. Klee, T. Gehrig, and J. McDonough, "Kalman filters for time delay of arrivalbased source localization," EURASIP J. Adv. Signal Processing, vol. 2006, no. AD-12378, pp. 1-15, 2006.
- (2006) EURASIP J. Adv. Signal Processing , vol.2006 , Issue.AD12378 , pp. 1-15
- Klee, U.¹ Gehrig, T.² McDonough, J.³

21
- 33645672078
- Kalman filters for audio-video source localization
- DOI 10.1109/ASPAA.2005.1540183, 1540183, 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
- T. Gehrig, K. Nickel, H. K. Ekenel, U. Klee, and J. McDonough, "Kalman filters for audio-video source localization," in Proc. IEEE Workshop Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, 2005, pp. 118-121. (Pubitemid 44461808)
- (2005) IEEE Workshop on Applications of Signal Processing to Audio and Acoustics , pp. 118-121
- Gehrig, T.¹ Nickel, K.² Ekenel, H.K.³ Klee, U.⁴ McDonough, J.⁵

22
- 40249109687
- Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters
- T. Gehrig, U. Klee, J. McDonough, S. Ikbal, M. Wölfel, and C. Fügen, "Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters," in Proc. Interspeech, 2006, pp. 2594-2597.
- (2006) Proc. Interspeech , pp. 2594-2597
- Gehrig, T.¹ Klee, U.² McDonough, J.³ Ikbal, S.⁴ Wölfel, M.⁵ Fügen, C.⁶

23
- 84943735747
- Robust adaptive beamforming
- H. Cox, R. M. Zeskind, and M. M. Owen, "Robust adaptive beamforming," IEEE Trans. Audio Speech Language Processing, vol. ASSP-35, no. 10, pp. 1365-1376, 1987.
- (1987) IEEE Trans. Audio Speech Language Processing , vol.ASSP-35 , Issue.10 , pp. 1365-1376
- Cox, H.¹ Zeskind, R.M.² Owen, M.M.³

24
- 0003807773
- 4th ed. New York: Prentice-Hall
- S. Haykin, Adaptive Filter Theory, 4th ed. New York: Prentice-Hall, 2002.
- (2002) Adaptive Filter Theory
- Haykin, S.¹

25
- 0346707504
- Microphone array post-filter based on noise field coherence
- I. A. McCowan and H. Bourlard, "Microphone array post-filter based on noise field coherence," IEEE Trans. Speech Audio Processing, vol. 11, no. 6, pp. 709-716, 2003.
- (2003) IEEE Trans. Speech Audio Processing , vol.11 , Issue.6 , pp. 709-716
- McCowan, I.A.¹ Bourlard, H.²

26
- 84933045103
- A generalized view on microphone array postfilters
- T. Wolff and M. Buck, "A generalized view on microphone array postfilters," in Proc. Int. Workshop Acoustic Echo and Noise Control, Tel Aviv, Israel, 2010.
- (2010) Proc. Int. Workshop Acoustic Echo and Noise Control, Tel Aviv, Israel
- Wolff, T.¹ Buck, M.²

27
- 0032072917
- Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering
- PII S1063667698029034
- C. Marro, Y. Mahieux, and K. U. Simmer, "Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering," IEEE Trans. Speech Audio Processing, vol. 6, pp. 240-259, 1998. (Pubitemid 128720650)
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.3 , pp. 240-259
- Marro, C.¹ Mahieux, Y.² Simmer, K.U.³

28
- 67651154520
- Beamforming with a maximum negentropy criterion
- Aug.
- K. Kumatani, J. McDonough, D. Klakow, P. N. Garner, and W. Li, "Beamforming with a maximum negentropy criterion," IEEE Trans. Audio Speech Language Processing, vol. 17, no. 5, pp. 994-1008, Aug. 2008.
- (2008) IEEE Trans. Audio Speech Language Processing , vol.17 , Issue.5 , pp. 994-1008
- Kumatani, K.¹ McDonough, J.² Klakow, D.³ Garner, P.N.⁴ Li, W.⁵

29
- 51449092343
- Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming
- K. Kumatani, J. McDonough, S. Schacht, D. Klakow, P. N. Garner, and W. Li, "Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, 2008, pp. 1609-1612.
- (2008) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV , pp. 1609-1612
- Kumatani, K.¹ McDonough, J.² Schacht, S.³ Klakow, D.⁴ Garner, P.N.⁵ Li, W.⁶

30
- 0003905759
- New York: Wiley-Interscience
- A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: Wiley-Interscience, 2001.
- (2001) Independent Component Analysis
- Hyvärinen, A.¹ Karhunen, J.² Oja, E.³

31
- 79958015105
- Maximum negentropy beamforming using complex generalized Gaussian distribution model
- K. Kumatani, J. McDonough, B. Rauch, and D. Klakow, "Maximum negentropy beamforming using complex generalized Gaussian distribution model," in Proc. ASILOMAR, Pacific Grove, CA, 2010, pp. 1420-1424.
- (2010) Proc. ASILOMAR, Pacific Grove, CA , pp. 1420-1424
- Kumatani, K.¹ McDonough, J.² Rauch, B.³ Klakow, D.⁴

32
- 84867218783
- Maximum kurtosis beamforming with the generalized sidelobe canceller
- Brisbane, Australia, Sept.
- K. Kumatani, J. McDonough, B. Rauch, P. N. Garner, W. Li, and J. Dines, "Maximum kurtosis beamforming with the generalized sidelobe canceller," in Proc. Interspeech, Brisbane, Australia, Sept. 2008, pp. 423-426.
- (2008) Proc. Interspeech , pp. 423-426
- Kumatani, K.¹ McDonough, J.² Rauch, B.³ Garner, P.N.⁴ Li, W.⁵ Dines, J.⁶

33
- 84858959884
- Maximum kurtosis beamforming with a subspace filter for distant speech recognition
- K. Kumatani, J. McDonough, and B. Raj, "Maximum kurtosis beamforming with a subspace filter for distant speech recognition," in Proc. ASRU, 2011, pp. 179-184.
- (2011) Proc. ASRU , pp. 179-184
- Kumatani, K.¹ McDonough, J.² Raj, B.³

34
- 79961162572
- Channel selection based on multichannel crosscorrelation coefficients for distant speech recognition
- K. Kumatani, J. McDonough, J. Lehman, and B. Raj, "Channel selection based on multichannel crosscorrelation coefficients for distant speech recognition," in Proc. HSCMA, Edinburgh, UK, 2011, pp. 1-6.
- (2011) Proc. HSCMA, Edinburgh, UK , pp. 1-6
- Kumatani, K.¹ McDonough, J.² Lehman, J.³ Raj, B.⁴

35
- 51449110329
- Speech enhancement with a new generalized eigenvector blocking matrix for application in a generalized sidelobe canceller
- E. Warsitz, A. Krueger, and R. Haeb-Umbach, "Speech enhancement with a new generalized eigenvector blocking matrix for application in a generalized sidelobe canceller," in Proc. ICASSP, Las Vegas, NV, 2008, pp. 73-76.
- (2008) Proc. ICASSP, Las Vegas, NV , pp. 73-76
- Warsitz, E.¹ Krueger, A.² Haeb-Umbach, R.³

36
- 84863792658
- Maximum negentropy beamforming with superdirectivity
- K. Kumatani, L. Lu, J. McDonough, A. Ghoshal, and D. Klakow, "Maximum negentropy beamforming with superdirectivity," in Proc. European Signal Processing Conf. (EUSIPCO), Aalborg, Denmark, 2010, pp. 2067-2071.
- (2010) Proc. European Signal Processing Conf. (EUSIPCO), Aalborg, Denmark , pp. 2067-2071
- Kumatani, K.¹ Lu, L.² McDonough, J.³ Ghoshal, A.⁴ Klakow, D.⁵

37
- 33645950814
- Spherical microphone arrays for 3D sound recording
- Y. Huang and J. Benesty, Eds. Boston, MA: Kluwer Academic
- J. Meyer and G. W. Elko, "Spherical microphone arrays for 3D sound recording," in Audio Signal Processing for Next-Generation Multimedia Communication Systems. Y. Huang and J. Benesty, Eds. Boston, MA: Kluwer Academic, 2004, pp. 67-90.
- (2004) Audio Signal Processing for Next-Generation Multimedia Communication Systems , pp. 67-90
- Meyer, J.¹ Elko, G.W.²

38
- 34948844018
- Flexible and optimal design of spherical microphone arrays for beamforming
- Z. Li and R. Duraiswami, "Flexible and optimal design of spherical microphone arrays for beamforming," IEEE Trans. Speech Audio Processing, vol. 15, no. 2, pp. 2007, 702-714.
- (2007) IEEE Trans. Speech Audio Processing , vol.15 , Issue.2 , pp. 702-714
- Li, Z.¹ Duraiswami, R.²

39
- 11144229405
- Analysis and design of spherical microphone arrays
- DOI 10.1109/TSA.2004.839244
- B. Rafaely, "Analysis and design of spherical microphone arrays," IEEE Trans. Speech Audio Processing, vol. 13, no. 1, pp. 135-143, 2005. (Pubitemid 40049946)
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.1 , pp. 135-143
- Rafaely, B.¹

40
- 78049289149
- Optimal modal beamforming for spherical microphone arrays
- S. Yan, H. Sun, U. P. Svensson, X. Ma, and J. M. Hovem, "Optimal modal beamforming for spherical microphone arrays," IEEE Trans. Audio Speech Language Processing, vol. 19, no. 2, pp. 361-371, 2011.
- (2011) IEEE Trans. Audio Speech Language Processing , vol.19 , Issue.2 , pp. 361-371
- Yan, S.¹ Sun, H.² Svensson, U.P.³ Ma, X.⁴ Hovem, J.M.⁵

41
- 0003519336
- San Diego, CA: Academic
- E. G. Williams, Fourier Acoustics. San Diego, CA: Academic, 1999.
- (1999) Fourier Acoustics
- Williams, E.G.¹

42
- 78651227376
- Bessel functions
- F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, Eds. New York, NY: Cambridge Univ. Press
- F. W. J. Olver and L. C. Maximon, "Bessel functions," in NIST Handbook of Mathematical Functions, F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, Eds. New York, NY: Cambridge Univ. Press, 2010.
- (2010) NIST Handbook of Mathematical Functions
- Olver, F.W.J.¹ Maximon, L.C.²

43
- 33750298852
- Heidelberg: Springer
- H. Teutsch, Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield Decomposition. Heidelberg: Springer, 2007.
- (2007) Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield Decomposition
- Teutsch, H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.