SCOPUS 정보 검색 플랫폼

Volumn 22, Issue 8, 2015, Pages 1006-1010

Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech? - A dataset, insights, and challenges

(1) Mysore, Gautham J a

a ADOBE RESEARCH (United States)

Author keywords

Automatic production; speech enhancement

Indexed keywords

AUDIO ACOUSTICS; AUDIO RECORDINGS; MOTION PICTURES; PROFESSIONAL ASPECTS; SPEECH ENHANCEMENT; SPEECH INTELLIGIBILITY; STUDIOS;

AUDIO EFFECTS; AUTOMATIC PRODUCTION; AUTOMATIC SPEECH RECOGNITION; CONSUMER DEVICES; PRODUCTION QUALITY; REAL WORLD ENVIRONMENTS; RECORDING STUDIOS; SPEECH CONTENT;

SPEECH RECOGNITION;

EID: 84919935005 PISSN: 10709908 EISSN: None Source Type: Journal
DOI: 10.1109/LSP.2014.2379648 Document Type: Article

Times cited : (113)

References (25)

1
- 79958711142
- Boston, MA, USA: Cengage Learning
- B. Owsinski, The Recording Engineer's Handbook, 3rd ed. Boston, MA, USA: Cengage Learning, 2013.
- (2013) The Recording Engineer's Handbook, 3rd Ed
- Owsinski, B.¹

2
- 33846965785
- Boston, MA, USA: Cengage Learning
- B. Owsinski, The Mixing Engineer's Handbook, 3rd ed. Boston, MA, USA: Cengage Learning, 2013.
- (2013) The Mixing Engineer's Handbook, 3rd Ed
- Owsinski, B.¹

3
- 84919900250
- Sound FX
- New York, NY, USA: Focal
- A. Case, Sound FX. Unlocking the Creative Potential of Recording Studio Effects. New York, NY, USA: Focal, 2007.
- (2007) Unlocking the Creative Potential of Recording Studio Effects
- Case, A.¹

4
- 84866036566
- Mastering Audio
- 2nd ed. New York, NY, USA: Focal
- B. Katz, Mastering Audio. The Art and the Science, 2nd ed. New York, NY, USA: Focal, 2007.
- (2007) The Art and the Science
- Katz, B.¹

5
- 0021645331
- Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
- Dec.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process, vol. 32, no. 6, Dec. 1984.
- (1984) IEEE Trans. Acoust., Speech, Signal Process , vol.32 , Issue.6
- Ephraim, Y.¹ Malah, D.²

6
- 0029726517
- Speech enhancement based on a priori signal to noise estimation
- May
- P. Scalart and V. Filho, "Speech enhancement based on a priori signal to noise estimation," in Proc. IEEE Int, Conf, Acoustics, Speech, and Signal Processing, May 1996.
- (1996) Proc. IEEE Int, Conf, Acoustics, Speech, and Signal Processing
- Scalart, P.¹ Filho, V.²

7
- 84878420060
- Speech enhancement by online non-negative spectrogram decomposition in non-stationary noise environments
- Sep.
- Z. Duan, G. J. Mysore, and P. Smaragdis, "Speech enhancement by online non-negative spectrogram decomposition in non-stationary noise environments," in Proc. Interspeech, Sep. 2012.
- (2012) Proc. Interspeech
- Duan, Z.¹ Mysore, G.J.² Smaragdis, P.³

8
- 34447100796
- Boca Raton, FL, USA: CRC
- P. C. Loizou, Speech Enhancement. Theory and Practice, 2nd ed. Boca Raton, FL, USA: CRC, 2013.
- (2013) Speech Enhancement. Theory and Practice, 2nd Ed
- Loizou, P.C.¹

9
- 80051618981
- Amsterdam The Netherlands: Springer
- P. Naylor and N. D. Gaubitch, Speech Dereverberation. Amsterdam, The Netherlands: Springer, 2010.
- (2010) Speech Dereverberation
- Naylor, P.¹ Gaubitch, N.D.²

10
- 84893622444
- The reverb challenge. A common evaluation framework for dereverberation and recognition of reverberant speech
- K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, S. Gannot, and B. Raj, "The reverb challenge. A common evaluation framework for dereverberation and recognition of reverberant speech," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
- (2013) Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
- Kinoshita, K.¹ Delcroix, M.² Yoshioka, T.³ Nakatani, T.⁴ Habets, E.⁵ Haeb-Umbach, R.⁶ Leutnant, V.⁷ Sehr, A.⁸ Kellermann, W.⁹ Maas, R.¹⁰ Gannot, S.¹¹ Raj, B.¹²

11
- 84905235283
- Speech de coloration based on the product of filters model
- May
- D. Liang, D. P. Ellis, M. D. Hoffman, and G. J. Mysore, "Speech de coloration based on the product of filters model," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, May 2014.
- (2014) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing
- Liang, D.¹ Ellis, D.P.² Hoffman, M.D.³ Mysore, G.J.⁴

12
- 85017319264
- Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients
- Jun.
- N. Enbom and B. Kleijn, "Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients," in Proc. IEEE Workshop on Speech Coding, Jun. 1999.
- (1999) Proc. IEEE Workshop on Speech Coding
- Enbom, N.¹ Kleijn, B.²

13
- 84870724550
- Language informed bandwidth expansion
- Sep.
- J. Han, G. J. Mysore, and B. Pardo, "Language informed bandwidth expansion," in Proc. IEEE Int. Workshop on Machine Learning for Signal Processing, Sep. 2012.
- (2012) Proc. IEEE Int. Workshop on Machine Learning for Signal Processing
- Han, J.¹ Mysore, G.J.² Pardo, B.³

14
- 33745105930
- Adaptive digital audio effects (a-dafx). A new class of sound transformations
- Sep.
- V. Verfaille, U. Zölzer, and D. Arfib, "Adaptive digital audio effects (a-dafx). A new class of sound transformations," IEEE Trans. Audio, Speech Lang. Process., vol. 14, no. 5, pp. 1817-1831, Sep. 2006.
- (2006) IEEE Trans. Audio, Speech Lang. Process , vol.14 , Issue.5 , pp. 1817-1831
- Verfaille, V.¹ Zölzer, U.² Arfib, D.³

15
- 84887107527
- Parameter automation in a dynamic range compressor
- Oct.
- D. Giannoulis, M. Massberg, and J. D. Reiss, "Parameter automation in a dynamic range compressor," J. Audio Eng. Soc., vol. 61, no. 10, Oct. 2013.
- (2013) J. Audio Eng. Soc. , vol.61 , Issue.10
- Giannoulis, D.¹ Massberg, M.² Reiss, J.D.³

16
- 84879854889
- Representation learning. A review and new perspectives
- Y. Bengio, A. Courville, and P. Vincent, "Representation learning. A review and new perspectives," IEEE Trans. Patt. Anal. Mach. Intell., vol. 35, no. 8, pp. 1798-1828, 2013.
- (2013) IEEE Trans. Patt. Anal. Mach. Intell. , vol.35 , Issue.8 , pp. 1798-1828
- Bengio, Y.¹ Courville, A.² Vincent, P.³

17
- 25444448065
- Cambridge, MA, USA: MIT Press
- C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. Cambridge, MA, USA: MIT Press, 2006.
- (2006) Gaussian Processes for Machine Learning
- Rasmussen, C.E.¹ Williams, C.K.I.²

18
- 84890541701
- The second 'CHIME' speech separation and recognition challenge. Datasets, tasks, and baselines
- May
- E. Vincent, J. Barker, S. Watanabe, J. L. Roux, F. Nesta, and M. Matas-soni, "The second 'CHIME' speech separation and recognition challenge. Datasets, tasks, and baselines," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, May 2013.
- (2013) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing
- Vincent, E.¹ Barker, J.² Watanabe, S.³ Roux, J.L.⁴ Nesta, F.⁵ Matas-Soni, M.⁶

19
- 4544279104
- The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Sep.
- H.-G. Hirsch and D. Pearce, "The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. ISCA Workshop ASR2000, Sep. 2000.
- (2000) Proc. ISCA Workshop ASR2000
- Hirsch, H.-G.¹ Pearce, D.²

20
- 67651242353
- Performance analysis of the aurora large vocabulary baseline system
- Sep.
- N. Parihar, J. Picone, D. Pearce, and H.-G. Hirsch, "Performance analysis of the aurora large vocabulary baseline system," in Proc. Eur. Signal Processing Conf., Sep. 2004.
- (2004) Proc. Eur. Signal Processing Conf
- Parihar, N.¹ Picone, J.² Pearce, D.³ Hirsch, H.-G.⁴

21
- 84865991945
- Digital dynamic range compressor design-a tutorial and analysis
- Jun.
- D. Giannoulis, M. Massberg, and J. D. Reiss, "Digital dynamic range compressor design-a tutorial and analysis," J. Audio Eng. Soc., vol. 60, no. 6, Jun. 2012.
- (2012) J. Audio Eng. Soc. , vol.60 , Issue.6
- Giannoulis, D.¹ Massberg, M.² Reiss, J.D.³

22
- 0032762471
- A statistical model-based voice activity detection
- Jan.
- J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999.
- (1999) IEEE Signal Process. Lett. , vol.6 , Issue.1 , pp. 1-3
- Sohn, J.¹ Kim, N.S.² Sung, W.³

23
- 84906264722
- Speaker and noise independent voice activity detection
- Aug.
- F. G. Germain, D. Sun, and G. J. Mysore, "Speaker and noise independent voice activity detection," in Proc. Interspeech, Aug. 2013.
- (2013) Proc. Interspeech
- Germain, F.G.¹ Sun, D.² Mysore, G.J.³

24
- 44149106061
- Evaluation of objective quality measures for speech enhancement
- Jan.
- Y. Hu and P. C. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Trans. Audio, Speech Lang. Process., vol. 16, no. 1, pp. 229-238, Jan. 2008.
- (2008) IEEE Trans. Audio, Speech Lang. Process , vol.16 , Issue.1 , pp. 229-238
- Hu, Y.¹ Loizou, P.C.²

25
- 79960694315
- Subjective and objective quality assessment of audio source separation
- V. Emiya, E. Vincent, N. Harlander, and V. Hohmann, "Subjective and objective quality assessment of audio source separation," IEEE Trans. Audio, Speech Lang. Process., vol. 19, no. 7, pp. 2046-2057, 2011.
- (2011) IEEE Trans. Audio, Speech Lang. Process , vol.19 , Issue.7 , pp. 2046-2057
- Emiya, V.¹ Vincent, E.² Harlander, N.³ Hohmann, V.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.