메뉴 건너뛰기




Volumn 25, Issue 12, 2017, Pages 2362-2374

Direct Speech Reconstruction from Articulatory Sensor Data by Machine Learning

Author keywords

articulatory to acoustic mapping; permanent magnet articulography; Silent speech interfaces; speech rehabilitation; speech synthesis

Indexed keywords

ARTIFICIAL INTELLIGENCE; DEEP LEARNING; DEEP NEURAL NETWORKS; LEARNING SYSTEMS; MAGNETIC LEVITATION VEHICLES; MAGNETS; PERMANENT MAGNETS; RECURRENT NEURAL NETWORKS; SPEECH SYNTHESIS;

EID: 85040448315     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2017.2757263     Document Type: Article
Times cited : (78)

References (52)
  • 3
    • 84931262369 scopus 로고    scopus 로고
    • Brain-to-text: Decoding spoken phrases from phone representations in the brain
    • Jun
    • C. Herff et al., "Brain-to-text: Decoding spoken phrases from phone representations in the brain," Frontiers Neurosci., vol. 9, p. 217, Jun. 2015.
    • (2015) Frontiers Neurosci. , vol.9 , pp. 217
    • Herff, C.1
  • 4
    • 76849099234 scopus 로고    scopus 로고
    • Modeling coarticulation in EMG-based continuous speech recognition
    • Apr
    • T. Schultz and M.Wand, "Modeling coarticulation in EMG-based continuous speech recognition," Speech Commun., vol. 52, no. 4, pp. 341-353, Apr. 2010.
    • (2010) Speech Commun. , vol.52 , Issue.4 , pp. 341-353
    • Schultz, T.1    Wand, M.2
  • 5
    • 84907468717 scopus 로고    scopus 로고
    • Tackling speaking mode varieties in EMG-based speech recognition
    • Oct
    • M. Wand, M. Janke, and T. Schultz, "Tackling speaking mode varieties in EMG-based speech recognition," IEEE Trans. BioMed. Eng., vol. 61, no. 10, pp. 2515-2526, Oct. 2014.
    • (2014) IEEE Trans. BioMed. Eng. , vol.61 , Issue.10 , pp. 2515-2526
    • Wand, M.1    Janke, M.2    Schultz, T.3
  • 6
    • 85009110524 scopus 로고    scopus 로고
    • An initial investigation into the real-time conversion of facial surface EMG signals to audible speech
    • L. Diener, C. Herff,M. Janke, and T. Schultz, "An initial investigation into the real-time conversion of facial surface EMG signals to audible speech," in 38th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., 2016, pp. 888-891.
    • (2016) 38th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. , pp. 888-891
    • Diener, L.1    Herff, C.2    Janke, M.3    Schultz, T.4
  • 7
    • 42949175762 scopus 로고    scopus 로고
    • Development of a (silent) speech recognition system for patients following laryngectomy
    • M. J. Fagan, S. R. Ell, J. M. Gilbert, E. Sarrazin, and P. M. Chapman, "Development of a (silent) speech recognition system for patients following laryngectomy," Med. Eng. Phys., vol. 30, no. 4, pp. 419-425, 2008.
    • (2008) Med. Eng. Phys. , vol.30 , Issue.4 , pp. 419-425
    • Fagan, M.J.1    Ell, S.R.2    Gilbert, J.M.3    Sarrazin, E.4    Chapman, P.M.5
  • 8
    • 38649140222 scopus 로고    scopus 로고
    • Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
    • Mar
    • T. Toda, A. W. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model," Speech Commun., vol. 50, no. 3, pp. 215-227, Mar. 2008.
    • (2008) Speech Commun. , vol.50 , Issue.3 , pp. 215-227
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 9
    • 76849104115 scopus 로고    scopus 로고
    • Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips
    • T. Hueber, E.-L. Benaroya, G. Chollet, B. Denby, G. Dreyfus, and M. Stone, "Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips," Speech Commun., vol. 52, no. 4, pp. 288-300, 2010.
    • (2010) Speech Commun. , vol.52 , Issue.4 , pp. 288-300
    • Hueber, T.1    Benaroya, E.-L.2    Chollet, G.3    Denby, B.4    Dreyfus, G.5    Stone, M.6
  • 10
    • 85039155335 scopus 로고    scopus 로고
    • Evaluation of a silent speech interface based on magnetic sensing and deep learning for a phonetically rich vocabulary
    • J. A. Gonzalez et al., "Evaluation of a silent speech interface based on magnetic sensing and deep learning for a phonetically rich vocabulary," in Proc. Interspeech, 2017, pp. 3986-3990.
    • (2017) Proc. Interspeech , pp. 3986-3990
    • Gonzalez, J.A.1
  • 11
    • 85019309263 scopus 로고    scopus 로고
    • Laryngeal cancer: United Kingdom national multidisciplinary guidelines
    • T. M. Jones, M. De, B. Foran, K. Harrington, and S. Mortimore, "Laryngeal cancer: United Kingdom national multidisciplinary guidelines," J. Laryngology Otology, vol. 130, no. Suppl 2, pp. S75-S82, 2016.
    • (2016) J. Laryngology Otology , vol.130 , pp. S75-S82
    • Jones, T.M.1    De, M.2    Foran, B.3    Harrington, K.4    Mortimore, S.5
  • 12
    • 78449253410 scopus 로고    scopus 로고
    • Isolated word recognition of silent speech using magnetic implants and sensors
    • J. M. Gilbert et al., "Isolated word recognition of silent speech using magnetic implants and sensors," Med. Eng. Phys., vol. 32, no. 10, pp. 1189-1197, 2010.
    • (2010) Med. Eng. Phys. , vol.32 , Issue.10 , pp. 1189-1197
    • Gilbert, J.M.1
  • 13
    • 84870292488 scopus 로고    scopus 로고
    • Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing
    • R. Hofe et al., "Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing," Speech Commun., vol. 55, no. 1, pp. 22-32, 2013.
    • (2013) Speech Commun. , vol.55 , Issue.1 , pp. 22-32
    • Hofe, R.1
  • 14
    • 84906226748 scopus 로고    scopus 로고
    • Articulatory synthesis of French connected speech from EMA data
    • A. Toutios and S. Narayanan, "Articulatory synthesis of french connected speech from EMA data," in Proc. Interspeech, 2013, pp. 2738-2742.
    • (2013) Proc. Interspeech , pp. 2738-2742
    • Toutios, A.1    Narayanan, S.2
  • 15
    • 84897952482 scopus 로고    scopus 로고
    • Modeling the vocal tract transfer function using a 3D digital waveguide mesh
    • Feb
    • M. Speed, D. Murphy, and D. Howard, "Modeling the vocal tract transfer function using a 3D digital waveguide mesh," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 2, pp. 453-464, Feb. 2014.
    • (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.22 , Issue.2 , pp. 453-464
    • Speed, M.1    Murphy, D.2    Howard, D.3
  • 17
    • 84962110277 scopus 로고    scopus 로고
    • A silent speech system based on permanent magnet articulography and direct synthesis
    • J. A. Gonzalez et al., "A silent speech system based on permanent magnet articulography and direct synthesis," Comput. Speech Lang., vol. 39, pp. 67-87, 2016.
    • (2016) Comput. Speech Lang. , vol.39 , pp. 67-87
    • Gonzalez, J.A.1
  • 18
    • 84949568613 scopus 로고    scopus 로고
    • Statistical conversion of silent articulation into audible speech using full-covariance HMM
    • T. Hueber and G. Bailly, "Statistical conversion of silent articulation into audible speech using full-covariance HMM," Comput. Speech Lang., vol. 36, pp. 274-293, 2016.
    • (2016) Comput. Speech Lang. , vol.36 , pp. 274-293
    • Hueber, T.1    Bailly, G.2
  • 19
    • 84949568676 scopus 로고    scopus 로고
    • Data driven articulatory synthesis with deep neural networks
    • S. Aryal and R. Gutierrez-Osuna, "Data driven articulatory synthesis with deep neural networks," Comput. Speech Lang., vol. 36, pp. 260-273, 2016.
    • (2016) Comput. Speech Lang. , vol.36 , pp. 260-273
    • Aryal, S.1    Gutierrez-Osuna, R.2
  • 20
    • 84910066630 scopus 로고    scopus 로고
    • Robust articulatory speech synthesis using deep neural networks for BCI applications
    • F. Bocquelet, T. Hueber, L. Girin, P. Badin, and B. Yvert, "Robust articulatory speech synthesis using deep neural networks for BCI applications," in Proc. Interspeech, 2014, pp. 2288-2292.
    • (2014) Proc. Interspeech , pp. 2288-2292
    • Bocquelet, F.1    Hueber, T.2    Girin, L.3    Badin, P.4    Yvert, B.5
  • 21
    • 84999828343 scopus 로고    scopus 로고
    • Real-time control of an articulatory-based speech synthesizer for brain computer interfaces
    • F. Bocquelet, T. Hueber, L. Girin, C. Savariaux, and B. Yvert, "Real-time control of an articulatory-based speech synthesizer for brain computer interfaces," PLoS Comput. Biol., vol. 12, no. 11, 2016, Art. no. e1005119.
    • (2016) PLoS Comput. Biol. , vol.12 , Issue.11
    • Bocquelet, F.1    Hueber, T.2    Girin, L.3    Savariaux, C.4    Yvert, B.5
  • 22
    • 0000877063 scopus 로고
    • Delayed auditory feedback
    • A. J. Yates, "Delayed auditory feedback," Psychol. Bull., vol. 60, no. 3, pp. 213-232, 1963.
    • (1963) Psychol. Bull. , vol.60 , Issue.3 , pp. 213-232
    • Yates, A.J.1
  • 23
    • 0036096888 scopus 로고    scopus 로고
    • Effect of delayed auditory feedback on normal speakers at two speech rates
    • A. Stuart, J. Kalinowski, M. P. Rastatter, and K. Lynch, "Effect of delayed auditory feedback on normal speakers at two speech rates," J. Acoust. Soc. Amer., vol. 111, no. 5, pp. 2237-2241, 2002.
    • (2002) J. Acoust. Soc. Amer. , vol.111 , Issue.5 , pp. 2237-2241
    • Stuart, A.1    Kalinowski, J.2    Rastatter, M.P.3    Lynch, K.4
  • 24
    • 85016157373 scopus 로고    scopus 로고
    • Restoring speech following total removal of the larynx by a learned transformation from sensor data to acoustics
    • J. M. Gilbert et al., "Restoring speech following total removal of the larynx by a learned transformation from sensor data to acoustics," J. Acoust. Soc. Amer., vol. 141, no. 3, pp. EL307-EL313, 2017.
    • (2017) J. Acoust. Soc. Amer. , vol.141 , Issue.3 , pp. EL307-EL313
    • Gilbert, J.M.1
  • 25
    • 84938864664 scopus 로고    scopus 로고
    • A user-centric design of permanent magnetic articulography based assistive speech technology
    • L. A. Cheah et al., "A user-centric design of permanent magnetic articulography based assistive speech technology," in Proc. BioSignals, 2015, pp. 109-116.
    • (2015) Proc. BioSignals , pp. 109-116
    • Cheah, L.A.1
  • 26
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Nov
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 28
    • 84961291190 scopus 로고    scopus 로고
    • Learning phrase representations using RNN encoder-decoder for statistical machine translation
    • K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," in Proc. Conf. Empirical Methods Natural Lang. Process., 2014, pp. 1724-1734.
    • (2014) Proc. Conf. Empirical Methods Natural Lang. Process. , pp. 1724-1734
    • Cho, K.1
  • 30
    • 84865698185 scopus 로고    scopus 로고
    • Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
    • Nov
    • T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 9, pp. 2505-2517, Nov. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.9 , pp. 2505-2517
    • Toda, T.1    Nakagiri, M.2    Shikano, K.3
  • 31
    • 0017968519 scopus 로고
    • Inversion of articulatory-to-acoustic transformation in the vocal tract by a computersorting technique
    • B. S. Atal, J. J. Chang, M. V. Mathews, and J. W. Tukey, "Inversion of articulatory-to-acoustic transformation in the vocal tract by a computersorting technique," J. Acoust. Soc. Amer., vol. 63, no. 5, pp. 1535-1555, 1978.
    • (1978) J. Acoust. Soc. Amer. , vol.63 , Issue.5 , pp. 1535-1555
    • Atal, B.S.1    Chang, J.J.2    Mathews, M.V.3    Tukey, J.W.4
  • 33
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • Nov
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, Nov. 2009.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 34
    • 85032751458 scopus 로고    scopus 로고
    • Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
    • Nov
    • G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
    • (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 82-97
    • Hinton, G.1
  • 37
    • 84921735339 scopus 로고    scopus 로고
    • Voice conversion using deep neural networks with layer-wise generative training
    • Dec
    • L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, "Voice conversion using deep neural networks with layer-wise generative training," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1859-1872, Dec. 2014.
    • (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.22 , Issue.12 , pp. 1859-1872
    • Chen, L.-H.1    Ling, Z.-H.2    Liu, L.-J.3    Dai, L.-R.4
  • 38
    • 84930630277 scopus 로고    scopus 로고
    • Deep learning
    • May
    • Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, May 2015.
    • (2015) Nature , vol.521 , Issue.7553 , pp. 436-444
    • LeCun, Y.1    Bengio, Y.2    Hinton, G.3
  • 39
    • 84867211725 scopus 로고    scopus 로고
    • Lowdelay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Lowdelay voice conversion based on maximum likelihood estimation of spectral parameter trajectory," in Proc. Interspeech, 2008, pp. 1076-1079.
    • (2008) Proc. Interspeech , pp. 1076-1079
    • Muramatsu, T.1    Ohtani, Y.2    Toda, T.3    Saruwatari, H.4    Shikano, K.5
  • 40
    • 0027530250 scopus 로고
    • SIMPLS: An alternative approach to partial least squares regression
    • S. De Jong, "SIMPLS: An alternative approach to partial least squares regression," Chemometrics Intell. Lab. Syst., vol. 18, no. 3, pp. 251-263, 1993.
    • (1993) Chemometrics Intell. Lab. Syst. , vol.18 , Issue.3 , pp. 251-263
    • De Jong, S.1
  • 41
    • 0000903748 scopus 로고
    • Generalization of backpropagation with application to a recurrent gas market model
    • P. J. Werbos, "Generalization of backpropagation with application to a recurrent gas market model," Neural Netw., vol. 1, no. 4, pp. 339-356, 1988.
    • (1988) Neural Netw. , vol.1 , Issue.4 , pp. 339-356
    • Werbos, P.J.1
  • 43
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 45
    • 0031268931 scopus 로고    scopus 로고
    • Bidirectional recurrent neural networks
    • Nov
    • M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673-2681, Nov. 1997.
    • (1997) IEEE Trans. Signal Process. , vol.45 , Issue.11 , pp. 2673-2681
    • Schuster, M.1    Paliwal, K.K.2
  • 46
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • Apr
    • H. Kawahara, I. Masuda-Katsuse, and A. De Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3, pp. 187-207, Apr. 1999.
    • (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigne, A.3
  • 51
    • 84910067727 scopus 로고    scopus 로고
    • Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography
    • J. A. Gonzalez et al., "Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography," in Proc. Interspeech, 2014, pp. 1018-1022.
    • (2014) Proc. Interspeech , pp. 1018-1022
    • Gonzalez, J.A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.