SCOPUS 정보 검색 플랫폼

IEEE/ACM Transactions on Audio Speech and Language Processing

Volumn 24, Issue 12, 2016, Pages 2301-2312

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

(4) Cernak, Milos a Lazaridis, Alexandros a Asaei, Afsaneh a Garner, Philip N a

a IDIAP RESEARCH INSTITUTE (Switzerland)

Author keywords

continuous F0 coding; deep neural networks; spiking neural networks; Very low bit rate speech coding

Indexed keywords

CODES (SYMBOLS); CODING ERRORS; CONTINUOUS SPEECH RECOGNITION; DEEP NEURAL NETWORKS; HIDDEN MARKOV MODELS; IMAGE CODING; MARKOV PROCESSES; NETWORK CODING; NEURAL NETWORKS; SIGNAL ENCODING; SPEECH; SPEECH CODING;

ANALYSIS AND SYNTHESIS; CONTINUOUS F0 CODING; FUNDAMENTAL FREQUENCIES; NEURAL NETWORKS (NNS); PHONOLOGICAL FEATURES; SPEECH CODING SYSTEM; SPIKING NEURAL NETWORKS; VERY LOW BIT RATE;

SPEECH RECOGNITION;

EID: 85027056450 PISSN: 23299290 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2016.2604566 Document Type: Article

Times cited : (27)

References (78)

1
- 0028550870
- Current objectives in 4-kb/s wireline-quality speech coding standardization
- Nov.
- S. Dimolitsas, C. Ravishankar, and G. Schroder, "Current objectives in 4-kb/s wireline-quality speech coding standardization," IEEE Signal Process. Lett., vol. 1, no. 11, pp. 157-159, Nov. 1994.
- (1994) IEEE Signal Process. Lett. , vol.1 , Issue.11 , pp. 157-159
- Dimolitsas, S.¹ Ravishankar, C.² Schroder, G.³

2
- 0035397411
- A very low bit rate speech coder based on a recognition/synthesis paradigm
- Jul.
- K.-S. Lee and R. Cox, "A very low bit rate speech coder based on a recognition/synthesis paradigm," IEEE Trans. Audio, Speech, Lang. Process., vol. 9, no. 5, pp. 482-491, Jul. 2001.
- (2001) IEEE Trans. Audio, Speech, Lang. Process. , vol.9 , Issue.5 , pp. 482-491
- Lee, K.-S.¹ Cox, R.²

3
- 0141702209
- Corpus based very low bit rate speech coding
- Piscataway, NJ, USA: IEEE Apr.
- G. V. Baudoin and F. El Chami, "Corpus based very low bit rate speech coding," in Proc. 2003 IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. Piscataway, NJ, USA: IEEE, Apr. 2003, pp. I-792-I-795.
- (2003) Proc. 2003 IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. I792-I795
- Baudoin, G.V.¹ El Chami, F.²

4
- 0024909981
- A phonetic vocoder
- Piscataway, NJ, USA: IEEE May
- J. Picone and G. R. Doddington, "A phonetic vocoder," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. Piscataway, NJ, USA: IEEE, May 1989, pp. 580-583.
- (1989) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 580-583
- Picone, J.¹ Doddington, G.R.²

5
- 84892164904
- A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques
- Piscataway, NJ, USA: IEEE May , vol. 2
- K. Tokuda, T. Masuko, J. Hiroi, T. Kobayashi, and T. Kitamura, "A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2. Piscataway, NJ, USA: IEEE, May 1998, vol. 2, pp. 609-612.
- (1998) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2 , pp. 609-612
- Tokuda, K.¹ Masuko, T.² Hiroi, J.³ Kobayashi, T.⁴ Kitamura, T.⁵

6
- 51449103684
- Multisensor very lowbit rate speech coding using segment quantization
- Mar.
- A. McCree, K. Brady, and T. F. Quatieri, "Multisensor very lowbit rate speech coding using segment quantization," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Mar. 2008, pp. 3997-4000.
- (2008) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 3997-4000
- McCree, A.¹ Brady, K.² Quatieri, T.F.³

7
- 84928152663
- Incremental syllable-context phonetic vocoding
- Jun.
- M. Cernak, P. N. Garner, A. Lazaridis, P. Motlicek, and X. Na, "Incremental syllable-context phonetic vocoding," IEEE/ACMTrans. Audio, Speech, Lang. Process., vol. 23, no. 6, pp. 1019-1030, Jun. 2015.
- (2015) IEEE/ACMTrans. Audio, Speech, Lang. Process. , vol.23 , Issue.6 , pp. 1019-1030
- Cernak, M.¹ Garner, P.N.² Lazaridis, A.³ Motlicek, P.⁴ Na, X.⁵

8
- 0038798704
- Linguistic dissection of switchboard-corpus automatic speech recognition systems
- G. Greenberg and S. Chang, "Linguistic dissection of switchboard-corpus automatic speech recognition systems," in Proc. ISCA Autom. Speech Recognit.: Challenges New Millennium, 2000, pp. 195-202.
- (2000) Proc. ISCA Autom. Speech Recognit.: Challenges New Millennium , pp. 195-202
- Greenberg, G.¹ Chang, S.²

9
- 84994347259
- M. Cernak, S. Benus, and A. Lazaridis, "Speech vocoding for laboratory phonology," 2016. [Online]. Available: Http://arxiv.org/abs/1601.05991
- (2016) Speech Vocoding for Laboratory Phonology
- Cernak, M.¹ Benus, S.² Lazaridis, A.³

10
- 0003948389
- Hoboken NJ USA: Wiley, Dec.
- J. Harris, English Sound Structure, 1st ed. Hoboken, NJ, USA: Wiley, Dec. 1994.
- (1994) English Sound Structure 1st Ed.
- Harris, J.¹

11
- 0041385414
- Harlow, Essex, U.K.: Longman
- J. Harris and G. Lindsey, The Elements of Phonological Representation. Harlow, Essex, U.K.: Longman, 1995, pp. 34-79.
- (1995) The Elements of Phonological Representation , pp. 34-79
- Harris, J.¹ Lindsey, G.²

12
- 0004119259
- New York NY USA: Harper & Row
- N. Chomsky and M. Halle, The Sound Pattern of English. NewYork, NY, USA: Harper & Row, 1968.
- (1968) The Sound Pattern of English
- Chomsky, N.¹ Halle, M.²

13
- 84867329143
- Boosting attribute and phone estimation accuracies with deep neural networks for detectionbased speech recognition
- Mar.
- D. Yu, S. Siniscalchi, L. Deng, and C.-H. Lee, "Boosting attribute and phone estimation accuracies with deep neural networks for detectionbased speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. Mar. 2012, pp. 4169-4172.
- (2012) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 4169-4172
- Yu, D.¹ Siniscalchi, S.² Deng, L.³ Lee, C.-H.⁴

14
- 84862931515
- Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data
- Mar.
- S. M. Siniscalchi, D.-C. Lyu, T. Svendsen, and C.-H. Lee, "Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 875-887, Mar. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.3 , pp. 875-887
- Siniscalchi, S.M.¹ Lyu, D.-C.² Svendsen, T.³ Lee, C.-H.⁴

15
- 84959104818
- On compressibility of neural network phonological features for low bit rate speech coding
- Sep.
- A. Asaei, M. Cernak, and H. Bourlard, "On compressibility of neural network phonological features for low bit rate speech coding," in Proc. Interspeech, Sep. 2015, pp. 418-422.
- (2015) Proc. Interspeech , pp. 418-422
- Asaei, A.¹ Cernak, M.² Bourlard, H.³

16
- 69349090197
- Learning deep architectures for AI
- Jan.
- Y. Bengio, "Learning deep architectures for AI," Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, Jan. 2009.
- (2009) Found. Trends Mach. Learn. , vol.2 , Issue.1 , pp. 1-127
- Bengio, Y.¹

17
- 0003573244
- Boston MA USA: Kluwer
- H. Bourlard and N. Morgan, Connectionist Speech Recognition: A Hybrid Approach. Boston, MA, USA: Kluwer, 1994.
- (1994) Connectionist Speech Recognition: A Hybrid Approach
- Bourlard, H.¹ Morgan, N.²

18
- 78649297301
- Deep belief networks for phone recognition
- A. Mohamed, G. E. Dahl, and G. E. Hinton, "Deep belief networks for phone recognition," in Proc. NIPS'22 Workshop Deep Learn. Speech Recognit., 2009.
- (2009) Proc. NIPS'22 Workshop Deep Learn. Speech Recognit.
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.E.³

19
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Nov.
- G. Hinton, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
- (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹

20
- 84055222005
- Context-dependent pretrained deep neural networks for large vocabulary speech recognition
- Jan.
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

21
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2013, pp. 7962-7966.
- (2013) Proc. IEEE Int. Conf. Acoust., Speech Signal Process. , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

22
- 34249043508
- Anytime learning of decision trees
- S. Esmeir, S. Markovitch, and C. Sammut, "Anytime learning of decision trees," J. Mach. Learn. Res., vol. 8, pp. 891-933, 2007.
- (2007) J. Mach. Learn. Res. , vol.8 , pp. 891-933
- Esmeir, S.¹ Markovitch, S.² Sammut, C.³

23
- 79955538498
- Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
- Jul.
- K. Yu, H. Zen, F. Mairesse, and S. Young, "Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis," Speech Commun., vol. 53, no. 6, pp. 914-923, Jul. 2011.
- (2011) Speech Commun. , vol.53 , Issue.6 , pp. 914-923
- Yu, K.¹ Zen, H.² Mairesse, F.³ Young, S.⁴

24
- 84905251808
- On the training aspects of deep neural network (DNN) for parametric TTS synthesis
- Y. Qian, Y. Fan, W. Hu, and F. Soong, "On the training aspects of deep neural network (DNN) for parametric TTS synthesis," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2014, pp. 3829-3833.
- (2014) Proc. IEEE Int. Conf. Acoust., Speech Signal Process. , pp. 3829-3833
- Qian, Y.¹ Fan, Y.² Hu, W.³ Soong, F.⁴

25
- 84945929642
- DNN-based speech synthesis: Importance of input features and training data
- A. Lazaridis, B. Potard, and P. N. Garner, "DNN-based speech synthesis: Importance of input features and training data," in Proc. Int. Conf. Speech Comput., 2015, pp. 193-200.
- (2015) Proc. Int. Conf. Speech Comput. , pp. 193-200
- Lazaridis, A.¹ Potard, B.² Garner, P.N.³

26
- 84989426403
- A new model of LPC excitation for producing natural-sounding speech at low bit rates
- Piscataway, NJ, USA: IEEE May
- B. S. Atal and J. R. Remde, "A new model of LPC excitation for producing natural-sounding speech at low bit rates," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 7. Piscataway, NJ, USA: IEEE, May 1982, pp. 614-617. [Online]. Available: Http://dx. doi.org/10.1109/icassp.1982.1171649
- (1982) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.7 , pp. 614-617
- Atal, B.S.¹ Remde, J.R.²

27
- 0022219187
- Code-excited linear prediction (CELP): High-quality speech at very low bit rates
- Piscataway, NJ, USA: IEEE Apr. [Online]. Available:
- M. Schroeder and B. Atal, "Code-excited linear prediction (CELP): High-quality speech at very low bit rates," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process, vol. 10. Piscataway, NJ, USA: IEEE, Apr. 1985, pp. 937-940. [Online]. Available: Http://dx.doi.org/10.1109/icassp.1985.1168147
- (1985) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.10 , pp. 937-940
- Schroeder, M.¹ Atal, B.²

28
- 0025694638
- Speech coding based on a multi-layer neural network
- Apr. [Online]. Available:
- S. Morishima, H. Harashima, and Y. Katayama, "Speech coding based on a multi-layer neural network," in Proc. IEEE Int. Conf. Commun., Incl. Supercomm Tech. Sessions. Conf. Record., Apr. 1990, vol. 2, pp. 429-433. [Online]. Available: Http://dx.doi.org/10.1109/icc.1990. 117117
- (1990) Proc. IEEE Int. Conf. Commun., Incl. Supercomm Tech. Sessions. Conf. Record. , vol.2 , pp. 429-433
- Morishima, S.¹ Harashima, H.² Katayama, Y.³

29
- 0030386937
- Prediction in speech coding: The modification of the coding of LPC parameters and nonlinear estimation technique by using ANN
- Oct. [Online]. Available:
- Y. Zhen, "Prediction in speech coding: The modification of the coding of LPC parameters and nonlinear estimation technique by using ANN," in Proc. 3rd Int. Conf. Signal Process., Oct. 1996, vol. 1, pp. 690-693. [Online]. Available: Http://dx.doi.org/10.1109/icsigp. 1996.567357
- (1996) Proc. 3rd Int. Conf. Signal Process. , vol.1 , pp. 690-693
- Zhen, Y.¹

30
- 0029727465
- A nonlinear adaptive predictor for speech compression
- Piscataway, NJ, USA: IEEE Jun. [Online]. Available:
- S. Hunt, "A nonlinear adaptive predictor for speech compression," in Proc. IEEE Int. Conf. Neural Netw., vol. 4. Piscataway, NJ, USA: IEEE, Jun. 1996, pp. 1998-2002. [Online]. Available: Http://dx.doi. org/10.1109/icnn.1996.549208
- (1996) Proc. IEEE Int. Conf. Neural Netw. , vol.4 , pp. 1998-2002
- Hunt, S.¹

31
- 0033309597
- Discriminative coding with predictive neural networks
- Hertfordshire, U.K.: IET [Online]. Available:
- C. Chavy, B. Gas, and J. L. Zarader, "Discriminative coding with predictive neural networks," in Proc. 9th Int. Conf. Art. Neural Netw., vol. 1. Hertfordshire, U.K.: IET, 1999, pp. 216-220. [Online]. Available: Http://dx.doi.org/10.1049/cp:19991111
- (1999) Proc. 9th Int. Conf. Art. Neural Netw. , vol.1 , pp. 216-220
- Chavy, C.¹ Gas, B.² Zarader, J.L.³

32
- 33745745020
- Adaptive hybrid speech coding with a MLP/LPC structure
- June 2-4 1999 Proceedings, Volume II. Berlin, Germany: Springer [Online]. Available:
- M. Faúndez-Zanuy, "Adaptive hybrid speech coding with a MLP/LPC structure," in Engineering Applications of Bio-Inspired Artificial Neural Networks: International Work-Conference on Artificial and Natural Neural Networks, IWANN'99 Alicante, Spain, June 2-4, 1999 Proceedings, Volume II. Berlin, Germany: Springer, 1999, pp. 814-823. [Online]. Available: Http://dx.doi.org/10.1007/BFb0100549
- (1999) Engineering Applications of Bio-Inspired Artificial Neural Networks: International Work-Conference On Artificial and Natural Neural Networks, IWANN'99 Alicante, Spain , pp. 814-823
- Faúndez-Zanuy, M.¹

33
- 84962840611
- Packet loss concealment based on deep neural networks for digital speech transmission
- Feb. [Online]. Available:
- B.-K. Lee and J.-H. Chang, "Packet loss concealment based on deep neural networks for digital speech transmission," IEEE/ACMTrans. Audio, Speech, Lang. Process., vol. 24, no. 2, pp. 378-387, Feb. 2016. [Online]. Available: Http://dx.doi.org/10.1109/taslp.2015.2509780
- (2016) IEEE/ACMTrans. Audio, Speech, Lang. Process. , vol.24 , Issue.2 , pp. 378-387
- Lee, B.-K.¹ Chang, J.-H.²

34
- 84864942567
- Complexity reduction of LDCELP speech coding in prediction of gain using neural networks
- M. Sheikhan, V. T. Vakili, and S. Garoucy, "Complexity reduction of LDCELP speech coding in prediction of gain using neural networks," World Appl. Sci. J., vol. 7, no. 7, pp. 38-44, 2009.
- (2009) World Appl. Sci. J. , vol.7 , Issue.7 , pp. 38-44
- Sheikhan, M.¹ Vakili, V.T.² Garoucy, S.³

35
- 0026384943
- A CELP codebook and search technique using a Hopfield net
- Piscataway, NJ, USA: IEEE Apr. vol. 1. [Online]. Available:
- M. G. Easton and C. C. Goodyear, "A CELP codebook and search technique using a Hopfield net," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process, vol. 1. Piscataway, NJ, USA: IEEE, Apr. 1991, pp. 685-688 vol. 1. [Online]. Available: Http://dx.doi.org/10. 1109/icassp.1991.150432
- (1991) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.1 , pp. 685-688
- Easton, M.G.¹ Goodyear, C.C.²

36
- 0028516668
- Fully vector-quantized neural network-based code-excited nonlinear predictive speech coding
- Oct. [Online].Available:
- L. Wu, M. Niranjan, and F. Fallside, "Fully vector-quantized neural network-based code-excited nonlinear predictive speech coding," IEEE Trans. Audio, Speech, Lang. Process., vol. 2, no. 4, pp. 482-489, Oct. 1994. [Online].Available: Http://dx.doi.org/10.1109/89. 326608
- (1994) IEEE Trans. Audio, Speech, Lang. Process. , vol.2 , Issue.4 , pp. 482-489
- Wu, L.¹ Niranjan, M.² Fallside, F.³

37
- 0024060644
- Multiband excitation vocoder
- Aug. [Online]. Available:
- D.W. Griffin and J. S. Lim, "Multiband excitation vocoder," IEEE Trans. Audio, Speech, Lang. Process., vol. 36, no. 8, pp. 1223-1235, Aug. 1988. [Online]. Available: Http://dx.doi.org/10.1109/29.1651
- (1988) IEEE Trans. Audio, Speech, Lang. Process. , vol.36 , Issue.8 , pp. 1223-1235
- Griffin, D.W.¹ Lim, J.S.²

38
- 1642601602
- A robust 800 bps MBE coder with VQ and MLP
- Piscataway, NJ, USA: IEEE, Oct. [Online]. Available:
- H. Cui and H. Jiang, "A robust 800 bps MBE coder with VQ and MLP," in Proc. Int. Conf. Commun. Technol. Proc., vol. 2. Piscataway, NJ, USA: IEEE, Oct. 1998, pp. 4. [Online]. Available: Http://dx.doi.org/10.1109/icct.1998.741011
- (1998) Proc. Int. Conf. Commun. Technol. Proc. , vol.2 , pp. 4
- Cui, H.¹ Jiang, H.²

39
- 0020194708
- An 800 bit/s vector quantization LPC vocoder
- Oct. [Online]. Available:
- D. Wong, B.-H. Juang, and A. Gray, "An 800 bit/s vector quantization LPC vocoder," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-30, no. 5, pp. 770-780, Oct. 1982. [Online]. Available: Http://dx.doi.org/10.1109/tassp.1982.1163960
- (1982) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-30 , Issue.5 , pp. 770-780
- Wong, D.¹ Juang, B.-H.² Gray, A.³

40
- 84871173623
- Segment quantization for very-low-rate speech coding
- Piscataway, NJ, USA: IEEE May [Online]. Available:
- S. Roucos, R. Schwartz, and J. Makhoul, "Segment quantization for very-low-rate speech coding," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process, vol. 7. Piscataway, NJ, USA: IEEE, May 1982, pp. 1565-1568. [Online]. Available: Http://dx.doi.org/10.1109/icassp.1982.1171472
- (1982) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.7 , pp. 1565-1568
- Roucos, S.¹ Schwartz, R.² Makhoul, J.³

41
- 0020550073
- A segment vocoder at 150 b/s
- Piscataway, NJ, USA: IEEE Apr. [Online]. Available:
- S. Roucos, R. Schwartz, and J.Makhoul, "A segment vocoder at 150 b/s," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process, vol. 8. Piscataway, NJ, USA: IEEE, Apr. 1983, pp. 61-64. [Online]. Available: Http://dx.doi.org/10.1109/icassp.1983.1172241
- (1983) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.8 , pp. 61-64
- Roucos, S.¹ Schwartz, R.² Makhoul, J.³

42
- 0020548652
- Very low data rate speech compression with LPC vector and matrix quantization
- Piscataway, NJ, USA: IEEE Apr. [Online]. Available:
- D. Wong, B. Juang, and D. Cheng, "Very low data rate speech compression with LPC vector and matrix quantization," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process, vol. 8. Piscataway, NJ, USA: IEEE, Apr. 1983, pp. 65-68. [Online]. Available: Http://dx.doi.org/10.1109/icassp.1983.1172244
- (1983) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.8 , pp. 65-68
- Wong, D.¹ Juang, B.² Cheng, D.³

43
- 0022084026
- Matrix quantizer design for LPC speech using the generalized Llyod algorithm
- Jun. [Online].Available:
- C. Tsao and R. Gray, "Matrix quantizer design for LPC speech using the generalized Llyod algorithm," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, no. 3, pp. 537-545, Jun. 1985. [Online].Available: Http://dx.doi.org/10.1109/tassp.1985.1164584
- (1985) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-33 , Issue.3 , pp. 537-545
- Tsao, C.¹ Gray, R.²

44
- 0024075701
- LPC speech coding based on variablelength segment quantization
- Sep. [Online]. Available:
- Y. Shiraki and M. Honda, "LPC speech coding based on variablelength segment quantization," IEEE Trans. Audio, Speech, Lang. Process., vol. 36, no. 9, pp. 1437-1444, Sep. 1988. [Online]. Available: Http://dx.doi.org/10.1109/29.90372
- (1988) IEEE Trans. Audio, Speech, Lang. Process. , vol.36 , Issue.9 , pp. 1437-1444
- Shiraki, Y.¹ Honda, M.²

45
- 84976552353
- Speech Compression
- Jun. [Online]. Available:
- J. Gibson, "Speech Compression," Information, vol. 7, no. 2, p. 32, Jun. 2016. [Online]. Available: Http://dx.doi.org/10.3390/info7020032
- (2016) Information , vol.7 , Issue.2 , pp. 32
- Gibson, J.¹

46
- 84946076199
- Phonological vocoding using artificial neural networks
- Apr.
- M. Cernak, B. Potard, and P. N. Garner, "Phonological vocoding using artificial neural networks," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr. 2015, pp. 4844-4848.
- (2015) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 4844-4848
- Cernak, M.¹ Potard, B.² Garner, P.N.³

47
- 80052637232
- Demodulation as probabilistic inference
- Nov.
- R. E. Turner and M. Sahani, "Demodulation as probabilistic inference," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 8, pp. 2398-2411, Nov. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.8 , pp. 2398-2411
- Turner, R.E.¹ Sahani, M.²

48
- 84903976036
- A role for amplitude modulation phase relationships in speech rhythm perception
- Jul.
- V. Leong, M. A. Stone, R. E. Turner, and U. Goswami, "A role for amplitude modulation phase relationships in speech rhythm perception." J. Acoust. Soc. Amer., vol. 136, no. 1, pp. 366-381, Jul. 2014.
- (2014) J. Acoust. Soc. Amer. , vol.136 , Issue.1 , pp. 366-381
- Leong, V.¹ Stone, M.A.² Turner, R.E.³ Goswami, U.⁴

49
- 23944484420
- An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex
- Sep.
- P. Lakatos, A. S. Shah, K. H. Knuth, I. Ulbert, G. Karmos, and C. E. Schroeder, "An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex." J. Neurophysiol., vol. 94, no. 3, pp. 1904-1911, Sep. 2005.
- (2005) J. Neurophysiol. , vol.94 , Issue.3 , pp. 1904-1911
- Lakatos, P.¹ Shah, A.S.² Knuth, K.H.³ Ulbert, I.⁴ Karmos, G.⁵ Schroeder, C.E.⁶

50
- 84994272823
- [Online]. Available:
- M. Cernak, A. Asaei, and H. Bourlard, On structured sparsity of phonological posteriors for linguistic parsing, 2016. [Online]. Available: Http://arxiv.org/abs/1601.05647
- (2016) On Structured Sparsity of Phonological Posteriors for Linguistic Parsing
- Cernak, M.¹ Asaei, A.² Bourlard, H.³

51
- 85090774282
- From discontinuous to continuous F0 modelling in HMM-based speech synthesis
- K. Yu, B. Thomson, S. Young, and T. Street, "From discontinuous to continuous F0 modelling In HMM-based speech synthesis," in Proc. 7th ISCA Speech Synthesis Workshop, 2010, pp. 94-99.
- (2010) Proc. 7th ISCA Speech Synthesis Workshop , pp. 94-99
- Yu, K.¹ Thomson, B.² Young, S.³ Street, T.⁴

52
- 85008023596
- Continuous F0 modeling for HMM based statistical parametric speech synthesis
- Jul. [Online]. Available:
- K. Yu and S. Young, "Continuous F0 modeling for HMM based statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1071-1079, Jul. 2011. [Online]. Available: Http://dx.doi.org/10.1109/tasl.2010.2076805
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1071-1079
- Yu, K.¹ Young, S.²

53
- 84902953892
- Using noisy speech to study the robustness of a continuous F0 modelling method in HMM-based speech synthesis
- K. U. Ogbureke, J. P. Cabral, and J. Carson-Berndsen, "Using noisy speech to study the robustness of a continuous F0 modelling method in HMM-based speech synthesis," in Proc. Speech Prosody, 2012, pp. 67-70.
- (2012) Proc. Speech Prosody , pp. 67-70
- Ogbureke, K.U.¹ Cabral, J.P.² Carson-Berndsen, J.³

54
- 84959123110
- Neuromorphic based oscillatory device for incremental syllable boundary detection
- Sep.
- A. Hyafil and M. Cernak, "Neuromorphic based oscillatory device for incremental syllable boundary detection," in Proc. Interspeech, Sep. 2015, pp. 1191-1195.
- (2015) Proc. Interspeech , pp. 1191-1195
- Hyafil, A.¹ Cernak, M.²

55
- 84906268958
- Syllable-based pitch encoding for low bit rate speech coding with recognition/synthesis architecture
- Aug.
- M. Cernak, X. Na, and P. N. Garner, "Syllable-based pitch encoding for low bit rate speech coding with recognition/synthesis architecture," in Proc. Interspeech, Aug. 2013, pp. 3449-3452.
- (2013) Proc. Interspeech , pp. 3449-3452
- Cernak, M.¹ Na, X.² Garner, P.N.³

56
- 84994246116
- PhonVoc: A phonetic and phonological vocoding toolkit
- M. Cernak and P. N. Garner, "PhonVoc: A phonetic and phonological vocoding toolkit," in Proc. Interspeech, 2016.
- (2016) Proc. Interspeech
- Cernak, M.¹ Garner, P.N.²

57
- 0012330750
- The design for the wall street journal-based CSR corpus
- D. B. Paul and J. M. Baker, "The design for the wall street journal-based CSR corpus," in Proc. Workshop Speech Nat. Lang., 1992, pp. 357-362.
- (1992) Proc. Workshop Speech Nat. Lang. , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

58
- 6344222337
- Philadelphia, PA, USA: Linguistic Data Consortium
- J. Garofolo, S. Lamel, L. F. Fisher, W. M. Fiscus, G. Jonathon , and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM.NIST speech disc 1-1.1.NASA STI/Recon technical report, 93, LDC93S1," Philadelphia, PA, USA: Linguistic Data Consortium, 1993.
- (1993) DARPA TIMIT Acoustic-phonetic Continuous Speech Corpus CD-ROM.NIST Speech Disc 1-1.1.NASA STI/Recon Technical Report, 93, LDC93S1
- Garofolo, J.¹ Lamel, S.² Fisher, L.F.³ Fiscus, W.M.⁴ Jonathon, G.⁵ Pallett, D.S.⁶

59
- 0003571407
- Human Commun. Res. Centre, Univ. of Edinburgh, Tech. Rep.
- A. Black, P. Taylor, and R. Caley, "The festival speech synthesis system," Human Commun. Res. Centre, Univ. of Edinburgh, Tech. Rep., 1997.
- (1997) The Festival Speech Synthesis System
- Black, A.¹ Taylor, P.² Caley, R.³

60
- 70350498327
- The HMM-based speech synthesis system version 2.0
- H. Zen, "The HMM-based speech synthesis system version 2.0," in Proc. 6th ISCA Speech Synthesis Workshop, 2007, pp. 131-136.
- (2007) Proc. 6th ISCA Speech Synthesis Workshop , pp. 131-136
- Zen, H.¹

61
- 85074721580
- Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project
- M. Wester, "Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project," in Proc. 7th ISCA Speech Synthesis Workshop, 2010, pp. 192-197.
- (2010) Proc. 7th ISCA Speech Synthesis Workshop , pp. 192-197
- Wester, M.¹

62
- 33745805403
- A fast learning algorithm for deep belief nets
- Jul.
- G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, Jul. 2006.
- (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.W.³

63
- 84858953642
- The Kaldi speech recognition toolkit
- Dec. IEEE Catalog No.: CFP11SRW-USB
- D. Povey et al., "The Kaldi speech recognition toolkit," in Proc. IEEE Workshop Autom. Speech Recognit. Understanding, Dec. 2011, IEEE Catalog No.: CFP11SRW-USB.
- (2011) Proc. IEEE Workshop Autom. Speech Recognit. Understanding
- Povey, D.¹

64
- 84930661557
- Speech encoding by coupled cortical theta and gamma oscillations
- May [Online]. Available:
- A. Hyafil, L. Fontolan, C. Kabdebon, B. Gutkin, A.-L. Giraud, and H. Brownell, "Speech encoding by coupled cortical theta and gamma oscillations," eLife, vol. 2015, no. 4, May 2015, Art. no. e06213. [Online]. Available: Http://dx.doi.org/10.7554/elife.06213
- (2015) ELife , vol.2015 , Issue.4
- Hyafil, A.¹ Fontolan, L.² Kabdebon, C.³ Gutkin, B.⁴ Giraud, A.-L.⁵ Brownell, H.⁶

65
- 84930614319
- [Online]. Available:
- W. M. Fisher, tsylb2. 1996. [Online]. Available: Http://www.nist. gov/speech/tools
- (1996) Tsylb2
- Fisher, W.M.¹

66
- 84946044425
- Idiap research report Idiap-RR-03-2015
- P. N. Garner, M. Cernak, and B. Potard, "A simple continuous excitation model for parametric vocoding," Idiap research report Idiap-RR-03-2015, 2015.
- (2015) A Simple Continuous Excitation Model for Parametric Vocoding
- Garner, P.N.¹ Cernak, M.² Potard, B.³

67
- 0027247004
- Mel-cepstral distance measure for objective speech quality assessment
- Piscataway, NJ, USA: IEEE May
- R. F. Kubichek, "Mel-cepstral distance measure for objective speech quality assessment," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process, vol. 1. Piscataway, NJ, USA: IEEE, May 1993, pp. 125-128.
- (1993) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.1 , pp. 125-128
- Kubichek, R.F.¹

68
- 0003450846
- ITU-T Rec. P.800, Geneva, Switzerland
- Methods for Subjective Determination of Transmission Quality, ITU-T Rec. P.800, Geneva, Switzerland, 1996.
- (1996) Methods for Subjective Determination of Transmission Quality

69
- 85075908665
- Speech quality assessment
- J. Benesty,M.M. Sondhi, and Y. Huang, Eds. Berlin, Germany: Springer
- V. Grancharov andW. B. Kleijn, "Speech quality assessment," in Springer Handbook of Speech Processing, J. Benesty,M.M. Sondhi, and Y. Huang, Eds. Berlin, Germany: Springer, 2008, pp. 83-100.
- (2008) Springer Handbook of Speech Processing , pp. 83-100
- Grancharov, V.¹ Kleijn, W.B.²

70
- 79960916745
- An algorithm for intelligibility prediction of time-frequency weighted noisy speech
- [Online]. Available:
- C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2125-2136, 2011. [Online]. Available: Http://dx.doi.org/10.1109/tasl.2011.2114881
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.7 , pp. 2125-2136
- Taal, C.H.¹ Hendriks, R.C.² Heusdens, R.³ Jensen, J.⁴

71
- 84942636431
- Speech intelligibility evaluation for mobile phones
- [Online]. Available:
- S. Jørgensen, J. Cubick, and T. Dau, "Speech intelligibility evaluation for mobile phones," Acta Acust. United with Acust., vol. 105, pp. 1016-1025. [Online]. Available: Http://dx.doi.org/10.3813/aaa.918896
- Acta Acust. United with Acust. , vol.105 , pp. 1016-1025
- Jørgensen, S.¹ Cubick, J.² Dau, T.³

72
- 78049365405
- A shorttime objective intelligibility measure for time-frequency weighted noisy speech
- Mar. [Online]. Available:
- C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "A shorttime objective intelligibility measure for time-frequency weighted noisy speech," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Mar. 2010, pp. 4214-4217. [Online]. Available: Http://dx.doi.org/10.1109/icassp.2010.5495701
- (2010) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 4214-4217
- Taal, C.H.¹ Hendriks, R.C.² Heusdens, R.³ Jensen, J.⁴

73
- 84985875274
- G.114, Geneva, Switzerland
- One-Way Transmission Time, ITU-T Rec. G.114, Geneva, Switzerland, 2003.
- (2003) One-Way Transmission Time, ITU-T Rec.

74
- 84959176782
- Mar. [Online]. Available:
- G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," Mar. 2015. [Online]. Available: Http://arxiv.org/abs/1503.02531
- (2015) Distilling the Knowledge in a Neural Network
- Hinton, G.¹ Vinyals, O.² Dean, J.³

75
- 84959104369
- Compressing deep neural networks using a rank-constrained topology
- P. Nakkiran, R. Alvarez, R. Prabhavalkar, and C. Parada, "Compressing deep neural networks using a rank-constrained topology," in Proc. Interspeech, 2015, pp. 1473-1477.
- (2015) Proc. Interspeech , pp. 1473-1477
- Nakkiran, P.¹ Alvarez, R.² Prabhavalkar, R.³ Parada, C.⁴

76
- 85027586796
- Oct. [Online]. Available:
- V. Sindhwani, T. N. Sainath, and S. Kumar, "Structured transforms for small-footprint deep learning," Oct. 2015. [Online]. Available: Http://arxiv.org/abs/1510.01722
- (2015) Structured Transforms for Small-footprint Deep Learning
- Sindhwani, V.¹ Sainath, T.N.² Kumar, S.³

77
- 85027574156
- Small-footprint deep neural networks with highway connections for speech recognition
- vol. abs/1512.04280 [Online]. Available:
- L. Lu and S. Renals, "Small-footprint deep neural networks with highway connections for speech recognition," CoRR, vol. abs/1512.04280, 2015. [Online]. Available: Http://arxiv.org/abs/1512.04280
- (2015) CoRR
- Lu, L.¹ Renals, S.²

78
- 84960944045
- Deepear: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning
- [Online]. Available:
- N. D. Lane, P. Georgiev, and L. Qendro, "Deepear: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning," in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput., 2015, pp. 283-294. [Online]. Available: Http://doi.acm. org/10.1145/2750858.2804262
- (2015) Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput. , pp. 283-294
- Lane, N.D.¹ Georgiev, P.² Qendro, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.