메뉴 건너뛰기




Volumn 29, Issue 6, 2012, Pages 58-69

Discriminative training for automatic speech recognition: Modeling, criteria, optimization, implementation, and performance

Author keywords

[No Author keywords available]

Indexed keywords

DEEP NEURAL NETWORKS; MAXIMUM LIKELIHOOD;

EID: 85032751713     PISSN: 10535888     EISSN: None     Source Type: Journal    
DOI: 10.1109/MSP.2012.2197232     Document Type: Article
Times cited : (44)

References (54)
  • 1
    • 0001185873 scopus 로고
    • An essay towards solving a problem in the doctrine of chances
    • (reprinted in Biometrika, vol. 45, no. 3/4, pp. 293-315, Dec. 1958)
    • T. Bayes, "An essay towards solving a problem in the doctrine of chances," Philos. Trans. R. Soc. Lond., vol. 53, pp. 370-418, 1763 (reprinted in Biometrika, vol. 45, no. 3/4, pp. 293-315, Dec. 1958).
    • (1763) Philos. Trans. R. Soc. Lond. , vol.53 , pp. 370-418
    • Bayes, T.1
  • 2
    • 0027929445 scopus 로고
    • On structuring probabilistic dependencies in language modeling
    • H. Ney, U. Essen, and R. Kneser, "On structuring probabilistic dependencies in language modeling," Comput. Speech Lang., vol. 2, no. 8, pp. 1-38, 1994.
    • (1994) Comput. Speech Lang. , vol.2 , Issue.8 , pp. 1-38
    • Ney, H.1    Essen, U.2    Kneser, R.3
  • 3
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoustics Speech Signal Processing, vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980. (Pubitemid 11464930)
    • (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis Steven, B.1    Mermelstein Paul2
  • 4
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • DOI 10.1121/1.399423
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738-1752, June 1990. (Pubitemid 20256470)
    • (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 6
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the em algorithm
    • A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Stat. Soc., vol. 39, no. B, pp. 1-38, 1977.
    • (1977) J. R. Stat. Soc. , vol.39 , Issue.B , pp. 1-38
    • Dempster, A.1    Laird, N.2    Rubin, D.3
  • 7
    • 0020796537 scopus 로고
    • Decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood
    • A. Nádas, "A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood," IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-31, pp. 814-817, Aug. 1983. (Pubitemid 14455162)
    • (1983) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-31 , Issue.4 , pp. 814-817
    • Nadas Arthur1
  • 9
    • 0031222490 scopus 로고    scopus 로고
    • MMIE training of large vocabulary recognition systems
    • PII S0167639397000290
    • V . Valtchev, J. J. Odell, P. C. Woodland, and S. J. Young, "MMIE training of large vocabulary recognition systems," Speech Commun., vol. 22, no. 4, pp. 303-314, 1997. (Pubitemid 127433601)
    • (1997) Speech Communication , vol.22 , Issue.4 , pp. 303-314
    • Valtchev, V.1    Odell, J.J.2    Woodland, P.C.3    Young, S.J.4
  • 10
    • 0036461035 scopus 로고    scopus 로고
    • Large scale discriminative training of hidden Markov models for speech recognition
    • P. C. Woodland and D. Povey, "Large scale discriminative training of hidden Markov models for speech recognition," Comput. Speech Lang., vol. 16, no. 1, pp. 25-48, 2002.
    • (2002) Comput. Speech Lang. , vol.16 , Issue.1 , pp. 25-48
    • Woodland, P.C.1    Povey, D.2
  • 11
    • 35248845658 scopus 로고    scopus 로고
    • On the relationship between classification error bounds and training criteria in statistical pattern recognition
    • Puerto de Andratx, Spain, June
    • H . Ney, "On the relationship between classification error bounds and training criteria in statistical pattern recognition," in Proc. Iberian Conf. Pattern Recognition and Image Analysis, Puerto de Andratx, Spain, June 2003, pp. 636-645.
    • (2003) Proc. Iberian Conf. Pattern Recognition and Image Analysis , pp. 636-645
    • Ney, H.1
  • 12
    • 0001492251 scopus 로고    scopus 로고
    • Minimum Bayes-risk automatic speech recognition
    • V . Goel and W. Byrne, "Minimum Bayes-risk automatic speech recognition," Comput. Speech Lang., vol. 14, no. 2, pp. 115-135, 2000.
    • (2000) Comput. Speech Lang. , vol.14 , Issue.2 , pp. 115-135
    • Goel, V.1    Byrne, W.2
  • 13
    • 0035342391 scopus 로고    scopus 로고
    • Comparison of discriminative training criteria and optimization methods for speech recognition
    • DOI 10.1016/S0167-6393(00)00035-2, PII S0167639300000352
    • R . Schlüter, W. Macherey, B. Müller, and H. Ney, "Comparison of discriminative training criteria and optimization methods for speech recognition," Speech Commun., vol. 34, no. 3, pp. 287-310, 2001. (Pubitemid 32284868)
    • (2001) Speech Communication , vol.34 , Issue.3 , pp. 287-310
    • Schluter, R.1    Macherey, W.2    Muller, B.3    Ney, H.4
  • 14
    • 0031139839 scopus 로고    scopus 로고
    • Minimum classification error rate methods for speech recognition
    • PII S1063667697035937
    • B. Juang, W. Chou, and C. Lee, "Minimum classification error rate methods for speech recognition," IEEE Trans. Speech Audio Processing, vol. 5, no. 3, pp. 257-265, 1997. (Pubitemid 127745998)
    • (1997) IEEE Transactions on Speech and Audio Processing , vol.5 , Issue.3 , pp. 257-265
    • Juang, B.-H.1    Chou, W.2    Lee, C.-H.3
  • 16
    • 34547522070 scopus 로고    scopus 로고
    • Discriminative training for large vocabulary speech recognition using minimum classification error
    • E. McDermott, T. Hazen, J. L. Roux, A. Nakamura, and S. Katagiri, "Discriminative training for large vocabulary speech recognition using minimum classification error," IEEE Trans. Audio Speech Lang. Processing, vol. 15, no. 1, pp. 203-223, 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Processing , vol.15 , Issue.1 , pp. 203-223
    • McDermott, E.1    Hazen, T.2    Roux, J.L.3    Nakamura, A.4    Katagiri, S.5
  • 19
    • 33745203493 scopus 로고    scopus 로고
    • Improved discriminative training using phone lattices
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • J. Zheng and A. Stolcke, "Improved discriminative training using phone lattices," in Proc. Interspeech, Lisbon, Portugal, Sept. 2005, pp. 2125-2128. (Pubitemid 43908513)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 2125-2128
    • Zheng, J.1    Stolcke, A.2
  • 22
    • 44849133017 scopus 로고    scopus 로고
    • A fast optimization method for large margin estimation of HMMs based on second order cone programming
    • Antwerp, Belgium, Aug.
    • Y. Yin and H. Jiang, "A fast optimization method for large margin estimation of HMMs based on second order cone programming," in Proc. Interspeech, Antwerp, Belgium, Aug. 2007, pp. 34-37.
    • (2007) Proc. Interspeech , pp. 34-37
    • Yin, Y.1    Jiang, H.2
  • 23
    • 34547522370 scopus 로고    scopus 로고
    • Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models
    • Honolulu, HI, Apr.
    • F. Sha and L. Saul, "Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, HI, Apr. 2007, pp. 313-316.
    • (2007) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP) , pp. 313-316
    • Sha, F.1    Saul, L.2
  • 24
    • 56449091292 scopus 로고    scopus 로고
    • Modified MMI/MPE: A direct evaluation of the margin in speech recognition
    • Helsinki, Finland, July
    • G. Heigold, T. Deselaers, R. Schlüter, and H. Ney, "Modified MMI/MPE: A direct evaluation of the margin in speech recognition," in Proc. Int. Conf. Machine Learning (ICML), Helsinki, Finland, July 2008, pp. 384-391.
    • (2008) Proc. Int. Conf. Machine Learning (ICML) , pp. 384-391
    • Heigold, G.1    Deselaers, T.2    Schlüter, R.3    Ney, H.4
  • 28
    • 85032750905 scopus 로고    scopus 로고
    • Discriminative learning in sequential pattern recognition-A unifying review for optimization-oriented speech recognition
    • X. He , L. Deng, and W. Chou, "Discriminative learning in sequential pattern recognition-A unifying review for optimization-oriented speech recognition," IEEE Signal Processing Mag., vol. 25, no. 5, pp. 14-36, 2008.
    • (2008) IEEE Signal Processing Mag. , vol.25 , Issue.5 , pp. 14-36
    • He, X.1    Deng, L.2    Chou, W.3
  • 31
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large vocabulary speech recognition
    • G. Da hl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large vocabulary speech recognition," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 1, pp. 30-42, 2012.
    • (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.1 , pp. 30-42
    • Da Hl, G.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 32
    • 34047246149 scopus 로고    scopus 로고
    • Maximum entropy direct models for speech recognition
    • DOI 10.1109/TSA.2005.858064
    • H.-K. J. Kuo and Y. Gao, "Maximum entropy direct models for speech recognition," IEEE Trans. Audio Speech Lang. Processing, vol. 14, no. 3, pp. 873-881, 2006. (Pubitemid 46547649)
    • (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.3 , pp. 873-881
    • Kuo, H.-K.J.1    Gao, Y.2
  • 33
    • 70350435251 scopus 로고    scopus 로고
    • Speech recognition using augmented conditional random fields
    • Y. Hi fny and S. Renals, "Speech recognition using augmented conditional random fields," IEEE Trans. Audio. Speech Lang. Processing, vol. 17, no. 2, pp. 354-365, 2009.
    • (2009) IEEE Trans. Audio. Speech Lang. Processing , vol.17 , Issue.2 , pp. 354-365
    • Hi Fny, Y.1    Renals, S.2
  • 34
    • 68749108327 scopus 로고    scopus 로고
    • Using continuous features in the maximum entropy model
    • D. Yu , L. Deng, and A. Acero, "Using continuous features in the maximum entropy model," Pattern Recognit. Lett., vol. 30, no. 14, pp. 1295-1300, 2009.
    • (2009) Pattern Recognit. Lett. , vol.30 , Issue.14 , pp. 1295-1300
    • Yu, D.1    Deng, L.2    Acero, A.3
  • 36
    • 0142192295 scopus 로고    scopus 로고
    • Conditional random fields: Probabilistic models for segmenting and labeling sequence data
    • San Francisco, CA, June-July
    • J. Laf ferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. Int. Conf. Machine Learning (ICML), San Francisco, CA, June-July 2001, pp. 282-289.
    • (2001) Proc. Int. Conf. Machine Learning (ICML) , pp. 282-289
    • Laf Ferty, J.1    McCallum, A.2    Pereira, F.3
  • 37
    • 70450209946 scopus 로고    scopus 로고
    • CRANDEM: Conditional random fields for word recognition
    • Brighton, England, Sept.
    • J. Mor ris and E. Fosler-Lussier, "CRANDEM: conditional random fields for word recognition," in Proc. Interspeech, Brighton, England, Sept. 2009, pp. 3063-3066.
    • (2009) Proc. Interspeech , pp. 3063-3066
    • Mor Ris, J.1    Fosler-Lussier, E.2
  • 38
    • 33947618431 scopus 로고    scopus 로고
    • Hidden conditional random fields for phone classification
    • Lisbon, Portugal, Sept.
    • A. Gun awardana, M. Mahajan, A. Acero, and J. Platt, "Hidden conditional random fields for phone classification," in Proc. Interspeech, Lisbon, Portugal, Sept. 2005, pp. 117-120.
    • (2005) Proc. Interspeech , pp. 117-120
    • Gun Awardana, A.1    Mahajan, M.2    Acero, A.3    Platt, J.4
  • 40
    • 73649117102 scopus 로고    scopus 로고
    • Joint acoustic and language modeling for speech recognition
    • Oct.
    • J.-T. Chien and C.-H. Chueh, "Joint acoustic and language modeling for speech recognition," Speech Commun., vol. 52, no. 3, pp. 223-235, Oct. 2010.
    • (2010) Speech Commun. , vol.52 , Issue.3 , pp. 223-235
    • Chien, J.-T.1    Chueh, C.-H.2
  • 41
    • 79956277935 scopus 로고    scopus 로고
    • Learning a discriminative weighted finite-state transducer for speech recognition
    • July
    • M. Leh r and I. Shafran, "Learning a discriminative weighted finite-state transducer for speech recognition," IEEE Trans. Audio Speech Lang Processing, vol. 19, no. 5, pp. 1360-1367, July 2011.
    • (2011) IEEE Trans. Audio Speech Lang Processing , vol.19 , Issue.5 , pp. 1360-1367
    • Lehr, M.1    Shafran, I.2
  • 43
    • 33745205296 scopus 로고    scopus 로고
    • Optimization methods for discriminative training
    • Lisbon, Portugal, Sept.
    • J. L. R oux and E. McDermott, "Optimization methods for discriminative training," in Proc. Interspeech, Lisbon, Portugal, Sept. 2005, pp. 3341-3344.
    • (2005) Proc. Interspeech , pp. 3341-3344
    • Oux, J.L.R.1    McDermott, E.2
  • 46
    • 84943274699 scopus 로고
    • A direct adaptive method for faster backpropagation learning: The Rprop algorithm
    • San Francisco, CA, Mar.-Apr.
    • M. Riedm iller and H. Braun, "A direct adaptive method for faster backpropagation learning: The Rprop algorithm," in Proc. IEEE Int. Conf. Neural Networks (ICNN), San Francisco, CA, Mar.-Apr. 1993, pp. 586-591.
    • (1993) Proc. IEEE Int. Conf. Neural Networks (ICNN) , pp. 586-591
    • Riedm Iller, M.1    Braun, H.2
  • 47
    • 0037238922 scopus 로고    scopus 로고
    • Empirical evaluation of the improved Rprop learning algorithms
    • DOI 10.1016/S0925-2312(01)00700-7, PII S0925231201007007
    • C. Igel and M. Hüsken, "Empirical evaluation of the improved Rprop learning algorithms," Neurocomputing, vol. 50, pp. 105-123, Jan. 2003. (Pubitemid 36131033)
    • (2003) Neurocomputing , vol.50 , pp. 105-123
    • Igel, C.1    Husken, M.2
  • 49
    • 70350376504 scopus 로고    scopus 로고
    • Weighted automata algorithms
    • M. Droste, W. Kuich, and H. Vogler, Eds. New York: Springer-Verlag
    • M. Mohri, "Weighted automata algorithms," in Handbook of Weighted Automata, M. Droste, W. Kuich, and H. Vogler, Eds. New York: Springer-Verlag, 2009, pp. 213-254.
    • (2009) Handbook of Weighted Automata , pp. 213-254
    • Mohri, M.1
  • 50
    • 0030719155 scopus 로고    scopus 로고
    • A word graph algorithm for large vocabulary continuous speech recognition
    • S. Ortman ns, H. Ney, and X. Aubert, "A word graph algorithm for large vocabulary continuous speech recognition," Comput. Speech Lang., vol. 11, no. 1, pp. 43-72, Jan. 1997. (Pubitemid 127375893)
    • (1997) Computer Speech and Language , vol.11 , Issue.1 , pp. 43-72
    • Ortmanns, S.1    Ney, H.2    Aubert, X.3
  • 53
    • 80053275857 scopus 로고    scopus 로고
    • First- and second-order expectation semirings with applications to minimum-risk training on translation forests
    • Singapore, Aug.
    • Z. Li and J. Eisner, "First- and second-order expectation semirings with applications to minimum-risk training on translation forests," in Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Singapore, Aug. 2009, pp. 40-51.
    • (2009) Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP) , pp. 40-51
    • Li, Z.1    Eisner, J.2
  • 54
    • 59549087165 scopus 로고    scopus 로고
    • On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes
    • Dec.
    • A. Ng and M. Jordan, "On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes," in Proc. Advances in Neural Information Processing Systems (NIPS), Dec. 2002, pp. 841-848.
    • (2002) Proc. Advances in Neural Information Processing Systems (NIPS) , pp. 841-848
    • Ng, A.1    Jordan, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.