메뉴 건너뛰기




Volumn 21, Issue 1, 2007, Pages 153-173

Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences

Author keywords

[No Author keywords available]

Indexed keywords

HIDDEN MARKOV MODEL (HMM); SYNTHETIC SPEECH; VECTOR SEQUENCES; VITERBI TYPE TRAINING ALGORITHMS;

EID: 33749573927     PISSN: 08852308     EISSN: 10958363     Source Type: Journal    
DOI: 10.1016/j.csl.2006.01.002     Document Type: Article
Times cited : (112)

References (80)
  • 1
    • 33749543418 scopus 로고    scopus 로고
    • Acero, A., 1999. Formant analysis and synthesis using hidden Markov models. In: Proceedings of European Conference on Speech Communication and Technology'99. pp. 1047-1050.
  • 2
    • 0022890536 scopus 로고    scopus 로고
    • Bahl, L., Brown, P., de Souza, P., Mercer, R., 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'86. pp. 49-52.
  • 3
    • 0033677172 scopus 로고    scopus 로고
    • Bilmes, J., 2000. Factored sparse inverse covariance matrices. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'97, vol. 2. pp. 1009-1012.
  • 4
    • 0038021376 scopus 로고    scopus 로고
    • Buried Markov models: a graphical modeling approach for automatic speech recognition
    • Bilmes J. Buried Markov models: a graphical modeling approach for automatic speech recognition. Computer, Speech and Language 17 2-3 (2003) 213-231
    • (2003) Computer, Speech and Language , vol.17 , Issue.2-3 , pp. 213-231
    • Bilmes, J.1
  • 5
    • 33749552622 scopus 로고    scopus 로고
    • Black, A., Taylor, P., 1997. The festival speech synthesis system: system documentation. Tech. Rep. HCRC/TR-83, University of Edinburgh.
  • 6
    • 85009062911 scopus 로고    scopus 로고
    • Bridle, J., 2004. Towards better understanding of the model implied by the use of dynamic features in HMMs. In: Proceedings of International Conference on Spoken Language Processing 2004, vol. 1. pp. 725-728.
  • 7
    • 33749570542 scopus 로고    scopus 로고
    • Brown, P., 1987. The acoustic modeling problem in automatic speech recognition. Ph.D. thesis, Carnegie Mellon University.
  • 9
    • 0032119268 scopus 로고    scopus 로고
    • A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition
    • Deng L. A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Communication 24 4 (1998) 299-323
    • (1998) Speech Communication , vol.24 , Issue.4 , pp. 299-323
    • Deng, L.1
  • 10
    • 0031185482 scopus 로고    scopus 로고
    • Speaker-independent phonetic classification using hidden Markov models with mixture of trend functions
    • Deng L., and Aksmanovic M. Speaker-independent phonetic classification using hidden Markov models with mixture of trend functions. IEEE Transactions Speech & Audio Processing 5 4 (1997) 319-324
    • (1997) IEEE Transactions Speech & Audio Processing , vol.5 , Issue.4 , pp. 319-324
    • Deng, L.1    Aksmanovic, M.2
  • 11
    • 0033623527 scopus 로고    scopus 로고
    • Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics
    • Deng L., and Ma J. Spontaneous speech recognition using a statistical coarticulatory model for the hidden vocal-tract-resonance dynamics. Journal of Acoustic Society of America 108 6 (2000) 3036-3048
    • (2000) Journal of Acoustic Society of America , vol.108 , Issue.6 , pp. 3036-3048
    • Deng, L.1    Ma, J.2
  • 12
    • 0028516022 scopus 로고
    • Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states
    • Deng L., Aksmanovic M., Sun X., and Wu J. Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Transactions on Speech & Audio Processing 2 4 (1994) 507-520
    • (1994) IEEE Transactions on Speech & Audio Processing , vol.2 , Issue.4 , pp. 507-520
    • Deng, L.1    Aksmanovic, M.2    Sun, X.3    Wu, J.4
  • 13
    • 85009211881 scopus 로고    scopus 로고
    • Deng, L., Bazzi, L., Acero, A., 2003. Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint. In: Proceedings of European Conference on Speech Communication and Technology, 2003. pp. 73-76.
  • 14
    • 0026397875 scopus 로고    scopus 로고
    • Digalakis, V., Rohlicek, J., Ostendorf, M., 1991. A dynamical system approach to continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'91, pp. 282-292.
  • 15
    • 33749552390 scopus 로고    scopus 로고
    • Donovan, R., Eide, E., 1998. The IBM trainable speech synthesis system. In: Proceedings of International Conference on Spoken Language Processing'98, vol. 5. pp. 1703-1706.
  • 16
    • 0028996983 scopus 로고    scopus 로고
    • Donovan, R., Woodland, P., 1995. Automatic speech synthesizer parameter estimation using HMMs. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'95. pp. 640-643.
  • 17
    • 85016140477 scopus 로고    scopus 로고
    • Fukada, T., Tokuda, K., T., K., Imai, S., 1992. An adaptive algorithm for melcepstral analysis of speech. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'92, vol. 1. pp. 137-140.
  • 18
    • 0022667694 scopus 로고
    • Speaker independent isolated word recognition using dynamic features of speech spectrum
    • Furui S. Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions Acoustics, Speech, & Signal Processing 34 (1986) 52-59
    • (1986) IEEE Transactions Acoustics, Speech, & Signal Processing , vol.34 , pp. 52-59
    • Furui, S.1
  • 19
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • Gales M. Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language 12 2 (1998) 75-98
    • (1998) Computer Speech & Language , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.1
  • 20
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • Gales M. Semi-tied covariance matrices for hidden Markov models. IEEE Transactions on Speech & Audio Processing 7 3 (1999) 272-281
    • (1999) IEEE Transactions on Speech & Audio Processing , vol.7 , Issue.3 , pp. 272-281
    • Gales, M.1
  • 21
    • 33749543199 scopus 로고    scopus 로고
    • Gales, M., Airey, S., 2003. Product of Gaussians for speech recognition. Tech. Rep. CUED/F-INFENG/TR.458, Cambridge University.
  • 22
    • 33749555773 scopus 로고    scopus 로고
    • Gales, M., Young, S., 1993. The theory of segmental hidden Markov models. Tech. Rep. CUED/F-INFENG/TR.133, Cambridge University.
  • 23
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • Gauvain J., and Lee C.-H. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transaction on Speech & Audio Processing 2 2 (1994) 291-298
    • (1994) IEEE Transaction on Speech & Audio Processing , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.1    Lee, C.-H.2
  • 24
    • 0030371122 scopus 로고    scopus 로고
    • Gish, H., Ng, K., 1996. Parametric trajectory models for speech recognition. In: Proceedings of International Conference on Spoken Language Processing'96, vol. 1. pp. 466-469.
  • 25
    • 33749579647 scopus 로고    scopus 로고
    • Hinton, G., 1999. Product of experts. In: Proceedings of ICANN, vol. 1. pp. 1-6.
  • 26
    • 0032673963 scopus 로고    scopus 로고
    • Probabilistic-trajectory segmental HMMs
    • Holmes W., and Russel M. Probabilistic-trajectory segmental HMMs. Computer, Speech and Language 13 1 (1999) 3-37
    • (1999) Computer, Speech and Language , vol.13 , Issue.1 , pp. 3-37
    • Holmes, W.1    Russel, M.2
  • 28
    • 0022097649 scopus 로고
    • Maximum likelihood estimation for mixture of multivariate stochastic observations of Markov chains
    • Juang B.-H. Maximum likelihood estimation for mixture of multivariate stochastic observations of Markov chains. AT&T Technical Journal 64 6 (1985) 1235-1249
    • (1985) AT&T Technical Journal , vol.64 , Issue.6 , pp. 1235-1249
    • Juang, B.-H.1
  • 31
    • 0032649321 scopus 로고    scopus 로고
    • Kobayashi, T., Masumitsu, K., Furuyama, J., 1999. Partly hidden Markov model and its application to speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'99, vol. 1. pp. 121-124.
  • 32
    • 33749564954 scopus 로고    scopus 로고
    • Koishida, K., Tokuda, K., Masuko, T., Kobayashi, T., 1997. Vector quantization of speech spectral parameters using statistics of dynamic features. In: Proceedings of International Conference on Signal Processing'97. pp. 247-252.
  • 33
    • 33749571719 scopus 로고    scopus 로고
    • Kominek, J., Black, A., 2003. CMU ARCTIC databases for speech synthesis. Tech. Rep. CMU-LTI-03-177, Carnegie Mellon University.
  • 36
    • 0026370307 scopus 로고    scopus 로고
    • Lee, C.-H., Giachin, E., 1991. Improved acoustic modeling for speaker independent large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 161-164.
  • 37
    • 4544260432 scopus 로고    scopus 로고
    • Lee, L., Attias, H., Deng, L., Fieguth, P., 2004. A multimodal variational approach to learning and inference in switching state space models. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing 2004. pp. 505-508.
  • 38
    • 0346892469 scopus 로고    scopus 로고
    • Automatic speech segmentation for concatenative inventory selection
    • van Santen J., Sproat R., Olive J., and Hirshberg J. (Eds), Springer-Verlag
    • Ljolje A., Hirschberg J., and van Santen J. Automatic speech segmentation for concatenative inventory selection. In: van Santen J., Sproat R., Olive J., and Hirshberg J. (Eds). Progress in Speech Synthesis (1997), Springer-Verlag 305-311
    • (1997) Progress in Speech Synthesis , pp. 305-311
    • Ljolje, A.1    Hirschberg, J.2    van Santen, J.3
  • 39
    • 33749556047 scopus 로고    scopus 로고
    • Ma, J., 2000. Spontaneous speech recognition using statistical dynamic models for the vocal tract resonance dynamics. Ph.D. thesis, University of Waterloo.
  • 40
    • 0742307392 scopus 로고    scopus 로고
    • Target-directed mixture linear dynamic models for spontaneous speech recognition
    • Ma J., and Deng L. Target-directed mixture linear dynamic models for spontaneous speech recognition. IEEE Transactions on Speech & Audio Processing 12 1 (2004) 47-58
    • (2004) IEEE Transactions on Speech & Audio Processing , vol.12 , Issue.1 , pp. 47-58
    • Ma, J.1    Deng, L.2
  • 41
    • 0036293703 scopus 로고    scopus 로고
    • Minami, Y., McDermott, E., Nakamura, A., Katagiri, S., 2002. A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing 2002, vol. 1. pp. 957-960.
  • 42
    • 0141480073 scopus 로고    scopus 로고
    • Minami, Y., McDermott, E., Nakamura, A., Katagiri, S., 2003. Recognition method with parametric trajectory generated from mixture distribution HMMs. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, 2003, vol. 1. pp. 124-127.
  • 43
    • 33645796901 scopus 로고    scopus 로고
    • Minami, Y., McDermott, E., Nakamura, A., Katagiri, S., 2004. A theoretical analysis of speech recognition based on feature trajectory models. In: Proceedings of International Conference on Spoken Language Processing 2004, vol. 1. pp. 549-552.
  • 44
    • 0036522866 scopus 로고    scopus 로고
    • A survey on automatic speech recognition
    • Nakagawa S. A survey on automatic speech recognition. IEICE Transactions Information & System E85-D 3 (2002) 465-486
    • (2002) IEICE Transactions Information & System , vol.E85-D , Issue.3 , pp. 465-486
    • Nakagawa, S.1
  • 46
    • 33749565410 scopus 로고    scopus 로고
    • Odell, J., 1995. The use of context in large vocabulary speech recognition. Ph.D. thesis, Cambridge University.
  • 50
    • 18544404092 scopus 로고    scopus 로고
    • Paliwal, K., 1993. Use of temporal correlation between successive frames in hidden Markov model based speech recognizer. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'93. pp. 215-218.
  • 51
    • 0032639922 scopus 로고    scopus 로고
    • Picone, J., Pike, S., Regan, R., Kamm, T., Bridle, J., Deng, L., Ma, Z., Richards, H., Schuster, M., 1999. Initial evaluation of hidden dynamic models on conversational speech. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 109-112.
  • 52
    • 0032627031 scopus 로고    scopus 로고
    • Qing, G., Fang, Z., Jian, W., Wenhu, W., 1999. A new method used in HMM for modeling frame correlation. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'99, vol. 1. pp. 169-172.
  • 53
    • 0024610919 scopus 로고    scopus 로고
    • Rabiner, L., 1989. A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of IEEE, vol. 77. pp. 257-285.
  • 54
    • 0032675736 scopus 로고    scopus 로고
    • Richards, H., Bridle, J., 1999. The HDM: a segmental hidden dynamic model of coarticulation. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'99, vol. 1. pp. 357-360.
  • 56
    • 33749577566 scopus 로고    scopus 로고
    • Rosti, A., Gales, M., 2003. Switching linear dynamical systems for speech recognition. Tech. Rep. CUED/F-INFENG/TR.461, Cambridge University.
  • 57
    • 0027228741 scopus 로고    scopus 로고
    • Russel, M., 1993. A segmental HMM for speech pattern modeling. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'93. pp. 499-502.
  • 58
    • 33749580231 scopus 로고    scopus 로고
    • Sagayama, S., Itakura, R, 1979. On individuality in a dynamic measure of speech. In: Proceedings of Spring Conference of Acoustic Society of Japan, pp. 589-590, (in Japanese).
  • 59
    • 85009257840 scopus 로고    scopus 로고
    • Shichiri, K., Sawabe, A., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 2002. Eigenvoices for HMM-based speech synthesis. In: Proceedings of International Conference on Spoken Language Processing 2002. pp. 1269-1272.
  • 60
    • 33749575074 scopus 로고    scopus 로고
    • Shinoda, K., Watanabe, T., 1997. Acoustic modeling based on the MDL criterion for speech recognition. In: Proceedings of European Conference on Speech Communication and Technology'97. pp. 99-102.
  • 61
    • 33749563058 scopus 로고    scopus 로고
    • Sim, K.-C., Gales, M., 2004. Precision matrix modeling for large vocabulary continuous speech recognition. Tech. Rep. CUED/F-INFENG/TR.485, Cambridge University.
  • 62
    • 33745200057 scopus 로고    scopus 로고
    • Sim, K.-C., Gales, M., 2005. Temporally varying model parameters for large vocabulary continuous speech recognition. In: Proceedings of Interspeech'05. pp. 2137-2140.
  • 63
    • 0027309782 scopus 로고    scopus 로고
    • Takahashi, S., 1993. Phoneme HMM's constrained by frame correlations. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'93. pp. 219-222.
  • 64
    • 0001455934 scopus 로고
    • A robust algorithm for pitch tracking (RAPT)
    • Kleijn W., and Paliwal K. (Eds), Elsevier
    • Talkin D. A robust algorithm for pitch tracking (RAPT). In: Kleijn W., and Paliwal K. (Eds). Speech Coding and Synthesis (1995), Elsevier 497-518
    • (1995) Speech Coding and Synthesis , pp. 497-518
    • Talkin, D.1
  • 65
    • 85143189909 scopus 로고    scopus 로고
    • Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T., 2001. Adaptation of pitch and spectrum for HMM-based speech synthesis using mllr. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing 2001, vol. 2. pp. 805-808.
  • 66
    • 0028996993 scopus 로고    scopus 로고
    • Tokuda, K., Kobayashi, T., Imai, S., 1995a. Speech parameter generation from HMM using dynamic features. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'95. pp. 660-663.
  • 67
    • 33749551677 scopus 로고    scopus 로고
    • Tokuda, K., Masuko, T., Yamada, Y., Kobayashi, T., Imai, S., 1995b. An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features. In: Proceedings of European Conference on Speech Communication and Technology'95. pp. 757-760.
  • 68
    • 0032678076 scopus 로고    scopus 로고
    • Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T., 1999. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'99. pp. 229-232.
  • 69
    • 0033708106 scopus 로고    scopus 로고
    • Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T., 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing 2000, vol. 3. pp. 1315-1318.
  • 70
    • 33749575075 scopus 로고    scopus 로고
    • Vanhoucke, V., 2003. Mixtures of inverse covariances: covariance modeling for Gaussian mixtures with applications to automatic speech recognition. Ph.D. thesis, Stanford University.
  • 71
    • 84935113569 scopus 로고
    • Error bounds for convolutional codes and an asymptotically optimal decoding algorithm
    • Viterbi A. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory 13 (1967) 260-269
    • (1967) IEEE Transactions on Information Theory , vol.13 , pp. 260-269
    • Viterbi, A.1
  • 72
    • 33749541446 scopus 로고    scopus 로고
    • Wellekens, C., 1987. Explicit correlation in hidden Markov model for speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing'87. pp. 383-386.
  • 73
    • 14544300108 scopus 로고    scopus 로고
    • How to pretend that correlated variables are independent by using difference observations
    • Williams C. How to pretend that correlated variables are independent by using difference observations. Neural Computation 17 1 (2005) 1-7
    • (2005) Neural Computation , vol.17 , Issue.1 , pp. 1-7
    • Williams, C.1
  • 75
    • 0026382580 scopus 로고    scopus 로고
    • Wilpon, J., Lee, C.-H., Rabiner, L., 1991. Improvements in connected digit recognition using higher order spectral and energy features. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 349-352.
  • 76
    • 33749547146 scopus 로고    scopus 로고
    • Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 1997. Speaker interpolation in HMM-based speech synthesis system. In: Proceedings of European Conference on Speech Communication and Technology'97, vol. 5. pp. 2523-2526.
  • 77
    • 33749541221 scopus 로고    scopus 로고
    • Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 1999. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of European Conference on Speech Communication and Technology'99, vol. 5. pp. 2347-2350.
  • 79
    • 0141702226 scopus 로고    scopus 로고
    • Zhou, J.-L., Seide, F., Deng, L., 2003. Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM-model and training. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing 2003, vol. 1. pp. 744-747.
  • 80
    • 33749544707 scopus 로고    scopus 로고
    • Zweig, G., 1998. Speech recognition using dynamic Bayesian networks. Ph.D. thesis, University of California, Berkeley.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.