SCOPUS 정보 검색 플랫폼

International Journal of Speech Technology

Volumn 14, Issue 1, 2011, Pages 19-33

Application of prosody models for developing speech systems in Indian languages

(1) Rao, K Sreenivasa a

a INDIAN INSTITUTE OF TECHNOLOGY (India)

Author keywords

Duration; Feedforward neural network; Intonation; Prosody; Speech systems

Indexed keywords

DURATION; FEED-FORWARD; INTONATION; PROSODY; SPEECH SYSTEMS;

FEEDFORWARD NEURAL NETWORKS; SPEECH SYNTHESIS;

SPEECH RECOGNITION;

EID: 79953168002 PISSN: 13812416 EISSN: 15728110 Source Type: Journal
DOI: 10.1007/s10772-010-9086-9 Document Type: Article

Times cited : (23)

References (48)

1
- 84994310262
- Prosodic models, automatic speech understanding, and speech synthesis: Towards the common ground
- Batliner, A., Mobius, B., Mohler, G., Schweitzer, A., & Noth, E. (2001). Prosodic models, automatic speech understanding, and speech synthesis: towards the common ground. In Eurospeech, Scandinavia.
- (2001) Eurospeech Scandinavia
- Batliner, A.¹ Mobius, B.² Mohler, G.³ Schweitzer, A.⁴ Noth, E.⁵

2
- 67650565075
- J. Benesty M. M. Sondhi Y. Huang (eds). Springer New York
- Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. New York: Springer.
- (2008) Springer Handbook on Speech Processing

3
- 0346869541
- From MBROLA to NU-MBROLA
- Bozkurt, B., Bagein, M., & Dutoit, T. (2001). From MBROLA to NU-MBROLA. In Proc. 4th ISCA workshop on speech synthesis, Pitlochry, Scotland, UK (pp. 127-129).
- (2001) Proc. 4th ISCA Workshop on Speech Synthesis Pitlochry, Scotland, UK , pp. 127-129
- Bozkurt, B.¹ Bagein, M.² Dutoit, T.³

4
- 79955636094
- Improving quality of MBROLA synthesis for non-uniform units synthesis
- Bozkurt, B., Dutoit, T., Prudon, R., D'Alessandro, C., & Pagel, V. (2002). Improving quality of MBROLA synthesis for non-uniform units synthesis. In IEEE workshop on speech synthesis, Santa Monica, California, USA, Sept. 2002.
- (2002) IEEE Workshop on Speech Synthesis Santa Monica, California, USA Sept. 2002
- Bozkurt, B.¹ Dutoit, T.² Prudon, R.³ D'Alessandro, C.⁴ Pagel, V.⁵

5
- 33745459765
- Chopde, A. (2009). Itrans Indian language transliteration package version 5.2 source. http://www.aczone.com/itrans/.
- (2009) Itrans Indian Language Transliteration Package Version 5.2 Source
- Chopde, A.¹

6
- 0003424145
- Macmillan Co. New York
- Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete-time processing of speech signals. New York: Macmillan Co.
- (1993) Discrete-time Processing of Speech Signals
- Deller, J.R.¹ Proakis, J.G.² Hansen, J.H.L.³

7
- 10944227302
- Acoustic model combination for recognition of speech in multiple languages using support vector machines
- Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2004). Acoustic model combination for recognition of speech in multiple languages using support vector machines. In Proc. IEEE int. conf. acoust., speech, signal processing.
- (2004) Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
- Gangashetty, S.V.¹ Sekhar, C.C.² Yegnanarayana, B.³

8
- 1942535983
- Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances
- 10.1109/ICISIP.2004.1287644
- Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2004). Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances. In Proc. IEEE int. conf. intelligent sensing and information processing, Chennai, India, Jan. 2004 (pp. 159-164).
- (2004) Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing Chennai, India Jan. 2004 , pp. 159-164
- Gangashetty, S.V.¹ Sekhar, C.C.² Yegnanarayana, B.³

9
- 0003413187
- Pearson Education Upper Saddle River 0934.68076
- Haykin, S. (1999). Neural networks: a comprehensive foundation. Upper Saddle River: Pearson Education.
- (1999) Neural Networks: A Comprehensive Foundation
- Haykin, S.¹

10
- 0003962869
- Macmillan Co. New York
- Hogg, R. V., & Ledolter, J. (1987). Engineering statistics. New York: Macmillan Co.
- (1987) Engineering Statistics
- Hogg, R.V.¹ Ledolter, J.²

11
- 0004056285
- Prentice-Hall New York
- Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing. New York: Prentice-Hall.
- (2001) Spoken Language Processing
- Huang, X.¹ Acero, A.² Hon, H.W.³

12
- 0028996978
- A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech
- Hwang, S.-H., & Chen, S.-H. (1995). A prosodic model for mandarin speech and its application to pitch level generation for text-to-speech. In Proc. IEEE int. conf. acoust., speech, signal processing, May 1995 (pp. 616-619).
- (1995) Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing May 1995 , pp. 616-619
- Hwang, S.-H.¹ Chen, S.-H.²

13
- 37549034753
- Syllabic properties of three Indian languages: Implications for speech recognition and language identification
- Khan, A. N., Gangashetty, S. V., & Yegnanarayana, B. (2003). Syllabic properties of three Indian languages: Implications for speech recognition and language identification. In Int. conf. natural language processing, Mysore, India, Dec. 2003 (pp. 125-134).
- (2003) Int. Conf. Natural Language Processing Mysore, India Dec. 2003 , pp. 125-134
- Khan, A.N.¹ Gangashetty, S.V.² Yegnanarayana, B.³

14
- 4544283412
- A data-driven synthesis approach for Indian languages using syllable as basic unit
- Kishore, S. P., Kumar, R., & Sangal, R. (2002). A data-driven synthesis approach for Indian languages using syllable as basic unit. In Int. conf. natural language processing.
- (2002) Int. Conf. Natural Language Processing
- Kishore, S.P.¹ Kumar, R.² Sangal, R.³

15
- 85009088217
- Text-to-speech (tts) in Indian languages
- Krishna, N. S., Murthy, H. A., & Gonsalves, T. A. (2002). Text-to-speech (tts) in Indian languages. In Int. conf. natural language processing.
- (2002) Int. Conf. Natural Language Processing
- Krishna, N.S.¹ Murthy, H.A.² Gonsalves, T.A.³

16
- 79953224159
- Master's thesis, Dept. of computer science and engineering, indian institute of technology, Madras, March
- Kumar, S. R. R. (1990). Significance of durational knowledge for a text-to-speech system in an Indian language. Master's thesis, Dept. of computer science and engineering, indian institute of technology, Madras, March.
- (1990) Significance of Durational Knowledge for A Text-to-speech System in An Indian Language
- Kumar, S.R.R.¹

17
- 0027626742
- Intonation component of a text-to-speech system for Hindi
- DOI 10.1006/csla.1993.1015
- A. S. M. Kumar S. Rajendran B. Yegnanarayana 1993 Intonation component of text-to-speech system for Hindi Computer Speech and Language 7 283 301 10.1006/csla.1993.1015 (Pubitemid 23705304)
- (1993) Computer Speech and Language , vol.7 , Issue.3 , pp. 283-301
- Madhukumar, A.S.¹ Rajendran, S.² Yegnanarayana, B.³

18
- 79953219507
- Duration knowledge for text-to-speech system for Telugu
- Kumar, K. K., Rao, K. S., & Yegnanarayana, B. (2002). Duration knowledge for text-to-speech system for Telugu. In Proc. int. conf. knowledge based computer systems, Mumbai, India, Dec. 2002 (pp. 563-571).
- (2002) Proc. Int. Conf. Knowledge Based Computer Systems Mumbai, India Dec. 2002 , pp. 563-571
- Kumar, K.K.¹ Rao, K.S.² Yegnanarayana, B.³

19
- 79953184012
- Incorporation of prosodic modules for large vocabulary continuous speech recognition
- Lee, S., Hirose, K., & Minematsu, N. (2001). Incorporation of prosodic modules for large vocabulary continuous speech recognition. In Proc. ISCA workshop on prosody in speech recognition and understanding (pp. 97-101).
- (2001) Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding , pp. 97-101
- Lee, S.¹ Hirose, K.² Minematsu, N.³

20
- 84867217083
- Comparing prosodic models for speaker recognition
- Leung, C.-C., Ferras, M., Barras, C., & Gauvain, J.-L. (2008). Comparing prosodic models for speaker recognition. In Interspeech, Brisbane, Australia, Sept 2008 (pp. 1945-1948).
- (2008) Interspeech Brisbane, Australia Sept 2008 , pp. 1945-1948
- Leung, C.-C.¹ Ferras, M.² Barras, C.³ Gauvain, J.-L.⁴

21
- 79953181342
- PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, June
- Mary, L. (2006). Multi level implicit features for language and speaker recognition. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, June.
- (2006) Multi Level Implicit Features for Language and Speaker Recognition
- Mary, L.¹

22
- 52949094265
- Extraction and representation of prosodic features for language and speaker recognition
- 10.1016/j.specom.2008.04.010
- L. Mary B. Yegnanarayana 2008 Extraction and representation of prosodic features for language and speaker recognition Speech communication 50 782 796 10.1016/j.specom.2008.04.010
- (2008) Speech Communication , vol.50 , pp. 782-796
- Mary, L.¹ Yegnanarayana, B.²

23
- 0000668614
- Robustness of group-delay-based method for extraction of significant excitation from speech signals
- 10.1109/89.799686
- P. S. Murthy B. Yegnanarayana 1999 Robustness of group-delay-based method for extraction of significant excitation from speech signals IEEE Transactions on Speech and Audio Processing 7 609 619 10.1109/89.799686
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , pp. 609-619
- Murthy, P.S.¹ Yegnanarayana, B.²

24
- 79953218113
- Prosody models for conversational speech recognition
- Ostendorfy, M., Shafranz, I., & Bates, R. (2003). Prosody models for conversational speech recognition. In Symposium on prosody and speech.
- (2003) Symposium on Prosody and Speech
- Ostendorfy, M.¹ Shafranz, I.² Bates, R.³

25
- 33745205178
- PhD thesis, Dept. of computer science and engineering, Indian institute of technology, Madras, Chennai, India, March
- Prasanna, S. R. M. (2004). Event-based analysis of speech. PhD thesis, Dept. of computer science and engineering, Indian institute of technology, Madras, Chennai, India, March.
- (2004) Event-based Analysis of Speech
- Prasanna, S.R.M.¹

26
- 4544369752
- Extraction of pitch in adverse conditions
- Prasanna, S. R. M., & Yegnanarayana, B. (2004). Extraction of pitch in adverse conditions. In Proc. IEEE int. conf. acoust., speech, signal processing, Montreal, Canada, May 2004.
- (2004) Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing Montreal, Canada May 2004
- Prasanna, S.R.M.¹ Yegnanarayana, B.²

27
- 0036288088
- Detection of vowel onset point in speech
- Prasanna, S. R. M., & Zachariah, J. M. (2002). Detection of vowel onset point in speech. In Proc. IEEE int. conf. acoust., speech, signal processing, Orlando, Florida, USA, May 2002.
- (2002) Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing Orlando, Florida, USA May 2002
- Prasanna, S.R.M.¹ Zachariah, J.M.²

28
- 65249112285
- Vowel onset point detection using source, spectral peaks, and modulation spectrum energies
- 10.1109/TASL.2008.2010884
- S. R. M. Prasanna B. V. S. Reddy P. K. Murthy 2009 Vowel onset point detection using source, spectral peaks, and modulation spectrum energies IEEE Transactions on Speech and Audio Processing 17 556 565 10.1109/TASL.2008.2010884
- (2009) IEEE Transactions on Speech and Audio Processing , vol.17 , pp. 556-565
- Prasanna, S.R.M.¹ Reddy, B.V.S.² Murthy, P.K.³

29
- 0004244302
- Englewood Cliffs Prentice Hall
- Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Prentice Hall: Englewood Cliffs.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.R.¹ Juang, B.H.²

30
- 79953210447
- Syllable duration in broadcast news in Telugu: A preliminary study
- Rajendran, S., Rao, K. S., Yegnanarayana, B., & Reddy, K. N. (2003). Syllable duration in broadcast news in Telugu: A preliminary study. In National conf. language technology tools: implementation of telugu/urdu, Hyderabad, India, Oct. 2003.
- (2003) National Conf. Language Technology Tools: Implementation of Telugu/urdu Hyderabad, India Oct. 2003
- Rajendran, S.¹ Rao, K.S.² Yegnanarayana, B.³ Reddy, K.N.⁴

31
- 37549007588
- Modeling supra-segmental features of syllables using neural networks
- P. B. Prasad S. R. M. Prasanna (eds). Springer New York. 10.1007/978-3-540-75398-8-4
- Rao, K. S. (2008). Modeling supra-segmental features of syllables using neural networks. In P. B. Prasad & S. R. M. Prasanna (Eds.), Speech, audio, image and biomedical signal processing using neural networks (pp. 71-95). New York: Springer.
- (2008) Speech, Audio, Image and Biomedical Signal Processing Using Neural Networks , pp. 71-95
- Rao, K.S.¹

32
- 4544252352
- Prosodic manipulation using instants of significant excitation
- Rao, K. S., & Yegnanarayana, B. (2003). Prosodic manipulation using instants of significant excitation. In Proc. IEEE int. conf. multimedia and expo, Baltimore, Maryland, USA, July 2003 (pp. 389-392).
- (2003) Proc. IEEE Int. Conf. Multimedia and Expo Baltimore, Maryland, USA July 2003 , pp. 389-392
- Rao, K.S.¹ Yegnanarayana, B.²

33
- 34047248058
- Prosody modification using instants of significant excitation
- DOI 10.1109/TSA.2005.858051
- K. S. Rao B. Yegnanarayana 2006 Prosody modification using instants of significant excitation IEEE Transactions on Speech and Audio Processing 14 972 980 10.1109/TSA.2005.858051 (Pubitemid 46547658)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.3 , pp. 972-980
- Sreenivasa Rao, K.¹ Yegnanarayana, B.²

34
- 33750713338
- Modeling durations of syllables using neural networks
- DOI 10.1016/j.csl.2006.06.003, PII S0885230806000234
- K. S. Rao B. Yegnanarayana 2007 Modeling durations of syllables using neural networks Computer Speech and Language 21 282 295 10.1016/j.csl.2006.06. 003 (Pubitemid 44709836)
- (2007) Computer Speech and Language , vol.21 , Issue.2 , pp. 282-295
- Rao, K.S.¹ Yegnanarayana, B.²

35
- 54049142844
- Intonation modeling for Indian languages
- 10.1016/j.csl.2008.06.005
- K. S. Rao B. Yegnanarayana 2009 Intonation modeling for Indian languages Computer Speech and Language 23 240 256 10.1016/j.csl.2008.06.005
- (2009) Computer Speech and Language , vol.23 , pp. 240-256
- Rao, K.S.¹ Yegnanarayana, B.²

36
- 0141741010
- Prosody modeling for automatic speech understanding: An overview of recent research at SRI
- Shriberg, E., & Stolcke, A. (2001). Prosody modeling for automatic speech understanding: An overview of recent research at SRI. In Prosody in speech recognition and understanding, ISCA tutorial and research workshop (ITRW), Molly Pitcher Inn, Red Bank, NJ, USA, Oct. 2001.
- (2001) Prosody in Speech Recognition and Understanding, ISCA Tutorial and Research Workshop (ITRW) Molly Pitcher Inn, Red Bank, NJ, USA Oct. 2001
- Shriberg, E.¹ Stolcke, A.²

37
- 79953165800
- Springer New York
- Shriberg, E., & Stolcke, A. (2004). Mathematical foundations of speech and language processing. New York: Springer.
- (2004) Mathematical Foundations of Speech and Language Processing
- Shriberg, E.¹ Stolcke, A.²

38
- 0029375490
- Determination of instants of significant excitation in speech using group delay function
- 10.1109/89.466662
- R. Smits B. Yegnanarayana 1995 Determination of instants of significant excitation in speech using group delay function IEEE Transactions on Speech and Audio Processing 3 325 333 10.1109/89.466662
- (1995) IEEE Transactions on Speech and Audio Processing , vol.3 , pp. 325-333
- Smits, R.¹ Yegnanarayana, B.²

39
- 37549013057
- A text-to-speech conversion system for Indian languages based on waveform concatenation model
- Dept. of computer science and engineering, Indian institute of technology, Madras, March
- Srikanth, S., Kumar, S. R. R., Sundar, R., & Yegnanarayana, B. (1989). A text-to-speech conversion system for Indian languages based on waveform concatenation model. Technical report No. 11, project VOIS, Dept. of computer science and engineering, Indian institute of technology, Madras, March.
- (1989) Technical Report No. 11, Project VOIS
- Srikanth, S.¹ Kumar, S.R.R.² Sundar, R.³ Yegnanarayana, B.⁴

40
- 0004129646
- MIT Press Cambridge
- Stevens, K. N. (1999). Acoustic phonetics. Cambridge: MIT Press.
- (1999) Acoustic Phonetics
- Stevens, K.N.¹

41
- 79953168127
- PhD thesis, Dept. of computer science and engineering, Indian institute of technology, Madras, Chennai, India, May
- Suryakanth, G. V. (2005). Neural network models for recognition of consonant-vowel units of speech in multiple languages. PhD thesis, Dept. of computer science and engineering, Indian institute of technology, Madras, Chennai, India, May.
- (2005) Neural Network Models for Recognition of Consonant-vowel Units of Speech in Multiple Languages
- Suryakanth, G.V.¹

42
- 33646681559
- PhD thesis, Dept. of phonetics, University of Helsinki, Finland
- Vainio, M. (2001). Artificial neural network based prosody models for Finnish text-to-speech synthesis. PhD thesis, Dept. of phonetics, University of Helsinki, Finland.
- (2001) Artificial Neural Network Based Prosody Models for Finnish Text-to-speech Synthesis
- Vainio, M.¹

43
- 4544268744
- Modeling the microprosody of pitch and loudness for speech synthesis with neural networks
- Vainio, M., & Altosaar, T. (1998). Modeling the microprosody of pitch and loudness for speech synthesis with neural networks. In Proc. int. conf. spoken language processing, Sidney, Australia, Dec. 1998.
- (1998) Proc. Int. Conf. Spoken Language Processing Sidney, Australia Dec. 1998
- Vainio, M.¹ Altosaar, T.²

44
- 0003991806
- Wiley New York
- Vapnik, V. N. (2001). Statistical learning theory. New York: Wiley.
- (2001) Statistical Learning Theory
- Vapnik, V.N.¹

45
- 0036299157
- Using prosodic and lexical information for speaker identification
- Weber, F., Manganaro, L., Peskin, B., & Shriberg, E. (2002). Using prosodic and lexical information for speaker identification. In Proc. IEEE int. conf. acoust., speech, signal processing.
- (2002) Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
- Weber, F.¹ Manganaro, L.² Peskin, B.³ Shriberg, E.⁴

46
- 33751115438
- Prosodic aspects of speech
- E. Keller (eds). Wiley Chichester
- Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: basic concepts, state of the art, the future challenges (pp. 23-40). Chichester: Wiley.
- (1994) Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art, the Future Challenges , pp. 23-40
- Werner, S.¹ Keller, E.²

47
- 0004312284
- Prentice Hall New York
- Yegnanarayana, B. (1999). Artificial neural networks. New York: Prentice Hall.
- (1999) Artificial Neural Networks
- Yegnanarayana, B.¹

48
- 34147129605
- Combining cepstral and prosodic features in language identification
- Yin, B., Ambikairajah, E., & Chen, F. (2006). Combining cepstral and prosodic features in language identification. In 18th international conference on pattern recognition (ICPR'06).
- (2006) 18th International Conference on Pattern Recognition (ICPR'06)
- Yin, B.¹ Ambikairajah, E.² Chen, F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.