SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Improving neural language models with a continuous cache

(3) Grave, Edouard a Joulin, Armand a Usunier, Nicolas a

a FACEBOOK AI RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

CHEMICAL ACTIVATION; COMPUTATIONAL LINGUISTICS;

EXTERNAL MEMORY; LANGUAGE MODEL; MEMORY SIZE; NETWORK LANGUAGE;

CACHE MEMORY;

EID: 85088228120 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (191)

References (50)

1
- 84922389693
- arXiv preprint
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- (2014) Neural Machine Translation by Jointly Learning to Align and Translate
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 0020719320
- A maximum likelihood approach to continuous speech recognition
- Lalit R Bahl, Frederick Jelinek, and Robert L Mercer. A maximum likelihood approach to continuous speech recognition. PAMI, 1983.
- (1983) PAMI
- Bahl, L.R.¹ Jelinek, F.² Mercer, R.L.³

3
- 0000274403
- Exploiting latent semantic information in statistical language modeling
- Jerome R Bellegarda. Exploiting latent semantic information in statistical language modeling. Proceedings of the IEEE, 2000.
- (2000) Proceedings of the IEEE
- Bellegarda, J.R.¹

4
- 0142166851
- A neural probabilistic language model
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. JMLR, 2003.
- (2003) JMLR
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Jauvin, C.⁴

5
- 85044611587
- The mathematics of statistical machine translation: Parameter estimation
- Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, and Robert L Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 1993.
- (1993) Computational Linguistics
- Brown, P.F.¹ Della Pietra, V.J.² Della Pietra, S.A.³ Mercer, R.L.⁴

6
- 85021672062
- arXiv preprint
- Danqi Chen, Jason Bolton, and Christopher D Manning. A thorough examination of the cnn/daily mail reading comprehension task. arXiv preprint arXiv:1606.02858, 2016.
- (2016) A Thorough Examination of the Cnn/Daily Mail Reading Comprehension Task
- Chen, D.¹ Bolton, J.² Manning, C.D.³

7
- 84939821078
- arXiv preprint
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Chung, J.¹ Gulcehre, C.² Cho, K.³ Bengio, Y.⁴

8
- 85128355013
- Towards better integration of semantic predictors in statistical language modeling
- Citeseer
- Noah Coccaro and Daniel Jurafsky. Towards better integration of semantic predictors in statistical language modeling. In ICSLP. Citeseer, 1998.
- (1998) ICSLP.
- Coccaro, N.¹ Jurafsky, D.²

9
- 30244553472
- Adaptive language modeling using minimum discriminant estimation
- Stephen Della Pietra, Vincent Della Pietra, Robert L Mercer, and Salim Roukos. Adaptive language modeling using minimum discriminant estimation. In Proceedings of the workshop on Speech and Natural Language, 1992.
- (1992) Proceedings of the Workshop on Speech and Natural Language
- Pietra, S.D.¹ Pietra, V.D.² Mercer, R.L.³ Roukos, S.⁴

10
- 85015943644
- arXiv preprint
- Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, and Jason Weston. Evaluating prerequisite qualities for learning end-to-end dialog systems. arXiv preprint arXiv:1511.06931, 2015.
- (2015) Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems
- Dodge, J.¹ Gane, A.² Zhang, X.³ Bordes, A.⁴ Chopra, S.⁵ Miller, A.⁶ Szlam, A.⁷ Weston, J.⁸

11
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 2011.
- (2011) JMLR
- Duchi, J.¹ Hazan, E.² Singer, Y.³

12
- 26444565569
- Finding structure in time
- Jeffrey L Elman. Finding structure in time. Cognitive science, 1990.
- (1990) Cognitive Science
- Elman, J.L.¹

13
- 84994531050
- arXiv preprint
- Yarin Gal and Zoubin Ghahramani. A theoretically grounded application of dropout in recurrent neural networks. arXiv preprint arXiv:1512.05287, 2015.
- (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
- Gal, Y.¹ Ghahramani, Z.²

14
- 0035497388
- A bit of progress in language modeling
- Joshua T Goodman. A bit of progress in language modeling. Computer Speech & Language, 2001.
- (2001) Computer Speech & Language
- Goodman, J.T.¹

15
- 85030484971
- arXiv preprint
- Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou. Efficient softmax approximation for gpus. arXiv preprint arXiv:1609.04309, 2016.
- (2016) Efficient Softmax Approximation for GPUs
- Grave, E.¹ Joulin, A.² Cissé, M.³ Grangier, D.⁴ Jégou, H.⁵

16
- 84890543083
- Speech recognition with deep recurrent neural networks
- A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP, 2013.
- (2013) ICASSP
- Graves, A.¹ Mohamed, A.² Hinton, G.³

17
- 84930616355
- arXiv preprint
- Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
- (2014) Neural Turing Machines
- Graves, A.¹ Wayne, G.² Danihelka, I.³

18
- 84965153738
- Learning to transduce with unbounded memory
- Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, and Phil Blunsom. Learning to transduce with unbounded memory. In Advances in Neural Information Processing Systems, pp. 1828-1836, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 1828-1836
- Grefenstette, E.¹ Hermann, K.M.² Suleyman, M.³ Blunsom, P.⁴

19
- 85011836383
- arXiv preprint
- Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. Pointing the unknown words. arXiv preprint arXiv:1603.08148, 2016.
- (2016) Pointing the Unknown Words
- Gulcehre, C.¹ Ahn, S.² Nallapati, R.³ Zhou, B.⁴ Bengio, Y.⁵

20
- 84965139942
- Teaching machines to read and comprehend
- Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. In NIPS, 2015.
- (2015) NIPS
- Hermann, K.M.¹ Kocisky, T.² Grefenstette, E.³ Espeholt, L.⁴ Kay, W.⁵ Suleyman, M.⁶ Blunsom, P.⁷

21
- 0031573117
- Long short-term memory
- Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

22
- 0032785782
- Modeling long distance dependence in language: Topic mixtures versus dynamic cache models
- Rukmini M Iyer and Mari Ostendorf. Modeling long distance dependence in language: Topic mixtures versus dynamic cache models. IEEE Transactions on speech and audio processing, 1999.
- (1999) IEEE Transactions on Speech and Audio Processing
- Iyer, R.M.¹ Ostendorf, M.²

23
- 0012357341
- A dynamic language model for speech recognition
- Frederick Jelinek, Bernard Merialdo, Salim Roukos, and Martin Strauss. A dynamic language model for speech recognition. In HLT, 1991.
- (1991) HLT
- Jelinek, F.¹ Merialdo, B.² Roukos, S.³ Strauss, M.⁴

24
- 84965117324
- Inferring algorithmic patterns with stack-augmented recurrent nets
- Armand Joulin and Tomas Mikolov. Inferring algorithmic patterns with stack-augmented recurrent nets. In Advances in Neural Information Processing Systems, pp. 190-198, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 190-198
- Joulin, A.¹ Mikolov, T.²

25
- 84978840213
- arXiv preprint
- Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
- (2016) Exploring the Limits of Language Modeling
- Jozefowicz, R.¹ Vinyals, O.² Schuster, M.³ Shazeer, N.⁴ Wu, Y.⁵

26
- 85009386107
- arXiv preprint
- Rudolf Kadlec, Martin Schmid, Ondrej Bajgar, and Jan Kleindienst. Text understanding with the attention sum reader network. arXiv preprint arXiv:1603.01547, 2016.
- (2016) Text Understanding with the Attention Sum Reader Network
- Kadlec, R.¹ Schmid, M.² Bajgar, O.³ Kleindienst, J.⁴

27
- 0023312404
- Estimation of probabilities from sparse data for the language model component of a speech recognizer
- Slava M Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. ICASSP, 1987.
- (1987) ICASSP
- Katz, S.M.¹

28
- 0034297742
- Maximum entropy techniques for exploiting syntactic, semantic and colloca-tional dependencies in language modeling
- Sanjeev Khudanpur and Jun Wu. Maximum entropy techniques for exploiting syntactic, semantic and colloca-tional dependencies in language modeling. Computer Speech & Language, 2000.
- (2000) Computer Speech & Language
- Khudanpur, S.¹ Wu, J.²

29
- 0028996876
- Improved backing-off for m-gram language modeling
- Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In ICASSP, 1995.
- (1995) ICASSP
- Kneser, R.¹ Ney, H.²

30
- 0027192617
- On the dynamic adaptation of stochastic language models
- Reinhard Kneser and Volker Steinbiss. On the dynamic adaptation of stochastic language models. In ICASSP, 1993.
- (1993) ICASSP
- Kneser, R.¹ Steinbiss, V.²

31
- 0012259838
- Speech recognition and the frequency of recently used words: A modified markov model for natural language
- Roland Kuhn. Speech recognition and the frequency of recently used words: A modified markov model for natural language. In Proceedings of the 12th conference on Computational linguistics-Volume 1, 1988.
- (1988) Proceedings of the 12th Conference on Computational Linguistics- , vol.1
- Kuhn, R.¹

32
- 0025446887
- A cache-based natural language model for speech recognition
- Roland Kuhn and Renato De Mori. A cache-based natural language model for speech recognition. PAMI, 1990.
- (1990) PAMI
- Kuhn, R.¹ De Mori, R.²

33
- 85095737723
- Probabilistic models of short and long distance word dependencies in running text
- Julien Kupiec. Probabilistic models of short and long distance word dependencies in running text. In Proceedings of the workshop on Speech and Natural Language, 1989.
- (1989) Proceedings of the Workshop on Speech and Natural Language
- Kupiec, J.¹

34
- 0027252194
- Trigger-based language models: A maximum entropy approach
- Raymond Lau, Ronald Rosenfeld, and Salim Roukos. Trigger-based language models: A maximum entropy approach. In ICASSP, 1993.
- (1993) ICASSP
- Lau, R.¹ Rosenfeld, R.² Roukos, S.³

35
- 34249852033
- Building a large annotated corpus of english: The penn treebank
- Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 1993.
- (1993) Computational Linguistics
- Marcus, M.P.¹ Marcinkiewicz, M.A.² Santorini, B.³

36
- 85037338922
- arXiv preprint
- Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016.
- (2016) Pointer Sentinel Mixture Models
- Merity, S.¹ Xiong, C.² Bradbury, J.³ Socher, R.⁴

37
- 84874235486
- Context dependent recurrent neural network language model
- Tomas Mikolov and Geoffrey Zweig. Context dependent recurrent neural network language model. In SLT, 2012.
- (2012) SLT
- Mikolov, T.¹ Zweig, G.²

38
- 79959829092
- Recurrent neural network based language model
- ` and
- Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernocky, ` and Sanjeev Khudanpur. Recurrent neural network based language model. In INTERSPEECH, 2010.
- (2010) INTERSPEECH
- Mikolov, T.¹ Karafiát, M.² Burget, L.³ Cernocky, J.⁴ Khudanpur, S.⁵

39
- 84865803833
- Empirical evaluation and combination of advanced language modeling techniques
- Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, and Jan Cernocky. ` Empirical evaluation and combination of advanced language modeling techniques. In INTERSPEECH, 2011.
- (2011) INTERSPEECH
- Mikolov, T.¹ Deoras, A.² Kombrink, S.³ Burget, L.⁴ Cernocky, J.⁵

40
- 84939804661
- arXiv preprint
- Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, and Marc'Aurelio Ranzato. Learning longer memory in recurrent neural networks. arXiv preprint arXiv:1412.7753, 2014.
- (2014) Learning Longer Memory in Recurrent Neural Networks
- Mikolov, T.¹ Joulin, A.² Chopra, S.³ Mathieu, M.⁴ Marc'Aurelio, R.⁵

41
- 85037349059
- arXiv preprint
- Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fernández. The lambada dataset: Word prediction requiring a broad discourse context. arXiv preprint arXiv:1606.06031, 2016.
- (2016) The Lambada Dataset: Word Prediction Requiring a Broad Discourse Context
- Denis Paperno, G.K.¹ Lazaridou, A.² Pham, Q.N.³ Bernardi, R.⁴ Pezzelle, S.⁵ Baroni, M.⁶ Boleda, G.⁷ Fernández, R.⁸

42
- 0030181951
- A maximum entropy approach to adaptive statistical language modeling
- Ronald Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 1996.
- (1996) Computer, Speech and Language
- Rosenfeld, R.¹

43
- 0000023031
- Dialogue act modeling for automatic tagging and recognition of conversational speech
- Andreas Stolcke, Noah Coccaro, Rebecca Bates, Paul Taylor, Carol Van Ess-Dykema, Klaus Ries, Elizabeth Shriberg, Daniel Jurafsky, Rachel Martin, and Marie Meteer. Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational linguistics, 2000.
- (2000) Computational Linguistics
- Stolcke, A.¹ Coccaro, N.² Bates, R.³ Taylor, P.⁴ Van Ess-Dykema, C.⁵ Ries, K.⁶ Shriberg, E.⁷ Jurafsky, D.⁸ Martin, R.⁹ Meteer, M.¹⁰

44
- 84965143740
- End-to-end memory networks
- Sainbayar Sukhbaatar, Szlam Arthur, Jason Weston, and Rob Fergus. End-to-end memory networks. In NIPS, 2015.
- (2015) NIPS
- Sukhbaatar, S.¹ Arthur, S.² Weston, J.³ Fergus, R.⁴

45
- 84965173945
- Pointer networks
- Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In NIPS, 2015.
- (2015) NIPS
- Vinyals, O.¹ Fortunato, M.² Jaitly, N.³

46
- 84994165837
- arXiv preprint
- Tian Wang and Kyunghyun Cho. Larger-context language modelling. arXiv preprint arXiv:1511.03729, 2015.
- (2015) Larger-Context Language Modelling
- Wang, T.¹ Cho, K.²

47
- 84893767304
- Paul J Werbos. Backpropagation through time: what it does and how to do it. 1990.
- (1990) Backpropagation through Time: What It Does and How to Do It
- Werbos, P.J.¹

48
- 0001609567
- An efficient gradient-based algorithm for on-line training of recurrent network trajectories
- Ronald J Williams and Jing Peng. An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural computation, 1990.
- (1990) Neural Computation
- Williams, R.J.¹ Peng, J.²

49
- 84944053926
- arXiv preprint
- Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.
- (2014) Recurrent Neural Network Regularization
- Zaremba, W.¹ Sutskever, I.² Vinyals, O.³

50
- 85013996214
- arXiv preprint
- Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutník, and Jürgen Schmidhuber. Recurrent highway networks. arXiv preprint arXiv:1607.03474, 2016.
- (2016) Recurrent Highway Networks
- Zilly, J.G.¹ Srivastava, R.K.² Koutník, J.³ Schmidhuber, J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.