SCOPUS 정보 검색 플랫폼

Volumn 131, Issue 3, 2012, Pages 139-148

An information-theoretic approach to curiosity-driven reinforcement learning

a University of Hawaii at Manoa (United States)

Author keywords

Adaptive behavior; Curiosity; Exploration exploitation trade off; Information theory; Rate distortion theory; Reinforcement learning

Indexed keywords

ALGORITHM; ANIMAL; ARTICLE; EXPLORATORY BEHAVIOR; HUMAN; INFORMATION SCIENCE; LEARNING;

ALGORITHMS; ANIMALS; EXPLORATORY BEHAVIOR; HUMANS; INFORMATION THEORY; LEARNING;

EID: 84865114997 PISSN: 14317613 EISSN: 16117530 Source Type: Journal
DOI: 10.1007/s12064-011-0142-z Document Type: Article

Times cited : (206)

References (36)

1
- 49249134328
- Predictive information and explorative behavior of autonomous robots
- 10.1140/epjb/e2008-00175-0 1:CAS:528:DC%2BD1cXpt1Whsbk%3D
- N Ay N Bertschinger R Der F Guttler E Olbrich 2008 Predictive information and explorative behavior of autonomous robots European Physical Journal B 63 329 339 10.1140/epjb/e2008-00175-0 1:CAS:528:DC%2BD1cXpt1Whsbk%3D
- (2008) European Physical Journal B , vol.63 , pp. 329-339
- Ay, N.¹ Bertschinger, N.² Der, R.³ Guttler, F.⁴ Olbrich, E.⁵

2
- 84865137636
- Dynamic policy programming
- MG Azar HJ Kappen 2010 Dynamic policy programming Journal for Machine Learning Research. arXiv:1004 2027 1 26
- (2010) Journal for Machine Learning Research. arXiv:1004 , vol.2027 , pp. 1-26
- Azar, M.G.¹ Kappen, H.J.²

3
- 84858765598
- Covariant policy search
- Acapulco, Mexico
- Bagnell JA, Schneider J (2003) Covariant policy search. In: International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico
- (2003) International Joint Conference on Artificial Intelligence (IJCAI)
- Bagnell, J.A.¹ Schneider, J.²

4
- 0035514587
- Predictability, complexity, and learning
- DOI 10.1162/089976601753195969
- W Bialek I Nemenman N Tishby 2001 Predictability, complexity and learning Neural Comput 13 2409 2463 11674845 10.1162/089976601753195969 1:STN:280:DC%2BD3Mrmslyiuw%3D%3D (Pubitemid 33594578)
- (2001) Neural Computation , vol.13 , Issue.11 , pp. 2409-2463
- Bialek, W.¹ Nemenman, I.² Tishby, N.³

5
- 0041965975
- R-max-a general polynomial time algorithm for near-optimal reinforcement learning
- Brafman RI, Tennenholtz M (2002) R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 3:213-231
- (2002) J Mach Learn Res , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

6
- 22044458307
- Information bottleneck for gaussian variables
- G Chechnik A Globerson N Tishby Y Weiss 2005 Information bottleneck for gaussian variables Journal of Machine Learning Research 6 165 188
- (2005) Journal of Machine Learning Research , vol.6 , pp. 165-188
- Chechnik, G.¹ Globerson, A.² Tishby, N.³ Weiss, Y.⁴

7
- 84872673342
- Optimal manifold representation of data: An information theoretic perspective
- Thrun S, Saul L, Schölkopf B (eds) MIT Press, Cambridge, MA
- Chigirev DV, Bialek W (2004) Optimal manifold representation of data: an information theoretic perspective. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge, MA
- (2004) Advances in Neural Information Processing Systems 16
- Chigirev, D.V.¹ Bialek, W.²

8
- 0042264936
- Synchronizing to the environment: Information theoretic limits on agent learning
- JP Crutchfield DP Feldman 2001 Synchronizing to the environment: Information theoretic limits on agent learning Adv in Complex Systems 4 2 251 264
- (2001) Adv in Complex Systems , vol.4 , Issue.2 , pp. 251-264
- Crutchfield, J.P.¹ Feldman, D.P.²

9
- 0037357392
- Regularities unseen, randomness observed: Levels of entropy convergence
- DOI 10.1063/1.1530990
- JP Crutchfield DP Feldman 2003 Regularities unseen, randomness observed: Levels of entropy convergence Chaos 13 1 25 54 12675408 10.1063/1.1530990 (Pubitemid 36419900)
- (2003) Chaos , vol.13 , Issue.1 , pp. 25-54
- Crutchfield, J.P.¹ Feldman, D.P.²

10
- 11944266539
- Information Theory and Statistical Mechanics
- 10.1103/PhysRev.106.620
- ET Jaynes 1957 Information Theory and Statistical Mechanics Phys Rev 106 4 620 630 10.1103/PhysRev.106.620
- (1957) Phys Rev , vol.106 , Issue.4 , pp. 620-630
- Jaynes, E.T.¹

11
- 0012257655
- Near-optimal reinforcement learning in polynomial time
- Kearns M, Singh S (Eds) (1998) Near-optimal reinforcement learning in polynomial time. In: Proceedings of the 15th International Conference on Machine Learning, pp 260-268
- (1998) Proceedings of the 15th International Conference on Machine Learning , pp. 260-268
- Kearns, M.¹ Singh, S.²

12
- 84865123260
- arXiv:1112.1125v2
- Little DY, Sommer FT (2011) Learning in embodied action-perception loops through exploration. arXiv:1112.1125v2
- (2011) Learning in Embodied Action-perception Loops Through Exploration
- Little, D.Y.¹ Sommer, F.T.²

13
- 34047267520
- Intrinsic motivation systems for autonomous mental development
- DOI 10.1109/TEVC.2006.890271, Convergent Approached to the Understanding of Autonomous Metal Development
- P-Y Oudeyer F Kaplan V Hafner 2007 Intrinsic motivation systems for autonomous mental development IEEE Transactions on Evolutionary Computation 11 2 265 286 10.1109/TEVC.2006.890271 (Pubitemid 46547111)
- (2007) IEEE Transactions on Evolutionary Computation , vol.11 , Issue.2 , pp. 265-286
- Oudeyer, P.-Y.¹ Kaplan, F.² Hafner, V.V.³

14
- 85123966307
- Distributional clustering of english words
- Association for Computational Linguistics
- Pereira F, Tishby N, Lee L (1993) Distributional clustering of english words. In 30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 183-190. http://xxx.lanl.gov/pdf/ cmp-lg/9408011
- (1993) 30th Annual Meeting of the Association for Computational Linguistics , pp. 183-190
- Pereira, F.¹ Tishby, N.² Lee, L.³

15
- 77958569725
- Relative entropy policy search
- AAAI Press, Menlo Park
- Peters J, Muelling K, Altun Y (2010) Relative entropy policy search. In: Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence (AAAI). AAAI Press, Menlo Park
- (2010) Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence (AAAI)
- Peters, J.¹ Muelling, K.² Altun, Y.³

16
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- 18482830 10.1016/j.neunet.2008.02.003
- J Peters S Schaal 2008 Reinforcement learning of motor skills with policy gradients Neural Networks 21 4 682 697 18482830 10.1016/j.neunet.2008.02.003
- (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
- Peters, J.¹ Schaal, S.²

17
- 9444263770
- Using MDP Characteristics to Guide Exploration in Reinforcement Learning
- Machine Learning: ECML 2003
- Ratitch B, Precup D (2003) Using MDP characteristics to guide exploration in reinforcement learning. In: Proceedings of ECML, pp 313-324 (Pubitemid 37230987)
- (2003) Lecture Notes in Computer Science , Issue.2837 , pp. 313-324
- Ratitch, B.¹ Precup, D.²

18
- 0032202775
- Deterministic annealing for clustering, compression, classification, regression, and related optimization problems
- PII S0018921998078608
- K Rose 1998 Deterministic annealing for clustering, compression, classification, regression, and related optimization problems Proc. IEEE 86 11 2210 2239 10.1109/5.726788 (Pubitemid 128720301)
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2210-2239
- Rose, K.¹

19
- 0000389568
- Statistical Mechanics and Phase Transitions in CLustering
- 10043066 10.1103/PhysRevLett.65.945
- K Rose E Gurewitz GC Fox 1990 Statistical Mechanics and Phase Transitions in CLustering Phys. Rev. Lett 65 8 945 948 10043066 10.1103/PhysRevLett.65.945
- (1990) Phys. Rev. Lett , vol.65 , Issue.8 , pp. 945-948
- Rose, K.¹ Gurewitz, E.² Fox, G.C.³

20
- 0026306990
- Curious model-building control systems
- Schmidhuber J (1991) Curious model-building control systems. In Proceedings of IJCNN, pp 1458-1463
- (1991) Proceedings of IJCNN , pp. 1458-1463
- Schmidhuber, J.¹

21
- 77954092659
- Art and science as by-products of the search for novel patterns, or data compressible in unknown yet learnable ways
- Swiss Design Network-et al. Edizioni 2009
- Schmidhuber J (2009) Art and science as by-products of the search for novel patterns, or data compressible in unknown yet learnable ways. In: Multiple ways to design research. Research cases that reshape the design discipline. Swiss Design Network-et al. Edizioni, 2009, pp 98-112
- (2009) Multiple Ways to Design Research. Research Cases That Reshape the Design Discipline , pp. 98-112
- Schmidhuber, J.¹

22
- 84856043672
- A mathematical theory of communication
- 623-656
- Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379-423, 623-656
- (1948) Bell Syst Tech J , vol.27 , pp. 379-423
- Shannon, C.E.¹

23
- 0003847664
- Aerial Press Santa Cruz, California
- Shaw R (1984) The dripping faucet as a model chaotic system. Aerial Press, Santa Cruz, California
- (1984) The Dripping Faucet As A Model Chaotic System
- Shaw, R.¹

24
- 84899031920
- Intrinsically motivated reinforcement learning
- Singh S, Barto AG, Chentanez N (2005) Intrinsically motivated reinforcement learning. In Proceedings of NIPS, pp 1281-1288
- (2005) Proceedings of NIPS , pp. 1281-1288
- Singh, S.¹ Barto, A.G.² Chentanez, N.³

25
- 79051470133
- Information-theoretic approach to interactive learning
- doi: 10.1209/0295-5075/85/28005
- Still S (2009) Information-theoretic approach to interactive learning. EPL 85 28005. doi: 10.1209/0295-5075/85/28005
- (2009) EPL , vol.85 , pp. 28005
- Still, S.¹

26
- 10044254422
- How many clusters? An information-theoretic perspective
- DOI 10.1162/0899766042321751
- S Still W Bialek 2004 How many clusters? An information theoretic perspective Neural Computation 16 12 2483 2506 15516271 10.1162/0899766042321751 (Pubitemid 39604007)
- (2004) Neural Computation , vol.16 , Issue.12 , pp. 2483-2506
- Still, S.¹ Bialek, W.²

27
- 84898998530
- Geometric clustering using the information bottleneck method
- Thrun S, Saul LK, Schölkopf B (eds) MIT Press, Cambridge, MA
- Still S, Bialek W, Bottou L (2004) Geometric clustering using the information bottleneck method. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge, MA
- (2004) Advances in Neural Information Processing Systems 16
- Still, S.¹ Bialek, W.² Bottou, L.³

28
- 34548745051
- Incremental model-based learners with formal learning-time guarantees
- Cambridge, MA
- Strehl AL, Li L, Littman ML (2006) Incremental model-based learners with formal learning-time guarantees. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, Cambridge, MA
- (2006) Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

29
- 0004102479
- MIT Press, Cambridge
- Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

30
- 68949157375
- Transfer learning for reinforcement learning domains: A survey
- ME Taylor P Stone 2009 Transfer learning for reinforcement learning domains: A survey Journal of Machine Learning Research 10 1 1633 1685
- (2009) Journal of Machine Learning Research , vol.10 , Issue.1 , pp. 1633-1685
- Taylor, M.E.¹ Stone, P.²

31
- 0001546350
- Active exploration in dynamic environments
- San Mateo, CA
- Thrun S, Moeller K (1992) Active exploration in dynamic environments. In: Advances in Neural Information Processing Systems (NIPS) 4, San Mateo, CA, pp 531-538
- (1992) Advances in Neural Information Processing Systems (NIPS) 4 , pp. 531-538
- Thrun, S.¹ Moeller, K.²

32
- 0001808038
- The information bottleneck method
- Tishby N, Pereira F, Bialek W (1999) The information bottleneck method. In: Proceedings of the 37th Annual Allerton Conference, pp 363-377
- (1999) Proceedings of the 37th Annual Allerton Conference , pp. 363-377
- Tishby, N.¹ Pereira, F.² Bialek, W.³

33
- 79957591834
- Information theory of decisions and actions
- Springer, New York
- Tishby N, Polani D (2010) Information theory of decisions and actions. In: Perception-reason-action cycle: models, algorithms and systems. Springer, New York
- (2010) Perception-reason-action Cycle: Models, Algorithms and Systems
- Tishby, N.¹ Polani, D.²

34
- 67650915125
- Efficient computation of optimal actions
- 19574462 10.1073/pnas.0710743106 1:CAS:528:DC%2BD1MXptVKjsL8%3D
- E Todorov 2009 Efficient computation of optimal actions PNAS 106 28 11478 11483 19574462 10.1073/pnas.0710743106 1:CAS:528:DC%2BD1MXptVKjsL8%3D
- (2009) PNAS , vol.106 , Issue.28 , pp. 11478-11483
- Todorov, E.¹

35
- 0004049893
- PhD thesis, Cambridge University
- Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, Cambridge University
- (1989) Learning from Delayed rewards3
- Watkins, C.J.C.H.¹

36
- 60349110114
- On discovery and learning of models with predictive representations of state for agents with continuous actions and observations
- Wingate D, Singh S (2007) On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. In Proceedings of International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp 1128-1135
- (2007) Proceedings of International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 1128-1135
- Wingate, D.¹ Singh, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.