SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 81, Issue 3, 2010, Pages 359-397

Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains

(2) Bernstein, Andrey a Shimkin, Nahum a

a TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

Author keywords

Adaptive resolution; Efficient exploration; Kernel functions; Reinforcement learning

Indexed keywords

ADAPTIVE APPROXIMATION; ADAPTIVE RESOLUTION; APPROXIMATION SCHEME; COARSE TO FINE; COARSER RESOLUTION; CONTINUOUS STATE SPACE; DETERMINISTIC DOMAINS; EFFICIENT EXPLORATION; EXPLORATION TECHNIQUES; KERNEL FUNCTION; LEARNING RATES; MISTAKE BOUNDS; MODEL-BASED; ONLINE LEARNING; OPTIMAL VALUE FUNCTIONS; ORIGINAL ALGORITHMS; STATE SPACE; UNCERTAINTY INTERVALS;

ADAPTIVE ALGORITHMS; APPROXIMATION ALGORITHMS; COMPUTATIONAL COMPLEXITY; POLYNOMIAL APPROXIMATION; REINFORCEMENT LEARNING;

LEARNING ALGORITHMS;

EID: 78649716899 PISSN: 08856125 EISSN: 15730565 Source Type: Journal
DOI: 10.1007/s10994-010-5186-7 Document Type: Article

Times cited : (32)

References (32)

1
- 0016556021
- A new approach to manipulator control: The cerebellar model articulation controller (CMAC)
- 0314.92007
- J. S. Albus 1975 A new approach to manipulator control: the cerebellar model articulation controller (CMAC) Journal of Dynamic Systems, Measurement and Control 97 220 227 0314.92007
- (1975) Journal of Dynamic Systems, Measurement and Control , vol.97 , pp. 220-227
- Albus, J.S.¹

2
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- 10.1007/s10994-007-5038-2
- A. Antos C. Szepesvári R. Munos 2008 Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path Machine Learning 71 1 89 129 10.1007/s10994-007-5038-2
- (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
- Antos, A.¹ Szepesvári, C.² Munos, R.³

3
- 85151789426
- Logarithmic online regret bounds for undiscounted reinforcement learning
- Auer, P., & Ortner, R. (2006). Logarithmic online regret bounds for undiscounted reinforcement learning. In Proceedings of neural information processing systems conference (NIPS).
- (2006) Proceedings of Neural Information Processing Systems Conference (NIPS)
- Auer, P.¹ Ortner, R.²

4
- 78649714480
- Master's thesis, Technion-Israel Institute of Technology. URL:
- Bernstein, A. (2007). Adaptive state aggregation for reinforcement learning. Master's thesis, Technion-Israel Institute of Technology. URL: http://tx.technion.ac.il/~andreyb/MSc-Thesis-final.pdf.
- (2007) Adaptive State Aggregation for Reinforcement Learning
- Bernstein, A.¹

5
- 84898060153
- Adaptive aggregation for reinforcement learning with efficient exploration: Deterministic domains
- Bernstein, A., & Shimkin, N. (2008). Adaptive aggregation for reinforcement learning with efficient exploration: deterministic domains. In Proceedings of the 21st annual conference on learning theory (COLT 2008).
- (2008) Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008)
- Bernstein, A.¹ Shimkin, N.²

6
- 0003565783
- 3rd ed. Athena Scientific Belmont
- Bertsekas, D. P. (2007). Dynamic programming and optimal control (3rd ed., vol. 2). Belmont: Athena Scientific.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

7
- 78649707587
- LEAP: Learning entities adaptive partitioning
- Whistler, Canada
- Bonarini, A., Lazaric, A., & Restelli, M. (2005). LEAP: learning entities adaptive partitioning. In Proceedings of neural information processing systems conference (NIPS 2005), workshop on reinforcement learning benchmarks and bake-offs II, Whistler, Canada (pp. 41-47).
- (2005) Proceedings of Neural Information Processing Systems Conference (NIPS 2005), Workshop on Reinforcement Learning Benchmarks and Bake-offs II , pp. 41-47
- Bonarini, A.¹ Lazaric, A.² Restelli, M.³

8
- 0346942368
- Decision-theoretic planning: Structural assumptions and computational leverage
- 0918.68110 1718251
- C. Boutilier T. Dean S. Hanks 1999 Decision-theoretic planning: structural assumptions and computational leverage Journal of Artificial Intelligence Research 11 1 94 0918.68110 1718251
- (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
- Boutilier, C.¹ Dean, T.² Hanks, S.³

9
- 0041965975
- R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning
- 10.1162/153244303765208377 1971337
- R. I. Brafman M. Tennenholtz 2002 R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning Journal of Machine Learning Research 3 213 231 10.1162/153244303765208377 1971337
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

10
- 78649712889
- Master's thesis, Technion-Israel Institute of Technology
- Chapman, H. (2007). Global confidence bound algorithms for the exploration-exploitation tradeoff in reinforcement learning. Master's thesis, Technion-Israel Institute of Technology.
- (2007) Global Confidence Bound Algorithms for the Exploration-exploitation Tradeoff in Reinforcement Learning
- Chapman, H.¹

11
- 0026206780
- An optimal one-way multigrid algorithm for discrete-time stochastic control
- DOI 10.1109/9.133184
- C.-S. Chow J. N. Tsitsiklis 1991 An optimal one-way multigrid algorithm for discrete-time stochastic control IEEE Transactions on Automatic Control 36 8 898 914 0752.93078 10.1109/9.133184 1116447 (Pubitemid 21674882)
- (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
- Chow Chee-Seng¹ Tsitsiklis John, N.²

12
- 0033629916
- Reinforcement learning in continuous time and space
- 10.1162/089976600300015961
- K. Doya 2000 Reinforcement learning in continuous time and space Neural Computation 12 219 245 10.1162/089976600300015961
- (2000) Neural Computation , vol.12 , pp. 219-245
- Doya, K.¹

13
- 38049096465
- Kernel-based models for reinforcement learning in continuous state spaces
- Jong, N., & Stone, P. (2006). Kernel-based models for reinforcement learning in continuous state spaces. In 23th international conference on machine learning (ICML 2006), workshop on kernel machines and reinforcement learning.
- (2006) 23th International Conference on Machine Learning (ICML 2006), Workshop on Kernel Machines and Reinforcement Learning
- Jong, N.¹ Stone, P.²

14
- 23244466805
- PhD thesis, Gatsby Computational Neuroscience Unit, University College London, UK
- Kakade, S. M. (2003). On the sample complexity of reinforcement learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, UK.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.M.¹

15
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- DOI 10.1023/A:1017984413808
- M. Kearns S. P. Singh 2002 Near-optimal reinforcement learning in polynomial time Machine Learning 49 209 232 1014.68071 10.1023/A:1017984413808 (Pubitemid 34325687)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
- Kearns, M.¹ Singh, S.²

16
- 4043069840
- On actor-critic algorithms
- 1049.93095 10.1137/S0363012901385691 2044789
- V. R. Konda J. N. Tsitsiklis 2003 On actor-critic algorithms SIAM Journal on Control and Optimization 42 4 1143 1166 1049.93095 10.1137/S0363012901385691 2044789
- (2003) SIAM Journal on Control and Optimization , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

17
- 78649707742
- Equi-gradient temporal difference learning
- Loth, M., Davy, M., Coulom, R., & Preux, P. (2006) Equi-gradient temporal difference learning. In 23th international conference on machine learning (ICML 2006), workshop on kernel machines and reinforcement learning.
- (2006) 23th International Conference on Machine Learning (ICML 2006), Workshop on Kernel Machines and Reinforcement Learning
- Loth, M.¹ Davy, M.² Coulom, R.³ Preux, P.⁴

18
- 0029514510
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- A. W. Moore C. G. Atkeson 1995 The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces Machine Learning 21 199 233
- (1995) Machine Learning , vol.21 , pp. 199-233
- Moore, A.W.¹ Atkeson, C.G.²

19
- 0036832953
- Variable resolution discretization in optimal control
- DOI 10.1023/A:1017992615625
- R. Munos A. W. Moore 2002 Variable resolution discretization in optimal control Machine Learning 49 291 323 1005.68086 10.1023/A:1017992615625 (Pubitemid 34325691)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 291-323
- Munos, R.¹ Moore, A.²

20
- 44649189852
- Finite-time bounds for fitted value iteration
- R. Munos C. Szepesvári 2008 Finite-time bounds for fitted value iteration Journal of Machine Learning Research 9 815 857
- (2008) Journal of Machine Learning Research , vol.9 , pp. 815-857
- Munos, R.¹ Szepesvári, C.²

21
- 84858776393
- Multi-resolution exploration in continuous spaces
- Nouri, A., & Littman, M. L. (2008). Multi-resolution exploration in continuous spaces. In Advances in neural information processing systems (NIPS) 21 (pp. 1209-1216).
- (2008) Advances in Neural Information Processing Systems (NIPS) , vol.21 , pp. 1209-1216
- Nouri, A.¹ Littman, M.L.²

22
- 0036832956
- Kernel-based reinforcement learning
- DOI 10.1023/A:1017928328829
- D. Ormoneit S. Sen 2002 Kernel-based reinforcement learning Machine Learning 49 161 178 1014.68069 10.1023/A:1017928328829 (Pubitemid 34325684)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
- Ormoneit, D.¹ Sen, A.²

23
- 47349092417
- Wiley New York
- Powell, W. B. (2007). Approximate dynamic programming for operations research: solving the curses of dimensionality. New York: Wiley.
- (2007) Approximate Dynamic Programming for Operations Research: Solving the Curses of Dimensionality
- Powell, W.B.¹

24
- 85102627959
- Wiley New York 0829.90134
- Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

25
- 85153965130
- Reinforcement learning with soft state aggregation
- Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In Advances in neural information processing systems (NIPS) 7 (pp. 361-368).
- (1995) Advances in Neural Information Processing Systems (NIPS) , vol.7 , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

26
- 31844432138
- A theoretical analysis of model-based interval estimation
- Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. In Proceedings of the 22nd international conference on machine learning (pp. 857-864).
- (2005) Proceedings of the 22nd International Conference on Machine Learning , pp. 857-864
- Strehl, A.L.¹ Littman, M.L.²

27
- 34548745051
- Incremental model-based learners with formal learning-time guarantees
- Strehl, A. L., Li, L., & Littman, M. L. (2006a). Incremental model-based learners with formal learning-time guarantees. In Proceedings of the 22nd international conference on uncertainty in artificial intelligence (pp. 485-493).
- (2006) Proceedings of the 22nd International Conference on Uncertainty in Artificial Intelligence , pp. 485-493
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

28
- 33749255382
- PAC model-free reinforcement learning
- Strehl, A. L., Wiewiora, E., Langford, J., & Littman, M. L. (2006b). PAC model-free reinforcement learning. In Proceedings of the 23nd international conference on machine learning (pp. 881-888).
- (2006) Proceedings of the 23nd International Conference on Machine Learning , pp. 881-888
- Strehl, A.L.¹ Wiewiora, E.² Langford, J.³ Littman, M.L.⁴

29
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- Sutton, R. S. (1996). Generalization in reinforcement learning: successful examples using sparse coarse coding. In Advances in neural information processing systems 8 (NIPS) (pp. 1038-1044).
- (1996) Advances in Neural Information Processing Systems 8 (NIPS) , pp. 1038-1044
- Sutton, R.S.¹

30
- 85042938295
- Optimistic linear programming gives logarithmic regret for irreducible MDPs
- Tewari, A., & Bartlett, P. L. (2007). Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Proceedings of neural information processing systems conference (NIPS).
- (2007) Proceedings of Neural Information Processing Systems Conference (NIPS)
- Tewari, A.¹ Bartlett, P.L.²

31
- 0002999362
- Splines: A perfect fit for signal and image processing
- 10.1109/79.799930
- M. Unser 1999 Splines: A perfect fit for signal and image processing IEEE Signal Processing Magazine 16 22 38 10.1109/79.799930
- (1999) IEEE Signal Processing Magazine , vol.16 , pp. 22-38
- Unser, M.¹

32
- 0017997986
- Approximations of dynamic programs, i
- 0393.90094 10.1287/moor.3.3.231 506661
- W. Whitt 1978 Approximations of dynamic programs, I Mathematics of Operations Research 3 3 231 243 0393.90094 10.1287/moor.3.3.231 506661
- (1978) Mathematics of Operations Research , vol.3 , Issue.3 , pp. 231-243
- Whitt, W.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.