SCOPUS 정보 검색 플랫폼

Journal of Artificial Intelligence Research

Volumn 53, Issue , 2015, Pages 375-438

Approximate value iteration with temporally extended actions

(3) Mann, Timothy A a Mannor, Shie a Precup, Doina b

a TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

b MCGILL UNIVERSITY (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

NAVIGATION; REINFORCEMENT LEARNING;

APPROXIMATION ERRORS; CONVERGENCE RATES; DIFFERENT DOMAINS; EFFICIENT PLANNING; FAST CONVERGENCE; FASTER CONVERGENCE; FITTED VALUE ITERATION; PRIMITIVE ACTIONS;

ITERATIVE METHODS;

EID: 84938498958 PISSN: 10769757 EISSN: None Source Type: Journal
DOI: 10.1613/jair.4676 Document Type: Article

Times cited : (47)

References (51)

1
- 84863839050
- DetH∗: Approximate Hierarchical Solution of Large Markov Decision Processes
- Barry, J. L., Kaelbling, L. P., & Lozano-Prez, T. (2011). DetH∗: Approximate Hierarchical Solution of Large Markov Decision Processes. In International Joint Conference on Artificial Intelligence.
- (2011) International Joint Conference on Artificial Intelligence
- Barry, J.L.¹ Kaelbling, L.P.² Lozano-Prez, T.³

2
- 0003487482
- Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.
- (1996) Neuro-dynamic Programming.
- Bertsekas, D.P.¹ Tsitsiklis, J.²

3
- 70049084399
- CORL: A Continuous-State Offset-Dynamics Reinforcement Learner
- th Conference on Uncertainty in Artificial Intelligence (UAI-08).
- (2008) th Conference on Uncertainty in Artificial Intelligence (UAI-08)
- Brunskill, E.¹ Leffler, B.R.² Li, L.³ Littman, M.L.⁴ Roy, N.⁵

4
- 84904099309
- GridLAB-D: An agent-based simulation framework for smart grids
- Chassin, D. P., Fuller, J. C., & Djilali, N. (2014). GridLAB-D: An agent-based simulation framework for smart grids. Journal of Applied Mathematics, 2014.
- (2014) Journal of Applied Mathematics , pp. 2014
- Chassin, D.P.¹ Fuller, J.C.² Djilali, N.³

5
- 80053022338
- Optimal policy switching algorithms for reinforcement learning
- th International Conference on Autonomous Agents and Multiagent Systems, pp. 709-714.
- (2010) th International Conference on Autonomous Agents and Multiagent Systems , pp. 709-714
- Comanici, G.¹ Precup, D.²

6
- 84893371376
- PAC optimal planning for invasive species management: Improved exploration for reinforcement learning from simulatordefined MDPs
- Dietterich, T. G., Taleghan, M. A., & Crowley, M. (2013). PAC optimal planning for invasive species management: Improved exploration for reinforcement learning from simulatordefined MDPs. In Proceedings of the National Conference on Artificial Intelligence.
- (2013) Proceedings of the National Conference on Artificial Intelligence
- Dietterich, T.G.¹ Taleghan, M.A.² Crowley, M.³

7
- 34147120474
- A note on two problems in connexion with graphs
- Dijkstra, E. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1 (1), 269-271.
- (1959) Numerische Mathematik , vol.1 , Issue.1 , pp. 269-271
- Dijkstra, E.¹

8
- 58449110583
- Regularized fitted Q-iteration: Application to planning
- Springer
- Farahmand, A., Ghavamzadeh, M., Szepesvári, C., & Mannor, S. (2008). Regularized fitted Q-iteration: Application to planning. In Recent Advances in Reinforcement Learning, pp. 55-68. Springer.
- (2008) Recent Advances in Reinforcement Learning , pp. 55-68
- Farahmand, A.¹ Ghavamzadeh, M.² Szepesvári, C.³ Mannor, S.⁴

9
- 85162063395
- Error propagation for approximate policy and value iteration
- Farahmand, A., Munos, R., & Szepesvári, C. (2010). Error propagation for approximate policy and value iteration. In Advances in Neural Information Processing Systems.
- (2010) Advances in Neural Information Processing Systems
- Farahmand, A.¹ Munos, R.² Szepesvári, C.³

10
- 34247199512
- Probabilistic policy reuse in a reinforcement learning agent
- th International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 720-727.
- (2006) th International Joint Conference on Autonomous Agents and Multiagent Systems , pp. 720-727
- Fernández, F.¹ Veloso, M.²

11
- 84899829959
- A formal basis for the heuristic determination of minimum cost paths
- Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on, 4 (2), 100-107.
- (1968) Systems Science and Cybernetics, IEEE Transactions on , vol.4 , Issue.2 , pp. 100-107
- Hart, P.¹ Nilsson, N.² Raphael, B.³

12
- 79956361567
- Efficient planning under uncertainty with macroactions
- He, R., Brunskill, E., & Roy, N. (2011). Efficient planning under uncertainty with macroactions. Journal of Artificial Intelligence Research, 40, 523-570.
- (2011) Journal of Artificial Intelligence Research , vol.40 , pp. 523-570
- He, R.¹ Brunskill, E.² Roy, N.³

13
- 0002956570
- SPUDD: Stochastic Planning Using Decision Diagrams
- Hoey, J., St-Aubin, R., Hu, A. J., & Boutilier, C. (1999). SPUDD: Stochastic Planning Using Decision Diagrams. In Proceedings of Uncertainty in Artificial Intelligence, Stockholm, Sweden.
- (1999) Proceedings of Uncertainty in Artificial Intelligence, Stockholm, Sweden
- Hoey, J.¹ St-Aubin, R.² Hu, A.J.³ Boutilier, C.⁴

14
- 0000148778
- A heuristic approach to the discovery of macro-operators
- Iba, G. A. (1989). A heuristic approach to the discovery of macro-operators. Machine Learning, 3, 285-317.
- (1989) Machine Learning , vol.3 , pp. 285-317
- Iba, G.A.¹

15
- 56449090073
- Hierarchical model-based reinforcement learning: Rmax + MAXQ
- th International Conference on Machine Learning.
- (2008) th International Conference on Machine Learning
- Jong, N.K.¹ Stone, P.²

16
- 0036832951
- A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Kearns, M., Mansour, Y., & Ng, A. Y. (2002). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49 (2-3), 193-208.
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 193-208
- Kearns, M.¹ Mansour, Y.² Ng, A.Y.³

17
- 33750293964
- Bandit based Monte-Carlo Planning
- Springer.
- Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo Planning. In Machine Learning: ECML-2006, pp. 282-293. Springer.
- (2006) Machine Learning: ECML-2006 , pp. 282-293
- Kocsis, L.¹ Szepesvári, C.²

18
- 80055032021
- Skill discovery in continuous reinforcement learning domains using skill chaining
- Konidaris, G., & Barto, A. (2009). Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems 22, pp. 1015-1023.
- (2009) Advances in Neural Information Processing Systems , vol.22 , pp. 1015-1023
- Konidaris, G.¹ Barto, A.²

19
- 84880873347
- Building portable options: Skill transfer in reinforcement learning
- Konidaris, G., & Barto, A. G. (2007). Building portable options: Skill transfer in reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 7, pp. 895-900.
- (2007) Proceedings of the International Joint Conference on Artificial Intelligence , vol.7 , pp. 895-900
- Konidaris, G.¹ Barto, A.G.²

20
- 85162033542
- Constructing skill trees for reinforcement learning agents from demonstration trajectories
- Konidaris, G., Kuindersma, S., Barto, A., & Grupen, R. (2010). Constructing skill trees for reinforcement learning agents from demonstration trajectories. In Advances in Neural Information Processing Systems, pp. 1162-1170.
- (2010) Advances in Neural Information Processing Systems , pp. 1162-1170
- Konidaris, G.¹ Kuindersma, S.² Barto, A.³ Grupen, R.⁴

21
- 30244535903
- Tech. Rep., Stanford University
- Lazanas, A., & Latombe, J.-C. (1992). Landmark-based robot navigation. Tech. rep., Stanford University.
- (1992) Landmark-Based Robot Navigation
- Lazanas, A.¹ Latombe, J.-C.²

22
- 77956523230
- Analysis of a classification-based policy iteration algorithm
- th International Conference on Machine Learning.
- (2010) th International Conference on Machine Learning
- Lazaric, A.¹ Ghavamzadeh, M.² Munos, R.³

23
- 0002290970
- On the complexity of solving Markov decision problems
- th conference on Uncertainty in artificial intelligence, pp. 394-402.
- (1995) th Conference on Uncertainty in Artificial Intelligence , pp. 394-402
- Littman, M.L.¹ Dean, T.L.² Kaelbling, L.P.³

24
- 84919807958
- Time-regularized interrupting options
- st International Conference on Machine Learning.
- (2014) st International Conference on Machine Learning
- Mankowitz, D.J.¹ Mann, T.A.² Mannor, S.³

25
- 84938531572
- Accessed: 2015-06-29
- Mann, T. A. (2014). Cyclic Inventory Management (CIM). https://code.google.com/p/rddlsim/source/browse/trunk/files/rddl2/examples/cim.rddl2. Accessed: 2015-06-29.
- (2014) Cyclic Inventory Management (CIM)
- Mann, T.A.¹

26
- 84919825430
- Scaling up approximate value iteration with options: Better policies with fewer iterations
- st International Conference on Machine Learning.
- (2014) st International Conference on Machine Learning
- Mann, T.A.¹ Mannor, S.²

27
- 14344250635
- Dynamic abstraction in reinforcement learning via clustering
- New York, NY, USA. ACM
- st International Conference on Machine learning, ICML '04, pp. 71-, New York, NY, USA. ACM.
- (2004) st International Conference on Machine Learning, ICML '04 , pp. 71
- Mannor, S.¹ Menache, I.² Hoze, A.³ Klein, U.⁴

28
- 0013465187
- Automatic discovery of subgoals in reinforcement learning using diverse density
- San Fransisco, USA
- th International Conference on Machine Learning, pp. 361-368, San Fransisco, USA.
- (2001) th International Conference on Machine Learning , pp. 361-368
- McGovern, A.¹ Barto, A.G.²

29
- 18344412081
- Multiple-supplier inventory models in supply chain management: A review
- th International Symposium on Inventories
- th International Symposium on Inventories.
- (2003) International Journal of Production Economics , vol.81-82 , pp. 265-279
- Minner, S.¹

30
- 29344453913
- Error bounds for approximate value iteration
- Munos, R. (2005). Error bounds for approximate value iteration. In Proceedings of the National Conference on Artificial Intelligence.
- (2005) Proceedings of the National Conference on Artificial Intelligence
- Munos, R.¹

31
- 44649189852
- Finite-time bounds for fitted value iteration
- Munos, R., & Szepesvári, C. (2008). Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9, 815-857.
- (2008) Journal of Machine Learning Research , vol.9 , pp. 815-857
- Munos, R.¹ Szepesvári, C.²

32
- 0002499613
- Graph spanners
- Peleg, D., & Schäffer, A. A. (1989). Graph spanners. Journal of Graph Theory, 13 (1), 99-116.
- (1989) Journal of Graph Theory , vol.13 , Issue.1 , pp. 99-116
- Peleg, D.¹ Schäffer, A.A.²

33
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21, 682-691.
- (2008) Neural Networks , vol.21 , pp. 682-691
- Peters, J.¹ Schaal, S.²

34
- 0001167669
- Multi-time models for temporally abstract planning
- Precup, D., & Sutton, R. S. (1997). Multi-time models for temporally abstract planning. In Advances in Neural Information Processing Systems 10.
- (1997) Advances in Neural Information Processing Systems , vol.10
- Precup, D.¹ Sutton, R.S.²

35
- 84957069070
- Theoretical results on reinforcement learning with temporally abstract options
- Springer.
- Precup, D., Sutton, R. S., & Singh, S. (1998). Theoretical results on reinforcement learning with temporally abstract options. In Machine Learning: ECML-1998, pp. 382-393. Springer.
- (1998) Machine Learning: ECML-1998 , pp. 382-393
- Precup, D.¹ Sutton, R.S.² Singh, S.³

36
- 0003998452
- John Wiley & Sons, Inc.
- Puterman, M. L. (1994). Markov Decision Processes - Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc.
- (1994) Markov Decision Processes - Discrete Stochastic Dynamic Programming.
- Puterman, M.L.¹

37
- 33646398129
- Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
- Springer.
- Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML-2005, pp. 317-328. Springer.
- (2005) Machine Learning: ECML-2005 , pp. 317-328
- Riedmiller, M.¹

38
- 27144482716
- Highway hierarchies hasten exact shortest path queries
- Brodal, G., & Leonardi, S. (Eds.) Algorithms: ESA-2005 Springer Berlin Heidelberg
- Sanders, P., & Schultes, D. (2005). Highway hierarchies hasten exact shortest path queries. In Brodal, G., & Leonardi, S. (Eds.), Algorithms: ESA-2005, Vol. 3669 of Lecture Notes in Computer Science, pp. 568-579. Springer Berlin Heidelberg.
- (2005) Lecture Notes in Computer Science , vol.3669 , pp. 568-579
- Sanders, P.¹ Schultes, D.²

39
- 0344821193
- Tech. rep. Office of Naval Research
- Scarf, H. (1959). The optimality of (s,S) policies in the dynamic inventory problem. Tech. rep. NR-047-019, Office of Naval Research.
- (1959) The Optimality of (s,S) Policies in the Dynamic Inventory Problem
- Scarf, H.¹

40
- 84867117249
- Approximate Modified Policy Iteration
- th International Conference on Machine Learning, Edinburgh, United Kingdom.
- (2012) th International Conference on Machine Learning, Edinburgh, United Kingdom
- Scherrer, B.¹ Ghavamzadeh, M.² Gabillon, V.³ Geist, M.⁴

41
- 0031277069
- Optimality of (s,S) policies in inventory models with markovian demand
- Sethi, S. P., & Cheng, F. (1997). Optimality of (s,S) policies in inventory models with markovian demand. Operations Research, 45 (6), 931-939.
- (1997) Operations Research , vol.45 , Issue.6 , pp. 931-939
- Sethi, S.P.¹ Cheng, F.²

42
- 80054721180
- Connectionist reinforcement learning for intelligent unit micro management in starcraft
- IEEE
- Shantia, A., Begue, E., & Wiering, M. (2011). Connectionist reinforcement learning for intelligent unit micro management in starcraft. In Proceedings of the International Joint Conference on Neural Networks, pp. 1794-1801. IEEE.
- (2011) Proceedings of the International Joint Conference on Neural Networks , pp. 1794-1801
- Shantia, A.¹ Begue, E.² Wiering, M.³

43
- 84867135062
- Compositional planning using optimal option models
- th International Conference on Machine Learning, Edinburgh.
- (2012) th International Conference on Machine Learning, Edinburgh
- Silver, D.¹ Ciosek, K.²

44
- 35148862397
- Using relative novelty to identify useful temporal abstractions in reinforcement learning
- New York, NY, USA. ACM
- st International Conference on Machine Learning, pp. 95-102, New York, NY, USA. ACM.
- (2004) st International Conference on Machine Learning , pp. 95-102
- Simsek, O.¹ Barto, A.G.²

45
- 84868298774
- Linear options
- th International Conference on Autonomous Agents and Multiagent Systems, pp. 31-38.
- (2010) th International Conference on Autonomous Agents and Multiagent Systems , pp. 31-38
- Sorg, J.¹ Singh, S.²

46
- 84912073624
- Learning options in reinforcement learning
- Springer
- Stolle, M., & Precup, D. (2002). Learning options in reinforcement learning. In Abstraction, Reformulation, and Approximation, pp. 212-223. Springer.
- (2002) Abstraction, Reformulation, and Approximation , pp. 212-223
- Stolle, M.¹ Precup, D.²

47
- 27544506565
- Reinforcement learning for robocup soccer keepaway
- Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Reinforcement learning for robocup soccer keepaway. Adaptive Behavior, 13 (3), 165-188.
- (2005) Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
- Stone, P.¹ Sutton, R.S.² Kuhlmann, G.³

48
- 0033170372
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112 (1), 181-211.
- (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

49
- 84937827694
- TD methods for the variance of the rewardto-go
- th International Conference on Machine Learning.
- (2013) th International Conference on Machine Learning
- Tamar, A.¹ Castro, D.D.² Mannor, S.³

50
- 31844447221
- Identifying useful subgoals in reinforcement learning by local graph partitioning
- nd International Conference on Machine Learning, pp. 816-823.
- (2005) nd International Conference on Machine Learning , pp. 816-823
- Wolfe, A.P.¹ Barto, A.G.²

51
- 58349118462
- FF-Replan: A Baseline for Probabilistic Planning
- Yoon, S. W., Fern, A., & Givan, R. (2007). FF-Replan: A Baseline for Probabilistic Planning. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 7, pp. 352-359.
- (2007) Proceedings of the International Conference on Automated Planning and Scheduling , vol.7 , pp. 352-359
- Yoon, S.W.¹ Fern, A.² Givan, R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.