SCOPUS 정보 검색 플랫폼

Journal of Machine Learning Research

Volumn 6, Issue , 2005, Pages

Prioritization methods for accelerating MDP solvers

(2) Wingate, David a Seppi, Kevin D a

a Brigham Young University (United States)

Author keywords

Dynamic programming; Markov Decision Processes; Policy iteration; Prioritized sweeping; Value iteration

Indexed keywords

ALGORITHMS; CALCULATIONS; DYNAMIC PROGRAMMING; ITERATIVE METHODS; LEARNING SYSTEMS; PROBLEM SOLVING;

MARKOV DECISION PROCESSES; PRIORITIZED SWEEPING; STRUCTURE VALUE; VALUE ITERATION;

MARKOV PROCESSES;

EID: 21844451909 PISSN: 15337928 EISSN: None Source Type: Journal
DOI: None Document Type: Article

Times cited : (70)

References (40)

1
- 4243832799
- PhD thesis, University of California Los Angeles, Los Angeles, CA
- Charles J. Alpert. Multi-way graph and hypergraph partitioning. PhD thesis, University of California Los Angeles, Los Angeles, CA, 1996.
- (1996) Multi-way Graph and Hypergraph Partitioning
- Alpert, C.J.¹

2
- 21844480297
- Generalized prioritized sweeping
- David Andre, Nir Friedman, and Ronald Parr. Generalized prioritized sweeping. Advances in Neural Information Processing Systems, 10:1001-1007, 1998.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1001-1007
- Andre, D.¹ Friedman, N.² Parr, R.³

3
- 0003473816
- SIAM, Philadelphia, PA
- Richard Barrett, Michael Berry, Tony F. Chan, James Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Charles Romine, and Henk Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
- (1994) Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition
- Barrett, R.¹ Berry, M.² Chan, T.F.³ Demmel, J.⁴ Donato, J.⁵ Dongarra, J.⁶ Eijkhout, V.⁷ Pozo, R.⁸ Romine, C.⁹ Van Der Vorst, H.¹⁰

4
- 0029210635
- Learning to act using real-time dynamic programming
- Andrew G. Barto, S. J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81-138, 1995.
- (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

5
- 0020138998
- Distributed dynamic programming
- Dimitri P. Bertsekas. Distributed dynamic programming. IEEE Transactions on Automatic Control, 27:610-616, 1982.
- (1982) IEEE Transactions on Automatic Control , vol.27 , pp. 610-616
- Bertsekas, D.P.¹

6
- 0020822225
- Distributed asynchronous computation of fixed points
- Dimitri P. Bertsekas. Distributed asynchronous computation of fixed points. Mathematics Programming, 27:107-120, 1983.
- (1983) Mathematics Programming , vol.27 , pp. 107-120
- Bertsekas, D.P.¹

7
- 0003487482
- Athena Scientific, Belmont, MA
- Dimitri P. Bertsekas and John Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.²

8
- 0003636164
- Prentice-Hall, Englewood Cliffs, NJ
- Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs, NJ, 1989.
- (1989) Parallel and Distributed Computation: Numerical Methods
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

9
- 0031370386
- Model minimization in Markov Decision Processes
- Thomas Dean and Robert Givan. Model minimization in Markov Decision Processes. In Proceedings of The Fourteenth National Conference on Artificial Intelligence, pages 106-111, 1997.
- (1997) Proceedings of the Fourteenth National Conference on Artificial Intelligence , pp. 106-111
- Dean, T.¹ Givan, R.²

10
- 0003989207
- PhD thesis, Carnegie Mellon University, Pittsburgh, PA
- Geoffrey J. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 1999.
- (1999) Approximate Solutions to Markov Decision Processes
- Gordon, G.J.¹

11
- 4544318426
- Efficient solution algorithms for factored MDPs
- Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19:399-468, 2003.
- (2003) Journal of Artificial Intelligence Research , vol.19 , pp. 399-468
- Guestrin, C.¹ Koller, D.² Parr, R.³ Venkataraman, S.⁴

12
- 0002357911
- Convergence of indirect adaptive asynchronous value iteration algorithms
- Vijaykumar Gullapalli and Andrew G. Barto. Convergence of indirect adaptive asynchronous value iteration algorithms. Advances in Neural Information Processing Systems, 6:695-702, 1994.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 695-702
- Gullapalli, V.¹ Barto, A.G.²

13
- 0002806618
- Multilevel k-way partitioning scheme for irregular graphs
- George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48:96-129, 1998.
- (1998) Journal of Parallel and Distributed Computing , vol.48 , pp. 96-129
- Karypis, G.¹ Kumar, V.²

14
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209-232, 2002.
- (2002) Machine Learning , vol.49 , pp. 209-232
- Kearns, M.J.¹ Singh, S.P.²

15
- 0003838436
- ACM Press, New York, NY
- Donald E. Knuth. The Stanford GraphBase: A Platform for Combinatorial Computing. ACM Press, New York, NY, 1993.
- (1993) The Stanford GraphBase: A Platform for Combinatorial Computing
- Knuth, D.E.¹

16
- 0003924011
- Springer-Verlag, New York, NY
- Harold J. Kushner and Paul Dupuis. Numerical methods for stochastic control problems in continuous time, Second Edition. Springer-Verlag, New York, NY, 2001.
- (2001) Numerical Methods for Stochastic Control Problems in Continuous Time, Second Edition
- Kushner, H.J.¹ Dupuis, P.²

17
- 0002290970
- On the complexity of solving Markov Decision Problems
- Michael L. Littman, Thomas L. Dean, and Leslie P. Kaelbling. On the complexity of solving Markov Decision Problems. In Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, pages 394-402, 1995.
- (1995) Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence , pp. 394-402
- Littman, M.L.¹ Dean, T.L.² Kaelbling, L.P.³

18
- 0003898180
- Plenum Press, New York, NY
- Raymond E. Miller and James W. Thatcher. Complexity of computer computations. Plenum Press, New York, NY, 1972.
- (1972) Complexity of Computer Computations
- Miller, R.E.¹ Thatcher, J.W.²

19
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less time
- Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103-130, 1993.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

20
- 0029514510
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space
- Andrew W. Moore and Christopher G. Atkeson. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space. Machine Learning, 21:199-233, 1995.
- (1995) Machine Learning , vol.21 , pp. 199-233
- Moore, A.W.¹ Atkeson, C.G.²

21
- 0036832953
- Variable resolution discretization in optimal control
- Remi Munos and Andrew W. Moore. Variable resolution discretization in optimal control. Machine Learning, 49:291-323, 2002.
- (2002) Machine Learning , vol.49 , pp. 291-323
- Munos, R.¹ Moore, A.W.²

22
- 84977063352
- Efficient learning and planning within the dyna framework
- Jing Peng and John Williams. Efficient learning and planning within the dyna framework. In Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pages 437-454, 1993.
- (1993) Proceedings of the Second International Conference on Simulation of Adaptive Behavior , pp. 437-454
- Peng, J.¹ Williams, J.²

23
- 85152551400
- Incremental multi-step Q-learning
- Jing Peng and Ronald J. Williams. Incremental multi-step Q-learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 226-232, 1994.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 226-232
- Peng, J.¹ Williams, R.J.²

24
- 84945313612
- Propagation of Q-values in tabular TD(lambda)
- Philippe Preux. Propagation of Q-values in tabular TD(lambda). In Proceedings of the Thirteenth European Conference on Machine Learning, pages 369-380, 2002.
- (2002) Proceedings of the Thirteenth European Conference on Machine Learning , pp. 369-380
- Preux, P.¹

25
- 0003998452
- John Wiley and Sons, Inc., New York, NY
- Martin L. Puterman. Markov Decision Processes-Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., New York, NY, 1994.
- (1994) Markov Decision Processes-discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

26
- 0037581251
- Modified policy iteration algorithms for discounted Markov Decision Problems
- Martin L. Puterman and Moon C. Shin. Modified policy iteration algorithms for discounted Markov Decision Problems. Management Science, 24:1127-1137, 1978.
- (1978) Management Science , vol.24 , pp. 1127-1137
- Puterman, M.L.¹ Shin, M.C.²

27
- 13244294436
- PhD thesis, University of Birmingham, Birmingham, United Kingdom
- Stuart I. Reynolds. Reinforcement Learning with Exploration. PhD thesis, University of Birmingham, Birmingham, United Kingdom, 2002.
- (2002) Reinforcement Learning with Exploration
- Reynolds, S.I.¹

28
- 0003636089
- On-line Q-learning using connectionist systems
- Cambridge University, Cambridge, United Kingdom
- Gavin A. Rummery and Mahesan Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University, Cambridge, United Kingdom, 1994.
- (1994) Technical Report , vol.CUED-F-INFENG-TR 166
- Rummery, G.A.¹ Niranjan, M.²

29
- 0003554096
- PWS Publishing, Boston
- Yousef Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing, Boston, 1996.
- (1996) Iterative Methods for Sparse Linear Systems
- Saad, Y.¹

30
- 0004232548
- Springer-Verlag
- Hermann A. Schwarz. Gesammelte Mathematische Abhandlungen, volume 2. Springer-Verlag, 1890.
- (1890) Gesammelte Mathematische Abhandlungen , vol.2
- Schwarz, H.A.¹

31
- 0029753630
- Reinforcement learning with replacing eligibility traces
- Satinder P. Singh and Richard S. Sutton. Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123-158, 1996.
- (1996) Machine Learning , vol.22 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

32
- 33847202724
- Learning to predict by the methods of temporal differences
- Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

33
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8:1038-1044, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
- Sutton, R.S.¹

34
- 0003603271
- Sandia National Laboratory, Albuquerque, NM
- Ray S. Tuminaro, Mike Heroux, S. A. Hutchinson, and John N. Shadid. Official Aztec User's Guide: Version 2.1. Sandia National Laboratory, Albuquerque, NM, 1999.
- (1999) Official Aztec User's Guide: Version 2.1
- Tuminaro, R.S.¹ Heroux, M.² Hutchinson, S.A.³ Shadid, J.N.⁴

35
- 0012252296
- Tight performance bounds on greedy policies based on imperfect value functions
- Northeastern University, Boston, MA
- Ronald J. Williams and Leemon C. Baird. Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NU-CCS-93-14, Northeastern University, Boston, MA, 1993.
- (1993) Technical Report , vol.NU-CCS-93-14
- Williams, R.J.¹ Baird, L.C.²

36
- 21844444450
- Cache efficiency of priority metrics for MDP solvers
- David Wingate and Kevin D. Seppi. Cache efficiency of priority metrics for MDP solvers. In AAAI Workshop on Learning and Planning in Markov Processes, pages 103-106, 2004a.
- (2004) AAAI Workshop on Learning and Planning in Markov Processes , pp. 103-106
- Wingate, D.¹ Seppi, K.D.²

37
- 14344254221
- P3VI: A partitioned, prioritized, parallel value iterator
- David Wingate and Kevin D. Seppi. P3VI: A partitioned, prioritized, parallel value iterator. In Proceedings of the Twenty-First International Conference on Machine Learning, pages 863-870, 2004b.
- (2004) Proceedings of the Twenty-first International Conference on Machine Learning , pp. 863-870
- Wingate, D.¹ Seppi, K.D.²

38
- 0000675435
- A method for speeding up value iteration in partially observable Markov Decision Processes
- Nevin L. Zhang, Stephen S. Lee, and Weihong Zhang. A method for speeding up value iteration in partially observable Markov Decision Processes. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pages 696-703, 1999.
- (1999) Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence , pp. 696-703
- Zhang, N.L.¹ Lee, S.S.² Zhang, W.³

39
- 0036374229
- Speeding up the convergence of value iteration in partially observable Markov Decision Processes
- Nevin L. Zhang and Weihong Zhang. Speeding up the convergence of value iteration in partially observable Markov Decision Processes. Journal of Artificial Intelligence Research, 14:29-51, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.14 , pp. 29-51
- Zhang, N.L.¹ Zhang, W.²

40
- 0141630007
- PQ-learning: An efficient robot learning method for intelligent behavior acquisition
- Weiyu Zhu and Stephen Levinson. PQ-learning: an efficient robot learning method for intelligent behavior acquisition. In Proceedings of the Seventh International Conference on Intelligent Autonomous Systems, 2002.
- (2002) Proceedings of the Seventh International Conference on Intelligent Autonomous Systems
- Zhu, W.¹ Levinson, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.