SCOPUS 정보 검색 플랫폼

IEEE International Conference on Intelligent Robots and Systems

Volumn , Issue , 2006, Pages 3178-3183

Towards direct policy search reinforcement learning for robot control

(3) El Fakdi, Andres a Carreras, Marc a Ridao, Pere a

a UNIVERSITY OF GIRONA (Spain)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; CONTROL SYSTEM ANALYSIS; INTELLIGENT CONTROL; OPTIMIZATION; ROBOTS; STATE ESTIMATION;

ROBOT CONTROL; UNDERWATER ROBOTS;

REINFORCEMENT LEARNING;

EID: 34250644253 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/IROS.2006.282342 Document Type: Conference Paper

Times cited : (8)

References (30)

1
- 0004102479
- MIT Press
- R. Sutton and A. Barto, Reinforcement Learning, an introduction. MIT Press, 1998.
- (1998) Reinforcement Learning, an introduction
- Sutton, R.¹ Barto, A.²

2
- 0001898381
- Practical reinforcement learning in continuous spaces
- W. D. Smart and L. P. Kaelbling, "Practical reinforcement learning in continuous spaces," in International Conference on Machine Learning, 2000.
- (2000) International Conference on Machine Learning
- Smart, W.D.¹ Kaelbling, L.P.²

3
- 79958854577
- Hierarchical memory-based reinforcement learning
- Denver, USA
- N. Hernandez and S. Mahadevan, "Hierarchical memory-based reinforcement learning," in Fifteenth International Conference on Neural Information Processing Systems, Denver, USA, 2000.
- (2000) Fifteenth International Conference on Neural Information Processing Systems
- Hernandez, N.¹ Mahadevan, S.²

4
- 0345307666
- Vision-based localization of an underwater robot in a structured environment
- Taipei, Taiwan
- M. Carreras, P. Ridao, R. Garcia, and T. Nicosevici, "Vision-based localization of an underwater robot in a structured environment," in IEEE International Conference on Robotics and Automation, Taipei, Taiwan, 2003.
- (2003) IEEE International Conference on Robotics and Automation
- Carreras, M.¹ Ridao, P.² Garcia, R.³ Nicosevici, T.⁴

5
- 0003487482
- Belmont, MA: Athena Scientific
- D. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.N.²

6
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems, vol. 12, pp. 1057-1063, 2000.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

7
- 34250624603
- C. Anderson, Approximating a policy can be easier than approximating a value function, University of Colorado State, Computer Science Technical Report, 2000.
- C. Anderson, "Approximating a policy can be easier than approximating a value function," University of Colorado State," Computer Science Technical Report, 2000.

8
- 14344253499
- Policy-gradient algorithms for partially observable markov decision processes,
- Ph.D. dissertation, Australian National University, April
- D. A. Aberdeen, "Policy-gradient algorithms for partially observable markov decision processes," Ph.D. dissertation, Australian National University, April 2003.
- (2003)
- Aberdeen, D.A.¹

9
- 0003125478
- Solving POMDPs by searching in policy space
- Madison, WI
- E. A. Hansen, "Solving POMDPs by searching in policy space," in 8th Conference on Uncertainty in Artificial Intelligence, Madison, WI, 1998, pp. 211-219.
- (1998) 8th Conference on Uncertainty in Artificial Intelligence , pp. 211-219
- Hansen, E.A.¹

10
- 34250683611
- N. Meuleau, K. E. Kim, L. P. Kaelbling, and A. R. Cassandra, Solving POMDPs by searching the space of finite policies, in 15th Conference on Uncertainty in Artificial Intelligence, M. Kaufmann, Ed., Computer science Dep., Brown University, July 1999, pp. 127-136.
- N. Meuleau, K. E. Kim, L. P. Kaelbling, and A. R. Cassandra, "Solving POMDPs by searching the space of finite policies," in 15th Conference on Uncertainty in Artificial Intelligence, M. Kaufmann, Ed., Computer science Dep., Brown University, July 1999, pp. 127-136.

11
- 2142812536
- Learning without state-estimation in partially observable markovian decision processes
- New Jersey, USA
- S. Singh, T. Jaakkola, and M. Jordan, "Learning without state-estimation in partially observable markovian decision processes," in Proceedings of the Eleventh International Conference on Machine Learning, New Jersey, USA, 1994.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning
- Singh, S.¹ Jaakkola, T.² Jordan, M.³

12
- 0033685422
- Direct gradient-based reinforcement learning
- Geneva, Switzerland, May
- J. Baxter and P. Bartlett, "Direct gradient-based reinforcement learning," in International Symposium on Circuits and Systems, Geneva, Switzerland, May 2000.
- (2000) International Symposium on Circuits and Systems
- Baxter, J.¹ Bartlett, P.²

13
- 34249833101
- Q-learning
- C. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, pp. 279-292, 1992.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

14
- 34250638040
- P. Marbach and J. N. Tsitsiklis, Gradient-based optimization of Markov reward processes: Practical variants, Center for Communications Systems Research, University of Cambridge, Tech. Rep., March 2000.
- P. Marbach and J. N. Tsitsiklis, "Gradient-based optimization of Markov reward processes: Practical variants," Center for Communications Systems Research, University of Cambridge, Tech. Rep., March 2000.

15
- 4043069840
- On actor-critic algorithms
- V. Konda and J. Tsitsiklis, "On actor-critic algorithms," SIAM Journal on Control and Optimization, vol. 42, number 4, pp. 1143-1166, 2003.
- (2003) SIAM Journal on Control and Optimization , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.¹ Tsitsiklis, J.²

16
- 34250649024
- N. Meuleau, L. Peshkin, and K. Kim, Exploration in gradient based reinforcement learning, Massachusetts Institute of Technology, AI Memo 2001-003, Tech. Rep., April 2001.
- N. Meuleau, L. Peshkin, and K. Kim, "Exploration in gradient based reinforcement learning," Massachusetts Institute of Technology, AI Memo 2001-003, Tech. Rep., April 2001.

17
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- R. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, pp. 229-256, 1992.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.¹

18
- 0001251942
- Reinforcement learning in pomdps with function approximation
- D. H. Fisher, Ed
- H. Kimura, K. Miyazaki, and S. Kobayashi, "Reinforcement learning in pomdps with function approximation," in Fourteenth International Conference on Machine Learning (ICML'97), D. H. Fisher, Ed., 1997, pp. 152-160.
- (1997) Fourteenth International Conference on Machine Learning (ICML'97) , pp. 152-160
- Kimura, H.¹ Miyazaki, K.² Kobayashi, S.³

19
- 85153938292
- Reinforcement Learning algorithms for partially observable Markov decision problems
- T. Jaakkola, S. Singh, and M. Jordan, Reinforcement Learning algorithms for partially observable Markov decision problems. Morgan Kaufman, 1995, vol. 7, pp. 345-352.
- (1995) Morgan Kaufman , vol.7 , pp. 345-352
- Jaakkola, T.¹ Singh, S.² Jordan, M.³

20
- 34250642878
- J. Baxter and P. Bartlett, Direct gradient-based reinforcement learning: I. gradient estimation algorithms, Australian National University, Tech. Rep., 1999.
- J. Baxter and P. Bartlett, "Direct gradient-based reinforcement learning: I. gradient estimation algorithms," Australian National University, Tech. Rep., 1999.

21
- 0009011171
- Simulation-based optimization of markov reward processes
- Technical report LIDS-P-2411, Massachussets Institute of Technology
- P. Marbach and J. N. Tsitsiklis, "Simulation-based optimization of markov reward processes," Technical report LIDS-P-2411, Massachussets Institute of Technology, 1998.
- (1998)
- Marbach, P.¹ Tsitsiklis, J.N.²

22
- 0009011171
- Simulation-based methods for markov decision processes,
- PhD Thesis, Laboratory for Information and Decision Systems, MIT
- P. Marbach, "Simulation-based methods for markov decision processes," PhD Thesis, Laboratory for Information and Decision Systems, MIT, 1998.
- (1998)
- Marbach, P.¹

23
- 0034859944
- Autonomous helicopter control using reinforcement learning policy search methods
- Korea
- J. Bagnell and J. Schneider, "Autonomous helicopter control using reinforcement learning policy search methods," in Proceedings of the IEEE International Conference on Robotics and Automation, Korea, 2001.
- (2001) Proceedings of the IEEE International Conference on Robotics and Automation
- Bagnell, J.¹ Schneider, J.²

24
- 84880911162
- Robot weightlifting by direct policy search
- M. Rosenstein and A. Barto, "Robot weightlifting by direct policy search," in Proceedings of the International Joint Conference on Artificial Intelligence, 2001.
- (2001) Proceedings of the International Joint Conference on Artificial Intelligence
- Rosenstein, M.¹ Barto, A.²

25
- 3042534761
- Policy gradient reinforcement learning for fast quadrupedal locomotion
- N. Kohl and P. Stone, "Policy gradient reinforcement learning for fast quadrupedal locomotion," in IEEE International Conference on Robotics and Automation (ICRA), 2004.
- (2004) IEEE International Conference on Robotics and Automation (ICRA)
- Kohl, N.¹ Stone, P.²

26
- 14044262287
- Stochastic policy gradient reinforcement learning on a simple 3D biped
- Sendai, Japan, September 28, October 2
- R. Tedrake, T. W. Zhang, and H. S. Seung, "Stochastic policy gradient reinforcement learning on a simple 3D biped," in IEEE/RSJ International Conference on Intelligent Robots and Systems IROS'04, Sendai, Japan, September 28 - October 2 2004.
- (2004) IEEE/RSJ International Conference on Intelligent Robots and Systems IROS'04
- Tedrake, R.¹ Zhang, T.W.² Seung, H.S.³

27
- 33846174631
- Learning sensory feedback to CPG with policy gradient for biped locomotion
- Barcelona, Spain, April
- T. Matsubara, J. Morimoto, J. Nakanishi, M. Sato, and K. Doya, "Learning sensory feedback to CPG with policy gradient for biped locomotion," in Proceedings of the International Conference on Robotics and Automation ICRA, Barcelona, Spain, April 2005.
- (2005) Proceedings of the International Conference on Robotics and Automation ICRA
- Matsubara, T.¹ Morimoto, J.² Nakanishi, J.³ Sato, M.⁴ Doya, K.⁵

28
- 0003413187
- 2nd ed. Prentice Hall
- S. Haykin, Neural Networks, a comprehensive foundation, 2nd ed. Prentice Hall, 1999.
- (1999) Neural Networks, a comprehensive foundation
- Haykin, S.¹

29
- 0042553279
- A. Savitzky and M. Golay, Analytical Chemistry, 1964, vol. 36, pp. 1627-1639.
- (1964) Analytical Chemistry , vol.36 , pp. 1627-1639
- Savitzky, A.¹ Golay, M.²

30
- 54249138402
- Model identification of a low-speed UUV
- Scotland UK
- P. Ridao, M. Carreras, and J. Batlle, "Model identification of a low-speed UUV," in Control Applications in Marine Systems CAMS'01, Scotland (UK), 2001.
- (2001) Control Applications in Marine Systems CAMS'01
- Ridao, P.¹ Carreras, M.² Batlle, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.