SCOPUS 정보 검색 플랫폼

2010 International Conference on Networking, Sensing and Control, ICNSC 2010

Volumn , Issue , 2010, Pages 243-248

Two-time-scale online actor-critic paradigm driven by POMDP

(3) Liu, Bo a He, Haibo a,b Repperger, Daniel W c

a Stevens Institute of Technology (United States)

b University of Rhode Island (United States)

c AIR FORCE RESEARCH LABORATORY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ACTOR CRITIC; ACTOR-CRITIC ALGORITHM; ADAPTIVE DYNAMIC PROGRAMMING; ANALYSIS AND SIMULATION; GREEDY ALGORITHMS; HIDDEN STATE; NONLINEAR FUNCTIONS; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; STATE ESTIMATORS; STOCHASTIC GRADIENT; TEMPORAL DIFFERENCES; TWO TIME SCALE;

MARKOV PROCESSES; STATE ESTIMATION;

NEURAL NETWORKS;

EID: 77953105620 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICNSC.2010.546149 Document Type: Conference Paper

Times cited : (4)

References (16)

1
- 0029679044
- Reinforcement learning: A survey
- L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

2
- 25644459607
- MIT Press, October
- E. Alpaydin, Introduction to Machine Learning. MIT Press, October 2004.
- (2004) Introduction to Machine Learning
- Alpaydin, E.¹

3
- 85153938292
- Reinforcement learning algorithm for partially observable markov decision problems
- MIT Press
- T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable markov decision problems," in Advances in Neural Information Processing Systems 7. MIT Press, 1995, pp. 345-352.
- (1995) Advances in Neural Information Processing Systems 7 , pp. 345-352
- Jaakkola, T.¹ Singh, S.P.² Jordan, M.I.³

4
- 2142812536
- Learning without stateestimation in partially observable markovian decision processes
- Morgan Kaufmann
- S. P. Singh, T. Jaakkola, and M. I. Jordan, "Learning without stateestimation in partially observable markovian decision processes," in In Proceedings of the Eleventh International Conference on Machine Learning. Morgan Kaufmann, 1994, pp. 284-292.
- (1994) In Proceedings of the Eleventh International Conference on Machine Learning , pp. 284-292
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

5
- 0347502949
- Reinforcement learning in non-markov environments
- S. D. Whitehead and L. J. Lin, "Reinforcement learning in non-markov environments," Artificial Intelligence, vol. 8, pp. 3-4, 1992.
- (1992) Artificial Intelligence , vol.8 , pp. 3-4
- Whitehead, S.D.¹ Lin, L.J.²

6
- 77953113428
- Tech. Rep.
- L. ji Lin and T. M. Mitchell, "Memory approaches to reinforcement learning in non-markovian domains," Tech. Rep., 1992.
- (1992) Memory Approaches to Reinforcement Learning in Non-markovian Domains
- Ji Lin, L.¹ Mitchell, T.M.²

7
- 0001770240
- Value-function approximations for partially observable markov decision processes
- M. Hauskrecht, "Value-function approximations for partially observable markov decision processes," Journal of Artificial Intelligence Research, vol. 13, pp. 33-94, 2000.
- (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 33-94
- Hauskrecht, M.¹

8
- 34247264897
- Tech. Rep.
- K. P. Murphy, "A Survey of POMDP Solution Techniques," Tech. Rep., 2000.
- (2000) A Survey of POMDP Solution Techniques
- Murphy, K.P.¹

9
- 0035273403
- Online learning control by association and reinforcement
- Mar
- J. Si and Y.-T. Wang, "Online learning control by association and reinforcement," IEEE Transactions on Neural Networks, vol. 12, no. 2, pp. 264-276, Mar 2001.
- (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.2 , pp. 264-276
- Si, J.¹ Wang, Y.-T.²

10
- 0000439891
- Convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. I. Jordan, and S. P. Singh, "Convergence of stochastic iterative dynamic programming algorithms," Neural Computation, vol. 6, pp. 1185-1201, 1994.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

11
- 0022783899
- Distributed asynchronous deterministic and stochastic gradient optimization algorithms
- Sep
- J. Tsitsiklis, D. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803-812, Sep 1986.
- (1986) IEEE Transactions on Automatic Control , vol.31 , Issue.9 , pp. 803-812
- Tsitsiklis, J.¹ Bertsekas, D.² Athans, M.³

12
- 0003427482
- Wiley-Interscience
- K. Miroslav Krstic, Petar V. Kokotovic, Nonlinear and Adaptive Control Design. Wiley-Interscience, 1995.
- (1995) Nonlinear and Adaptive Control Design
- Miroslav Krstic, K.¹ Kokotovic, P.V.²

13
- 84898938510
- Actor-critic algorithms
- MIT Press
- V. Konda and J. Tsitsiklis, "Actor-critic algorithms," in SIAM Journal on Control and Optimization. MIT Press, 2000, pp. 1008-1014.
- (2000) SIAM Journal on Control and Optimization , pp. 1008-1014
- Konda, V.¹ Tsitsiklis, J.²

14
- 0021376040
- Convergence of an adaptive linear estimation algorithm
- E. Eweda and O. Macchi, "Convergence of an adaptive linear estimation algorithm," IEEE Transactions on Automatic Control, vol. 29, no. 2, pp. 119-127, Feb 1984.
- (1984) IEEE Transactions on Automatic Control , vol.29 , Issue.2
- Eweda, E.¹ Macchi, O.²

15
- 0041510534
- Linear stochastic approximation driven by slowly varying markov chains
- V. R. Konda and J. N. Tsitsiklis, "Linear stochastic approximation driven by slowly varying markov chains," Systems and Control Letters, vol. 50, pp. 95-102, 2003.
- (2003) Systems and Control Letters , vol.50 , pp. 95-102
- Konda, V.R.¹ Tsitsiklis, J.N.²

16
- 61349103197
- Generating self-excited oscillations via two-relay controller
- Feb.
- L. Aguilar, I. Boiko, L. Fridman, and R. Iriarte, "Generating self-excited oscillations via two-relay controller," IEEE Transactions on Automatic Control, vol. 54, no. 2, pp. 416-420, Feb. 2009.
- (2009) IEEE Transactions on Automatic Control , vol.54 , Issue.2 , pp. 416-420
- Aguilar, L.¹ Boiko, I.² Fridman, L.³ Iriarte, R.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.