메뉴 건너뛰기




Volumn 131, Issue 4, 2009, Pages 1-8

A real-time computational learning model for sequential decision-making problems under uncertainty

Author keywords

[No Author keywords available]

Indexed keywords

ALTERNATIVE APPROACH; COMPUTATIONAL LEARNING MODEL; CONTROL POLICY; CONTROL PROBLEMS; CONTROLLED MARKOV CHAINS; DECISION-MAKING PROBLEM; DYNAMIC SYSTEMS; EVALUATION FUNCTION; EXISTING METHOD; EXPECTED COSTS; LEARNING MECHANISM; LEARNING MODELS; OPTIMAL CONTROL POLICY; POLE BALANCING; REAL TIME; SIMULATION-BASED; STATE SPACE REPRESENTATION; STATE TRANSITIONS; STOCHASTIC DISTURBANCES; STOCHASTIC FRAMEWORK; SYSTEM RESPONSE;

EID: 77955875058     PISSN: 00220434     EISSN: 15289028     Source Type: Journal    
DOI: 10.1115/1.3117200     Document Type: Article
Times cited : (21)

References (41)
  • 2
    • 0742319170 scopus 로고    scopus 로고
    • Reinforcement Learning for Long-Run Average Cost
    • Gosavi, A., 2004, "Reinforcement Learning for Long-Run Average Cost," Eur. J. Oper. Res., 155, pp. 654-74.
    • (2004) Eur. J. Oper. Res. , vol.155 , pp. 654-674
    • Gosavi, A.1
  • 3
    • 0003487482 scopus 로고    scopus 로고
    • (Optimization and Neural Computation Series, 3)
    • Bertsekas, D. P. and Tsitsiklis, J. N., 1996, "Neuro-Dynamic Programming" (Optimization and Neural Computation Series, 3), 1st ed., Athena Scientific, Nashua, NH.
    • (1996) Neuro-Dynamic Programming
    • Bertsekas, D.P.1    Tsitsiklis, J.N.2
  • 6
    • 0001201756 scopus 로고
    • Some Studies in Machine Learning Using the Game of Checkers
    • Samuel, A. L., 1959, "Some Studies in Machine Learning Using the Game of Checkers," IBM J. Res. Dev., 3, pp. 210-229.
    • (1959) IBM J. Res. Dev. , vol.3 , pp. 210-229
    • Samuel, A.L.1
  • 7
    • 0001201757 scopus 로고
    • Some Studies in Machine Learning Using the Game of Checkers. II: Recent Progress
    • Samuel, A. L., 1967, "Some Studies in Machine Learning Using the Game of Checkers. II: Recent Progress," IBM J. Res. Develop., 11, pp. 601-617.
    • (1967) IBM J. Res. Develop. , vol.11 , pp. 601-617
    • Samuel, A.L.1
  • 9
    • 33847202724 scopus 로고
    • Learning to Predict by the Methods of Temporal Difference
    • Sutton, R. S., 1988, "Learning to Predict by the Methods of Temporal Difference," Mach. Learn., 3, pp. 9-44.
    • (1988) Mach. Learn. , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 14
    • 0029752592 scopus 로고    scopus 로고
    • Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results
    • Mahadevan, S., 1996, "Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results," Mach. Learn., 22, pp. 159-195.
    • (1996) Mach. Learn. , vol.22 , pp. 159-195
    • Mahadevan, S.1
  • 15
    • 85132026293 scopus 로고
    • Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
    • Austin, TX
    • Sutton, R. S., 1990, "Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming," Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, pp. 216-224.
    • (1990) Proceedings of the Seventh International Conference on Machine Learning , pp. 216-224
    • Sutton, R.S.1
  • 17
    • 0027684215 scopus 로고
    • Prioritized Sweeping: Reinforcement Learning With Less Data and Less Time
    • Moore, A. W., and Atkinson, C. G., 1993, "Prioritized Sweeping: Reinforcement Learning With Less Data and Less Time," Mach. Learn., 13, pp. 103-30.
    • (1993) Mach. Learn. , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkinson, C.G.2
  • 19
    • 0029210635 scopus 로고
    • Learning to Act Using Real-Time Dynamic Programming
    • Barto, A. G., Bradtke, S. J., and Singh, S. P., 1995, "Learning to Act Using Real-Time Dynamic Programming," Artif. Intell., 72, pp. 81-138.
    • (1995) Artif. Intell. , vol.72 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 27
    • 0036592028 scopus 로고    scopus 로고
    • Control of Exploitation-Exploration Meta-Parameter in Reinforcement Learning
    • Ishii, S., Yoshida, W., and Yoshimoto, J., 2002, "Control of Exploitation-Exploration Meta-Parameter in Reinforcement Learning," Neural Networks, 15, pp. 665-87.
    • (2002) Neural Networks , vol.15 , pp. 665-687
    • Ishii, S.1    Yoshida, W.2    Yoshimoto, J.3
  • 28
    • 44349177251 scopus 로고    scopus 로고
    • Implementation of the Agent Using Universal On-Line Q-Learning by Balancing Exploration and Exploitation in Reinforcement Learning
    • Chan-Geon, P., and Sung-Bong, Y., 2003, "Implementation of the Agent Using Universal On-Line Q-Learning by Balancing Exploration and Exploitation in Reinforcement Learning," Journal of KISS: Software and Applications, 30, pp. 672-80.
    • (2003) Journal of KISS: Software and Applications , vol.30 , pp. 672-680
    • Chan-Geon, P.1    Sung-Bong, Y.2
  • 29
    • 33746840174 scopus 로고    scopus 로고
    • Marco Polo: A Reinforcement Learning System Considering Tradeoff Exploitation and Exploration Under Markovian Environments
    • Miyazaki, K., and Yamamura, M., 1997, "Marco Polo: A Reinforcement Learning System Considering Tradeoff Exploitation and Exploration Under Markovian Environments," Journal of Japanese Society for Artificial Intelligence, 12, pp. 78-89.
    • (1997) Journal of Japanese Society for Artificial Intelligence , vol.12 , pp. 78-89
    • Miyazaki, K.1    Yamamura, M.2
  • 31
  • 32
    • 0024646143 scopus 로고
    • Learning to Control an Inverted Pendulum Using Neural Networks
    • Anderson, C. W., 1989, "Learning to Control an Inverted Pendulum Using Neural Networks," IEEE Control Syst. Mag., 9, pp. 31-7.
    • (1989) IEEE Control Syst. Mag. , vol.9 , pp. 31-37
    • Anderson, C.W.1
  • 37
    • 0035273403 scopus 로고    scopus 로고
    • On-line Learning Control by Association and Reinforcement
    • Si, J., and Wang, Y. T., 2001, "On-line Learning Control by Association and Reinforcement," IEEE Trans. Neural Netw., 12, pp. 264-276.
    • (2001) IEEE Trans. Neural Netw. , vol.12 , pp. 264-276
    • Si, J.1    Wang, Y.T.2
  • 38
    • 0029178384 scopus 로고
    • Learning Control Based on Pattern Recognition Applied to Vehicle Cruise Control Systems
    • Seattle, WA
    • Zhang, B. S., Leigh, I., and Leigh, J. R., 1995, "Learning Control Based on Pattern Recognition Applied to Vehicle Cruise Control Systems," Proceedings of the American Control Conference, Seattle, WA, pp. 3101-3105.
    • (1995) Proceedings of the American Control Conference , pp. 3101-3105
    • Zhang, B.S.1    Leigh, I.2    Leigh, J.R.3
  • 40
    • 77955876912 scopus 로고    scopus 로고
    • TESIS
    • TESIS, http://www.tesis.de/en/.
  • 41
    • 26444601262 scopus 로고    scopus 로고
    • Cooperative Multi-Agent Learning: The State of the Art
    • Panait, L., and Luke, S., 2005, "Cooperative Multi-Agent Learning: The State of the Art," Auton. Agents Multi-Agent Syst., 11, pp. 387-434.
    • (2005) Auton. Agents Multi-Agent Syst. , vol.11 , pp. 387-434
    • Panait, L.1    Luke, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.