메뉴 건너뛰기




Volumn 45, Issue , 2012, Pages 515-564

Safe exploration of state and action spaces in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ACTION SPACES; BUSINESS MANAGEMENT; COMPLEX TRANSITIONS; CONTINUOUS STATE; EXPLORATION TECHNIQUES; HIGH-DIMENSIONAL; ROBUST BEHAVIOR; TRIAL-AND-ERROR PROCESS;

EID: 84875199879     PISSN: None     EISSN: 10769757     Source Type: Journal    
DOI: 10.1613/jair.3761     Document Type: Article
Times cited : (162)

References (62)
  • 1
    • 0028401306 scopus 로고
    • Case-based reasoning; foundational issues, methodological variations, and system approaches
    • Aamodt, A., & Plaza, E. (1994). Case-Based Reasoning; Foundational Issues, Methodological Variations, and System Approaches. AI Communications, 7 (1), 39-59.
    • (1994) AI Communications , vol.7 , Issue.1 , pp. 39-59
    • Aamodt, A.1    Plaza, E.2
  • 2
    • 84883027643 scopus 로고    scopus 로고
    • Autonomous Autorotation of an RC Helicopter
    • Abbeel, P., Coates, A., Hunter, T., & Ng, A. Y. (2008). Autonomous Autorotation of an RC Helicopter. In ISER, pp. 385-394.
    • (2008) ISER , pp. 385-394
    • Abbeel, P.1    Coates, A.2    Hunter, T.3    Ng, A.Y.4
  • 3
    • 77955809093 scopus 로고    scopus 로고
    • Autonomous helicopter aerobatics through apprenticeship learning. I
    • Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. I. J. Robotic Res., 29 (13), 1608-1639.
    • (2010) J. Robotic Res. , vol.29 , Issue.13 , pp. 1608-1639
    • Abbeel, P.1    Coates, A.2    Ng, A.Y.3
  • 4
    • 50249164874 scopus 로고    scopus 로고
    • Robocup 2007: Robot soccer world cup xi
    • Springer-Verlag, Berlin, Heidelberg
    • Abbott, R. G. (2008). Robocup 2007: Robot soccer world cup xi.. chap. Behavioral Cloning for Simulator Validation, pp. 329-336. Springer-Verlag, Berlin, Heidelberg.
    • (2008) Chap. Behavioral Cloning for Simulator Validation , pp. 329-336
    • Abbott, R.G.1
  • 5
    • 0000217085 scopus 로고
    • Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
    • Aha, D. W. (1992). Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms. International Journal Man-Machine Studies, 36 (2), 267-287.
    • (1992) International Journal Man-Machine Studies , vol.36 , Issue.2 , pp. 267-287
    • Aha, D.W.1
  • 6
    • 0025725905 scopus 로고
    • Instance-based learning algorithms
    • Aha, D. W., & Kibler, D. (1991). Instance-based learning algorithms. In Machine Learning, pp. 37-66.
    • (1991) Machine Learning , pp. 37-66
    • Aha, D.W.1    Kibler, D.2
  • 9
    • 84901708832 scopus 로고    scopus 로고
    • Case-based reasoning: Survey and future directions
    • Puppe, F. (Ed.), Vol. 1570 of Lecture Notes in Computer Science, Springer
    • Bartsch-Sprl, B., Lenz, M., & Hbner, A. (1999). Case-based reasoning: Survey and future directions.. In Puppe, F. (Ed.), XPS, Vol. 1570 of Lecture Notes in Computer Science, pp. 67-89. Springer.
    • (1999) XPS , pp. 67-89
    • Bartsch-Sprl, B.1    Lenz, M.2    Hbner, A.3
  • 10
    • 70350352555 scopus 로고    scopus 로고
    • Improving reinforcement learning by using case-based heuristics
    • Springer, Lecture Notes in Artificial Intelligence, Springer
    • Bianchi, R., Ros, R., & de Mántaras, R. L. (2009). Improving reinforcement learning by using case-based heuristics.. Vol. 5650, pp. 75-89. Lecture Notes in Artificial Intelligence, Springer, Lecture Notes in Artificial Intelligence, Springer.
    • (2009) Lecture Notes in Artificial Intelligence , vol.5650 , pp. 75-89
    • Bianchi, R.1    Ros, R.2    De Mántaras, R.L.3
  • 12
    • 84875147306 scopus 로고
    • Proceedings of the workshop on value function approximation, machine learning conference 1995
    • Boyan, J., Moore, A., & Sutton, R. (1995). Proceedings of the workshop on value function approximation, machine learning conference 1995... Technical Report CMU-CS-95- 206.
    • (1995) Technical Report CMU-CS-95- 206
    • Boyan, J.1    Moore, A.2    Sutton, R.3
  • 15
    • 0007512578 scopus 로고
    • Truncating temporal differences: On the efficient implementation of td(lambda) for reinforcement learning
    • Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of td(lambda) for reinforcement learning. Journal of Artificial Intelligence Research (JAIR), 2, 287-318.
    • (1995) Journal of Artificial Intelligence Research (JAIR) , vol.2 , pp. 287-318
    • Cichosz, P.1
  • 17
    • 0033077715 scopus 로고    scopus 로고
    • Risk-sensitive and minimax control of discrete- time, finite-state markov decision processes
    • Coraluppi, S. P., & Marcus, S. I. (1999). Risk-Sensitive and Minimax Control of Discrete- Time, Finite-State Markov Decision Processes. AUTOMATICA, 35, 301-309.
    • (1999) Automatica , vol.35 , pp. 301-309
    • Coraluppi, S.P.1    Marcus, S.I.2
  • 20
    • 4444312102 scopus 로고    scopus 로고
    • Integrating guidance into relational reinforcement learning
    • Driessens, K., & Dẑeroski, S. (2004). Integrating guidance into relational reinforcement learning. Machine Learning, 57 (3), 271-304.
    • (2004) Machine Learning , vol.57 , Issue.3 , pp. 271-304
    • Driessens, K.1    Dẑeroski, S.2
  • 21
    • 39549117816 scopus 로고    scopus 로고
    • Local feature weighting in nearest prototype classification
    • Fernandez, F., & Isasi, P. (2008). Local feature weighting in nearest prototype classification. Neural Networks, IEEE Transactions on, 19 (1), 40-53.
    • (2008) Neural Networks, IEEE Transactions on , vol.19 , Issue.1 , pp. 40-53
    • Fernandez, F.1    Isasi, P.2
  • 28
    • 31144477417 scopus 로고    scopus 로고
    • Risk-sensitive reinforcement learning applied to control under constraints
    • Geibel, P., & Wysotzki, F. (2005). Risk-sensitive Reinforcement Learning Applied to Control under Constraints. Journal of Artificial Intelligence Research (JAIR), 24, 81-108.
    • (2005) Journal of Artificial Intelligence Research (JAIR) , vol.24 , pp. 81-108
    • Geibel, P.1    Wysotzki, F.2
  • 33
    • 84867438662 scopus 로고    scopus 로고
    • Essex wizards 2001 team description
    • Birk, A. Coradeschi, S. & Tadokoro, S. (Eds.), Vol. 2377 of Lecture Notes in Computer Science, Springer
    • Hu, H., Kostiadis, K., Hunter, M., & Kalyviotis, N. (2001). Essex wizards 2001 team description. In Birk, A., Coradeschi, S., & Tadokoro, S. (Eds.), RoboCup, Vol. 2377 of Lecture Notes in Computer Science, pp. 511-514. Springer.
    • (2001) RoboCup , pp. 511-514
    • Hu, H.1    Kostiadis, K.2    Hunter, M.3    Kalyviotis, N.4
  • 38
    • 80955137547 scopus 로고    scopus 로고
    • Neuroevolutionary reinforcement learning for generalized control of simulated helicopters
    • Koppejan, R., & Whiteson, S. (2011). Neuroevolutionary reinforcement learning for generalized control of simulated helicopters. Evolutionary Intelligence, 4, 219-241.
    • (2011) Evolutionary Intelligence , vol.4 , pp. 219-241
    • Koppejan, R.1    Whiteson, S.2
  • 41
    • 9444276079 scopus 로고    scopus 로고
    • Reinforcement learning for average reward zero-sum games
    • Shawe- Taylor, J. & Singer, Y. (Eds.), Vol. 3120 of Lecture Notes in Computer Science, Springer
    • Mannor, S. (2004). Reinforcement learning for average reward zero-sum games. In Shawe- Taylor, J., & Singer, Y. (Eds.), COLT, Vol. 3120 of Lecture Notes in Computer Science, pp. 49-63. Springer.
    • (2004) COLT , pp. 49-63
    • Mannor, S.1
  • 44
    • 0036832952 scopus 로고    scopus 로고
    • Risk-Sensitive reinforcement learning
    • Mihatsch, O., & Neuneier, R. (2002). Risk-Sensitive reinforcement learning. Machine Learning, 49 (2-3), 267-290.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 267-290
    • Mihatsch, O.1    Neuneier, R.2
  • 48
    • 3042583887 scopus 로고    scopus 로고
    • Autonomous helicopter flight via reinforcement learning
    • Thrun, S. Saul, L. K. & Scholkopf, B. (Eds.), MIT Press
    • Ng, A. Y., Kim, H. J., Jordan, M. I., & Sastry, S. (2003). Autonomous Helicopter Flight via Reinforcement Learning. In Thrun, S., Saul, L. K., & Scholkopf, B. (Eds.), NIPS. MIT Press.
    • (2003) NIPS
    • Ng, A.Y.1    Kim, H.J.2    Jordan, M.I.3    Sastry, S.4
  • 50
    • 0242667271 scopus 로고    scopus 로고
    • Genetic programming with user-driven selection: Experiments on the evolution of algorithms for image enhancement
    • Morgan Kaufmann
    • Poli, R., & Cagnoni, S. (1997). Genetic programming with user-driven selection: Experiments on the evolution of algorithms for image enhancement. In Genetic Programming 1997: Proceedings of the Second Annual Conference, pp. 269-277. Morgan Kaufmann.
    • (1997) Genetic Programming 1997: Proceedings of the Second Annual Conference , pp. 269-277
    • Poli, R.1    Cagnoni, S.2
  • 52
    • 0031231885 scopus 로고    scopus 로고
    • Experiments with reinforcement learning in problems with continuous state and action spaces
    • Santamaría, J. C., Sutton, R. S., & Ram, A. (1998). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6, 163-218.
    • (1998) Adaptive Behavior , vol.6 , pp. 163-218
    • Santamaría, J.C.1    Sutton, R.S.2    Ram, A.3
  • 55
    • 0001898381 scopus 로고    scopus 로고
    • Practical reinforcement learning in continuous spaces
    • Morgan Kaufmann
    • Smart, W. D., & Kaelbling, L. P. (2000). Practical reinforcement learning in continuous spaces. In Artificial Intelligence, pp. 903-910. Morgan Kaufmann.
    • (2000) Artificial Intelligence , pp. 903-910
    • Smart, W.D.1    Kaelbling, L.P.2
  • 56
    • 0036058423 scopus 로고    scopus 로고
    • Effective reinforcement learning for mobile robots
    • IEEE
    • Smart, W. D., & Kaelbling, L. P. (2002). Effective reinforcement learning for mobile robots. In ICRA, pp. 3404-3410. IEEE.
    • (2002) ICRA , pp. 3404-3410
    • Smart, W.D.1    Kaelbling, L.P.2
  • 62
    • 0033362601 scopus 로고    scopus 로고
    • Evolving artificial neural networks
    • Yao, X. (1999). Evolving artificial neural networks. PIEEE: Proceedings of the IEEE, 87, 1423-1447.
    • (1999) PIEEE: Proceedings of the IEEE , vol.87 , pp. 1423-1447
    • Yao, X.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.