메뉴 건너뛰기




Volumn 19, Issue , 2003, Pages 569-629

Accelerating reinforcement learning through implicit imitation

Author keywords

[No Author keywords available]

Indexed keywords

FORMAL LOGIC; MATHEMATICAL MODELS; MULTI AGENT SYSTEMS;

EID: 27344432348     PISSN: 10769757     EISSN: 10769757     Source Type: Journal    
DOI: 10.1613/jair.898     Document Type: Article
Times cited : (148)

References (73)
  • 5
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton
    • Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 9
    • 0002803122 scopus 로고    scopus 로고
    • Drama, a connectionist architecturefor control and learning in autonomous robots
    • Billard, A., & Hayes, G. (1999). Drama, a connectionist architecturefor control and learning in autonomous robots. Adaptive Behavior Journal, 7, 35-64.
    • (1999) Adaptive Behavior Journal , vol.7 , pp. 35-64
    • Billard, A.1    Hayes, G.2
  • 12
    • 0346942368 scopus 로고    scopus 로고
    • Decision theoretic planning: Structural assumptions and computational leverage
    • Boutilier, C., Dean, T., & Hanks, S. (1999). Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.
    • (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
    • Boutilier, C.1    Dean, T.2    Hanks, S.3
  • 15
    • 0032426663 scopus 로고    scopus 로고
    • Learning by imitation: A hierarchical approach
    • Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: a hierarchical approach. Behavioral and Brain Sciences, 21, 667-721.
    • (1998) Behavioral and Brain Sciences , vol.21 , pp. 667-721
    • Byrne, R.W.1    Russon, A.E.2
  • 19
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • Crites, R., & Barto, A. G. (1998). Elevator group control using multiple reinforcement learning agents. Machine-Learning, 55(2-3), 235-62.
    • (1998) Machine-learning , vol.55 , Issue.2-3 , pp. 235-262
    • Crites, R.1    Barto, A.G.2
  • 21
    • 0030697013 scopus 로고    scopus 로고
    • Abstraction and approximate decision theoretic planning
    • Dearden, R., & Boutilier, C. (1997). Abstraction and approximate decision theoretic planning. Artificial Intelligence, 89, 219-283.
    • (1997) Artificial Intelligence , vol.89 , pp. 219-283
    • Dearden, R.1    Boutilier, C.2
  • 26
    • 0026546002 scopus 로고
    • Observational learning in octopus vulgaris
    • Fiorito, G., &: Scotto, P. (1992). Observational learning in octopus vulgaris. Science, 256, 545-47.
    • (1992) Science , vol.256 , pp. 545-547
    • Fiorito, G.1    Scotto, P.2
  • 27
    • 0010221544 scopus 로고    scopus 로고
    • Practical reinforcement learning in continuous domains
    • Computer Science Division, University of California, Berkeley
    • Forbes, J., & Andre, D. (2000). Practical reinforcement learning in continuous domains. Tech. rep. UCB/CSD-00-1109, Computer Science Division, University of California, Berkeley.
    • (2000) Tech. Rep. , vol.UCB-CSD-00-1109
    • Forbes, J.1    Andre, D.2
  • 28
    • 0030149710 scopus 로고    scopus 로고
    • Robot programming by demonstration (RPD): Support the induction by human interaction
    • Friedrich, H., Munch, S., Dillmann, R., Bocionek, S., & Sassin, M. (1996). Robot programming by demonstration (RPD): Support the induction by human interaction. Machine Learning, 23, 163-189.
    • (1996) Machine Learning , vol.23 , pp. 163-189
    • Friedrich, H.1    Munch, S.2    Dillmann, R.3    Bocionek, S.4    Sassin, M.5
  • 34
    • 0028740409 scopus 로고
    • Learning by watching: Extracting reusable task knowledge from visual observation of human performance
    • Kuniyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation, 10(6), 799-822.
    • (1994) IEEE Transactions on Robotics and Automation , vol.10 , Issue.6 , pp. 799-822
    • Kuniyoshi, Y.1    Inaba, M.2    Inoue, H.3
  • 36
    • 0013194218 scopus 로고
    • Mondrian: A teachable graphical editor
    • Cypher, A. (Ed.). MIT Press, Cambridge, MA
    • Lieberman, H. (1993). Mondrian: A teachable graphical editor. In Cypher, A. (Ed.), Watch What I Do: Programming by Demonstration, pp. 340-358. MIT Press, Cambridge, MA.
    • (1993) Watch What I Do: Programming by Demonstration , pp. 340-358
    • Lieberman, H.1
  • 38
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293-321.
    • (1992) Machine Learning , vol.8 , pp. 293-321
    • Lin, L.-J.1
  • 40
    • 0002679852 scopus 로고
    • A survey of algorithmic methods for partially observed Markov decision processes
    • Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28, 47-66.
    • (1991) Annals of Operations Research , vol.28 , pp. 47-66
    • Lovejoy, W.S.1
  • 41
    • 0032335478 scopus 로고    scopus 로고
    • Using communication to reduce locality in distributed multi-agent learning
    • Mataric, M. J. (1998). Using communication to reduce locality in distributed multi-agent learning. Journal Experimental and Theoretical Artificial Intelligence, 10(3), 357-369.
    • (1998) Journal Experimental and Theoretical Artificial Intelligence , vol.10 , Issue.3 , pp. 357-369
    • Mataric, M.J.1
  • 43
    • 0032679082 scopus 로고    scopus 로고
    • Exploration of multi-state environments: Local mesures and back-propagation of uncertainty
    • Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local mesures and back-propagation of uncertainty. Machine Learning, 32(2), 117-154.
    • (1999) Machine Learning , vol.32 , Issue.2 , pp. 117-154
    • Meuleau, N.1    Bourgine, P.2
  • 45
    • 0038144813 scopus 로고
    • Knowledge, learning and machine intelligence
    • Sterling, L. (Ed.), Plenum Press, New York
    • Michie, D. (1993). Knowledge, learning and machine intelligence. In Sterling, L. (Ed.), Intelligent Systems. Plenum Press, New York.
    • (1993) Intelligent Systems
    • Michie, D.1
  • 47
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less real time
    • Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13(1), 103-30.
    • (1993) Machine Learning , vol.13 , Issue.1 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 52
    • 27344456317 scopus 로고    scopus 로고
    • Cultural transmission of communications systems: Comparing observational and reinforcement learning models
    • Edinburgh
    • Oliphant, M. (1999). Cultural transmission of communications systems: Comparing observational and reinforcement learning models. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 47-54 Edinburgh.
    • (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 47-54
    • Oliphant, M.1
  • 55
    • 0027619739 scopus 로고
    • Imitation in free-ranging rehabilitant orangutans (pongo-pygmaeus)
    • Russon, A., & Galdikas, B. (1993). Imitation in free-ranging rehabilitant orangutans (pongo-pygmaeus). Journal of Comparative Psychology, 107(2), 147-161.
    • (1993) Journal of Comparative Psychology , vol.107 , Issue.2 , pp. 147-161
    • Russon, A.1    Galdikas, B.2
  • 61
    • 84898972974 scopus 로고    scopus 로고
    • Reinforcement learning for dynamic channel allocation in cellular telephone systems
    • Cambridge, MA. MIT Press
    • Singh, S. P., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural information processing systems, pp. 974-980 Cambridge, MA. MIT Press.
    • (1997) Advances in Neural Information Processing Systems , pp. 974-980
    • Singh, S.P.1    Bertsekas, D.2
  • 62
    • 0015658957 scopus 로고
    • The optimal control of partially observable Markov processes over a finite horizon
    • Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071-1088.
    • (1973) Operations Research , vol.21 , pp. 1071-1088
    • Smallwood, R.D.1    Sondik, E.J.2
  • 64
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 66
    • 85152198941 scopus 로고
    • Multi-agent reinforcement learning: Independent vs. cooperative agents
    • Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In ICML-93, pp. 330-37.
    • (1993) ICML-93 , pp. 330-337
    • Tan, M.1
  • 70
    • 0002913504 scopus 로고
    • Do monkeys ape?
    • Parker, S., & Gibson, K. (Eds.), Cambridge University Press, Cambridge
    • Visalberghi, E., & Fragazy, D. (1990). Do monkeys ape?. In Parker, S., & Gibson, K. (Eds.), Language and Intelligence in Monkeys and Apes, pp. 247-273. Cambridge University Press, Cambridge.
    • (1990) Language and Intelligence in Monkeys and Apes , pp. 247-273
    • Visalberghi, E.1    Fragazy, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.