SCOPUS 정보 검색 플랫폼

Journal of Artificial Intelligence Research

Volumn 19, Issue , 2003, Pages 569-629

Accelerating reinforcement learning through implicit imitation

(2) Price, Bob a Boutilier, Craig b

a UNIVERSITY OF BRITISH COLUMBIA (Canada)

b UNIVERSITY OF TORONTO (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

FORMAL LOGIC; MATHEMATICAL MODELS; MULTI AGENT SYSTEMS;

LEARNING AGENTS; REINFORCEMENT LEARNING;

LEARNING SYSTEMS;

EID: 27344432348 PISSN: 10769757 EISSN: 10769757 Source Type: Journal
DOI: 10.1613/jair.898 Document Type: Article

Times cited : (148)

References (73)

1
- 17144416297
- Learning how to do things with imitation
- Bauer, M., & Rich, C. (Eds.), Cape Cod, MA
- Alissandrakis, A., Nehaniv, C. L., & Dautenhahn, K. (2000). Learning how to do things with imitation. In Bauer, M., & Rich, C. (Eds.), AAAI Fall Symposium on Learning How to Do Things, pp. 1-6 Cape Cod, MA.
- (2000) AAAI Fall Symposium on Learning How to Do Things , pp. 1-6
- Alissandrakis, A.¹ Nehaniv, C.L.² Dautenhahn, K.³

2
- 0002130986
- Robot learning from demonstration
- Nashville, TN
- Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 12-20 Nashville, TN.
- (1997) Proceedings of the Fourteenth International Conference on Machine Learning , pp. 12-20
- Atkeson, C.G.¹ Schaal, S.²

3
- 0031073475
- Locally weighted learning for control
- Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning for control. Artificial Intelligence Review, 11(1-5), 75-113.
- (1997) Artificial Intelligence Review , vol.11 , Issue.1-5 , pp. 75-113
- Atkeson, C.G.¹ Moore, A.W.² Schaal, S.³

4
- 0002734328
- Robot see, robot do: An overview of robot imitation
- Brighton,UK
- Bakker, P., & Kuniyoshi, Y. (1996). Robot see, robot do: An overview of robot imitation. In AISB96 Workshop on Learning in Robots and Animals, pp. 3-11 Brighton,UK.
- (1996) AISB96 Workshop on Learning in Robots and Animals , pp. 3-11
- Bakker, P.¹ Kuniyoshi, Y.²

5
- 0003787146
- Princeton University Press, Princeton
- Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton.
- (1957) Dynamic Programming
- Bellman, R.E.¹

6
- 0003565779
- Prentice-Hall, Englewood Cliffs
- Bertsekas, D. P. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs.
- (1987) Dynamic Programming: Deterministic and Stochastic Models
- Bertsekas, D.P.¹

7
- 0003487482
- Athena, Belmont, MA
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic Programming. Athena, Belmont, MA.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

8
- 84956662672
- Learning to communicate through imitation in autonomous robots
- Lausanne, Switzerland
- Billard, A., & Hayes, G. (1997). Learning to communicate through imitation in autonomous robots. In Proceedings of The Seventh International Conference on Artificial Neural Networks, pp. 763-68 Lausanne, Switzerland.
- (1997) Proceedings of the Seventh International Conference on Artificial Neural Networks , pp. 763-768
- Billard, A.¹ Hayes, G.²

9
- 0002803122
- Drama, a connectionist architecturefor control and learning in autonomous robots
- Billard, A., & Hayes, G. (1999). Drama, a connectionist architecturefor control and learning in autonomous robots. Adaptive Behavior Journal, 7, 35-64.
- (1999) Adaptive Behavior Journal , vol.7 , pp. 35-64
- Billard, A.¹ Hayes, G.²

10
- 27344441033
- Imitation skills as a means to enhance learning of a synthetic proto-language in an autonomous robot
- Edinburgh
- Billard, A., Hayes, G., & Dautenhahn, K. (1999). Imitation skills as a means to enhance learning of a synthetic proto-language in an autonomous robot. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 88-95 Edinburgh.
- (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 88-95
- Billard, A.¹ Hayes, G.² Dautenhahn, K.³

11
- 84880690163
- Sequential optimality and coordination in multiagent systems
- Stockholm
- Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 478-485 Stockholm.
- (1999) Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence , pp. 478-485
- Boutilier, C.¹

12
- 0346942368
- Decision theoretic planning: Structural assumptions and computational leverage
- Boutilier, C., Dean, T., & Hanks, S. (1999). Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.
- (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
- Boutilier, C.¹ Dean, T.² Hanks, S.³

13
- 84880865940
- Rational and convergent learning in stochastic games
- Seattle
- Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 1021-1026 Seattle.
- (2001) Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence , pp. 1021-1026
- Bowling, M.¹ Veloso, M.²

14
- 2942593503
- Imitation as social exchange between humans and robot
- Edinburgh
- Breazeal, C. (1999). Imitation as social exchange between humans and robot. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 96-104 Edinburgh.
- (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 96-104
- Breazeal, C.¹

15
- 0032426663
- Learning by imitation: A hierarchical approach
- Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: a hierarchical approach. Behavioral and Brain Sciences, 21, 667-721.
- (1998) Behavioral and Brain Sciences , vol.21 , pp. 667-721
- Byrne, R.W.¹ Russon, A.E.²

16
- 27344446157
- Imitating human performances to automatically generate expressive jazz ballads
- Edinburgh
- Cañamero, D., Arcos, J. L., & de Mantaras, R. L. (1999). Imitating human performances to automatically generate expressive jazz ballads. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 115-20 Edinburgh.
- (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 115-120
- Cañamero, D.¹ Arcos, J.L.² De Mantaras, R.L.³

17
- 0028564629
- Acting optimally in partially observable stochastic domains
- Seattle
- Cassandra, A. R., Kaelbling, L. P., & Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pp. 1023-1028 Seattle.
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence , pp. 1023-1028
- Cassandra, A.R.¹ Kaelbling, L.P.² Littman, M.L.³

18
- 14044265957
- Intelligent social learning
- Birmingham
- Conte, R. (2000). Intelligent social learning. In Proceedings of the AISB'00 Symposium on Starting from Society: the Applications of Social Analogies to Computational Systems Birmingham.
- (2000) Proceedings of the AISB'00 Symposium on Starting from Society: The Applications of Social Analogies to Computational Systems
- Conte, R.¹

19
- 0032208335
- Elevator group control using multiple reinforcement learning agents
- Crites, R., & Barto, A. G. (1998). Elevator group control using multiple reinforcement learning agents. Machine-Learning, 55(2-3), 235-62.
- (1998) Machine-learning , vol.55 , Issue.2-3 , pp. 235-262
- Crites, R.¹ Barto, A.G.²

20
- 0031370386
- Model minimization in Markov decision processes
- Providence
- Dean, T., & Givan, R. (1997). Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pp. 106-111 Providence.
- (1997) Proceedings of the Fourteenth National Conference on Artificial Intelligence , pp. 106-111
- Dean, T.¹ Givan, R.²

21
- 0030697013
- Abstraction and approximate decision theoretic planning
- Dearden, R., & Boutilier, C. (1997). Abstraction and approximate decision theoretic planning. Artificial Intelligence, 89, 219-283.
- (1997) Artificial Intelligence , vol.89 , pp. 219-283
- Dearden, R.¹ Boutilier, C.²

22
- 1142281527
- Model-based bayesian exploration
- Stockholm
- Dearden, R., Friedman, N., & Andre, D. (1999). Model-based bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 150-159 Stockholm.
- (1999) Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence , pp. 150-159
- Dearden, R.¹ Friedman, N.² Andre, D.³

23
- 0004234657
- Addison-Wesley, Reading, MA
- DeGroot, M. H. (1975). Probability and statistics. Addison-Wesley, Reading, MA.
- (1975) Probability and Statistics
- DeGroot, M.H.¹

24
- 0345378770
- Do robots ape?
- Cambridge, MA
- Demiris, J., &: Hayes, G. (1997). Do robots ape?. In Proceedings of the AAAI Fall Symposium on Socially Intelligent Agents, pp. 28-31 Cambridge, MA.
- (1997) Proceedings of the AAAI Fall Symposium on Socially Intelligent Agents , pp. 28-31
- Demiris, J.¹ Hayes, G.²

25
- 0002947644
- Active and passive routes to imitation
- Edinburgh
- Demiris, J., & Hayes, G. (1999). Active and passive routes to imitation. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 81-87 Edinburgh.
- (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 81-87
- Demiris, J.¹ Hayes, G.²

26
- 0026546002
- Observational learning in octopus vulgaris
- Fiorito, G., &: Scotto, P. (1992). Observational learning in octopus vulgaris. Science, 256, 545-47.
- (1992) Science , vol.256 , pp. 545-547
- Fiorito, G.¹ Scotto, P.²

27
- 0010221544
- Practical reinforcement learning in continuous domains
- Computer Science Division, University of California, Berkeley
- Forbes, J., & Andre, D. (2000). Practical reinforcement learning in continuous domains. Tech. rep. UCB/CSD-00-1109, Computer Science Division, University of California, Berkeley.
- (2000) Tech. Rep. , vol.UCB-CSD-00-1109
- Forbes, J.¹ Andre, D.²

28
- 0030149710
- Robot programming by demonstration (RPD): Support the induction by human interaction
- Friedrich, H., Munch, S., Dillmann, R., Bocionek, S., & Sassin, M. (1996). Robot programming by demonstration (RPD): Support the induction by human interaction. Machine Learning, 23, 163-189.
- (1996) Machine Learning , vol.23 , pp. 163-189
- Friedrich, H.¹ Munch, S.² Dillmann, R.³ Bocionek, S.⁴ Sassin, M.⁵

29
- 0003881270
- Prentice-Hall, Englewood Cliffs
- Hartmanis, J., & Stearns, R. E. (1966). Algebraic Structure Theory of Sequential Machines. Prentice-Hall, Englewood Cliffs.
- (1966) Algebraic Structure Theory of Sequential Machines
- Hartmanis, J.¹ Stearns, R.E.²

30
- 0000929496
- Multiagent reinforcement learning: Theoretical framework and an algorithm
- Madison, WI
- Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifthteenth International Conference on Machine Learning, pp. 242-250 Madison, WI.
- (1998) Proceedings of the Fifthteenth International Conference on Machine Learning , pp. 242-250
- Hu, J.¹ Wellman, M.P.²

31
- 0004280606
- MIT Press, Cambridge,MA
- Kaelbling, L. P. (1993). Learning in Embedded Systems. MIT Press, Cambridge,MA.
- (1993) Learning in Embedded Systems
- Kaelbling, L.P.¹

32
- 0029679044
- Reinforcement learning: A survey
- Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence. Research, 4, 237-285.
- (1996) Journal of Artificial Intelligence. Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

33
- 0012257655
- Near-optimal reinforcement learning in polynomial time
- Madison, WI
- Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. In Proceedings of the Fifthteenth International Conference on Machine Learning, pp. 260-268 Madison, WI.
- (1998) Proceedings of the Fifthteenth International Conference on Machine Learning , pp. 260-268
- Kearns, M.¹ Singh, S.²

34
- 0028740409
- Learning by watching: Extracting reusable task knowledge from visual observation of human performance
- Kuniyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation, 10(6), 799-822.
- (1994) IEEE Transactions on Robotics and Automation , vol.10 , Issue.6 , pp. 799-822
- Kuniyoshi, Y.¹ Inaba, M.² Inoue, H.³

35
- 0026998940
- Online miminization of transition systems
- Victoria, BC
- Lee, D., & Yannakakis, M. (1992). Online miminization of transition systems. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing (STOC-92), pp. 264-274 Victoria, BC.
- (1992) Proceedings of the 24th Annual ACM Symposium on the Theory of Computing (STOC-92) , pp. 264-274
- Lee, D.¹ Yannakakis, M.²

36
- 0013194218
- Mondrian: A teachable graphical editor
- Cypher, A. (Ed.). MIT Press, Cambridge, MA
- Lieberman, H. (1993). Mondrian: A teachable graphical editor. In Cypher, A. (Ed.), Watch What I Do: Programming by Demonstration, pp. 340-358. MIT Press, Cambridge, MA.
- (1993) Watch What I Do: Programming by Demonstration , pp. 340-358
- Lieberman, H.¹

37
- 85074045754
- Self-improvement based on reinforcement learning, planning and teaching
- Lin, L.-J. (1991). Self-improvement based on reinforcement learning, planning and teaching. Machine Learning: Proceedings of the Eighth International Workshop (ML91), 8, 323-27.
- (1991) Machine Learning: Proceedings of the Eighth International Workshop (ML91) , vol.8 , pp. 323-327
- Lin, L.-J.¹

38
- 0000123778
- Self-improving reactive agents based on reinforcement learning, planning and teaching
- Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293-321.
- (1992) Machine Learning , vol.8 , pp. 293-321
- Lin, L.-J.¹

39
- 85149834820
- Markov games as a framework for multi-agent reinforcement learning
- New Brunswick, NJ
- Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 157-163 New Brunswick, NJ.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 157-163
- Littman, M.L.¹

40
- 0002679852
- A survey of algorithmic methods for partially observed Markov decision processes
- Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28, 47-66.
- (1991) Annals of Operations Research , vol.28 , pp. 47-66
- Lovejoy, W.S.¹

41
- 0032335478
- Using communication to reduce locality in distributed multi-agent learning
- Mataric, M. J. (1998). Using communication to reduce locality in distributed multi-agent learning. Journal Experimental and Theoretical Artificial Intelligence, 10(3), 357-369.
- (1998) Journal Experimental and Theoretical Artificial Intelligence , vol.10 , Issue.3 , pp. 357-369
- Mataric, M.J.¹

42
- 0003148586
- Behaviour-based primitives for articulated control
- R. Pfiefer, B. Blumberg, J.-A. M. S. W. W. (Ed.), Zurich. MIT Press
- Matarić, M. J., Williamson, M., Demiris, J., & Mohan, A. (1998). Behaviour-based primitives for articulated control. In R. Pfiefer, B. Blumberg, J.-A. M. . S. W. W. (Ed.), Fifth International conference on simulation of adaptive behavior SAB'98, pp. 165-170 Zurich. MIT Press.
- (1998) Fifth International Conference on Simulation of Adaptive Behavior SAB'98 , pp. 165-170
- Matarić, M.J.¹ Williamson, M.² Demiris, J.³ Mohan, A.⁴

43
- 0032679082
- Exploration of multi-state environments: Local mesures and back-propagation of uncertainty
- Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local mesures and back-propagation of uncertainty. Machine Learning, 32(2), 117-154.
- (1999) Machine Learning , vol.32 , Issue.2 , pp. 117-154
- Meuleau, N.¹ Bourgine, P.²

44
- 38249003206
- A comparison of the Bonferroni and Scheffé bounds
- Mi, J., & Sampson, A. R. (1993). A comparison of the Bonferroni and Scheffé bounds. Journal of Statistical Planning and Inference, 36, 101-105.
- (1993) Journal of Statistical Planning and Inference , vol.36 , pp. 101-105
- Mi, J.¹ Sampson, A.R.²

45
- 0038144813
- Knowledge, learning and machine intelligence
- Sterling, L. (Ed.), Plenum Press, New York
- Michie, D. (1993). Knowledge, learning and machine intelligence. In Sterling, L. (Ed.), Intelligent Systems. Plenum Press, New York.
- (1993) Intelligent Systems
- Michie, D.¹

46
- 0012934374
- LEAP: A learning apprentice for VLSI design
- Los Altos, California. Morgan Kaufmann Publishers, Inc.
- Mitchell, T. M., Mahadevan, S., & Steinberg, L. (1985). LEAP: A learning apprentice for VLSI design. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pp. 573-580 Los Altos, California. Morgan Kaufmann Publishers, Inc.
- (1985) Proceedings of the Ninth International Joint Conference on Artificial Intelligence , pp. 573-580
- Mitchell, T.M.¹ Mahadevan, S.² Steinberg, L.³

47
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13(1), 103-30.
- (1993) Machine Learning , vol.13 , Issue.1 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

48
- 0003757637
- Harvard University Press, Cambridge
- Myerson, R. B. (1991). Game Theory: Analysis of Conflict. Harvard University Press, Cambridge.
- (1991) Game Theory: Analysis of Conflict
- Myerson, R.B.¹

49
- 0002803472
- Mapping between dissimilar bodies: Affordances and the algebraic foundations of imitation
- Edinburgh
- Nehaniv, C., & Dautenhahn, K. (1998). Mapping between dissimilar bodies: Affordances and the algebraic foundations of imitation. In Proceedings of the Seventh European Workshop on Learning Robots, pp. 64-72 Edinburgh.
- (1998) Proceedings of the Seventh European Workshop on Learning Robots , pp. 64-72
- Nehaniv, C.¹ Dautenhahn, K.²

50
- 0042547347
- Algorithms for inverse reinforcement learning
- Stanford
- Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663-670 Stanford.
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning , pp. 663-670
- Ng, A.Y.¹ Russell, S.²

51
- 27344455308
- Is it really imitation? A review of simple mechanisms in social information gathering
- Edinburgh
- Noble, J., & Todd, P. M. (1999). Is it really imitation? a review of simple mechanisms in social information gathering. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 65-73 Edinburgh.
- (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 65-73
- Noble, J.¹ Todd, P.M.²

52
- 27344456317
- Cultural transmission of communications systems: Comparing observational and reinforcement learning models
- Edinburgh
- Oliphant, M. (1999). Cultural transmission of communications systems: Comparing observational and reinforcement learning models. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 47-54 Edinburgh.
- (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 47-54
- Oliphant, M.¹

53
- 84880768440
- A Bayesian approach to imitation in reinforcement learning
- Acapulco. to appear
- Price, B., & Boutilier, C. (2003). A Bayesian approach to imitation in reinforcement learning. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence Acapulco. to appear.
- (2003) Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
- Price, B.¹ Boutilier, C.²

54
- 0003998452
- John Wiley and Sons, Inc., New York
- Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., New York.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

55
- 0027619739
- Imitation in free-ranging rehabilitant orangutans (pongo-pygmaeus)
- Russon, A., & Galdikas, B. (1993). Imitation in free-ranging rehabilitant orangutans (pongo-pygmaeus). Journal of Comparative Psychology, 107(2), 147-161.
- (1993) Journal of Comparative Psychology , vol.107 , Issue.2 , pp. 147-161
- Russon, A.¹ Galdikas, B.²

56
- 84886681621
- Learning to fly
- Aberdeen, UK
- Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. In Proceedings of the Ninth International Conference on Machine Learning, pp. 385-393 Aberdeen, UK.
- (1992) Proceedings of the Ninth International Conference on Machine Learning , pp. 385-393
- Sammut, C.¹ Hurst, S.² Kedzier, D.³ Michie, D.⁴

57
- 1442295367
- Knowing what to imitate and knowing when you succeed
- Edinburgh
- Scassellati, B. (1999). Knowing what to imitate and knowing when you succeed. In Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts, pp. 105-113 Edinburgh.
- (1999) Proceedings of the AISB'99 Symposium on Imitation in Animals and Artifacts , pp. 105-113
- Scassellati, B.¹

58
- 4444242180
- Why experimentation can be better than perfect guidance
- Nashville
- Scheffer, T., Greiner, R., & Darken, C. (1997). Why experimentation can be better than perfect guidance. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 331-339 Nashville.
- (1997) Proceedings of the Fourteenth International Conference on Machine Learning , pp. 331-339
- Scheffer, T.¹ Greiner, R.² Darken, C.³

59
- 0004255301
- Wiley, New York
- Seber, G. A. F. (1984). Multivariate Observations. Wiley, New York.
- (1984) Multivariate Observations
- Seber, G.A.F.¹

60
- 0000392613
- Stochastic games
- Shapley, L. S. (1953). Stochastic games. Proceedings of the National Academy of Sciences, 39, 327-332.
- (1953) Proceedings of the National Academy of Sciences , vol.39 , pp. 327-332
- Shapley, L.S.¹

61
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- Cambridge, MA. MIT Press
- Singh, S. P., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural information processing systems, pp. 974-980 Cambridge, MA. MIT Press.
- (1997) Advances in Neural Information Processing Systems , pp. 974-980
- Singh, S.P.¹ Bertsekas, D.²

62
- 0015658957
- The optimal control of partially observable Markov processes over a finite horizon
- Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071-1088.
- (1973) Operations Research , vol.21 , pp. 1071-1088
- Smallwood, R.D.¹ Sondik, E.J.²

63
- 33749975326
- Skill reconstruction as induction of LQ controllers with subgoals
- Nagoya
- Sue, D., & Bratko, I. (1997). Skill reconstruction as induction of LQ controllers with subgoals. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pp. 914-919 Nagoya.
- (1997) Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence , pp. 914-919
- Sue, D.¹ Bratko, I.²

64
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

65
- 0004102479
- MIT Press, Cambridge, MA
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

66
- 85152198941
- Multi-agent reinforcement learning: Independent vs. cooperative agents
- Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In ICML-93, pp. 330-37.
- (1993) ICML-93 , pp. 330-337
- Tan, M.¹

67
- 0347183675
- Reconstruction human skill with machine learning
- Amsterdam
- Urbancic, T., & Bratko, I. (1994). Reconstruction human skill with machine learning. In Eleventh European Conference on Artificial Intelligence, pp. 498-502 Amsterdam.
- (1994) Eleventh European Conference on Artificial Intelligence , pp. 498-502
- Urbancic, T.¹ Bratko, I.²

68
- 0008861422
- Two kinds of training information for evaluation function learning
- Anaheim, CA
- Utgoff, P. E., & Clouse, J. A. (1991). Two kinds of training information for evaluation function learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 596-600 Anaheim, CA.
- (1991) Proceedings of the Ninth National Conference on Artificial Intelligence , pp. 596-600
- Utgoff, P.E.¹ Clouse, J.A.²

69
- 0002817126
- Learning hierarchical performance knowledge by observation
- Bled, Slovenia
- van Lent, M., & Laird, J. (1999). Learning hierarchical performance knowledge by observation. In Proceedings of the Sixteenth International Conference on Machine Learning, pp. 229-238 Bled, Slovenia.
- (1999) Proceedings of the Sixteenth International Conference on Machine Learning , pp. 229-238
- Van Lent, M.¹ Laird, J.²

70
- 0002913504
- Do monkeys ape?
- Parker, S., & Gibson, K. (Eds.), Cambridge University Press, Cambridge
- Visalberghi, E., & Fragazy, D. (1990). Do monkeys ape?. In Parker, S., & Gibson, K. (Eds.), Language and Intelligence in Monkeys and Apes, pp. 247-273. Cambridge University Press, Cambridge.
- (1990) Language and Intelligence in Monkeys and Apes , pp. 247-273
- Visalberghi, E.¹ Fragazy, D.²

71
- 34249833101
- Q-learning
- Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

72
- 85158158334
- Complexity analysis of cooperative mechanisms in reinforcement learning
- Anaheim
- Whitehead, S. D. (1991a). Complexity analysis of cooperative mechanisms in reinforcement learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 607-613 Anaheim.
- (1991) Proceedings of the Ninth National Conference on Artificial Intelligence , pp. 607-613
- Whitehead, S.D.¹

73
- 85152652126
- Complexity and cooperation in q-learning
- Whitehead, S. D. (1991b). Complexity and cooperation in q-learning. In Machine Learning. Proceedings of the Eighth International Workshop (ML91), pp. 363-367.
- (1991) Machine Learning. Proceedings of the Eighth International Workshop (ML91) , pp. 363-367
- Whitehead, S.D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.