메뉴 건너뛰기




Volumn 12, Issue 4-5, 1999, Pages 727-753

Multi-agent reinforcement learning: Weighting and partitioning

Author keywords

Averaging; Gating; Neural networks; Partitioning; Reinforcement learning; Weighting

Indexed keywords

APPROXIMATION THEORY; ARTIFICIAL INTELLIGENCE; COMPUTATIONAL COMPLEXITY; FUNCTIONS; HEURISTIC METHODS; MATHEMATICAL MODELS; NEURAL NETWORKS; STATE SPACE METHODS;

EID: 0032772352     PISSN: 08936080     EISSN: None     Source Type: Journal    
DOI: 10.1016/S0893-6080(99)00024-6     Document Type: Article
Times cited : (53)

References (70)
  • 1
    • 0016556021 scopus 로고
    • A new approach to manipulator control: The cerebellar model articulation control
    • Albus J. A new approach to manipulator control: the cerebellar model articulation control. Journal of Dynamic Systems Measure and Control. 97:1975;270-277.
    • (1975) Journal of Dynamic Systems Measure and Control , vol.97 , pp. 270-277
    • Albus, J.1
  • 3
    • 0003787146 scopus 로고
    • Princeton, NJ: Princeton University Press
    • Bellman R. Dynamic programming. 1957;Princeton University Press, Princeton, NJ.
    • (1957) Dynamic Programming
    • Bellman, R.1
  • 6
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • J. Tesauro, D. Touretzky, Leen T. Cambridge, MA: MIT Press
    • Boyan J., Moore A. Generalization in reinforcement learning: safely approximating the value function. Tesauro J., Touretzky D., Leen T. Neural Information Processing Systems. 7:1995;369 MIT Press, Cambridge, MA.
    • (1995) Neural Information Processing Systems , vol.7 , pp. 369
    • Boyan, J.1    Moore, A.2
  • 7
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • Breiman L. Bagging predictors. Machine Learning. 24:1996;123-140.
    • (1996) Machine Learning , vol.24 , pp. 123-140
    • Breiman, L.1
  • 8
    • 0030196364 scopus 로고    scopus 로고
    • Stacked regressions
    • Breiman L. Stacked regressions. Machine Learning. 24:1996;49-64.
    • (1996) Machine Learning , vol.24 , pp. 49-64
    • Breiman, L.1
  • 9
    • 0003619255 scopus 로고    scopus 로고
    • Bias, variance and arcing classifiers
    • Berkeley: University of California
    • Breiman, L. (1996c). Bias, variance and arcing classifiers. Technical Report 460. Berkeley: University of California.
    • (1996) Technical Report , vol.460
    • Breiman, L.1
  • 11
    • 0026998041 scopus 로고
    • Reinforcement learning with perceptual aliasing: The perceptual distinction approach
    • San Francisco, CA: Morgan Kaufmann. pp. 183-188
    • Chrisman L. Reinforcement learning with perceptual aliasing: the perceptual distinction approach. Proceedings of AAAI. 1993;Morgan Kaufmann, San Francisco, CA. pp. 183-188.
    • (1993) Proceedings of AAAI
    • Chrisman, L.1
  • 14
    • 0345064332 scopus 로고
    • Ant-Q: A reinforcement learning approach to combinatorial optimization
    • Belgium: Universite Libre de Bruxelles
    • Dorigo, M., and Gambardella, L. (1995). Ant-Q: a reinforcement learning approach to combinatorial optimization. Technical Report 95-01. Belgium: Universite Libre de Bruxelles.
    • (1995) Technical Report 95-01
    • Dorigo, M.1    Gambardella, L.2
  • 15
    • 0000201141 scopus 로고    scopus 로고
    • Improving regressors using boosting techniques
    • San Francicso, CA: Morgan Kaufmann. pp. 107-115
    • Drucker H. Improving regressors using boosting techniques. Proceedings of ICML'97. 1997;Morgan Kaufmann, San Francicso, CA. pp. 107-115.
    • (1997) Proceedings of ICML'97
    • Drucker, H.1
  • 17
    • 0002978642 scopus 로고    scopus 로고
    • Experiments with a new boosting algorithm
    • San Francisco, CA: Morgan Kaufmann. pp. 148-156
    • Freund Y., Schapire R. Experiments with a new boosting algorithm. Proceedings of ICML'97. 1996;Morgan Kaufmann, San Francisco, CA. pp. 148-156.
    • (1996) Proceedings of ICML'97
    • Freund, Y.1    Schapire, R.2
  • 20
    • 0007214322 scopus 로고    scopus 로고
    • W-learning: A simple RL-based society of mind
    • Cambridge, UK: University of Cambridge, Computer Laboratory
    • Humphrys, M. (1996). W-learning: a simple RL-based society of mind. Technical report 362, Cambridge, UK: University of Cambridge, Computer Laboratory.
    • (1996) Technical Report , vol.362
    • Humphrys, M.1
  • 21
    • 0031568357 scopus 로고    scopus 로고
    • Bias/variance analysis of mixtures-of-experts architectures
    • Jacobs R. Bias/variance analysis of mixtures-of-experts architectures. Neural Computation. 9:1997;369-383.
    • (1997) Neural Computation , vol.9 , pp. 369-383
    • Jacobs, R.1
  • 23
    • 0000262562 scopus 로고
    • Hierarchical mixtures of experts and the EM algorithm
    • Jordan M., Jacobs R. Hierarchical mixtures of experts and the EM algorithm. Neural Computation. 6:1994;181-214.
    • (1994) Neural Computation , vol.6 , pp. 181-214
    • Jordan, M.1    Jacobs, R.2
  • 25
    • 85054435084 scopus 로고
    • Neural network ensembles, cross validation, and active learning
    • Cambridge, MA: MIT Press. pp. 231-238
    • Krogh A., Vedelsby J. Neural network ensembles, cross validation, and active learning. Neural Information Processing Systems. 1995;MIT Press, Cambridge, MA. pp. 231-238.
    • (1995) Neural Information Processing Systems
    • Krogh, A.1    Vedelsby, J.2
  • 27
    • 0026852133 scopus 로고
    • Theory and development of higher-order CMAC neural networks
    • Lane, S., Handelman, D., & Gelfand, J. (1992). Theory and development of higher-order CMAC neural networks. IEEE Control Systems, pp. 23-31.
    • (1992) IEEE Control Systems , pp. 23-31
    • Lane, S.1    Handelman, D.2    Gelfand, J.3
  • 28
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning, and teaching
    • Lin L. Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning. 8:1992;293-321.
    • (1992) Machine Learning , vol.8 , pp. 293-321
    • Lin, L.1
  • 29
    • 0002289220 scopus 로고    scopus 로고
    • Pruning adaptive boosting
    • San Francisco, CA: Morgan Kaufmann. pp. 211-218
    • Margineantu D., Dietterich T. Pruning adaptive boosting. Proceedings of ICML. 1997;Morgan Kaufmann, San Francisco, CA. pp. 211-218.
    • (1997) Proceedings of ICML
    • Margineantu, D.1    Dietterich, T.2
  • 31
    • 0002242826 scopus 로고    scopus 로고
    • Learning to use selective attention and short-term memory in sequential tasks
    • Cambridge, MA: MIT Press. pp. 315-324
    • McCallum A. Learning to use selective attention and short-term memory in sequential tasks. Proceedings of the conference on Simulation of Adaptive Behavior. 1996;MIT Press, Cambridge, MA. pp. 315-324.
    • (1996) Proceedings of the Conference on Simulation of Adaptive Behavior
    • McCallum, A.1
  • 32
    • 85153941282 scopus 로고
    • Bias, variance and the combination of least squares estimators
    • Cambridge, MA: MIT Press. pp. 295-302
    • Meir R. Bias, variance and the combination of least squares estimators. Neural Information Processing Systems. 1995;MIT Press, Cambridge, MA. pp. 295-302.
    • (1995) Neural Information Processing Systems
    • Meir, R.1
  • 33
    • 0030352275 scopus 로고    scopus 로고
    • Reducing variance of committee prediction with resampling techniques
    • Parmanto B., Munro P., Doyle H. Reducing variance of committee prediction with resampling techniques. Connection Science. 8:(3/4):1996;405-426.
    • (1996) Connection Science , vol.8 , Issue.3-4 , pp. 405-426
    • Parmanto, B.1    Munro, P.2    Doyle, H.3
  • 36
    • 0025490985 scopus 로고
    • Networks for approximation and learning
    • Poggio T., Girosi F. Networks for approximation and learning. Proceedings of IEEE. 78:(9):1990;1481-1497.
    • (1990) Proceedings of IEEE , vol.78 , Issue.9 , pp. 1481-1497
    • Poggio, T.1    Girosi, F.2
  • 37
    • 33744584654 scopus 로고
    • Inductive learning of decision trees
    • Quinlan R. Inductive learning of decision trees. Machine Learning. 1:1986;81-106.
    • (1986) Machine Learning , vol.1 , pp. 81-106
    • Quinlan, R.1
  • 38
    • 0030370417 scopus 로고    scopus 로고
    • Bagging, Boosting and C4.5
    • San Francisco, CA: Morgan Kaufmann. pp. 725-730
    • Quinlan R. Bagging, Boosting and C4.5. Proceedings of AAAI'96. 1996;Morgan Kaufmann, San Francisco, CA. pp. 725-730.
    • (1996) Proceedings of AAAI'96
    • Quinlan, R.1
  • 39
    • 0030374103 scopus 로고    scopus 로고
    • Bootstrapping with noise: An effective regularization technique
    • Raviv Y., Intrator N. Bootstrapping with noise: an effective regularization technique. Connection Science. 8:(3/4):1996;355-372.
    • (1996) Connection Science , vol.8 , Issue.3-4 , pp. 355-372
    • Raviv, Y.1    Intrator, N.2
  • 40
    • 13444280906 scopus 로고    scopus 로고
    • Learning goal-decomposition rules using exercises
    • San Francisco, CA: Morgan Kaufmann. pp. 278-286
    • Reddy C., Tadepalli P. Learning goal-decomposition rules using exercises. Proceedings of ICML'97. 1997;Morgan Kaufmann, San Francisco, CA. pp. 278-286.
    • (1997) Proceedings of ICML'97
    • Reddy, C.1    Tadepalli, P.2
  • 42
    • 0030367578 scopus 로고    scopus 로고
    • Ensemble learning using decorrelated neural networks
    • Rosen B. Ensemble learning using decorrelated neural networks. Connection Science. 8:(3/4):1996;373-384.
    • (1996) Connection Science , vol.8 , Issue.3-4 , pp. 373-384
    • Rosen, B.1
  • 43
    • 0026118624 scopus 로고
    • Tree-structured adaptive networks for function approximation in high-dimensional spaces
    • Sanger T. Tree-structured adaptive networks for function approximation in high-dimensional spaces. IEEE Transaction on Neural Networks. 2:(2):1991;285-293.
    • (1991) IEEE Transaction on Neural Networks , vol.2 , Issue.2 , pp. 285-293
    • Sanger, T.1
  • 44
    • 84964009081 scopus 로고    scopus 로고
    • From isolation to cooperation: An alternative view of a system of experts
    • Cambridge, MA: MIT Press. pp. 605-611
    • Schaal S., Atkeson C. From isolation to cooperation: an alternative view of a system of experts. Advances in Neural Information Processing Systems. 1996;MIT Press, Cambridge, MA. pp. 605-611.
    • (1996) Advances in Neural Information Processing Systems
    • Schaal, S.1    Atkeson, C.2
  • 47
    • 85153965130 scopus 로고
    • Reinforcement learning with soft state aggregation
    • S.L. Hanson, J.C. Cowan, & L. Giles. San Mateo, CA: Morgan Kaufmann
    • Singh S., Jaakkola T., Jordan M. Reinforcement learning with soft state aggregation. Hanson S.L., Cowan J.C., Giles L. Advances in Neural Information Processing Systems. 1994;Morgan Kaufmann, San Mateo, CA.
    • (1994) Advances in Neural Information Processing Systems
    • Singh, S.1    Jaakkola, T.2    Jordan, M.3
  • 48
    • 0345064317 scopus 로고    scopus 로고
    • Planning from reinforcement learning
    • Tuscaloosa, AL: University of Alabama
    • Sun, R. (1997). Planning from reinforcement learning. Technical report TR-CS-97-0027, Tuscaloosa, AL: University of Alabama.
    • (1997) Technical Report TR-CS-97-0027
    • Sun, R.1
  • 50
    • 0003773492 scopus 로고    scopus 로고
    • A hybrid agent architecture for reactive sequential decision making
    • R. Sun, & F. Alexandre. Hillsdale, NJ: Lawrence Erlbaum Associates
    • Sun R., Peterson T. A hybrid agent architecture for reactive sequential decision making. Sun R., Alexandre F. Connectionist-symbolic integration. 1997;Lawrence Erlbaum Associates, Hillsdale, NJ.
    • (1997) Connectionist-symbolic Integration
    • Sun, R.1    Peterson, T.2
  • 52
    • 0032203235 scopus 로고    scopus 로고
    • Some experiments with a hybrid model for learning sequential decision making
    • Sun R., Peterson T. Some experiments with a hybrid model for learning sequential decision making. Information Sciences. 111:1998;83-107.
    • (1998) Information Sciences , vol.111 , pp. 83-107
    • Sun, R.1    Peterson, T.2
  • 53
    • 0001842850 scopus 로고    scopus 로고
    • Bottom-up skill learning in reactive sequential decision tasks
    • Hillsdale, NJ: Lawrence Erlbaum Associates. pp. 684-690
    • Sun R., Peterson T., Merrill E. Bottom-up skill learning in reactive sequential decision tasks. Proceedings of 18th Cognitive Science Society Conference. 1996;Lawrence Erlbaum Associates, Hillsdale, NJ. pp. 684-690.
    • (1996) Proceedings of 18th Cognitive Science Society Conference
    • Sun, R.1    Peterson, T.2    Merrill, E.3
  • 55
    • 0000723997 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • Cambridge, MA: MIT Press
    • Sutton R. Generalization in reinforcement learning: successful examples using sparse coarse coding. Neural Information Processing Systems. 8:1996;MIT Press, Cambridge, MA.
    • (1996) Neural Information Processing Systems , vol.8
    • Sutton, R.1
  • 57
    • 0021892282 scopus 로고
    • Fuzzy identification of systems and its applications to modeling and control
    • Takagi T., Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems Man and Cybernetics. 15:(1):1985;116-132.
    • (1985) IEEE Transactions on Systems Man and Cybernetics , vol.15 , Issue.1 , pp. 116-132
    • Takagi, T.1    Sugeno, M.2
  • 58
    • 0000078841 scopus 로고    scopus 로고
    • Averaging regularized estimators
    • Taniguchi M., Tresp V. Averaging regularized estimators. Neural Computation. 9:1997;1163-1178.
    • (1997) Neural Computation , vol.9 , pp. 1163-1178
    • Taniguchi, M.1    Tresp, V.2
  • 59
    • 0029390263 scopus 로고
    • Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
    • Tham C. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robotics and Autonomous Systems. 15:1995;247-274.
    • (1995) Robotics and Autonomous Systems , vol.15 , pp. 247-274
    • Tham, C.1
  • 60
  • 61
    • 0040639069 scopus 로고    scopus 로고
    • Stacking bagged and dagged models
    • San Francisco, CA: Morgan Kaufmann. pp. 367-375
    • Ting W.K., Witten I. Stacking bagged and dagged models. Proceedings of ICML'97. 1997;Morgan Kaufmann, San Francisco, CA. pp. 367-375.
    • (1997) Proceedings of ICML'97
    • Ting, W.K.1    Witten, I.2
  • 62
    • 85153970023 scopus 로고
    • Combining estimators using non-constant weighting functions
    • Cambridge, MA: MIT Press. pp. 419-426
    • Tresp V., Taniguchi M. Combining estimators using non-constant weighting functions. Neural Information Processing Systems. 7:1995;MIT Press, Cambridge, MA. pp. 419-426.
    • (1995) Neural Information Processing Systems , vol.7
    • Tresp, V.1    Taniguchi, M.2
  • 63
    • 0030365938 scopus 로고    scopus 로고
    • Error correlation and error reduction in ensemble classifiers
    • Tumer K., Ghosh J. Error correlation and error reduction in ensemble classifiers. Connection Science. 8:(3/4):1996;385-404.
    • (1996) Connection Science , vol.8 , Issue.3-4 , pp. 385-404
    • Tumer, K.1    Ghosh, J.2
  • 66
    • 0004049895 scopus 로고
    • PhD Thesis, Cambridge, UK: Cambridge University
    • Watkins, C. (1989). Learning with delayed rewards. PhD Thesis, Cambridge, UK: Cambridge University.
    • (1989) Learning with Delayed Rewards
    • Watkins, C.1
  • 68
    • 85158158334 scopus 로고
    • A complexity analysis of cooperative mechanisms in reinforcement learning
    • San Francisco, CA: Morgan Kaufmann. pp. 607-613
    • Whitehead A. A complexity analysis of cooperative mechanisms in reinforcement learning. Proceedings of the AAAI'93. 1993;Morgan Kaufmann, San Francisco, CA. pp. 607-613.
    • (1993) Proceedings of the AAAI'93
    • Whitehead, A.1
  • 69
    • 0026692226 scopus 로고
    • Stacked generalization
    • Wolpert D. Stacked generalization. Neural Networks. 5:1992;241-259.
    • (1992) Neural Networks , vol.5 , pp. 241-259
    • Wolpert, D.1
  • 70
    • 85140116568 scopus 로고
    • An alternative model for mixtures of experts
    • Cambridge, MA: MIT Press. pp. 633-640
    • Xu L., Jordan M., Hinton G. An alternative model for mixtures of experts. Neural Information Processing Systems. 7:1995;MIT Press, Cambridge, MA. pp. 633-640.
    • (1995) Neural Information Processing Systems , vol.7
    • Xu, L.1    Jordan, M.2    Hinton, G.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.