메뉴 건너뛰기




Volumn 15, Issue , 2001, Pages 351-381

Experiments with infinite-horizon, policy-gradient estimation

Author keywords

[No Author keywords available]

Indexed keywords

CONJUGATE-GRADIENTS; NATURAL INTERPRETATION; PARTIALLY OBSERVABLE DECISION PROCESS (POMDP);

EID: 0013495368     PISSN: 10769757     EISSN: None     Source Type: Journal    
DOI: 10.1613/jair.807     Document Type: Article
Times cited : (121)

References (35)
  • 1
    • 0009056093 scopus 로고    scopus 로고
    • Policy-gradient learning of controllers with internal state
    • Australian National University
    • Aberdeen, D., & Baxter, J. (2001). Policy-gradient learning of controllers with internal state. Tech. rep., Australian National University.
    • (2001) Tech. Rep.
    • Aberdeen, D.1    Baxter, J.2
  • 3
    • 2542506169 scopus 로고    scopus 로고
    • Hebbian synaptic modifications in spiking neurons that learn
    • Research School of Information Sciences and Engineering, Australian National University
    • Bartlett, P. L., & Baxter, J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Tech. rep., Research School of Information Sciences and Engineering, Australian National University. http://csl.anu.edu.au/∼bartlett/papers/BartlettBaxter-Nov99.ps.gz.
    • (1999) Tech. Rep.
    • Bartlett, P.L.1    Baxter, J.2
  • 9
    • 0034275416 scopus 로고    scopus 로고
    • Learning to play chess using temporal-differences
    • Baxter, J., Tridgell, A., & Weaver, L. (2000). Learning to play chess using temporal-differences. Machine Learning, 40(3), 243-263.
    • (2000) Machine Learning , vol.40 , Issue.3 , pp. 243-263
    • Baxter, J.1    Tridgell, A.2    Weaver, L.3
  • 11
    • 0031258478 scopus 로고    scopus 로고
    • Perturbation Realization, Potentials, and Sensitivity Analysis of Markov Processes
    • Cao, X.-R., & Chen, H.-F. (1997). Perturbation Realization, Potentials, and Sensitivity Analysis of Markov Processes. IEEE Transactions on Automatic Control, 42, 1382-1393.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 1382-1393
    • Cao, X.-R.1    Chen, H.-F.2
  • 12
    • 0032122986 scopus 로고    scopus 로고
    • Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization
    • Cao, X.-R., & Wan, Y.-W. (1998). Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization. IEEE Transactions on Control Systems Technology, 6, 482-492.
    • (1998) IEEE Transactions on Control Systems Technology , vol.6 , pp. 482-492
    • Cao, X.-R.1    Wan, Y.-W.2
  • 14
    • 0028444151 scopus 로고
    • Smooth Perturbation Derivative Estimation for Markov Chains
    • Fu, M. C., & Hu, J. (1994). Smooth Perturbation Derivative Estimation for Markov Chains. Operations Research Letters, 15, 241-251.
    • (1994) Operations Research Letters , vol.15 , pp. 241-251
    • Fu, M.C.1    Hu, J.2
  • 16
    • 0008336447 scopus 로고    scopus 로고
    • An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
    • Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. In Fifteenth International Conference on Machine Learning, pp. 278-286.
    • (1998) Fifteenth International Conference on Machine Learning , pp. 278-286
    • Kimura, H.1    Kobayashi, S.2
  • 21
    • 0009011171 scopus 로고    scopus 로고
    • Simulation-Based Optimization of Markov Reward Processes
    • MIT
    • Marbach, P., & Tsitsiklis, J. N. (1998). Simulation-Based Optimization of Markov Reward Processes. Tech. rep., MIT.
    • (1998) Tech. Rep.
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 23
    • 0001201756 scopus 로고
    • Some Studies in Machine Learning Using the Game of Checkers
    • Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3, 210-229.
    • (1959) IBM Journal of Research and Development , vol.3 , pp. 210-229
    • Samuel, A.L.1
  • 26
    • 0000224681 scopus 로고
    • Reinforcement learning with soft state aggregation
    • Tesauro, G., Touretzky, D., & Leen, T. (Eds.), MIT Press, Cambridge, MA
    • Singh, S., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 7. MIT Press, Cambridge, MA.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Singh, S.1    Jaakkola, T.2    Jordan, M.3
  • 27
    • 33847202724 scopus 로고
    • Learning to Predict by the Method of Temporal Differences
    • Sutton, R. (1988). Learning to Predict by the Method of Temporal Differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.1
  • 30
    • 13444294406 scopus 로고    scopus 로고
    • A multi-agent, policy-gradient approach to network routing
    • Australian National University
    • Tao, N., Baxter, J., & Weaver, L. (2001). A multi-agent, policy-gradient approach to network routing. Tech. rep., Australian National University.
    • (2001) Tech. Rep.
    • Tao, N.1    Baxter, J.2    Weaver, L.3
  • 31
    • 0001046225 scopus 로고
    • Practical Issues in Temporal Difference Learning
    • Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning, 8, 257-278.
    • (1992) Machine Learning , vol.8 , pp. 257-278
    • Tesauro, G.1
  • 32
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 33
    • 24044492128 scopus 로고    scopus 로고
    • Reinforcement learning from state and temporal differences
    • Australian National University
    • Weaver, L., & Baxter, J. (1999). Reinforcement learning from state and temporal differences. Tech. rep., Australian National University.
    • (1999) Tech. Rep.
    • Weaver, L.1    Baxter, J.2
  • 34
    • 0000337576 scopus 로고
    • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
    • Williams, R. J. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 229-256.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.