메뉴 건너뛰기




Volumn , Issue , 2009, Pages 297-327

Reinforcement learning of optimal controls

Author keywords

[No Author keywords available]

Indexed keywords


EID: 84860531726     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1007/978-1-4020-9119-3_15     Document Type: Chapter
Times cited : (4)

References (37)
  • 1
    • 0005069684 scopus 로고
    • Adaptively pointing spacebome radar for precipitation measurements
    • Atlas, D. (1982). Adaptively pointing spacebome radar for precipitation measurements. Journal of Applied Meteorology, 21, 429-443.
    • (1982) Journal of Applied Meteorology , vol.21 , pp. 429-443
    • Atlas, D.1
  • 2
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • A. Prieditis & S. J. Russell (Eds.), 9-12 July 1995. Tahoe City, CA/San Francisco: Morgan Kaufmann
    • Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis & S. J. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 30-37). 9-12 July 1995. Tahoe City, CA/San Francisco: Morgan Kaufmann.
    • (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
    • Baird, L.C.1
  • 3
    • 0003272616 scopus 로고    scopus 로고
    • Reinforcement learning in POMDP via direct gradient ascent
    • 29 June-2 July 2000. Stanford, CA/San Francisco: Morgan Kaufmann
    • Baxter, J., & Bartlett, P. L. (2000). Reinforcement learning in POMDP via direct gradient ascent. Proceedings of the 17th International Conference on Machine Learning (pp. 41-48). 29 June-2 July 2000. Stanford, CA/San Francisco: Morgan Kaufmann.
    • (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 41-48
    • Baxter, J.1    Bartlett, P.L.2
  • 4
    • 85012688561 scopus 로고
    • Princeton, NJ: Princeton University Press
    • Bellman, R. E. (1957). Dynamic programming (342 pp.). Princeton, NJ: Princeton University Press.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 7
    • 0001188860 scopus 로고    scopus 로고
    • The air traffic flow management problem with enroute capacities
    • Bertsimas, D., & Patterson, S. S. (1998). The air traffic flow management problem with enroute capacities. Operations Research, 46, 406-422. (Pubitemid 128655441)
    • (1998) Operations Research , vol.46 , Issue.3 , pp. 406-422
    • Bertsimas, D.1    Patterson, S.S.2
  • 8
    • 0026998041 scopus 로고
    • Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
    • Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 183-188). 12-16 July 1992. San Jose/Menlo Park, CA: AAAI Press. (Pubitemid 23633590)
    • (1992) Proceedings Tenth National Conference on Artificial Intelligence , pp. 183-188
    • Chrisman Lonnie1
  • 9
    • 0028388685 scopus 로고
    • TD(0) converges with probability 1
    • Dayan, P., & Sejnowski, T. (1994). TD(0) converges with probability 1. Machine Learning, 14, 295-301.
    • (1994) Machine Learning , vol.14 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.2
  • 10
    • 77958166664 scopus 로고    scopus 로고
    • Integrating advanced weather forecast technologies into air traffic management decision support
    • Evans, J. E., Weber, M. E., & Moser, W. R. (2006). Integrating advanced weather forecast technologies into air traffic management decision support. Lincoln Laboratory Journal, 16, 81-96.
    • (2006) Lincoln Laboratory Journal , vol.16 , pp. 81-96
    • Evans, J.E.1    Weber, M.E.2    Moser, W.R.3
  • 12
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.2    Singh, S.3
  • 13
    • 85153938292 scopus 로고
    • Reinforcement learning algorithm for partially observable Markov decision problems
    • G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Cambridge, MA: MIT Press
    • Jaakkola, T., Singh, S., & Jordan, M. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems: Proceedings of the 1994 Conference (pp. 345-352). Cambridge, MA: MIT Press.
    • (1995) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference , pp. 345-352
    • Jaakkola, T.1    Singh, S.2    Jordan, M.3
  • 15
    • 77957786462 scopus 로고    scopus 로고
    • Future air traffic management requirements for dynamic weather avoidance routing
    • October 2006. Portland, OR: IEEE/AIAA
    • Krozel, J., Andre, A. D. & Smith, P. (2006). Future air traffic management requirements for dynamic weather avoidance routing. Preprints, 25th Digital Avionics Systems Conference (pp. 1-9). October 2006. Portland, OR: IEEE/AIAA.
    • (2006) Preprints, 25th Digital Avionics Systems Conference , pp. 1-9
    • Krozel, J.1    Andre, A.D.2    Smith, P.3
  • 17
    • 0002679852 scopus 로고
    • A survey of algorithmic methods for partially observable Markov decision processes
    • Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47-66.
    • (1991) Annals of Operations Research , vol.28 , pp. 47-66
    • Lovejoy, W.S.1
  • 20
    • 0000955979 scopus 로고    scopus 로고
    • Incremental multi-step Qlearning
    • Peng, J., & Williams, R. J. (1996). Incremental multi-step Qlearning. Machine Learning, 22, 283-290.
    • (1996) Machine Learning , vol.22 , pp. 283-290
    • Peng, J.1    Williams, R.J.2
  • 21
    • 4644328593 scopus 로고    scopus 로고
    • Off-policy temporal-difference learning with function approximation
    • C. E. Brodley and A. P. Danylok (Eds.), 28 June-1 July 2001.Williamstown, MA/San Francisco, CA: Morgan Kaufmann
    • Precup, D., Sutton, R. S., & Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In C. E. Brodley and A. P. Danylok (Eds.), Proceedings of the 18th International Conference on Machine Learning (pp. 417-424). 28 June-1 July 2001.Williamstown, MA/San Francisco, CA: Morgan Kaufmann.
    • (2001) Proceedings of the 18th International Conference on Machine Learning , pp. 417-424
    • Precup, D.1    Sutton, R.S.2    Dasgupta, S.3
  • 24
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3, 211-229.
    • (1959) IBM Journal on Research and Development , vol.3 , pp. 211-229
    • Samuel, A.L.1
  • 26
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22, 123-158. (Pubitemid 126724365)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 27
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • DOI 10.1023/A:1007678930559
    • Singh, S. P., Jaakkola, T., Littman, M. L., & Szepasvari, C. (2000). Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38, 287-308. (Pubitemid 30572449)
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szepesvari, C.4
  • 29
    • 0035283402 scopus 로고    scopus 로고
    • On the convergence of temporal-difference learning with linear function approximation
    • DOI 10.1023/A:1007609817671
    • Tadic, V. (2001). On the convergence of temporal-difference learning with linear function approximation. Machine Learning, 42, 241-267. (Pubitemid 32188797)
    • (2001) Machine Learning , vol.42 , Issue.3 , pp. 241-267
    • Tadic, V.1
  • 30
    • 0042466434 scopus 로고    scopus 로고
    • On the convergence of optimistic policy iteration
    • Tsitsiklis, J. N. (2002). On the convergence of optimistic policy iteration. Journal of Machine Learning Research, 3, 59-72.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 59-72
    • Tsitsiklis, J.N.1
  • 31
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • PII S0018928697034375
    • Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690. (Pubitemid 127760263)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 32
    • 0003190274 scopus 로고
    • Intelligent machinery, National Physical Laboratory report
    • D. C. Ince (Ed.). 1992, New York: Elsevier Science
    • Turing, A. M. (1948). Intelligent machinery, National Physical Laboratory report. In D. C. Ince (Ed.). 1992, Collected works of A. M. Turing: Mechanical intelligence (227 pp.). New York: Elsevier Science.
    • (1948) Collected Works of A. M. Turing: Mechanical Intelligence
    • Turing, A.M.1
  • 33
    • 0002988210 scopus 로고
    • Computing machinery and intelligence
    • Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433-460.
    • (1950) Mind , vol.59 , pp. 433-460
    • Turing, A.M.1
  • 34
    • 0004049893 scopus 로고
    • Ph.D. thesis, King's College, Cambridge University, Cambridge
    • Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. thesis, King's College, Cambridge University, Cambridge, 234 pp.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.C.H.1
  • 37
    • 66149110157 scopus 로고    scopus 로고
    • Experimental results on learning stochastic memoryless policies for partially observable Markov decision processes
    • M. S. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Cambridge, MA: MIT Press
    • Williams, J. K., & Singh, S. (1999). Experimental results on learning stochastic memoryless policies for partially observable Markov decision processes. In M. S. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Advances in neural information processing systems 11. Proceedings of the 1998 Conference (pp. 1073-1079). Cambridge, MA: MIT Press.
    • (1999) Advances in Neural Information Processing Systems 11. Proceedings of the 1998 Conference , pp. 1073-1079
    • Williams, J.K.1    Singh, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.