SCOPUS 정보 검색 플랫폼

Artificial Intelligence Methods in the Environmental Sciences

Volumn , Issue , 2009, Pages 297-327

Reinforcement learning of optimal controls

(1) Williams, John K a

a NATIONAL CENTER FOR ATMOSPHERIC RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 84860531726 PISSN: None EISSN: None Source Type: Book
DOI: 10.1007/978-1-4020-9119-3_15 Document Type: Chapter

Times cited : (4)

References (37)

1
- 0005069684
- Adaptively pointing spacebome radar for precipitation measurements
- Atlas, D. (1982). Adaptively pointing spacebome radar for precipitation measurements. Journal of Applied Meteorology, 21, 429-443.
- (1982) Journal of Applied Meteorology , vol.21 , pp. 429-443
- Atlas, D.¹

2
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- A. Prieditis & S. J. Russell (Eds.), 9-12 July 1995. Tahoe City, CA/San Francisco: Morgan Kaufmann
- Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis & S. J. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 30-37). 9-12 July 1995. Tahoe City, CA/San Francisco: Morgan Kaufmann.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
- Baird, L.C.¹

3
- 0003272616
- Reinforcement learning in POMDP via direct gradient ascent
- 29 June-2 July 2000. Stanford, CA/San Francisco: Morgan Kaufmann
- Baxter, J., & Bartlett, P. L. (2000). Reinforcement learning in POMDP via direct gradient ascent. Proceedings of the 17th International Conference on Machine Learning (pp. 41-48). 29 June-2 July 2000. Stanford, CA/San Francisco: Morgan Kaufmann.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 41-48
- Baxter, J.¹ Bartlett, P.L.²

4
- 85012688561
- Princeton, NJ: Princeton University Press
- Bellman, R. E. (1957). Dynamic programming (342 pp.). Princeton, NJ: Princeton University Press.
- (1957) Dynamic Programming
- Bellman, R.E.¹

5
- 0003565783
- (292 pp.). Belmont, MA: Athena Scientific
- Bertsekas, D. P. (1995). Dynamic programming and optimal control (Vol. 1, Vol. 2, 387 pp., 292 pp.). Belmont, MA: Athena Scientific.
- (1995) Dynamic Programming and Optimal Control , vol.1-2
- Bertsekas, D.P.¹

6
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming (491 pp.). Belmont, MA: Athena Scientific.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

7
- 0001188860
- The air traffic flow management problem with enroute capacities
- Bertsimas, D., & Patterson, S. S. (1998). The air traffic flow management problem with enroute capacities. Operations Research, 46, 406-422. (Pubitemid 128655441)
- (1998) Operations Research , vol.46 , Issue.3 , pp. 406-422
- Bertsimas, D.¹ Patterson, S.S.²

8
- 0026998041
- Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
- Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 183-188). 12-16 July 1992. San Jose/Menlo Park, CA: AAAI Press. (Pubitemid 23633590)
- (1992) Proceedings Tenth National Conference on Artificial Intelligence , pp. 183-188
- Chrisman Lonnie¹

9
- 0028388685
- TD(0) converges with probability 1
- Dayan, P., & Sejnowski, T. (1994). TD(0) converges with probability 1. Machine Learning, 14, 295-301.
- (1994) Machine Learning , vol.14 , pp. 295-301
- Dayan, P.¹ Sejnowski, T.²

10
- 77958166664
- Integrating advanced weather forecast technologies into air traffic management decision support
- Evans, J. E., Weber, M. E., & Moser, W. R. (2006). Integrating advanced weather forecast technologies into air traffic management decision support. Lincoln Laboratory Journal, 16, 81-96.
- (2006) Lincoln Laboratory Journal , vol.16 , pp. 81-96
- Evans, J.E.¹ Weber, M.E.² Moser, W.R.³

11
- 0002480225
- Second essay on a general method in dynamics
- Part I for 1835
- Hamilton, W. R. (1835). Second essay on a general method in dynamics. Philosophical Transactions of the Royal Society, Part I for 1835, 95-144.
- (1835) Philosophical Transactions of the Royal Society , pp. 95-144
- Hamilton, W.R.¹

12
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.² Singh, S.³

13
- 85153938292
- Reinforcement learning algorithm for partially observable Markov decision problems
- G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Cambridge, MA: MIT Press
- Jaakkola, T., Singh, S., & Jordan, M. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems: Proceedings of the 1994 Conference (pp. 345-352). Cambridge, MA: MIT Press.
- (1995) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference , pp. 345-352
- Jaakkola, T.¹ Singh, S.² Jordan, M.³

14
- 35348883115
- Joint Planning and Development Office (JPDO). Washington, DC: Weather Integration Product Team
- Joint Planning and Development Office (JPDO). (2006). Next generation air transportation system (NGATS)-weather concept of operations (30 pp.). Washington, DC: Weather Integration Product Team.
- (2006) Next Generation Air Transportation System (NGATS)-weather Concept of Operations

15
- 77957786462
- Future air traffic management requirements for dynamic weather avoidance routing
- October 2006. Portland, OR: IEEE/AIAA
- Krozel, J., Andre, A. D. & Smith, P. (2006). Future air traffic management requirements for dynamic weather avoidance routing. Preprints, 25th Digital Avionics Systems Conference (pp. 1-9). October 2006. Portland, OR: IEEE/AIAA.
- (2006) Preprints, 25th Digital Avionics Systems Conference , pp. 1-9
- Krozel, J.¹ Andre, A.D.² Smith, P.³

16
- 0004066022
- New York: Springer
- Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications (417 pp.). New York: Springer.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.G.²

17
- 0002679852
- A survey of algorithmic methods for partially observable Markov decision processes
- Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28, 47-66.
- (1991) Annals of Operations Research , vol.28 , pp. 47-66
- Lovejoy, W.S.¹

18
- 30044439279
- Distributed Collaborative Adaptive Sensing (DCAS) for improved detection, understanding, and prediction of atmospheric hazards
- 10-13 January 2005. Paper 11.3. San Diego, CA
- McLaughlin, D. J., Chandrasekar, V., Droegemeier, K., Frasier, S., Kurose, J., Junyent, F., et al. (2005). Distributed Collaborative Adaptive Sensing (DCAS) for improved detection, understanding, and prediction of atmospheric hazards. Preprints-CD, AMS Ninth Symposium on Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface. 10-13 January 2005. Paper 11.3. San Diego, CA.
- (2005) Preprints-CD, AMS Ninth Symposium on Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface
- McLaughlin, D.J.¹ Chandrasekar, V.² Droegemeier, K.³ Frasier, S.⁴ Kurose, J.⁵ Junyent, F.⁶

19
- 84885825685
- Ph.D. thesis, Department of Computer Science, University of Colorado
- Myers, W. L. (2000). Effects of visual representations of dynamic hazard worlds on human navigational performance. Ph.D. thesis, Department of Computer Science, University of Colorado, 64 pp.
- (2000) Effects of Visual Representations of Dynamic Hazard Worlds on Human Navigational Performance
- Myers, W.L.¹

20
- 0000955979
- Incremental multi-step Qlearning
- Peng, J., & Williams, R. J. (1996). Incremental multi-step Qlearning. Machine Learning, 22, 283-290.
- (1996) Machine Learning , vol.22 , pp. 283-290
- Peng, J.¹ Williams, R.J.²

21
- 4644328593
- Off-policy temporal-difference learning with function approximation
- C. E. Brodley and A. P. Danylok (Eds.), 28 June-1 July 2001.Williamstown, MA/San Francisco, CA: Morgan Kaufmann
- Precup, D., Sutton, R. S., & Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In C. E. Brodley and A. P. Danylok (Eds.), Proceedings of the 18th International Conference on Machine Learning (pp. 417-424). 28 June-1 July 2001.Williamstown, MA/San Francisco, CA: Morgan Kaufmann.
- (2001) Proceedings of the 18th International Conference on Machine Learning , pp. 417-424
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

22
- 85102627959
- Hoboken, NJ: Wiley Interscience
- Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming (649 pp.). Hoboken, NJ: Wiley Interscience.
- (2005) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

23
- 0000016172
- A stochastic approximation method
- Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400-407.
- (1951) Annals of Mathematical Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

24
- 0001201756
- Some studies in machine learning using the game of checkers
- Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3, 211-229.
- (1959) IBM Journal on Research and Development , vol.3 , pp. 211-229
- Samuel, A.L.¹

25
- 84921399937
- Piscataway, NJ: Wiley-Interscience
- Si, J., Barto, A. G., Powell,W. B., &Wunsch, D. (Eds.). (2004). Handbook of learning and approximate dynamic programming (644 pp.). Piscataway, NJ: Wiley-Interscience.
- (2004) Handbook of Learning and Approximate Dynamic Programming
- Si, J.¹ Barto, A.G.² Powell, W.B.³ Wunsch, D.⁴

26
- 0029753630
- Reinforcement learning with replacing eligibility traces
- Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22, 123-158. (Pubitemid 126724365)
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

27
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- DOI 10.1023/A:1007678930559
- Singh, S. P., Jaakkola, T., Littman, M. L., & Szepasvari, C. (2000). Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38, 287-308. (Pubitemid 30572449)
- (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
- Singh, S.¹ Jaakkola, T.² Littman, M.L.³ Szepesvari, C.⁴

28
- 0004102479
- Cambridge, MA: MIT Press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (322 pp.). Cambridge, MA: MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

29
- 0035283402
- On the convergence of temporal-difference learning with linear function approximation
- DOI 10.1023/A:1007609817671
- Tadic, V. (2001). On the convergence of temporal-difference learning with linear function approximation. Machine Learning, 42, 241-267. (Pubitemid 32188797)
- (2001) Machine Learning , vol.42 , Issue.3 , pp. 241-267
- Tadic, V.¹

30
- 0042466434
- On the convergence of optimistic policy iteration
- Tsitsiklis, J. N. (2002). On the convergence of optimistic policy iteration. Journal of Machine Learning Research, 3, 59-72.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 59-72
- Tsitsiklis, J.N.¹

31
- 0031143730
- An analysis of temporal-difference learning with function approximation
- PII S0018928697034375
- Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690. (Pubitemid 127760263)
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

32
- 0003190274
- Intelligent machinery, National Physical Laboratory report
- D. C. Ince (Ed.). 1992, New York: Elsevier Science
- Turing, A. M. (1948). Intelligent machinery, National Physical Laboratory report. In D. C. Ince (Ed.). 1992, Collected works of A. M. Turing: Mechanical intelligence (227 pp.). New York: Elsevier Science.
- (1948) Collected Works of A. M. Turing: Mechanical Intelligence
- Turing, A.M.¹

33
- 0002988210
- Computing machinery and intelligence
- Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433-460.
- (1950) Mind , vol.59 , pp. 433-460
- Turing, A.M.¹

34
- 0004049893
- Ph.D. thesis, King's College, Cambridge University, Cambridge
- Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. thesis, King's College, Cambridge University, Cambridge, 234 pp.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

35
- 34249833101
- Q-learning
- Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

36
- 84885796461
- Ph.D. thesis, Department of Mathematics, University of Colorado, Colorado
- Williams, J. K. (2000). On the convergence of model-free policy iteration algorithms for reinforcement learning: Stochastic approximation under discontinuous mean dynamics. Ph.D. thesis, Department of Mathematics, University of Colorado, Colorado, 173 pp.
- (2000) On the Convergence of Model-free Policy Iteration Algorithms for Reinforcement Learning: Stochastic Approximation under Discontinuous Mean Dynamics
- Williams, J.K.¹

37
- 66149110157
- Experimental results on learning stochastic memoryless policies for partially observable Markov decision processes
- M. S. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Cambridge, MA: MIT Press
- Williams, J. K., & Singh, S. (1999). Experimental results on learning stochastic memoryless policies for partially observable Markov decision processes. In M. S. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Advances in neural information processing systems 11. Proceedings of the 1998 Conference (pp. 1073-1079). Cambridge, MA: MIT Press.
- (1999) Advances in Neural Information Processing Systems 11. Proceedings of the 1998 Conference , pp. 1073-1079
- Williams, J.K.¹ Singh, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.