SCOPUS 정보 검색 플랫폼

IEEE SSCI 2011: Symposium Series on Computational Intelligence - ADPRL 2011: 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning

Volumn , Issue , 2011, Pages 143-150

Bayesian active learning with basis functions

(2) Ryzhov, Ilya O a Powell, Warren B a

a Princeton University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ACTIVE LEARNING; APPROXIMATE DYNAMIC PROGRAMMING; BASIS FUNCTIONS; CURSE OF DIMENSIONALITY; EXPLORATION/EXPLOITATION DILEMMAS; LINEAR COMBINATIONS; NUMERICAL EXPERIMENTS; VALUE FUNCTION APPROXIMATION;

ARTIFICIAL INTELLIGENCE; NUMERICAL METHODS; REINFORCEMENT LEARNING;

DYNAMIC PROGRAMMING;

EID: 80052219755 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ADPRL.2011.5967365 Document Type: Conference Paper

Times cited : (9)

References (35)

1
- 0003998452
- New York: John Wiley & Sons
- M. L. Puterman, Markov Decision Processes. New York: John Wiley & Sons, 1994.
- (1994) Markov Decision Processes
- Puterman, M.L.¹

2
- 32044460264
- Understanding the fine structure of electricity prices
- H. Geman and A. Roncoroni, "Understanding the Fine Structure of Electricity Prices," The Journal of Business, vol. 79, no. 3, 2006.
- (2006) The Journal of Business , vol.79 , Issue.3
- Geman, H.¹ Roncoroni, A.²

3
- 0004007508
- Cambridge, Massachusetts: The MIT Press
- R. Sutton and A. Barto, Reinforcement Learning. Cambridge, Massachusetts: The MIT Press, 1998.
- (1998) Reinforcement Learning
- Sutton, R.¹ Barto, A.²

4
- 0003487482
- Athena Scientific
- D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

5
- 84921399937
- New York: IEEE Press
- J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. New York: IEEE Press, 2004.
- (2004) Handbook of Learning and Approximate Dynamic Programming
- Si, J.¹ Barto, A.G.² Powell, W.B.³ Wunsch, D.⁴

6
- 47349092417
- New York: John Wiley and Sons
- W. B. Powell, Approximate Dynamic Programming: Solving the curses of dimensionality. New York: John Wiley and Sons, 2007.
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell, W.B.¹

7
- 0031388983
- A neuro-dynamic programming approach to retailer inventory management
- B. Van Roy, D. Bertsekas, Y. Lee, and J. Tsitsiklis, "A neuro-dynamic programming approach to retailer inventory management," in Proceedings of the 36th IEEE Conference on Decision and Control, vol. 4, 1997, pp. 4052-4057.
- (1997) Proceedings of the 36th IEEE Conference on Decision and Control , vol.4 , pp. 4052-4057
- Van Roy, B.¹ Bertsekas, D.² Lee, Y.³ Tsitsiklis, J.⁴

8
- 0001046225
- Practical issues in temporal difference learning
- G. Tesauro, "Practical Issues in Temporal Difference Learning," Machine Learning, vol. 8, no. 3-4, pp. 257-277, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
- Tesauro, G.¹

9
- 4544257178
- Approximating Q-values with basis function representations
- Hillsdale, NJ, M. Mozer, D. Touretzky, and P. Smolensky, Eds.
- P. Sabes, "Approximating Q-values with basis function representations," in Proceedings of the Fourth Connectionist Models Summer School, Hillsdale, NJ, M. Mozer, D. Touretzky, and P. Smolensky, Eds., 1993, pp. 264-271.
- (1993) Proceedings of the Fourth Connectionist Models Summer School , pp. 264-271
- Sabes, P.¹

10
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- DOI 10.1007/s10479-005-5732-z
- I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal-difference reinforcement learning," Annals of Operations Research, vol. 134, no. 1, pp. 215-238, 2005. (Pubitemid 40550047)
- (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

11
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- R. Sutton, H. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, "Fast gradient-descent methods for temporal-difference learning with linear function approximation," in Proceedings of the 26th International Conference on Machine Learning, 2009, pp. 993-1000.
- (2009) Proceedings of the 26th International Conference on Machine Learning , pp. 993-1000
- Sutton, R.¹ Maei, H.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

12
- 0004280606
- Cambridge, MA: MIT Press
- L. P. Kaelbling, Learning in Embedded Systems. Cambridge, MA: MIT Press, 1993.
- (1993) Learning in Embedded Systems
- Kaelbling, L.P.¹

13
- 3843131884
- A new criterion using information gain for action selection strategy in reinforcement learning
- K. Iwata, K. Ikeda, and H. Sakai, "A new criterion using information gain for action selection strategy in reinforcement learning," IEEE Transactions on Neural Networks, vol. 15, no. 4, pp. 792-799, 2004.
- (2004) IEEE Transactions on Neural Networks , vol.15 , Issue.4 , pp. 792-799
- Iwata, K.¹ Ikeda, K.² Sakai, H.³

14
- 0003513355
- New York: John Wiley and Sons
- R. Bechhofer, T. Santner, and D. Goldsman, Design and Analysis of Experiments for Statistical Selection, Screening and Multiple Comparisons. New York: John Wiley and Sons, 1995.
- (1995) Design and Analysis of Experiments for Statistical Selection, Screening and Multiple Comparisons
- Bechhofer, R.¹ Santner, T.² Goldsman, D.³

15
- 84891584370
- New York: John Wiley and Sons
- J. Gittins, Multi-Armed Bandit Allocation Indices. New York: John Wiley and Sons, 1989.
- (1989) Multi-Armed Bandit Allocation Indices
- Gittins, J.¹

16
- 16244388049
- Local bandit approximation for optimal learning problems
- M. Duff and A. Barto, "Local bandit approximation for optimal learning problems," Advances in Neural Information Processing Systems, vol. 9, pp. 1019-1025, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.9 , pp. 1019-1025
- Duff, M.¹ Barto, A.²

17
- 1142281527
- Model-based Bayesian exploration
- R. Dearden, N. Friedman, and D. Andre, "Model-based Bayesian Exploration," in Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 1999, pp. 150-159.
- (1999) Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence , pp. 150-159
- Dearden, R.¹ Friedman, N.² Andre, D.³

18
- 14344258433
- A Bayesian framework for reinforcement learning
- M. Strens, "A Bayesian framework for reinforcement learning," in Proceedings of the 17th International Conference on Machine Learning, 2000, pp. 943-950.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 943-950
- Strens, M.¹

19
- 1942421168
- Design for an optimal probe
- M. Duff, "Design for an optimal probe," in Proceedings of the 20th International Conference on Machine Learning, 2003, pp. 131-138.
- (2003) Proceedings of the 20th International Conference on Machine Learning , pp. 131-138
- Duff, M.¹

20
- 33749251297
- An analytic solution to discrete Bayesian reinforcement learning
- P. Poupart, N. Vlassis, J. Hoey, and K. Regan, "An analytic solution to discrete Bayesian reinforcement learning," in Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 697-704.
- (2006) Proceedings of the 23rd International Conference on Machine Learning , pp. 697-704
- Poupart, P.¹ Vlassis, N.² Hoey, J.³ Regan, K.⁴

21
- 0031619316
- Bayesian Q-learning
- R. Dearden, N. Friedman, and S. Russell, "Bayesian Q-learning," in Proceedings of the 15th National Conference on Artificial Intelligence, 1998, pp. 761-768.
- (1998) Proceedings of the 15th National Conference on Artificial Intelligence , pp. 761-768
- Dearden, R.¹ Friedman, N.² Russell, S.³

22
- 1942421151
- Bayes meets bellman: The Gaussian process approach to temporal difference learning
- Y. Engel, S. Mannor, and R. Meir, "Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning," in Proceedings of the 20th International Conference on Machine Learning, 2003, pp. 154-161.
- (2003) Proceedings of the 20th International Conference on Machine Learning , pp. 154-161
- Engel, Y.¹ Mannor, S.² Meir, R.³

23
- 31844451013
- Reinforcement learning with Gaussian processes
- -, "Reinforcement learning with Gaussian processes," in Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 208-215.
- (2005) Proceedings of the 22nd International Conference on Machine Learning , pp. 208-215
- Engel, Y.¹ Mannor, S.² Meir, R.³

24
- 0030590294
- Bayesian look ahead one-stage sampling allocations for selection of the best population
- DOI 10.1016/0378-3758(95)00169-7
- S. Gupta and K. Miescke, "Bayesian look ahead one-stage sampling allocations for selection of the best population," Journal of Statistical Planning and Inference, vol. 54, no. 2, pp. 229-244, 1996. (Pubitemid 126161097)
- (1996) Journal of Statistical Planning and Inference , vol.54 , Issue.2 , pp. 229-244
- Gupta, S.S.¹ Miescke, K.J.²

25
- 55549135706
- A knowledge gradient policy for sequential information collection
- P. I. Frazier, W. B. Powell, and S. Dayanik, "A knowledge gradient policy for sequential information collection," SIAM Journal on Control and Optimization, vol. 47, no. 5, pp. 2410-2439, 2008.
- (2008) SIAM Journal on Control and Optimization , vol.47 , Issue.5 , pp. 2410-2439
- Frazier, P.I.¹ Powell, W.B.² Dayanik, S.³

26
- 67650505320
- The knowledge gradient algorithm for online subset selection
- Nashville, TN
- I. O. Ryzhov and W. B. Powell, "The knowledge gradient algorithm for online subset selection," in Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nashville, TN, 2009, pp. 137-144.
- (2009) Proceedings of the 2009rfsti IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning , pp. 137-144
- Ryzhov, I.O.¹ Powell, W.B.²

27
- 67650386778
- Submitted for publication
- I. O. Ryzhov, W. B. Powell, and P. I. Frazier, "The knowledge gradient algorithm for a general class of online learning problems," Submitted for publication, 2009.
- (2009) The Knowledge Gradient Algorithm for A General Class of Online Learning Problems
- Ryzhov, I.O.¹ Powell, W.B.² Frazier, P.I.³

28
- 79951586758
- Optimal learning of transition probabilities in the two-agent newsvendor problem
- B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, Eds.
- I. O. Ryzhov, M. R. Valdez-Vivas, and W. B. Powell, "Optimal Learning of Transition Probabilities in the Two-Agent Newsvendor Problem," in Proceedings of the 2010 Winter Simulation Conference, B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, Eds., 2010.
- (2010) Proceedings of the 2010 Winter Simulation Conference
- Ryzhov, I.O.¹ Valdez-Vivas, M.R.² Powell, W.B.³

29
- 79952383607
- Approximate dynamic programming with correlated bayesian beliefs
- I. O. Ryzhov and W. B. Powell, "Approximate Dynamic Programming With Correlated Bayesian Beliefs," in Proceedings of the 48th Allerton Conference on Communication, Control and Computing, 2010.
- (2010) Proceedings of the 48th Allerton Conference on Communication, Control and Computing
- Ryzhov, I.O.¹ Powell, W.B.²

30
- 79961092747
- The knowledge-gradient algorithm for sequencing experiments in drug discovery
- to appear
- D. Negoescu, P. Frazier, and W. Powell, "The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery," INFORMS J. on Computing (to appear), 2010.
- (2010) INFORMS J. on Computing
- Negoescu, D.¹ Frazier, P.² Powell, W.³

31
- 33748998787
- Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
- DOI 10.1007/s10994-006-8365-9
- A. George andW. B. Powell, "Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming," Machine Learning, vol. 65, no. 1, pp. 167-198, 2006. (Pubitemid 44451197)
- (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
- George, A.P.¹ Powell, W.B.²

32
- 0003611103
- Addison-Wesley
- J. Scharf, Statistical signal processing. Addison-Wesley, 1991.
- (1991) Statistical Signal Processing
- Scharf, J.¹

33
- 70449498873
- The knowledge-gradient policy for correlated normal rewards
- P. I. Frazier, W. B. Powell, and S. Dayanik, "The knowledge-gradient policy for correlated normal rewards," INFORMS J. on Computing, vol. 21, no. 4, pp. 599-613, 2009.
- (2009) INFORMS J. on Computing , vol.21 , Issue.4 , pp. 599-613
- Frazier, P.I.¹ Powell, W.B.² Dayanik, S.³

34
- 0000792991
- The stochastic behavior of commodity prices: Implications for valuation and hedging
- E. Schwartz, "The stochastic behavior of commodity prices: Implications for valuation and hedging," Journal of Finance, vol. 52, no. 3, pp. 923- 973, 1997. (Pubitemid 127344954)
- (1997) Journal of Finance , vol.52 , Issue.3 , pp. 923-973
- Schwartz, E.S.¹

35
- 77956513316
- A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation
- R. Sutton, C. Szepesvári, and H. Maei, "A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation," Advances in Neural Information Processing Systems, vol. 21, pp. 1609-1616, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.21 , pp. 1609-1616
- Sutton, R.¹ Szepesvári, C.² Maei, H.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.