SCOPUS 정보 검색 플랫폼

Journal of Machine Learning Research

Volumn 9, Issue , 2008, Pages 815-857

Finite-time bounds for fitted value iteration

(2) Munos, Rémi a Szepesvári, Csaba b

a INRIA (France)

b UNIVERSITY OF ALBERTA (Canada)

Author keywords

Discounted markovian decision processes; Fitted value iteration; Generative model; Optimal control; Pollard's inequality; Regression; Reinforcement learning; Statistical learning theory; Supervised learning

Indexed keywords

CONVERGENCE OF NUMERICAL METHODS; ELECTRIC NETWORK ANALYSIS; MODAL ANALYSIS; RISK ASSESSMENT; ROBOT LEARNING; STATE SPACE METHODS;

CONVERGENCE RATES; FITTED VALUE ITERATION; GENERATIVE MODELING; HIGH PROBABILITY; INFINITE STATES; MARKOVIAN DECISION PROCESSES;

MULTITASKING;

EID: 44649189852 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Article

Times cited : (615)

References (66)

1
- 0003924391
- Cambridge University Press, Cambridge, UK
- M. Anthony and P.L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, UK, 1999.
- (1999) Neural Network Learning: Theoretical Foundations
- Anthony, M.¹ Bartlett, P.L.²

2
- 33746032553
- A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In G. Lugosi and H.U. Simon, editors, The Nineteenth Annual Conference on Learning Theory, COLT 2006, Proceedings, 4005 of LNCS/LNAI, pages 574-588, Berlin, Heidelberg, June 2006. Springer-Verlag. (Pittsburgh, PA, USA, June 22-25, 2006.).
- A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. In G. Lugosi and H.U. Simon, editors, The Nineteenth Annual Conference on Learning Theory, COLT 2006, Proceedings, volume 4005 of LNCS/LNAI, pages 574-588, Berlin, Heidelberg, June 2006. Springer-Verlag. (Pittsburgh, PA, USA, June 22-25, 2006.).

3
- 34548752490
- Value-iteration based fitted policy iteration: Learning with a single trajectory
- IEEE, April, Honolulu, Hawaii, Apr 1-5
- A. Antos, Cs. Szepesvári, and R. Munos. Value-iteration based fitted policy iteration: learning with a single trajectory. In 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), pages 330-337. IEEE, April 2007. (Honolulu, Hawaii, Apr 1-5, 2007.).
- (2007) 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007) , pp. 330-337
- Antos, A.¹ Szepesvári, C.² Munos, R.³

4
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71:89-129, 2008.
- (2008) Machine Learning , vol.71 , pp. 89-129
- Antos, A.¹ Szepesvári, C.² Munos, R.³

5
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Armand Prieditis and Stuart Russell, editors, San Francisco, CA, Morgan Kaufmann
- Leemon C. Baird. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 30-37, San Francisco, CA, 1995. Morgan Kaufmann.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
- Baird, L.C.¹

6
- 84968519017
- Functional approximation and dynamic programming
- R.E. Bellman and S.E. Dreyfus. Functional approximation and dynamic programming. Math. Tables and other Aids Comp., 13:247-251, 1959.
- (1959) Math. Tables and other Aids Comp , vol.13 , pp. 247-251
- Bellman, R.E.¹ Dreyfus, S.E.²

7
- 0003923091
- Academic Press, New York
- D. P. Bertsekas and S.E. Shreve. Stochastic Optimal Control (The Discrete Time Case). Academic Press, New York, 1978.
- (1978) Stochastic Optimal Control (The Discrete Time Case)
- Bertsekas, D.P.¹ Shreve, S.E.²

8
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

9
- 0001523794
- Strict stationarity of generalized autoregressive processes
- P. Bougerol and N. Picard. Strict stationarity of generalized autoregressive processes. Annals of Probability, 20:1714-1730, 1992.
- (1992) Annals of Probability , vol.20 , pp. 1714-1730
- Bougerol, P.¹ Picard, N.²

10
- 0004234484
- McGraw-Hill, London, New York
- E.W. Cheney. Introduction to Approximation Theory. McGraw-Hill, London, New York, 1966.
- (1966) Introduction to Approximation Theory
- Cheney, E.W.¹

11
- 38249024662
- The complexity of dynamic programming
- C.S. Chow and J.N. Tsitsiklis. The complexity of dynamic programming. Journal of Complexity, 5:466-488, 1989.
- (1989) Journal of Complexity , vol.5 , pp. 466-488
- Chow, C.S.¹ Tsitsiklis, J.N.²

12
- 0026206780
- An optimal multigrid algorithm for continuous state discrete time stochastic control
- C.S. Chow and J.N. Tsitsiklis. An optimal multigrid algorithm for continuous state discrete time stochastic control. IEEE Transactions on Automatic Control, 36(8):898-914, 1991.
- (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
- Chow, C.S.¹ Tsitsiklis, J.N.²

13
- 0003798635
- Cambridge University Press
- N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines (and other kernel-based learning methods). Cambridge University Press, 2000.
- (2000) An introduction to support vector machines (and other kernel-based learning methods)
- Cristianini, N.¹ Shawe-Taylor, J.²

14
- 0003259931
- Improving elevator performance using reinforcement learning
- R.H. Crites and A.G. Barto. Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems 9, 1997.
- (1997) Advances in Neural Information Processing Systems , vol.9
- Crites, R.H.¹ Barto, A.G.²

15
- 0002319896
- Nonlinear Approximation
- R. DeVore. Nonlinear Approximation. Acta Numerica, 1997.
- (1997) Acta Numerica
- DeVore, R.¹

16
- 84899029004
- Batch value function approximation via support vectors
- T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Cambridge, MA, MIT Press
- T. G. Dietterich and X. Wang. Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
- (2002) Advances in Neural Information Processing Systems 14
- Dietterich, T.G.¹ Wang, X.²

17
- 21844465127
- Tree-based batch mode reinforcement learning
- D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, 2005.
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

18
- 84937398609
- PAC bounds for multi-armed bandit and Markov decision processes
- E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and Markov decision processes. In Fifteenth Annual Conference on Computational Learning Theory (COLT), pages 255-270, 2002.
- (2002) Fifteenth Annual Conference on Computational Learning Theory (COLT) , pp. 255-270
- Even-Dar, E.¹ Mannor, S.² Mansour, Y.³

19
- 84880694195
- Stable function approximation in dynamic programming
- Armand Prieditis and Stuart Russell, editors, San Francisco, CA, Morgan Kaufmann
- G.J. Gordon. Stable function approximation in dynamic programming. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 261-268, San Francisco, CA, 1995. Morgan Kaufmann.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 261-268
- Gordon, G.J.¹

20
- 2342446663
- A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis
- A. Gosavi. A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis. Machine Learning, 55:5-29, 2004.
- (2004) Machine Learning , vol.55 , pp. 5-29
- Gosavi, A.¹

21
- 0004058370
- Wiley, New York
- U. Grendander. Abstract Inference. Wiley, New York, 1981.
- (1981) Abstract Inference
- Grendander, U.¹

22
- 0003624357
- Springer-Verlag, New York
- L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. A distribution-free theory of nonparametric regression. Springer-Verlag, New York, 2002.
- (2002) A distribution-free theory of nonparametric regression
- Györfi, L.¹ Kohler, M.² Krzyżak, A.³ Walk, H.⁴

23
- 1642437938
- Duality theory and simulation in financial engineering
- M. Haugh. Duality theory and simulation in financial engineering. In Proceedings of the Winter Simulation Conference, pages 327-334, 2003.
- (2003) Proceedings of the Winter Simulation Conference , pp. 327-334
- Haugh, M.¹

24
- 0000996139
- Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension
- D. Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217-232, 1995.
- (1995) Journal of Combinatorial Theory, Series A , vol.69 , Issue.2 , pp. 217-232
- Haussler, D.¹

25
- 84947403595
- Probability inequalities for sums of bounded random variables
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
- (1963) Journal of the American Statistical Association , vol.58 , pp. 13-30
- Hoeffding, W.¹

26
- 22944487667
- Experiments in value function approximation with sparse support vector regression
- T. Jung and T. Uthmann. Experiments in value function approximation with sparse support vector regression. In ECML, pages 180-191, 2004.
- (2004) ECML , pp. 180-191
- Jung, T.¹ Uthmann, T.²

27
- 1942514728
- Approximately optimal approximate reinforcement learning
- San Francisco, CA, USA, Morgan Kaufmann Publishers Inc
- S. Kakade and J. Langford. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 267-274, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
- (2002) Proceedings of the Nineteenth International Conference on Machine Learning , pp. 267-274
- Kakade, S.¹ Langford, J.²

28
- 23244466805
- PhD thesis, Gatsby Computational Neuroscience Unit, University College London
- S.M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.
- (2003) On the sample complexity of reinforcement learning
- Kakade, S.M.¹

29
- 84880649215
- A sparse sampling algorithm for near-optimal planning in large Markovian decision processes
- M. Kearns, Y. Mansour, and A.Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markovian decision processes. In Proceedings of IJCAI'99, pages 1324-1331, 1999.
- (1999) Proceedings of IJCAI'99 , pp. 1324-1331
- Kearns, M.¹ Mansour, Y.² Ng, A.Y.³

30
- 0015000439
- Some results on Tchebycheffian spline functions
- G. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82-95, 1971.
- (1971) J. Math. Anal. Applic , vol.33 , pp. 82-95
- Kimeldorf, G.¹ Wahba, G.²

31
- 4644323293
- Least-squares policy iteration
- M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.¹ Parr, R.²

32
- 0001556720
- Efficient agnostic learning of neural networks with bounded fan-in
- W.S. Lee, P.L. Bartlett, and R.C. Williamson. Efficient agnostic learning of neural networks with bounded fan-in. IEEE Transactions on Information Theory, 42(6):2118-2132, 1996.
- (1996) IEEE Transactions on Information Theory , vol.42 , Issue.6 , pp. 2118-2132
- Lee, W.S.¹ Bartlett, P.L.² Williamson, R.C.³

33
- 0035578679
- Valuing american options by simulation: A simple least-squares approach
- F. A. Longstaff and E. S. Shwartz. Valuing american options by simulation: A simple least-squares approach. Rev. Financial Studies, 14(1): 113-147, 2001.
- (2001) Rev. Financial Studies , vol.14 , Issue.1 , pp. 113-147
- Longstaff, F.A.¹ Shwartz, E.S.²

34
- 0001963197
- Self-improving factory simulation using continuous-time average-reward reinforcement learning
- S. Mahadevan, N. Marchalleck, T. Das, and A. Gosavi. Self-improving factory simulation using continuous-time average-reward reinforcement learning. In Proceedings of the 14th International Conference on Machine Learning (IMLC '97), 1997.
- (1997) Proceedings of the 14th International Conference on Machine Learning (IMLC '97)
- Mahadevan, S.¹ Marchalleck, N.² Das, T.³ Gosavi, A.⁴

35
- 0345184460
- Computational advances in dynamic programming
- Academic Press
- T.L. Morin. Computational advances in dynamic programming. In Dynamic Programming and its Applications, pages 53-90. Academic Press, 1978.
- (1978) Dynamic Programming and its Applications , pp. 53-90
- Morin, T.L.¹

36
- 1942516880
- Error bounds for approximate policy iteration
- R. Munos. Error bounds for approximate policy iteration. In 19th International Conference on Machine Learning, pages 560-567, 2003.
- (2003) 19th International Conference on Machine Learning , pp. 560-567
- Munos, R.¹

37
- 29344453913
- Error bounds for approximate value iteration
- R. Munos. Error bounds for approximate value iteration. American Conference on Artificial Intelligence, 2005.
- (2005) American Conference on Artificial Intelligence
- Munos, R.¹

38
- 23244437791
- A generalization error for Q-learning
- S.A. Murphy. A generalization error for Q-learning. Journal of Machine Learning Research, 6: 1073-1097, 2005.
- (2005) Journal of Machine Learning Research , vol.6 , pp. 1073-1097
- Murphy, S.A.¹

39
- 0141819580
- PEGASUS: A policy search method for large MDPs and POMDPs
- A.Y. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, pages 406-415, 2000.
- (2000) Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence , pp. 406-415
- Ng, A.Y.¹ Jordan, M.²

40
- 0033480745
- Generalization bounds for function approximation from scattered noisy data
- P. Niyogi and F. Girosi. Generalization bounds for function approximation from scattered noisy data. Advances in Computational Mathematics, 10:51-80, 1999.
- (1999) Advances in Computational Mathematics , vol.10 , pp. 51-80
- Niyogi, P.¹ Girosi, F.²

41
- 0036832956
- Kernel-based reinforcement learning
- D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49:161-178, 2002.
- (2002) Machine Learning , vol.49 , pp. 161-178
- Ormoneit, D.¹ Sen, S.²

42
- 0003998452
- John Wiley & Sons, Inc, New York, NY
- M.L. Puterman. Markov Decision Processes - Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, 1994.
- (1994) Markov Decision Processes - Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

43
- 27144457662
- Approximate solutions of a discounted Markovian decision problem
- 98: Dynamische Optimierungen:77-92
- D. Reetz. Approximate solutions of a discounted Markovian decision problem. Bonner Mathematischer Schriften, 98: Dynamische Optimierungen:77-92, 1977.
- (1977) Bonner Mathematischer Schriften
- Reetz, D.¹

44
- 33646398129
- Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method
- M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In 16th European Conference on Machine Learning, pages 317-328, 2005.
- (2005) 16th European Conference on Machine Learning , pp. 317-328
- Riedmiller, M.¹

45
- 0002317013
- Numerical dyanmic programming in economics
- H. Amman, D. Kendrick, and J. Rust, editors, Elsevier, North Holland
- J. Rust. Numerical dyanmic programming in economics. In H. Amman, D. Kendrick, and J. Rust, editors, Handbook of Computational Economics. Elsevier, North Holland, 1996a.
- (1996) Handbook of Computational Economics
- Rust, J.¹

46
- 0001509947
- Using randomization to break the curse of dimensionality
- J. Rust. Using randomization to break the curse of dimensionality. Econometrica, 65:487-516, 1996b.
- (1996) Econometrica , vol.65 , pp. 487-516
- Rust, J.¹

47
- 0001201756
- Some studies in machine learning using the game of checkers
- A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210-229, 1959.
- (1959) IBM Journal on Research and Development , pp. 210-229
- Samuel, A.L.¹

48
- 0004242550
- E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York
- Reprinted in Computers and Thought, E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York, 1963.
- (1963) Computers and Thought

49
- 0001201757
- Some studies in machine learning using the game of checkers, II - recent progress
- A.L. Samuel. Some studies in machine learning using the game of checkers, II - recent progress. IBM Journal on Research and Development, pages 601-617, 1967.
- (1967) IBM Journal on Research and Development , pp. 601-617
- Samuel, A.L.¹

50
- 0001703864
- On the density of families of sets
- N. Sauer. On the density of families of sets. Journal of Combinatorial Theory Series A, 13:145-147, 1972.
- (1972) Journal of Combinatorial Theory Series A , vol.13 , pp. 145-147
- Sauer, N.¹

51
- 0004094721
- MIT Press, Cambridge, MA
- B. Schölkopf and A.J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
- (2002) Learning with Kernels
- Schölkopf, B.¹ Smola, A.J.²

52
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- S.P. Singh and D.P. Bertsekas. Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 9, 1997.
- (1997) Advances in Neural Information Processing Systems , vol.9
- Singh, S.P.¹ Bertsekas, D.P.²

53
- 85153965130
- Reinforcement learning with soft state aggregation
- MIT Press
- S.P. Singh, T. Jaakkola, and M.I. Jordan. Reinforcement learning with soft state aggregation. In Proceedings of Neural Information Processing Systems 7, pages 361-368. MIT Press, 1995.
- (1995) Proceedings of Neural Information Processing Systems , vol.7 , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

54
- 0000439527
- Optimal rates of convergence for nonparametric estimators
- C.J. Stone. Optimal rates of convergence for nonparametric estimators. Annals of Statistics, 8: 1348-1360, 1980.
- (1980) Annals of Statistics , vol.8 , pp. 1348-1360
- Stone, C.J.¹

55
- 0000439527
- Optimal global rates of convergence for nonparametric regression
- C.J. Stone. Optimal global rates of convergence for nonparametric regression. Annals of Statistics, 10:1040-1053, 1982.
- (1982) Annals of Statistics , vol.10 , pp. 1040-1053
- Stone, C.J.¹

56
- 0034759906
- Efficient approximate planning in continuous space Markovian decision problems
- Cs. Szepesvári. Efficient approximate planning in continuous space Markovian decision problems. AI Communications, 13:163-176, 2001.
- (2001) AI Communications , vol.13 , pp. 163-176
- Szepesvári, C.¹

57
- 44649150245
- Efficient approximate planning in continuous space Markovian decision problems
- accepted
- Cs. Szepesvári. Efficient approximate planning in continuous space Markovian decision problems. Journal of European Artificial Intelligence Research, 2000. accepted.
- (2000) Journal of European Artificial Intelligence Research
- Szepesvári, C.¹

58
- 31844456754
- Finite time bounds for sampling based fitted value iteration
- Cs. Szepesvári and R. Munos. Finite time bounds for sampling based fitted value iteration. In ICML'2005, pages 881-886, 2005.
- (2005) ICML'2005 , pp. 881-886
- Szepesvári, C.¹ Munos, R.²

59
- 14344263882
- Interpolation-based Q-learning
- D. Schuurmans R. Greiner, editor
- Cs. Szepesvári and W.D. Smart. Interpolation-based Q-learning. In D. Schuurmans R. Greiner, editor, Proceedings of the International Conference on Machine Learning, pages 791-798, 2004.
- (2004) Proceedings of the International Conference on Machine Learning , pp. 791-798
- Szepesvári, C.¹ Smart, W.D.²

60
- 0029276036
- Temporal difference learning and TD-Gammon
- March
- G.J. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38: 58-67, March 1995.
- (1995) Communications of the ACM , vol.38 , pp. 58-67
- Tesauro, G.J.¹

61
- 0035391083
- Regression methods for pricing complex American-style options
- J. N. Tsitsiklis and Van B. Roy. Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks, 12:694-703, 2001.
- (2001) IEEE Transactions on Neural Networks , vol.12 , pp. 694-703
- Tsitsiklis, J.N.¹ Roy, V.B.²

62
- 0029752470
- Feature-based methods for large scale dynamic programming
- J. N. Tsitsiklis and B. Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94, 1996.
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

63
- 0001024505
- On the uniform convergence of relative frequencies of events to their probabilities
- V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264-280, 1971.
- (1971) Theory of Probability and its Applications , vol.16 , pp. 264-280
- Vapnik, V.N.¹ Chervonenkis, A.Y.²

64
- 0006292876
- Efficient value function approximation using regression trees
- Stockholm, Sweden
- X. Wang and T.G. Dietterich. Efficient value function approximation using regression trees. In Proceedings of the IJCAI Workshop on Statistical Machine Learning for Large-Scale Optimization, Stockholm, Sweden, 1999.
- (1999) Proceedings of the IJCAI Workshop on Statistical Machine Learning for Large-Scale Optimization
- Wang, X.¹ Dietterich, T.G.²

65
- 0347067948
- Covering number bounds of certain regularized linear function classes
- T. Zhang. Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research, 2:527-550, 2002.
- (2002) Journal of Machine Learning Research , vol.2 , pp. 527-550
- Zhang, T.¹

66
- 0002769452
- A reinforcement learning approach to job-shop scheduling
- W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of the International Joint Conference on Artificial Intellience, 1995.
- (1995) Proceedings of the International Joint Conference on Artificial Intellience
- Zhang, W.¹ Dietterich, T.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.