SCOPUS 정보 검색 플랫폼

Journal of Artificial Intelligence Research

Volumn 27, Issue , 2006, Pages 153-201

Solving factored MDPs with hybrid state and action variables

(3) Kveton, Branislav a Hauskrecht, Milos b Guestrin, Carlos c

a UNIVERSITY OF PITTSBURGH (United States)

b Learning Research and Development Ctr (United States)

c NONE (United States)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; DECISION SUPPORT SYSTEMS; FUNCTIONS; LINEAR PROGRAMMING; OPTIMIZATION; PROBLEM SOLVING;

DISCRETE VARIABLES; HYBRID APPROXIMATE LINEAR PROGRAMMING (HALF); MARKOV DECISION PROCESS (MDP) MODELS; OPTIMIZATION PROBLEMS;

MARKOV PROCESSES;

EID: 33750586671 PISSN: 10769757 EISSN: 10769757 Source Type: Journal
DOI: 10.1613/jair.2085 Document Type: Article

Times cited : (46)

References (68)

1
- 0037262814
- An introduction to MCMC for machine learning
- Andrieu, C., de Freitas, N., Doucet, A., &; Jordan, M. (2003). An introduction to MCMC for machine learning. Machine Learning, 50, 5-43.
- (2003) Machine Learning , vol.50 , pp. 5-43
- Andrieu, C.¹ De Freitas, N.² Doucet, A.³ Jordan, M.⁴

2
- 50549213583
- Optimal control of Markov processes with incomplete state information
- Astrom, K. (1965). Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications, 10(1), 174-205.
- (1965) Journal of Mathematical Analysis and Applications , vol.10 , Issue.1 , pp. 174-205
- Astrom, K.¹

3
- 85012688561
- Princeton University Press, Princeton, NJ
- Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.
- (1957) Dynamic Programming
- Bellman, R.¹

4
- 84968468700
- Polynomial approximation - A new computational technique in dynamic programming: Allocation processes
- Bellman, R., Kalaba, R., & Kotkin, B. (1963). Polynomial approximation - a new computational technique in dynamic programming: Allocation processes. Mathematics of Computation, 17(82), 155-161.
- (1963) Mathematics of Computation , vol.17 , Issue.82 , pp. 155-161
- Bellman, R.¹ Kalaba, R.² Kotkin, B.³

5
- 0000268954
- A counterexample for temporal differences learning
- Bertsekas, D. (1995). A counterexample for temporal differences learning. Neural Computation, 7(2), 270-279.
- (1995) Neural Computation , vol.7 , Issue.2 , pp. 270-279
- Bertsekas, D.¹

6
- 0003487482
- Athena Scientific, Belmont, MA
- Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

7
- 0003850196
- Athena Scientific, Belmont, MA
- Bertsimas, D., &: Tsitsiklis, J. (1997). Introduction to Linear Optimization. Athena Scientific, Belmont, MA.
- (1997) Introduction to Linear Optimization
- Bertsimas, D.¹ Tsitsiklis, J.²

8
- 85166207010
- Exploiting structure in policy construction
- Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1104-1111.
- (1995) Proceedings of the 14th International Joint Conference on Artificial Intelligence , pp. 1104-1111
- Boutilier, C.¹ Dearden, R.² Goldszmidt, M.³

9
- 3042524845
- Planning under continuous time and resource uncertainty: A challenge for AI
- Bresina, J., Dearden, R., Meuleau, N., Ramakrishnan, S., Smith, D., &: Washington, R. (2002). Planning under continuous time and resource uncertainty: A challenge for AI. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pp. 77-84.
- (2002) Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence , pp. 77-84
- Bresina, J.¹ Dearden, R.² Meuleau, N.³ Ramakrishnan, S.⁴ Smith, D.⁵ Washington, R.⁶

10
- 0002205556
- Rao-Blackwellisation of sampling schemes
- Casella, G., & Robert, C. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83(1), 81-94.
- (1996) Biometrika , vol.83 , Issue.1 , pp. 81-94
- Casella, G.¹ Robert, C.²

11
- 0026206780
- An optimal one-way multigrid algorithm for discrete-time stochastic control
- Chow, C.-S., &; Tsitsiklis, J. (1991). An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control, 36(8), 898-914.
- (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
- Chow, C.-S.¹ Tsitsiklis, J.²

12
- 0008586604
- A method for using belief networks as influence diagrams
- Cooper, G. (1988). A method for using belief networks as influence diagrams. In Proceedings of the Workshop on Uncertainty in Artificial Intelligence, pp. 55-63.
- (1988) Proceedings of the Workshop on Uncertainty in Artificial Intelligence , pp. 55-63
- Cooper, G.¹

13
- 85156187730
- Improving elevator performance using reinforcement learning
- Crites, R., & Barto, A. (1996). Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems 8, pp. 1017-1023.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
- Crites, R.¹ Barto, A.²

14
- 0348090400
- The linear programming approach to approximate dynamic programming
- de Farias, D. P., & Van Roy, B. (2003). The linear programming approach to approximate dynamic programming. Operations Research, 51(6), 850-856.
- (2003) Operations Research , vol.51 , Issue.6 , pp. 850-856
- De Farias, D.P.¹ Van Roy, B.²

15
- 5544258192
- On constraint sampling for the linear programming approach to approximate dynamic programming
- de Farias, D. P., & Van Roy, B. (2004). On constraint sampling for the linear programming approach to approximate dynamic programming. Mathematics of Operations Research, 29(3), 462-478.
- (2004) Mathematics of Operations Research , vol.29 , Issue.3 , pp. 462-478
- De Farias, D.P.¹ Van Roy, B.²

16
- 84990553353
- A model for reasoning about persistence and causation
- Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
- (1989) Computational Intelligence , vol.5 , pp. 142-150
- Dean, T.¹ Kanazawa, K.²

17
- 0002251094
- Bucket elimination: A unifying framework for probabilistic inference
- Dechter, R. (1996). Bucket elimination: A unifying framework for probabilistic inference. In Proceedings of the 12th Conference, on Uncertainty in Artificial Intelligence, pp. 211-219.
- (1996) Proceedings of the 12th Conference, on Uncertainty in Artificial Intelligence , pp. 211-219
- Dechter, R.¹

18
- 4243137056
- Hybrid Monte Carlo
- Duane, S., Kennedy, A. D., Pendleton, B., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216-222.
- (1987) Physics Letters B , vol.195 , Issue.2 , pp. 216-222
- Duane, S.¹ Kennedy, A.D.² Pendleton, B.³ Roweth, D.⁴

19
- 29344460055
- Dynamic programming for structured continuous Markov decision problems
- Feng, Z., Dearden, R., Meuleau, N., & Washington, R. (2004). Dynamic programming for structured continuous Markov decision problems. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 154-161.
- (2004) Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence , pp. 154-161
- Feng, Z.¹ Dearden, R.² Meuleau, N.³ Washington, R.⁴

20
- 47249139892
- Metrics for Markov decision processes with infinite state spaces
- Ferns, N., Panangaden, P., & Precup, D. (2005). Metrics for Markov decision processes with infinite state spaces. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence.
- (2005) Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence
- Ferns, N.¹ Panangaden, P.² Precup, D.³

21
- 0021518209
- Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images
- Geman, S., &; Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721-741.
- (1984) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.6 , Issue.6 , pp. 721-741
- Geman, S.¹ Geman, D.²

22
- 0003989207
- Ph.D. thesis, Carnegie Mellon University
- Gordon, G. (1999). Approximate Solutions to Markov Decision Processes. Ph.D. thesis, Carnegie Mellon University.
- (1999) Approximate Solutions to Markov Decision Processes
- Gordon, G.¹

23
- 14344256227
- Ph.D. thesis, Stanford University
- Guestrin, C. (2003). Planning Under Uncertainty in Complex Structured Environments. Ph.D. thesis, Stanford University.
- (2003) Planning under Uncertainty in Complex Structured Environments
- Guestrin, C.¹

24
- 29344475738
- Solving factored MDPs with continuous and discrete variables
- Guestrin, C., Hauskrecht, M., &; Kveton, B. (2004). Solving factored MDPs with continuous and discrete variables. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 235-242.
- (2004) Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence , pp. 235-242
- Guestrin, C.¹ Hauskrecht, M.² Kveton, B.³

25
- 84880803349
- Generalizing plans to new environments in relational MDPs
- Guestrin, C., Koller, D., Gearhart, C., &: Kanodia, N. (2003). Generalizing plans to new environments in relational MDPs. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 1003-1010.
- (2003) Proceedings of the 18th International Joint Conference on Artificial Intelligence , pp. 1003-1010
- Guestrin, C.¹ Koller, D.² Gearhart, C.³ Kanodia, N.⁴

26
- 84880898477
- Max-norm projections for factored MDPs
- Guestrin, C., Koller, D., & Parr, R. (2001). Max-norm projections for factored MDPs. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 673-682.
- (2001) Proceedings of the 17th International Joint Conference on Artificial Intelligence , pp. 673-682
- Guestrin, C.¹ Koller, D.² Parr, R.³

27
- 84899028010
- Multiagent planning with factored MDPs
- Guestrin, C., Koller, D., &: Parr, R. (2002). Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems 14, pp. 1523-1530.
- (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1523-1530
- Guestrin, C.¹ Koller, D.² Parr, R.³

28
- 4544318426
- Efficient solution algorithms for factored MDPs
- Guestrin, C., Koller, D., Parr, R., &; Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19, 399-468.
- (2003) Journal of Artificial Intelligence Research , vol.19 , pp. 399-468
- Guestrin, C.¹ Koller, D.² Parr, R.³ Venkataraman, S.⁴

29
- 0036923118
- Context specific multiagent coordination and planning with factored MDPs
- Guestrin, C., Venkataraman, S., &; Koller, D. (2002). Context specific multiagent coordination and planning with factored MDPs. In Proceedings of the 18th National Conference on Artificial Intelligence, pp. 253-259.
- (2002) Proceedings of the 18th National Conference on Artificial Intelligence , pp. 253-259
- Guestrin, C.¹ Venkataraman, S.² Koller, D.³

30
- 77956890234
- Monte Carlo sampling methods using Markov chains and their application
- Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their application. Biometrika, 57, 97-109.
- (1970) Biometrika , vol.57 , pp. 97-109
- Hastings, W.K.¹

31
- 0001770240
- Value-function approximations for partially observable Markov decision processes
- Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33-94.
- (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 33-94
- Hauskrecht, M.¹

32
- 84898970468
- Linear program approximations for factored continuous-state Markov decision processes
- Hauskrecht, M., & Kveton, B. (2004). Linear program approximations for factored continuous-state Markov decision processes. In Advances in Neural Information Processing Systems 16, pp. 895-902.
- (2004) Advances in Neural Information Processing Systems , vol.16 , pp. 895-902
- Hauskrecht, M.¹ Kveton, B.²

33
- 0032398552
- Auxiliary variable methods for Markov chain Monte Carlo with applications
- Higdon, D. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications. Journal of the American Statistical Association, 55(442), 585-595.
- (1998) Journal of the American Statistical Association , vol.55 , Issue.442 , pp. 585-595
- Higdon, D.¹

34
- 0000086731
- Influence diagrams
- Strategic Decisions Group, Menlo Park, CA
- Howard, R., & Matheson, J. (1984). Influence diagrams. In Readings on the Principles and Applications of Decision Analysis, Vol. 2, pp. 719-762. Strategic Decisions Group, Menlo Park, CA.
- (1984) Readings on the Principles and Applications of Decision Analysis , vol.2 , pp. 719-762
- Howard, R.¹ Matheson, J.²

35
- 0003598496
- Cambridge University Press, Cambridge, United Kingdom
- Jeffreys, H., &; Jeffreys, B. (1988). Methods of Mathematical Physics. Cambridge University Press, Cambridge, United Kingdom.
- (1988) Methods of Mathematical Physics
- Jeffreys, H.¹ Jeffreys, B.²

36
- 0000305280
- From influence diagrams to junction trees
- Jensen, F., Jensen, F., &; Dittmer, S. (1994). From influence diagrams to junction trees. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp. 367-373.
- (1994) Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence , pp. 367-373
- Jensen, F.¹ Jensen, F.² Dittmer, S.³

37
- 0000564361
- A polynomial algorithm in linear programming
- Khachiyan, L. (1979). A polynomial algorithm in linear programming. Doklady Akademii Nauk SSSR, 244, 1093-1096.
- (1979) Doklady Akademii Nauk SSSR , vol.244 , pp. 1093-1096
- Khachiyan, L.¹

38
- 26444479778
- Optimization by simulated annealing
- Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.
- (1983) Science , vol.220 , Issue.4598 , pp. 671-680
- Kirkpatrick, S.¹ Gelatt, C.D.² Vecchi, M.P.³

39
- 84880688552
- Computing factored value functions for policies in structured MDPs
- Koller, D., &; Parr, R. (1999). Computing factored value functions for policies in structured MDPs. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 1332-1339.
- (1999) Proceedings of the 16th International Joint Conference on Artificial Intelligence , pp. 1332-1339
- Koller, D.¹ Parr, R.²

40
- 13444281230
- Heuristic refinements of approximate linear programming for factored continuous-state Markov decision processes
- Kveton, B., & Hauskrecht, M. (2004). Heuristic refinements of approximate linear programming for factored continuous-state Markov decision processes. In Proceedings of the 14th International Conference on Automated Planning and Scheduling, pp. 306-314.
- (2004) Proceedings of the 14th International Conference on Automated Planning and Scheduling , pp. 306-314
- Kveton, B.¹ Hauskrecht, M.²

41
- 33746031635
- An MCMC approach to solving hybrid factored MDPs
- Kveton, B., & Hauskrecht, M. (2005). An MCMC approach to solving hybrid factored MDPs. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, pp. 1346-1351.
- (2005) Proceedings of the 19th International Joint Conference on Artificial Intelligence , pp. 1346-1351
- Kveton, B.¹ Hauskrecht, M.²

42
- 33750595113
- Learning basis functions in hybrid domains
- Kveton, B., & Hauskrecht, M. (2006a). Learning basis functions in hybrid domains. In Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1161-1166.
- (2006) Proceedings of the 21st National Conference on Artificial Intelligence , pp. 1161-1166
- Kveton, B.¹ Hauskrecht, M.²

43
- 33746054938
- Solving factored MDPs with exponential-family transition models
- Kveton, B., & Hauskrecht, M. (2006b). Solving factored MDPs with exponential-family transition models. In Proceedings of the 16th International Conference on Automated Planning and Scheduling, pp. 114-120.
- (2006) Proceedings of the 16th International Conference on Automated Planning and Scheduling , pp. 114-120
- Kveton, B.¹ Hauskrecht, M.²

44
- 29344433509
- Samuel meets Amarel: Automating value function approximation using global state space analysis
- Mahadevan, S. (2005). Samuel meets Amarel: Automating value function approximation using global state space analysis. In Proceedings of the 20th National Conference on Artificial Intelligence, pp. 1000-1005.
- (2005) Proceedings of the 20th National Conference on Artificial Intelligence , pp. 1000-1005
- Mahadevan, S.¹

45
- 77957901577
- Value function approximation with diffusion wavelets and Laplacian eigenfunctions
- Mahadevan, S., & Maggioni, M. (2006). Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In Advances in Neural Information Processing Systems 18, pp. 843-850.
- (2006) Advances in Neural Information Processing Systems , vol.18 , pp. 843-850
- Mahadevan, S.¹ Maggioni, M.²

46
- 33750591731
- Learning representation and control in continuous Markov decision processes
- Mahadevan, S., Maggioni, M., Ferguson, K., &: Osentoski, S. (2006). Learning representation and control in continuous Markov decision processes. In Proceedings of the 21st National Conference on Artificial Intelligence.
- (2006) Proceedings of the 21st National Conference on Artificial Intelligence
- Mahadevan, S.¹ Maggioni, M.² Ferguson, K.³ Osentoski, S.⁴

47
- 0001257766
- Linear programming and sequential decisions
- Manne, A. (1960). Linear programming and sequential decisions. Management Science, 6(3), 259-267.
- (1960) Management Science , vol.6 , Issue.3 , pp. 259-267
- Manne, A.¹

48
- 5744249209
- Equation of state calculations by fast computing machines
- Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087-1092.
- (1953) Journal of Chemical Physics , vol.21 , pp. 1087-1092
- Metropolis, N.¹ Rosenbluth, A.² Rosenbluth, M.³ Teller, A.⁴ Teller, E.⁵

49
- 0036832953
- Variable resolution discretization in optimal control
- Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
- (2002) Machine Learning , vol.49 , pp. 291-323
- Munos, R.¹ Moore, A.²

50
- 33750602399
- Ph.D. thesis, Brown University
- Ortiz, L. (2002). Selecting Approximately-Optimal Actions in Complex Structured Domains. Ph.D. thesis, Brown University.
- (2002) Selecting Approximately-optimal Actions in Complex Structured Domains
- Ortiz, L.¹

51
- 2442647933
- Approximating MAP using local search
- Park, J., & Darwiche, A. (2001). Approximating MAP using local search. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, pp. 403-410.
- (2001) Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence , pp. 403-410
- Park, J.¹ Darwiche, A.²

52
- 26944480622
- Solving MAP exactly using systematic search
- Park, J., & Darwiche, A. (2003). Solving MAP exactly using systematic search. In Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, pp. 459-468.
- (2003) Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence , pp. 459-468
- Park, J.¹ Darwiche, A.²

53
- 0036927202
- Greedy linear value-approximation for factored Markov decision processes
- Patrascu, R., Poupart, P., Schuurmans, D., Boutilier, C., & Guestrin, C. (2002). Greedy linear value-approximation for factored Markov decision processes. In Proceedings of the 18th National Conference on Artificial Intelligence, pp. 285-291.
- (2002) Proceedings of the 18th National Conference on Artificial Intelligence , pp. 285-291
- Patrascu, R.¹ Poupart, P.² Schuurmans, D.³ Boutilier, C.⁴ Guestrin, C.⁵

54
- 85102627959
- John Wiley & Sons, New York, NY
- Puterman, M. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, NY.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.¹

55
- 0001509947
- Using randomization to break the curse of dimensionality
- Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica, 65(3), 487-516.
- (1997) Econometrica , vol.65 , Issue.3 , pp. 487-516
- Rust, J.¹

56
- 72949112166
- Approximate linear programming for first-order MDPs
- Sanner, S., &; Boutilier, C. (2005). Approximate linear programming for first-order MDPs. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence.
- (2005) Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence
- Sanner, S.¹ Boutilier, C.²

57
- 84898980702
- Direct value-approximation for factored MDPs
- Schuurmans, D., & Patrascu, R. (2002). Direct value-approximation for factored MDPs. In Advances in Neural Information Processing Systems 14, pp. 1579-1586.
- (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1579-1586
- Schuurmans, D.¹ Patrascu, R.²

58
- 0000273218
- Generalized polynomial approximations in Markovian decision processes
- Schweitzer, P., &; Seidmann, A. (1985). Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110, 568-582.
- (1985) Journal of Mathematical Analysis and Applications , vol.110 , pp. 568-582
- Schweitzer, P.¹ Seidmann, A.²

59
- 0003871607
- Ph.D. thesis, Stanford University
- Sondik, E. (1971). The Optimal Control of Partially Observable Markov Decision Processes. Ph.D. thesis, Stanford University.
- (1971) The Optimal Control of Partially Observable Markov Decision Processes
- Sondik, E.¹

60
- 0004102479
- MIT Press, Cambridge, MA
- Sutton, R., &: Barto, A. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

61
- 0001046225
- Practical issues in temporal difference learning
- Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8(3-4), 257-277.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
- Tesauro, G.¹

62
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves masterlevel play
- Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6(2), 215-219.
- (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
- Tesauro, G.¹

63
- 0029276036
- Temporal difference learning and TD-Gammon
- Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68.
- (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

64
- 0008832043
- Tech. rep., Carnegie Mellon University
- Trick, M., & Zin, S. (1993). A linear programming approach to solving stochastic dynamic programs. Tech. rep., Carnegie Mellon University.
- (1993) A Linear Programming Approach to Solving Stochastic Dynamic Programs
- Trick, M.¹ Zin, S.²

65
- 33746070253
- Planning under Uncertainty in Complex Structured Environments
- Van Roy, B. (1998). Planning Under Uncertainty in Complex Structured Environments. Ph.D. thesis, Massachusetts Institute of Technology.
- (1998) Ph.D. Thesis, Massachusetts Institute of Technology.
- Van Roy, B.¹

66
- 33750590218
- Annealed MAP
- Yuan, C., Lu, T.-C., & Druzdzel, M. (2004). Annealed MAP. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 628-635.
- (2004) Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence , pp. 628-635
- Yuan, C.¹ Lu, T.-C.² Druzdzel, M.³

67
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- Zhang, W., & Dietterich, T. (1995). A reinforcement learning approach to job-shop scheduling. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1114-1120.
- (1995) Proceedings of the 14th International Joint Conference on Artificial Intelligence , pp. 1114-1120
- Zhang, W.¹ Dietterich, T.²

68
- 85156225449
- High-performance job-shop scheduling with a time-delay TD(A) network
- Zhang, W., & Dietterich, T. (1996). High-performance job-shop scheduling with a time-delay TD(A) network. In Advances in Neural Information Processing Systems 8, pp. 1024-1030.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1024-1030
- Zhang, W.¹ Dietterich, T.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.