메뉴 건너뛰기




Volumn 27, Issue , 2006, Pages 153-201

Solving factored MDPs with hybrid state and action variables

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; DECISION SUPPORT SYSTEMS; FUNCTIONS; LINEAR PROGRAMMING; OPTIMIZATION; PROBLEM SOLVING;

EID: 33750586671     PISSN: 10769757     EISSN: 10769757     Source Type: Journal    
DOI: 10.1613/jair.2085     Document Type: Article
Times cited : (46)

References (68)
  • 2
    • 50549213583 scopus 로고
    • Optimal control of Markov processes with incomplete state information
    • Astrom, K. (1965). Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications, 10(1), 174-205.
    • (1965) Journal of Mathematical Analysis and Applications , vol.10 , Issue.1 , pp. 174-205
    • Astrom, K.1
  • 3
    • 85012688561 scopus 로고
    • Princeton University Press, Princeton, NJ
    • Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.
    • (1957) Dynamic Programming
    • Bellman, R.1
  • 4
    • 84968468700 scopus 로고
    • Polynomial approximation - A new computational technique in dynamic programming: Allocation processes
    • Bellman, R., Kalaba, R., & Kotkin, B. (1963). Polynomial approximation - a new computational technique in dynamic programming: Allocation processes. Mathematics of Computation, 17(82), 155-161.
    • (1963) Mathematics of Computation , vol.17 , Issue.82 , pp. 155-161
    • Bellman, R.1    Kalaba, R.2    Kotkin, B.3
  • 5
    • 0000268954 scopus 로고
    • A counterexample for temporal differences learning
    • Bertsekas, D. (1995). A counterexample for temporal differences learning. Neural Computation, 7(2), 270-279.
    • (1995) Neural Computation , vol.7 , Issue.2 , pp. 270-279
    • Bertsekas, D.1
  • 10
    • 0002205556 scopus 로고    scopus 로고
    • Rao-Blackwellisation of sampling schemes
    • Casella, G., & Robert, C. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83(1), 81-94.
    • (1996) Biometrika , vol.83 , Issue.1 , pp. 81-94
    • Casella, G.1    Robert, C.2
  • 11
    • 0026206780 scopus 로고
    • An optimal one-way multigrid algorithm for discrete-time stochastic control
    • Chow, C.-S., &; Tsitsiklis, J. (1991). An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control, 36(8), 898-914.
    • (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
    • Chow, C.-S.1    Tsitsiklis, J.2
  • 14
    • 0348090400 scopus 로고    scopus 로고
    • The linear programming approach to approximate dynamic programming
    • de Farias, D. P., & Van Roy, B. (2003). The linear programming approach to approximate dynamic programming. Operations Research, 51(6), 850-856.
    • (2003) Operations Research , vol.51 , Issue.6 , pp. 850-856
    • De Farias, D.P.1    Van Roy, B.2
  • 15
    • 5544258192 scopus 로고    scopus 로고
    • On constraint sampling for the linear programming approach to approximate dynamic programming
    • de Farias, D. P., & Van Roy, B. (2004). On constraint sampling for the linear programming approach to approximate dynamic programming. Mathematics of Operations Research, 29(3), 462-478.
    • (2004) Mathematics of Operations Research , vol.29 , Issue.3 , pp. 462-478
    • De Farias, D.P.1    Van Roy, B.2
  • 16
    • 84990553353 scopus 로고
    • A model for reasoning about persistence and causation
    • Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
    • (1989) Computational Intelligence , vol.5 , pp. 142-150
    • Dean, T.1    Kanazawa, K.2
  • 21
    • 0021518209 scopus 로고
    • Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images
    • Geman, S., &; Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721-741.
    • (1984) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.6 , Issue.6 , pp. 721-741
    • Geman, S.1    Geman, D.2
  • 30
    • 77956890234 scopus 로고
    • Monte Carlo sampling methods using Markov chains and their application
    • Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their application. Biometrika, 57, 97-109.
    • (1970) Biometrika , vol.57 , pp. 97-109
    • Hastings, W.K.1
  • 31
    • 0001770240 scopus 로고    scopus 로고
    • Value-function approximations for partially observable Markov decision processes
    • Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33-94.
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 33-94
    • Hauskrecht, M.1
  • 32
    • 84898970468 scopus 로고    scopus 로고
    • Linear program approximations for factored continuous-state Markov decision processes
    • Hauskrecht, M., & Kveton, B. (2004). Linear program approximations for factored continuous-state Markov decision processes. In Advances in Neural Information Processing Systems 16, pp. 895-902.
    • (2004) Advances in Neural Information Processing Systems , vol.16 , pp. 895-902
    • Hauskrecht, M.1    Kveton, B.2
  • 33
    • 0032398552 scopus 로고    scopus 로고
    • Auxiliary variable methods for Markov chain Monte Carlo with applications
    • Higdon, D. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications. Journal of the American Statistical Association, 55(442), 585-595.
    • (1998) Journal of the American Statistical Association , vol.55 , Issue.442 , pp. 585-595
    • Higdon, D.1
  • 37
    • 0000564361 scopus 로고
    • A polynomial algorithm in linear programming
    • Khachiyan, L. (1979). A polynomial algorithm in linear programming. Doklady Akademii Nauk SSSR, 244, 1093-1096.
    • (1979) Doklady Akademii Nauk SSSR , vol.244 , pp. 1093-1096
    • Khachiyan, L.1
  • 38
    • 26444479778 scopus 로고
    • Optimization by simulated annealing
    • Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.
    • (1983) Science , vol.220 , Issue.4598 , pp. 671-680
    • Kirkpatrick, S.1    Gelatt, C.D.2    Vecchi, M.P.3
  • 44
    • 29344433509 scopus 로고    scopus 로고
    • Samuel meets Amarel: Automating value function approximation using global state space analysis
    • Mahadevan, S. (2005). Samuel meets Amarel: Automating value function approximation using global state space analysis. In Proceedings of the 20th National Conference on Artificial Intelligence, pp. 1000-1005.
    • (2005) Proceedings of the 20th National Conference on Artificial Intelligence , pp. 1000-1005
    • Mahadevan, S.1
  • 45
    • 77957901577 scopus 로고    scopus 로고
    • Value function approximation with diffusion wavelets and Laplacian eigenfunctions
    • Mahadevan, S., & Maggioni, M. (2006). Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In Advances in Neural Information Processing Systems 18, pp. 843-850.
    • (2006) Advances in Neural Information Processing Systems , vol.18 , pp. 843-850
    • Mahadevan, S.1    Maggioni, M.2
  • 47
    • 0001257766 scopus 로고
    • Linear programming and sequential decisions
    • Manne, A. (1960). Linear programming and sequential decisions. Management Science, 6(3), 259-267.
    • (1960) Management Science , vol.6 , Issue.3 , pp. 259-267
    • Manne, A.1
  • 49
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
    • (2002) Machine Learning , vol.49 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 55
    • 0001509947 scopus 로고    scopus 로고
    • Using randomization to break the curse of dimensionality
    • Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica, 65(3), 487-516.
    • (1997) Econometrica , vol.65 , Issue.3 , pp. 487-516
    • Rust, J.1
  • 61
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8(3-4), 257-277.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
    • Tesauro, G.1
  • 62
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves masterlevel play
    • Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6(2), 215-219.
    • (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G.1
  • 63
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68.
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 68


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.