메뉴 건너뛰기




Volumn 40, Issue 6, 2011, Pages 1662-1714

Bisimulation metrics for continuous Markov decision processes

Author keywords

Bisimulation; Continuous; Markov decision process; Metrics; Reinforcement learning

Indexed keywords

BEHAVIORAL RESEARCH; LINEAR PROGRAMMING; MARKOV PROCESSES; PRESSES (MACHINE TOOLS); ROBOT PROGRAMMING; ROBOTS; SAMPLING; STOCHASTIC MODELS; STOCHASTIC SYSTEMS;

EID: 84855578361     PISSN: 00975397     EISSN: None     Source Type: Journal    
DOI: 10.1137/10080484X     Document Type: Conference Paper
Times cited : (163)

References (64)
  • 1
    • 0031176507 scopus 로고    scopus 로고
    • Scale-sensitive dimensions, uniform convergence, and learnability
    • N. ALON, S. BEN-DAVID, N. CESA-BIANCHI, AND D. HAUSSLER, Scale-sensitive dimensions, uniform convergence, and learnability, J. Assoc. Comput. Mach., 44(1997), pp. 615-631. (Pubitemid 127617707)
    • (1997) Journal of the ACM , vol.44 , Issue.4 , pp. 615-631
    • Alon, N.1    Ben-David, S.2    Cesa-Bianchi, N.3    Haussler, D.4
  • 2
    • 51049096652 scopus 로고    scopus 로고
    • Uniform glivenko-cantelli theorems and concentration of measure in the mathematical modelling of learning
    • London, UK
    • M. ANTHONY, Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning, Technical Report LSE-CDAM-2002-07, Centre for Discrete and Applicable Mathematics, London, UK, 2002; available online at http://www.maths.lse.ac.uk/Personal/martin/mresearch. html.
    • (2002) Technical Report LSE-CDAM-2002-07, Centre for Discrete and Applicable Mathematics
    • Anthony, M.1
  • 3
    • 0037209806 scopus 로고    scopus 로고
    • Performance measure sensitive congruences for Markovian process algebras
    • DOI 10.1016/S0304-3975(01)00090-1, PII S0304397501000901
    • M. BERNARDO AND M. BRAVETTI, Performance measure sensitive congruences for Markovian process algebras, Theoret. Comput. Sci., 290(2003), pp. 117-160. (Pubitemid 35264420)
    • (2003) Theoretical Computer Science , vol.290 , Issue.1 , pp. 117-160
    • Bernardo, M.1    Bravetti, M.2
  • 7
    • 0346942368 scopus 로고    scopus 로고
    • Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
    • C. BOUTILIER, T. DEAN, AND S. HANKS, Decision-theoretic planning: Structural assumptions and computational leverage, J. Artificial Intell. Res., 11(1999), pp. 1-94. (Pubitemid 129628760)
    • (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
    • Boutilier, C.1    Dean, T.2    Hanks, S.3
  • 9
    • 0034248853 scopus 로고    scopus 로고
    • Stochastic dynamic programming with factored representations
    • C. BOUTILIER, R. DEARDEN, AND M. GOLDSZMIDT, Stochastic dynamic programming with factored representations, Artificial Intell., 121(2000), pp. 49-107.
    • (2000) Artificial Intell. , vol.121 , pp. 49-107
    • Boutilier, C.1    Dearden, R.2    Goldszmidt, M.3
  • 10
    • 34248363967 scopus 로고    scopus 로고
    • Nearest-neighbor searching and metric space dimensions
    • G. Shakhnarovich, T. Darrell, and P. Indyk, eds., MIT Press, Cambridge, MA
    • K. L. CLARKSON, Nearest-neighbor searching and metric space dimensions, in Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, G. Shakhnarovich, T. Darrell, and P. Indyk, eds., MIT Press, Cambridge, MA, 2006, pp. 15-59.
    • (2006) Nearest-Neighbor Methods for Learning and Vision: Theory and Practice , pp. 15-59
    • Clarkson, K.L.1
  • 14
    • 0043001071 scopus 로고    scopus 로고
    • Ph. D. thesis, McGill University, Montreal, Canada
    • J. DESHARNAIS, Label led Markov Processes, Ph. D. thesis, McGill University, Montreal, Canada, 2000.
    • (2000) Label Led Markov Processes
    • Desharnais, J.1
  • 19
    • 84855599086 scopus 로고    scopus 로고
    • Lecture notes for a course given at Aarhus University, Aarhus, Denmark
    • R. M. DUDLEY, Notes on Empirical Processes, Lecture notes for a course given at Aarhus University, Aarhus, Denmark, 1999.
    • (1999) Notes on Empirical Processes
    • Dudley, R.M.1
  • 21
    • 0000050198 scopus 로고
    • Uniform and universal Glivenko-Cantelli classes
    • R. M. DUDLEY, E. GINÉ, AND J. ZINN, Uniform and universal Glivenko-Cantelli classes, J. Theoret. Probab., 4(1991), pp. 485-510.
    • (1991) J. Theoret. Probab. , vol.4 , pp. 485-510
    • Dudley, R.M.1    Giné, E.2    Zinn, J.3
  • 22
    • 0013215143 scopus 로고    scopus 로고
    • When Scott is weak on the top
    • A. EDALAT, When Scott is weak on the top, Math. Struct. Comput. Sci., 7(1997), pp. 401-417.
    • (1997) Math. Struct. Comput. Sci. , vol.7 , pp. 401-417
    • Edalat, A.1
  • 23
    • 9444285501 scopus 로고    scopus 로고
    • Approximate Equivalence of Markov Decision Processes
    • Learning Theory and Kernel Machines
    • E. EVEN-DAR AND Y. MANSOUR, Approximate equivalence of Markov decision processes, in Proceedings of the 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop, (COLT/Kernel, Washington, DC), Lecture Notes in Comput. Sci. 2777, Springer, Berlin, New York, 2003, pp. 581-594. (Pubitemid 37053231)
    • (2003) Lecture Notes in Computer Science , Issue.2777 , pp. 581-594
    • Even-Dar, E.1    Mansour, Y.2
  • 30
    • 33645578193 scopus 로고    scopus 로고
    • A computational study of cost reoptimization for min-cost flow problems
    • DOI 10.1287/ijoc.1040.0081
    • A. FRANGIONI AND A. MANCA, A computational study of cost reoptimization for min-cost flow problems, INFORMS J. Comput., 18(2006), pp. 61-70. (Pubitemid 43515946)
    • (2006) INFORMS Journal on Computing , vol.18 , Issue.1 , pp. 61-70
    • Frangioni, A.1    Manca, A.2
  • 31
    • 0036896227 scopus 로고    scopus 로고
    • On choosing and bounding probability metrics
    • A. L. GIBBS AND F. E. SU, On choosing and bounding probability metrics, Internat. Statist. Rev., 70(2002), pp. 419-435.
    • (2002) Internat. Statist. Rev. , vol.70 , pp. 419-435
    • Gibbs, A.L.1    Su, F.E.2
  • 32
    • 0038517214 scopus 로고    scopus 로고
    • Equivalence notions and model minimization in Markov decision processes
    • R. GIVAN, T. DEAN, AND M. GREIG, Equivalence notions and model minimization in Markov decision processes, Artificial Intell., 147(2003), pp. 163-223.
    • (2003) Artificial Intell. , vol.147 , pp. 163-223
    • Givan, R.1    Dean, T.2    Greig, M.3
  • 33
    • 0021938963 scopus 로고
    • Clustering to minimize the maximum intercluster distance
    • T. F. GONZALEZ, Clustering to minimize the maximum intercluster distance, Theoret. Comput. Sci., 38(1985), pp. 293-306.
    • (1985) Theoret. Comput. Sci. , vol.38 , pp. 293-306
    • Gonzalez, T.F.1
  • 34
    • 1642338607 scopus 로고    scopus 로고
    • Coffee, tea, or...?: A Markov decision process model for airline meal provisioning
    • J. H. GOTO, M. E. LEWIS, AND M. L. PUTERMAN, Coffee, tea, or...?: A Markov decision process model for airline meal provisioning, Trans. Sci., 38(2004), pp. 107-118.
    • (2004) Trans. Sci. , vol.38 , pp. 107-118
    • Goto, J.H.1    Lewis, M.E.2    Puterman, M.L.3
  • 36
    • 0021974161 scopus 로고
    • Algebraic laws for nondeterminism and concurrency
    • M. HENNESSY AND R. MILNER, Algebraic laws for nondeterminism and concurrency, J. Assoc. Comput. Mach., 32(1985), pp. 137-161.
    • (1985) J. Assoc. Comput. Mach. , vol.32 , pp. 137-161
    • Hennessy, M.1    Milner, R.2
  • 37
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • PII S000437029800023X
    • L. PACK KAELBLING, M. L. LITTMAN, AND A. R. CASSANDRA, Planning and acting in partially observable stochastic domains, Artificial Intell., 101(1998), pp. 99-134. (Pubitemid 128387390)
    • (1998) Artificial Intelligence , vol.101 , Issue.1-2 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.L.2    Cassandra, A.R.3
  • 41
    • 0026222347 scopus 로고
    • Bisimulation through probabilistic testing
    • K. G. LARSEN AND A. SKOU, Bisimulation through probabilistic testing, Inform. Comput., 94(1991), pp. 1-28. (Pubitemid 21694897)
    • (1991) Information and Computation , vol.94 , Issue.1 , pp. 1-28
    • Larsen Kim, G.1    Skou Arne2
  • 43
    • 84855593932 scopus 로고    scopus 로고
    • Planning for Markov decision processes with sparse stochasticity
    • L. K. Saul, Y. Weiss, and L. Bottou, eds., MIT Press, Cambridge, MA
    • M. LIKHACHEV, G. GORDON, AND S. THRUN, Planning for Markov decision processes with sparse stochasticity, in Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou, eds., MIT Press, Cambridge, MA, 2005, pp. 785-792.
    • (2005) Advances in Neural Information Processing Systems , vol.17 , pp. 785-792
    • Likhachev, M.1    Gordon, G.2    Thrun, S.3
  • 44
    • 0003276135 scopus 로고
    • A calculus of communicating systems
    • Springer-Verlag, New York
    • R. MILNER, A calculus of Communicating Systems, Lecture Notes in Comput. Sci. 92, Springer-Verlag, New York, 1980.
    • (1980) Lecture Notes in Comput. Sci. , vol.92
    • Milner, R.1
  • 45
    • 0003954103 scopus 로고
    • Prentice-Hall International, Englewood Cliffs, NJ
    • R. MILNER, Communication and Concurrency, Prentice-Hall International, Englewood Cliffs, NJ, 1989.
    • (1989) Communication and Concurrency
    • Milner, R.1
  • 46
    • 0031272080 scopus 로고    scopus 로고
    • How does the value function of a Markov decision process depend on the transition probabilities?
    • A. MÜLLER, How does the value function of a Markov decision process depend on the transition probabilities?, Math. Oper. Res., 22(1997), pp. 872-885. (Pubitemid 127653365)
    • (1997) Mathematics of Operations Research , vol.22 , Issue.4 , pp. 872-885
    • Muller, A.1
  • 48
    • 38149055725 scopus 로고    scopus 로고
    • Pseudometrics for state aggregation in average reward Markov decision processes
    • Sendai, Japan, Lecture Notes in Comput. Sci, M. Hutter, R. A. Servedio, and E. Takimoto, eds., Springer-Verlag, Berlin, New York
    • R. ORTNER, Pseudometrics for state aggregation in average reward Markov decision processes, in Proceedings of Algorithmic Learning Theory: 18th International Conference (ALT), Sendai, Japan, Lecture Notes in Comput. Sci. 4754, M. Hutter, R. A. Servedio, and E. Takimoto, eds., Springer-Verlag, Berlin, New York, 2007, pp. 373-387.
    • (2007) Proceedings of Algorithmic Learning Theory: 18th International Conference (ALT) , vol.4754 , pp. 373-387
    • Ortner, R.1
  • 54
    • 0003696856 scopus 로고    scopus 로고
    • Mass transportation problems. Vol. 1: Theory. Vol. 2: Applications
    • Springer, New York
    • S. T. RACHEV AND L. RUESCHENDORF, Mass Transportation Problems. Vol. 1: Theory. Vol. 2: Applications, Springer Ser. Statis. Probab. Appl., Springer, New York, 1998.
    • (1998) Springer Ser. Statis. Probab. Appl.
    • Rachev, S.T.1    Rueschendorf, L.2
  • 57
    • 37149022033 scopus 로고    scopus 로고
    • Approximating a behavioural pseudometric without discount for probabilistic systems
    • Foundations of Software Science and Computational Structures - 10th International Conference, FOSSACS 2007. Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2007
    • F. VAN BREUGEL, B. SHARMA, AND J. WORRELL, Approximating a behavioural pseudometric without discount for probabilistic systems, in Proceedings of Foundations of Software Science and Computational Structures, 10th International Conference (FOSSACS), Held as Part of the Joint European Conferences on Theory and Practice of Software (ETAPS Braga, Portugal), Lecture Notes in Comput. Sci. 4423, Helmut Seidl, ed., Springer, Berlin, New York, 2007, pp. 123-137. (Pubitemid 350259700)
    • (2007) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol.LNCS , pp. 123-137
    • Van Breugel, F.1    Sharma, B.2    Worrell, J.3
  • 58
    • 84879525367 scopus 로고    scopus 로고
    • Towards Quantitative Verification of Probabilistic Transition Systems
    • Automata, Languages and Programming
    • F. VAN BREUGEL AND J. WORRELL, Towards quantitative verification of probabilistic transition systems, in ICALP '01: Proceedings of the 28th International Colloquium on Automata, Languages and Programming, Springer-Verlag, London, UK, 2001, pp. 421-432. (Pubitemid 33300818)
    • (2001) Lecture Notes in Computer Science , Issue.2076 , pp. 421-432
    • Van Breugel, F.1    Worrell, J.2
  • 59
    • 84944050216 scopus 로고    scopus 로고
    • An Algorithm for Quantitative Verification of Probabilistic Transition Systems
    • CONCUR 2001 - Concurrency Theory
    • F. VAN BREUGEL AND J. WORRELL, An algorithm for quantitative verification of probabilistic transition systems, in CONCUR '01: Proceedings of the 12th International Conference on Concurrency Theory, Springer-Verlag, London, UK, 2001, pp. 336-350. (Pubitemid 33326666)
    • (2001) Lecture Notes in Computer Science , Issue.2154 , pp. 336-350
    • Van Breugel, F.1    Worrell, J.2
  • 60
    • 1542342359 scopus 로고    scopus 로고
    • Topics in optimal transportation
    • AMS, Providence, RI
    • C. VILLANI, Topics in Optimal Transportation, Grad. Stud. Math. 58, AMS, Providence, RI, 2003.
    • (2003) Grad. Stud. Math. , vol.58
    • Villani, C.1
  • 62
    • 27744508529 scopus 로고    scopus 로고
    • Finite approximation of measure and integration
    • DOI 10.1016/j.apal.2005.05.033, PII S0168007205000813
    • J. WEBSTER, Finite approximation of measure and integration, Ann. Pure Appl. Logic, 137(2006), pp. 439-449. (Pubitemid 41614400)
    • (2006) Annals of Pure and Applied Logic , vol.137 , Issue.1-3 , pp. 439-449
    • Webster, J.1
  • 63
    • 0004273499 scopus 로고
    • The formal semantics of programming languages
    • MIT Press, Cambridge, MA
    • G. WINSKEL, The Formal Semantics of Programming Languages, Found. Comput. Sci. Ser., MIT Press, Cambridge, MA, 1993.
    • (1993) Found. Comput. Sci. Ser.
    • Winskel, G.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.