메뉴 건너뛰기




Volumn 1, Issue 4, 2008, Pages 403-565

Learning representation and control in markov decision processes: New frontiers

Author keywords

[No Author keywords available]

Indexed keywords

DECISION PROBLEMS; DIAGONALIZATIONS; DIMENSIONALITY REDUCTION; DRAZIN INVERSE; EXACT SOLUTION; GENERIC ALGORITHM; LAPLACIAN OPERATOR; LAPLACIANS; LOW-DIMENSIONAL REPRESENTATION; MACHINE-LEARNING; MARKOV DECISION PROCESSES; MATHEMATICAL FRAMEWORKS; MATRIX REPRESENTATION; MODEL FREE; MODEL-BASED; OFF-DIAGONAL ELEMENTS; OPTIMAL CONTROLS; OPTIMAL POLICIES; POLICY ITERATION; ROW SUMS;

EID: 70349322784     PISSN: 19358237     EISSN: 19358245     Source Type: Journal    
DOI: 10.1561/2200000003     Document Type: Article
Times cited : (46)

References (144)
  • 2
    • 14644405640 scopus 로고    scopus 로고
    • On the spectra of nonsymmetric Laplacian matrices
    • R. Agaev and P. Cheboratev, "On the spectra of nonsymmetric Laplacian matrices," Linear Algebra and Its Applications, vol.399, pp. 157-168, 2005.
    • (2005) Linear Algebra and Its Applications , vol.399 , pp. 157-168
    • Agaev, R.1    Cheboratev, P.2
  • 3
    • 0001119106 scopus 로고
    • On representations of problems of reasoning about actions
    • (D. Michie, ed.) Elsevier/North-Holland
    • S. Amarei, "On representations of problems of reasoning about actions," in Machine Intelligence 3, (D. Michie, ed.), pp. 131-171, Elsevier/North-Holland, 1968.
    • (1968) Machine Intelligence , vol.3 , pp. 131-171
    • Amarei, S.1
  • 7
    • 0037288370 scopus 로고    scopus 로고
    • Recent advances in hierarchical reinforcement learning
    • A. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning," Discrete Event Systems Journal, vol.13, pp. 41-77, 2003.
    • (2003) Discrete Event Systems Journal , vol.13 , pp. 41-77
    • Barto, A.1    Mahadevan, S.2
  • 8
    • 3142725535 scopus 로고    scopus 로고
    • Semi-supervised learning on Riemannian manifolds
    • M. Belkin and P. Niyogi, "Semi-supervised learning on Riemannian manifolds," Machine Learning, vol.56, pp. 209-239, 2004.
    • (2004) Machine Learning , vol.56 , pp. 209-239
    • Belkin, M.1    Niyogi, P.2
  • 11
    • 0024680419 scopus 로고
    • Adaptive Aggregation Methods for infinite horizon dynamic programming
    • D. Bertsekas and D. Castanon, "Adaptive Aggregation Methods for infinite horizon dynamic programming," IEEE Transactions on Automatic Control, vol.34, pp. 589-598, 1989.
    • (1989) IEEE Transactions on Automatic Control , vol.34 , pp. 589-598
    • Bertsekas, D.1    Castanon, D.2
  • 15
    • 0041386088 scopus 로고    scopus 로고
    • A geometric interpretation of the MetropolisHasting algorithm
    • L. Billera and P. Diaconis, "A geometric interpretation of the MetropolisHasting algorithm," Statistical Science, vol.16, pp. 335-339, 2001.
    • (2001) Statistical Science , vol.16 , pp. 335-339
    • Billera, L.1    Diaconis, P.2
  • 18
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. Bradtke and A. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol.22, pp. 33-57, 1996.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.1    Barto, A.2
  • 21
    • 0032027940 scopus 로고    scopus 로고
    • The Relations among Potentials, Perturbation Analysis, and Markov Decision Processes
    • [21] X. Cao, "The relations among potentials, perturbation analysis, and Markov decision processes," Discrete-Event Dynamic Systems, vol.8, no.1, pp. 71-87, 1998. (Pubitemid 128512397)
    • (1998) Discrete Event Dynamic Systems: Theory and Applications , vol.8 , Issue.1 , pp. 71-87
    • Cao, X.-R.1
  • 24
    • 84969165400 scopus 로고    scopus 로고
    • Forest matrices around the Laplacian matrix
    • P. Chebotarev and R. Agaev, "Forest matrices around the Laplacian matrix," Linear Algebra and Its Applications, vol.15, no.1, pp. 253-274, 2002.
    • (2002) Linear Algebra and Its Applications , vol.15 , Issue.1 , pp. 253-274
    • Chebotarev, P.1    Agaev, R.2
  • 25
    • 0028401429 scopus 로고
    • Generalized matrix inversion and rank computation by repeated squaring
    • L. Chen, E. Krishnamurthy, and I. Macleod, "Generalized matrix inversion and rank computation by repeated squaring," Parallel Computing, vol.20, pp. 297-311, 1994.
    • (1994) Parallel Computing , vol.20 , pp. 297-311
    • Chen, L.1    Krishnamurthy, E.2    Macleod, I.3
  • 27
    • 17444366585 scopus 로고    scopus 로고
    • Laplacians and the Cheeger inequality for directed graphs
    • April
    • F. Chung, "Laplacians and the Cheeger inequality for directed graphs," Annals of Combinatorics, vol.9, no.1, pp. 1-19, April 2005.
    • (2005) Annals of Combinatorics , vol.9 , Issue.1 , pp. 1-19
    • Chung, F.1
  • 28
    • 19644394100 scopus 로고    scopus 로고
    • Geometric diffusions as a tool for harmonic analysis and structure definition of data. Part i: Diffusion maps
    • May
    • R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker, "Geometric diffusions as a tool for harmonic analysis and structure definition of data. Part i: Diffusion maps," Proceedings of National Academy of Science, vol.102, no.21, pp. 7426-7431, May 2005.
    • (2005) Proceedings of National Academy of Science , vol.102 , Issue.21 , pp. 7426-7431
    • Coifman, R.1    Lafon, S.2    Lee, A.3    Maggioni, M.4    Nadler, B.5    Warner, F.6    Zucker, S.7
  • 29
    • 19644366699 scopus 로고    scopus 로고
    • Geometrie diffusions as a tool for harmonie analysis and structure definition of data. Part ii: Multiscale methods
    • May
    • R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker, "Geometrie diffusions as a tool for harmonie analysis and structure definition of data. Part ii: Multiscale methods," Proceedings of the National Academy of Science, vol.102, no.21, pp. 7432-7437, May 2005.
    • (2005) Proceedings of the National Academy of Science , vol.102 , Issue.21 , pp. 7432-7437
    • Coifman, R.1    Lafon, S.2    Lee, A.3    Maggioni, M.4    Nadler, B.5    Warner, F.6    Zucker, S.7
  • 31
    • 25844521242 scopus 로고    scopus 로고
    • Geometric diffusions for the analysis of data from sensor networks
    • October
    • R. Coifman, M. Maggioni, S. Zucker, and I. Kevrekidis, "Geometric diffusions for the analysis of data from sensor networks," Curr Opin Neurobiol, vol.15, no.5, pp. 576-584, October 2005.
    • (2005) Curr Opin Neurobiol , vol.15 , Issue.5 , pp. 576-584
    • Coifman, R.1    Maggioni, M.2    Zucker, S.3    Kevrekidis, I.4
  • 33
    • 0032643313 scopus 로고    scopus 로고
    • Solving semi-Markov decision problems using average-reward reinforcement learning
    • T. Das, A. Gosavi, S. Mahadevan, and N. Marchalleck, "Solving semi-Markov decision problems using average-reward reinforcement learning," Management Science, vol.45, no.4, pp. 560-574, 1999.
    • (1999) Management Science , vol.45 , Issue.4 , pp. 560-574
    • Das, T.1    Gosavi, A.2    Mahadevan, S.3    Marchalleck, N.4
  • 34
    • 0003833285 scopus 로고
    • Society for Industrial and Applied Mathematics.
    • I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics. 1992.
    • (1992) Ten Lectures on Wavelets
    • Daubechies, I.1
  • 35
    • 0001158047 scopus 로고
    • Improving generalisation for temporal difference learning: The successor representation
    • P. Dayan, "Improving generalisation for temporal difference learning: The successor representation," Neural Computation, vol.5, pp. 613-624, 1993.
    • (1993) Neural Computation , vol.5 , pp. 613-624
    • Dayan, P.1
  • 39
    • 29244453931 scopus 로고    scopus 로고
    • On the Nyström method for approximating a Gram matrix for improved Kernel-based learning
    • P. Drineas and M. W. Mahoney, "On the Nyström method for approximating a Gram matrix for improved Kernel-based learning," Journal of Machine Learning Research, vol.6, pp. 2153-2175, 2005.
    • (2005) Journal of Machine Learning Research , vol.6 , pp. 2153-2175
    • Drineas, P.1    Mahoney, M.W.2
  • 40
    • 0043247546 scopus 로고    scopus 로고
    • Accelerating reinforcement learning by composing solutions of automatically identified subtasks
    • C. Drummond, "Accelerating reinforcement learning by composing solutions of automatically identified subtasks," Journal of AI Research, vol.16, pp. 59-104, 2002.
    • (2002) Journal of AI Research , vol.16 , pp. 59-104
    • Drummond, C.1
  • 41
    • 85095808297 scopus 로고    scopus 로고
    • Geometric aspects of the theory of Krylov subspace methods
    • [41] M. Eiermann and O. Ernst, "Geometric aspects of the theory of Krylov subspace methods," Acta Numérica, pp. 251-312, 2001. (Pubitemid 33305812)
    • (2001) ACTA NUMERICA , pp. 251-312
    • Eiermann, M.1    Ernst, O.G.2
  • 44
    • 0001350119 scopus 로고
    • Algebraic connectivity of graphs
    • M. Fiedler, "Algebraic connectivity of graphs," Czechoslovak Mathematical Journal, vol.23, no.98, pp. 298-305, 1973.
    • (1973) Czechoslovak Mathematical Journal , vol.23 , Issue.98 , pp. 298-305
    • Fiedler, M.1
  • 45
    • 0036832959 scopus 로고    scopus 로고
    • Structure in the space of value functions
    • D. Foster and P. Dayan, "Structure in the space of value functions," Machine Learning, vol.49, pp. 325-346, 2002.
    • (2002) Machine Learning , vol.49 , pp. 325-346
    • Foster, D.1    Dayan, P.2
  • 48
    • 70349352794 scopus 로고    scopus 로고
    • Model minimization in Markov decision processes
    • R. Givan and T. Dean, "Model minimization in Markov decision processes," AAAI, 1997.
    • (1997) AAAI
    • Givan, R.1    Dean, T.2
  • 50
    • 0038595393 scopus 로고
    • Technical Report, CMU-CS-95-103, Department of Computer Science, Carnegie Mellon University
    • G. Gordon, "Stable function approximation in dynamic programming," Technical Report, CMU-CS-95-103, Department of Computer Science, Carnegie Mellon University, 1995.
    • (1995) Stable Function Approximation in Dynamic Programming
    • Gordon, G.1
  • 53
    • 34547313657 scopus 로고    scopus 로고
    • Graph Laplacians and their convergence on random neighborhood graphs
    • M. Hein, J. Audibert, and U. von Luxburg, "Graph Laplacians and their convergence on random neighborhood graphs," Journal of Machine Learning Research, vol.8, pp. 1325-1368, 2007.
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 1325-1368
    • Hein, M.1    Audibert, J.2    Von Luxburg, U.3
  • 62
    • 0032131147 scopus 로고    scopus 로고
    • A fast and high quality multilevel scheme for partitioning irregular graphs
    • G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM Journal of Scientific Computing, vol.20, no.1, pp. 359-392, 1999.
    • (1999) SIAM Journal of Scientific Computing , vol.20 , Issue.1 , pp. 359-392
    • Karypis, G.1    Kumar, V.2
  • 63
    • 33846689581 scopus 로고    scopus 로고
    • Block diagonalization of Laplacian matrices of symmetric graphs using group theory
    • A. Kaveh and A. Nikbakht, "Block diagonalization of Laplacian matrices of symmetric graphs using group theory," International Journal for Numerical Methods in Engineering, vol.69, pp. 908-947, 2007.
    • (2007) International Journal for Numerical Methods in Engineering , vol.69 , pp. 908-947
    • Kaveh, A.1    Nikbakht, A.2
  • 68
    • 26444490324 scopus 로고    scopus 로고
    • PhD thesis, Yale University, Department of Mathematics and Applied Mathematics
    • S. Lafon, "Diffusion maps and geometric harmonics," PhD thesis, Yale University, Department of Mathematics and Applied Mathematics, 2004.
    • (2004) Diffusion Maps and Geometric Harmonics
    • Lafon, S.1
  • 70
    • 33750184660 scopus 로고    scopus 로고
    • Updating the stationary vector of an irreducible Markov chain with an eye on google's pagerank
    • A. Langville and C. Meyer, "Updating the stationary vector of an irreducible Markov chain with an eye on google's pagerank," SIAM Journal on Matrix Analysis, vol.27, pp. 968-987, 2005.
    • (2005) SIAM Journal on Matrix Analysis , vol.27 , pp. 968-987
    • Langville, A.1    Meyer, C.2
  • 76
    • 33749267463 scopus 로고    scopus 로고
    • Fast direct policy evaluation using multiscale analysis of Markov diffusion processes
    • New York, NY, USA: ACM Press
    • M. Maggioni and S. Mahadevan, "Fast direct policy evaluation using multiscale analysis of Markov diffusion processes," in Proceedings of the 23rd International Conference on Machine Learning, pp. 601-608, New York, NY, USA: ACM Press, 2006.
    • (2006) Proceedings of the 23rd International Conference on Machine Learning , pp. 601-608
    • Maggioni, M.1    Mahadevan, S.2
  • 81
    • 0026880130 scopus 로고
    • Automatic programming of behavior-based robots using reinforcement learning
    • [81] S. Mahadevan and J. Connell, "Automatic programming of behaviorbased robots using reinforcement learning," Artificial Intelligence, vol.55, pp. 311-365, 1992. Appeared originally as IBM TR RC16359, December 1990. (Pubitemid 23565211)
    • (1992) Artificial Intelligence , vol.55 , Issue.2-3 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 83
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
    • S. Mahadevan and M. Maggioni, "Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes," Journal of Machine Learning Research, vol.8, pp. 2169-2231, 2007.
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 86
    • 0024700097 scopus 로고
    • A theory for multiresolution signal decomposition: The wavelet representation
    • S. Mallat, "A theory for multiresolution signal decomposition: The wavelet representation," IEEE Transactions on Pattern Analysis of Machanical Intelligence, vol.11, no.7, pp. 674-693, 1989.
    • (1989) IEEE Transactions on Pattern Analysis of Machanical Intelligence , vol.11 , Issue.7 , pp. 674-693
    • Mallat, S.1
  • 88
    • 57749103516 scopus 로고    scopus 로고
    • Computing isotypic projections with the lanczos iteration
    • D. Malsen, M. Orrison, and D. Rockmore, "Computing isotypic projections with the lanczos iteration," SIAM, vol.2, nos. 60/61, pp. 601-628, 2003.
    • (2003) SIAM , vol.2 , Issue.60-61 , pp. 601-628
    • Malsen, D.1    Orrison, M.2    Rockmore, D.3
  • 91
    • 84898985184 scopus 로고    scopus 로고
    • Learning segmentation by random walks
    • M. Meila and J. Shi, "Learning segmentation by random walks," NIPS, 2001.
    • (2001) NIPS
    • Meila, M.1    Shi, J.2
  • 92
    • 0043256056 scopus 로고
    • Sensitivity of the stationary distribution of a Markov chain
    • C. Meyer, "Sensitivity of the stationary distribution of a Markov chain," SIAM Journal of Matrix Analysis and Applications, vol.15, no.3, pp. 715-728, 1994.
    • (1994) SIAM Journal of Matrix Analysis and Applications , vol.15 , Issue.3 , pp. 715-728
    • Meyer, C.1
  • 93
    • 0008813538 scopus 로고    scopus 로고
    • Barycentric interpolators for continuous space and time reinforcement learning
    • MIT Press
    • A. Moore, "Barycentric interpolators for continuous space and time reinforcement learning," in Advances in Neural Information Processing Systems, MIT Press, 1998.
    • (1998) Advances in Neural Information Processing Systems
    • Moore, A.1
  • 95
  • 96
    • 84899013108 scopus 로고    scopus 로고
    • On spectral clustering: Analysis and an algorithm
    • A. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm," NIPS, 2002.
    • (2002) NIPS
    • Ng, A.1    Jordan, M.2    Weiss, Y.3
  • 98
    • 30844447280 scopus 로고    scopus 로고
    • Technical Report TR-2001-2030, University of Chicago, Computer Science Deparment, November
    • P. Niyogi and M. Belkin, "Semi-supervised learning on Riemannian manifolds," Technical Report TR-2001-2030, University of Chicago, Computer Science Deparment, November 2001.
    • (2001) Semi-supervised Learning on Riemannian Manifolds
    • Niyogi, P.1    Belkin, M.2
  • 100
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Machine Learning, vol.49, nos. 2-3, pp. 161-178, 2002.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 116
    • 0034704222 scopus 로고    scopus 로고
    • Nonlinear dimensionality reduction by locally linear embedding
    • DOI 10.1126/science.290.5500.2323
    • [116] S. Roweis and L. Saul, "Nonlinear dimensionality reduction by local linear embedding," Science, vol.290, pp. 2323-2326, 2000. (Pubitemid 32041578)
    • (2000) Science , vol.290 , Issue.5500 , pp. 2323-2326
    • Roweis, S.T.1    Saul, L.K.2
  • 119
    • 32844474095 scopus 로고    scopus 로고
    • Reinforcement learning with factored states and actions
    • B. Sallans and G. Hinton, "Reinforcement learning with factored states and actions," Journal of Machine Learning Research, vol.5, pp. 1063-1088, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1063-1088
    • Sallans, B.1    Hinton, G.2
  • 121
    • 0001296683 scopus 로고
    • Perturbation theory and finite Markov chains
    • P. Schweitzer, "Perturbation theory and finite Markov chains," Journal of Applied Probability, vol.5, no.2, pp. 410-413, 1968.
    • (1968) Journal of Applied Probability , vol.5 , Issue.2 , pp. 410-413
    • Schweitzer, P.1
  • 124
    • 26944499565 scopus 로고    scopus 로고
    • Approximate policy construction using decision diagrams
    • R. St-Aubin, J. Hoey, and C. Boutilier, "Approximate policy construction using decision diagrams," NIPS, 2000.
    • (2000) NIPS
    • St-Aubin, R.1    Hoey, J.2    Boutilier, C.3
  • 130
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol.3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 131
    • 0034704229 scopus 로고    scopus 로고
    • A global geometric framework for nonlinear dimensionality reduction
    • DOI 10.1126/science.290.5500.2319
    • [131] J. Tenenbaum, V. de Silva, and J. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol.290, pp. 2319-2323, 2000. (Pubitemid 32041577)
    • (2000) Science , vol.290 , Issue.5500 , pp. 2319-2323
    • Tenenbaum, J.B.1    De, S.2    Langford, J.C.3
  • 132
    • 0000985504 scopus 로고
    • Td-gammon, a self-teaching backgammon program, achieves master-level play
    • G. Tesauro, "Td-gammon, a self-teaching backgammon program, achieves master-level play," Neural Computation, vol.6, pp. 215-219, 1994.
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 135
    • 0036782663 scopus 로고    scopus 로고
    • Many-layered learning
    • P. Utgoff and D. Stracuzzi, "Many-layered learning," Neural Computation, vol.14, pp. 2497-2529, 2002.
    • (2002) Neural Computation , vol.14 , pp. 2497-2529
    • Utgoff, P.1    Stracuzzi, D.2
  • 140
  • 141
    • 0012841228 scopus 로고    scopus 로고
    • Successive matrix squaring algorithm for computing the Drazin inverse
    • Y. Wei, "Successive matrix squaring algorithm for computing the Drazin inverse," Applied Mathematics and Computation, vol.108, pp. 67-75, 2000.
    • (2000) Applied Mathematics and Computation , vol.108 , pp. 67-75
    • Wei, Y.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.