메뉴 건너뛰기




Volumn 8, Issue , 2007, Pages 2169-2231

Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes

Author keywords

Manifold learning; Markov decision processes; Reinforcement learning; Spectral graph theory; Value function approximation

Indexed keywords

LEAST SQUARES POLICY ITERATION (LSPI); MANIFOLD LEARNING; MARKOV DECISION PROCESSES; PROTO VALUE FUNCTIONS (PVFS); SPECTRAL GRAPH THEORY; VALUE FUNCTION APPROXIMATION;

EID: 35748957806     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (282)

References (111)
  • 2
    • 0001119106 scopus 로고
    • On representations of problems of reasoning about actions
    • Donald Michie, editor, Elsevier/North-Holland
    • S. Amarel. On representations of problems of reasoning about actions. In Donald Michie, editor, Machine Intelligence 3, volume 3, pages 131-171. Elsevier/North-Holland, 1968.
    • (1968) Machine Intelligence 3 , vol.3 , pp. 131-171
    • Amarel, S.1
  • 6
    • 0037288370 scopus 로고    scopus 로고
    • Recent advances in hierarchical reinforcement learning
    • A. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Systems Journal, 13:41-77, 2003.
    • (2003) Discrete Event Systems Journal , vol.13 , pp. 41-77
    • Barto, A.1    Mahadevan, S.2
  • 8
    • 3142725535 scopus 로고    scopus 로고
    • Semi-supervised learning on Riemannian manifolds
    • M. Belkin and P. Niyogi. Semi-supervised learning on Riemannian manifolds. Machine Learning, 56:209-239, 2004.
    • (2004) Machine Learning , vol.56 , pp. 209-239
    • Belkin, M.1    Niyogi, P.2
  • 9
    • 33750729556 scopus 로고    scopus 로고
    • M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:23992434, 2006.
    • M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:23992434, 2006.
  • 11
    • 24944499193 scopus 로고    scopus 로고
    • On the optimality of spectral compression of mesh data
    • M. Ben-Chen and C. Gotsman. On the optimality of spectral compression of mesh data. ACM Transactions on Graphics, 24(1), 2005.
    • (2005) ACM Transactions on Graphics , vol.24 , Issue.1
    • Ben-Chen, M.1    Gotsman, C.2
  • 14
    • 0041386088 scopus 로고    scopus 로고
    • A geometric interpretation of the Metropolis-Hasting algorithm
    • L. Billera and P. Diaconis. A geometric interpretation of the Metropolis-Hasting algorithm. Statistical Science, 16:335-339, 2001.
    • (2001) Statistical Science , vol.16 , pp. 335-339
    • Billera, L.1    Diaconis, P.2
  • 15
    • 0346942368 scopus 로고    scopus 로고
    • Decision-theoretic planning: Structural assumptions and computational leverage
    • C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1-94, 1999.
    • (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
    • Boutilier, C.1    Dean, T.2    Hanks, S.3
  • 17
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. Bradtke and A. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57, 1996.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.1    Barto, A.2
  • 20
    • 21944448955 scopus 로고    scopus 로고
    • The Q-spectrum and spanning trees of tensor products of bipartite graphs
    • T. Chow. The Q-spectrum and spanning trees of tensor products of bipartite graphs. Proceedings of the American Mathematical Society, 125(11):3155-3161, 1997.
    • (1997) Proceedings of the American Mathematical Society , vol.125 , Issue.11 , pp. 3155-3161
    • Chow, T.1
  • 22
    • 17444366585 scopus 로고    scopus 로고
    • Laplacians and the Cheeger inequality for directed graphs
    • April
    • F Chung. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9 (1):1-19, April 2005.
    • (2005) Annals of Combinatorics , vol.9 , Issue.1 , pp. 1-19
    • Chung, F.1
  • 23
    • 84987472544 scopus 로고
    • Laplacian and vibrational spectra for homogeneous graphs
    • F. Chung and S. Sternberg. Laplacian and vibrational spectra for homogeneous graphs. Journal of Graph Theory, 16(6):605-627, 1992.
    • (1992) Journal of Graph Theory , vol.16 , Issue.6 , pp. 605-627
    • Chung, F.1    Sternberg, S.2
  • 25
  • 26
    • 19644366699 scopus 로고    scopus 로고
    • Geometric diffusions as a tool for harmonic analysis and structure definition of data, part ii: Multiscale methods
    • May
    • R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, Frederick Warner, and Steven Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data, part ii: Multiscale methods. Proceedings of the National Academy of Science, 102(21):7432-7437, May 2005b.
    • (2005) Proceedings of the National Academy of Science , vol.102 , Issue.21 , pp. 7432-7437
    • Coifman, R.1    Lafon, S.2    Lee, A.3    Maggioni, M.4    Nadler, B.5    Warner, F.6    Zucker, S.7
  • 27
    • 25844521242 scopus 로고    scopus 로고
    • Geometric diffusions for the analysis of data from sensor networks
    • October
    • R. Coifman, M. Maggioni, S. Zucker, and I. Kevrekidis. Geometric diffusions for the analysis of data from sensor networks. Curr Opin Neurobiol, 15(5):576-84, October 2005c.
    • (2005) Curr Opin Neurobiol , vol.15 , Issue.5 , pp. 576-584
    • Coifman, R.1    Maggioni, M.2    Zucker, S.3    Kevrekidis, I.4
  • 30
    • 0001158047 scopus 로고
    • Improving generalisation for temporal difference learning: The successor representation
    • P. Dayan. Improving generalisation for temporal difference learning: The successor representation. Neural Computation, 5:613-624, 1993.
    • (1993) Neural Computation , vol.5 , pp. 613-624
    • Dayan, P.1
  • 34
    • 29244453931 scopus 로고    scopus 로고
    • On the Nyström method for approximating a Gram matrix for improved kernel-based learning
    • P Drineas and M W Mahoney. On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J. Machine Learning Research, 6:2153-2175, 2005.
    • (2005) J. Machine Learning Research , vol.6 , pp. 2153-2175
    • Drineas, P.1    Mahoney, M.W.2
  • 35
    • 0043247546 scopus 로고    scopus 로고
    • Accelerating reinforcement learning by composing solutions of automatically identified subtasks
    • C. Drummond. Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of AIResearch, 16:59-104, 2002.
    • (2002) Journal of AIResearch , vol.16 , pp. 59-104
    • Drummond, C.1
  • 38
    • 0036832959 scopus 로고    scopus 로고
    • Structure in the space of value functions
    • D. Foster and P. Dayan. Structure in the space of value functions. Machine Learning, 49:325-346, 2002.
    • (2002) Machine Learning , vol.49 , pp. 325-346
    • Foster, D.1    Dayan, P.2
  • 40
    • 0038595393 scopus 로고
    • Stable function approximation in dynamic programming
    • Technical Report CMU-CS95-103, Department of Computer Science, Carnegie Mellon University
    • G. Gordon. Stable function approximation in dynamic programming. Technical Report CMU-CS95-103, Department of Computer Science, Carnegie Mellon University, 1995.
    • (1995)
    • Gordon, G.1
  • 49
    • 35748975552 scopus 로고    scopus 로고
    • Universal parametrizations via eigenfunctions of the Laplacian and heat kernels
    • Submitted
    • P. Jones, M. Maggioni, and R. Schul. Universal parametrizations via eigenfunctions of the Laplacian and heat kernels. Submitted, 2007.
    • (2007)
    • Jones, P.1    Maggioni, M.2    Schul, R.3
  • 51
    • 0032131147 scopus 로고    scopus 로고
    • A fast and high quality multilevel scheme for partitioning irregular graphs
    • G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal of Scientific Computing, 20(1):359-392, 1999.
    • (1999) SIAM Journal of Scientific Computing , vol.20 , Issue.1 , pp. 359-392
    • Karypis, G.1    Kumar, V.2
  • 59
    • 26444490324 scopus 로고    scopus 로고
    • PhD thesis, Yale University, Dept of Mathematics & Applied Mathematics
    • S. Lafon. Diffusion Maps and Geometric Harmonics. PhD thesis, Yale University, Dept of Mathematics & Applied Mathematics, 2004.
    • (2004) Diffusion Maps and Geometric Harmonics
    • Lafon, S.1
  • 64
    • 33749267463 scopus 로고    scopus 로고
    • Fast direct policy evaluation using multiscale analysis of Markov Diffusion Processes
    • New York, NY, USA, ACM Press
    • M. Maggioni and S. Mahadevan. Fast direct policy evaluation using multiscale analysis of Markov Diffusion Processes. In Proceedings of the 23rd international conference on Machine learning, pages 601-608, New York, NY, USA, 2006. ACM Press.
    • (2006) Proceedings of the 23rd international conference on Machine learning , pp. 601-608
    • Maggioni, M.1    Mahadevan, S.2
  • 70
    • 0024700097 scopus 로고
    • A theory for multiresolution signal decomposition: The wavelet representation
    • ISSN 0162-8828
    • S. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell., 11(7):674-693, 1989. ISSN 0162-8828.
    • (1989) IEEE Trans. Pattern Anal. Mach. Intell , vol.11 , Issue.7 , pp. 674-693
    • Mallat, S.1
  • 73
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • N. Menache, N. Shimkin, and S. Mannor. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134:215-238, 2005.
    • (2005) Annals of Operations Research , vol.134 , pp. 215-238
    • Menache, N.1    Shimkin, N.2    Mannor, S.3
  • 76
    • 0037288398 scopus 로고    scopus 로고
    • Least-squares policy evaluation algorithms with linear function approximation
    • A. Nedic and D. Bertsekas. Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Systems Journal, 13, 2003.
    • (2003) Discrete Event Systems Journal , vol.13
    • Nedic, A.1    Bertsekas, D.2
  • 77
    • 84899013108 scopus 로고    scopus 로고
    • On spectral clustering: Analysis and an algorithm
    • A. Ng, M. Jordan, and Y Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2002.
    • (2002) NIPS
    • Ng, A.1    Jordan, M.2    Weiss, Y.3
  • 78
    • 34547995167 scopus 로고    scopus 로고
    • Regression and regularization on large graphs
    • Technical report, University of Chicago, Nov
    • P. Niyogi, I. Matveeva, and M. Belkin. Regression and regularization on large graphs. Technical report, University of Chicago, Nov. 2003.
    • (2003)
    • Niyogi, P.1    Matveeva, I.2    Belkin, M.3
  • 79
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2-3): 161178, 2002.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161178
    • Ormoneit, D.1    Sen, S.2
  • 91
    • 0034704222 scopus 로고    scopus 로고
    • Nonlinear dimensionality reduction by local linear embedding
    • S. Roweis and L. Saul. Nonlinear dimensionality reduction by local linear embedding. Science, 290:2323-2326, 2000.
    • (2000) Science , vol.290 , pp. 2323-2326
    • Roweis, S.1    Saul, L.2
  • 92
    • 32844474095 scopus 로고    scopus 로고
    • Reinforcement learning with factored states and actions
    • B. Sallans and G. Hinton. Reinforcement learning with factored states and actions. Journal of Machine Learning Research, 5:1063-1088, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1063-1088
    • Sallans, B.1    Hinton, G.2
  • 96
    • 0034244751 scopus 로고    scopus 로고
    • Normalized cuts and image segmentation
    • J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE PAMI, 22:888-905, 2000.
    • (2000) IEEE PAMI , vol.22 , pp. 888-905
    • Shi, J.1    Malik, J.2
  • 102
    • 35748936335 scopus 로고    scopus 로고
    • A general framework for adaptive regularization based on diffusion processes on graphs
    • Technical Report YALE/DCS/TR1365, Yale Univ, July
    • A. Szlam, M. Maggioni, and R. Coifman. A general framework for adaptive regularization based on diffusion processes on graphs. Technical Report YALE/DCS/TR1365, Yale Univ, July 2006.
    • (2006)
    • Szlam, A.1    Maggioni, M.2    Coifman, R.3
  • 103
    • 0034704229 scopus 로고    scopus 로고
    • A global geometric framework for nonlinear dimensionality reduction
    • J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319-2323, 2000.
    • (2000) Science , vol.290 , pp. 2319-2323
    • Tenenbaum, J.1    de Silva, V.2    Langford, J.3
  • 104
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • G. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257-278, 1992.
    • (1992) Machine Learning , vol.8 , pp. 257-278
    • Tesauro, G.1
  • 106
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42:674-690, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.1    Van Roy, B.2
  • 110


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.