메뉴 건너뛰기




Volumn 14, Issue , 2013, Pages 2067-2118

Construction of approximation spaces for reinforcement learning

Author keywords

Diffusion distance; Least squares policy iteration; Proto value functions; Reinforcement learning; Slow feature analysis; Visual robot navigation

Indexed keywords

DIFFUSION DISTANCE; POLICY ITERATION; PROTO-VALUE FUNCTIONS; ROBOT NAVIGATION; SLOW FEATURE ANALYSIS;

EID: 84883256795     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (18)

References (75)
  • 2
    • 0042378381 scopus 로고    scopus 로고
    • Laplacian eigenmaps for dimensionality reduction and data representation
    • M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373-1396, 2003.
    • (2003) Neural Computation , vol.15 , Issue.6 , pp. 1373-1396
    • Belkin, M.1    Niyogi, P.2
  • 3
    • 27244444336 scopus 로고    scopus 로고
    • Slow feature analysis yields a rich repertoire of complex cell properties
    • P. Berkes and L. Wiskott. Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision, 5:579-602, 2005.
    • (2005) Journal of Vision , vol.5 , pp. 579-602
    • Berkes, P.1    Wiskott, L.2
  • 6
    • 84865242246 scopus 로고    scopus 로고
    • Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis
    • W. Böhmer, S. Grünewälder, H. Nickisch, and K. Obermayer. Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis. Machine Learning, 89 (1-2):67-86, 2012.
    • (2012) Machine Learning , vol.89 , Issue.1-2 , pp. 67-86
    • Böhmer, W.1    Grünewälder, S.2    Nickisch, H.3    Obermayer, K.4
  • 8
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: safely approximating the value function. In Advances in Neural Information Processing Systems, pages 369-376, 1995.
    • (1995) Advances in Neural Information Processing Systems , pp. 369-376
    • Boyan, J.A.1    Moore, A.W.2
  • 10
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • 3
    • S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22 (1/2/3):33-57, 1996.
    • (1996) Machine Learning , vol.22 , Issue.1-2 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 12
    • 0038891993 scopus 로고    scopus 로고
    • Sparse on-line Gaussian processes
    • L. Csató and M. Opper. Sparse on-line Gaussian processes. Neural Computation, 14(3):641-668, 2002.
    • (2002) Neural Computation , vol.14 , Issue.3 , pp. 641-668
    • Csató, L.1    Opper, M.2
  • 14
    • 0345414201 scopus 로고    scopus 로고
    • Real-time simultaneous localization and mapping with a single camera
    • A. J. Davison. Real-time simultaneous localization and mapping with a single camera. In IEEE International Conference on Computer Vision, volume 2, page 1403, 2003.
    • (2003) IEEE International Conference on Computer Vision , vol.2 , pp. 1403
    • Davison, A.J.1
  • 15
    • 0348090400 scopus 로고    scopus 로고
    • The linear programming approach to approximate dynamic programming
    • D. P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850-865, 2003.
    • (2003) Operations Research , vol.51 , Issue.6 , pp. 850-865
    • De Farias, D.P.1    Van Roy, B.2
  • 16
    • 1942421151 scopus 로고    scopus 로고
    • Bayes meets Bellman: The Gaussian process approach to temporal difference learning
    • Y. Engel, S. Mannor, and R. Meir. Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In International Conference on Machine Learning, pages 154-161, 2003.
    • (2003) International Conference on Machine Learning , pp. 154-161
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 17
    • 58349096666 scopus 로고    scopus 로고
    • Proto-transfer learning in Markov decision processes using spectral methods
    • K. Ferguson and S. Mahadevan. Proto-transfer learning in Markov decision processes using spectral methods. In ICML Workshop on Transfer Learning, 2006.
    • (2006) ICML Workshop on Transfer Learning
    • Ferguson, K.1    Mahadevan, S.2
  • 19
    • 0000188120 scopus 로고
    • Learning invariance from transformation sequences
    • P. Földiák. Learning invariance from transformation sequences. Neural Computation, 3(2):194-200, 1991.
    • (1991) Neural Computation , vol.3 , Issue.2 , pp. 194-200
    • Földiák, P.1
  • 20
    • 34548412214 scopus 로고    scopus 로고
    • Slowness and sparseness leads to place, head-direction, and spatial-view cells
    • M. Franzius, H. Sprekeler, and L. Wiskott. Slowness and sparseness leads to place, head-direction, and spatial-view cells. PLoS Computational Biology, 3(8):e166, 2007.
    • (2007) PLoS Computational Biology , vol.3 , Issue.8
    • Franzius, M.1    Sprekeler, H.2    Wiskott, L.3
  • 21
    • 79958777413 scopus 로고    scopus 로고
    • The optimal unbiased value estimator and its relation to LSTD, TD and MC
    • S. Grünewälder and K. Obermayer. The optimal unbiased value estimator and its relation to LSTD, TD and MC. Machine Learning, 83:289-330, 2011.
    • (2011) Machine Learning , vol.83 , pp. 289-330
    • Grünewälder, S.1    Obermayer, K.2
  • 24
    • 14844352327 scopus 로고    scopus 로고
    • Linear program approximations for factored continuous-state Markov decision processes
    • M. Hauskrecht and B. Kveton. Linear program approximations for factored continuous-state Markov decision processes. In Advances in Neural Information Processing Systems, pages 895-902, 2003.
    • (2003) Advances in Neural Information Processing Systems , pp. 895-902
    • Hauskrecht, M.1    Kveton, B.2
  • 26
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • G. E. Hinton and S. Osindero. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006.
    • (2006) Neural Computation , vol.18 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2
  • 29
    • 34247562373 scopus 로고    scopus 로고
    • On the law of large numbers for (geometrically) ergodic Markov chains
    • S. T. Jensen and A. Rahbek. On the law of large numbers for (geometrically) ergodic Markov chains. Economic Theory, 23:761-766, 2007.
    • (2007) Economic Theory , vol.23 , pp. 761-766
    • Jensen, S.T.1    Rahbek, A.2
  • 32
  • 36
    • 84867687400 scopus 로고    scopus 로고
    • Incremental slow feature analysis: Adaptive lowcomplexity slow feature updating from high-dimensional input streams
    • V. R. Kompella, M. D. Luciw, and J. Schmidhuber. Incremental slow feature analysis: adaptive lowcomplexity slow feature updating from high-dimensional input streams. Neural Computation, 24(11):2994-3024, 2012.
    • (2012) Neural Computation , vol.24 , Issue.11 , pp. 2994-3024
    • Kompella, V.R.1    Luciw, M.D.2    Schmidhuber, J.3
  • 40
    • 78049417739 scopus 로고    scopus 로고
    • Reinforcement learning on slow features of highdimensional input streams
    • R. Legenstein, N. Wilbert, and L. Wiskott. Reinforcement learning on slow features of highdimensional input streams. PLoS Computational Biology, 6(8):e1000894, 2010.
    • (2010) PLoS Computational Biology , vol.6 , Issue.8
    • Legenstein, R.1    Wilbert, N.2    Wiskott, L.3
  • 43
    • 84867667632 scopus 로고    scopus 로고
    • Low complexity proto-value function learning from sensory observations with incremental slow feature analysis
    • Springer-Verlag
    • M. Luciw and J. Schmidhuber. Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In International Conference on Artificial Neural Networks and Machine Learning, volume III, pages 279-287. Springer-Verlag, 2012.
    • (2012) International Conference on Artificial Neural Networks and Machine Learning , vol.3 , pp. 279-287
    • Luciw, M.1    Schmidhuber, J.2
  • 45
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A Laplacian framework for learning representations and control in Markov decision processes
    • S. Mahadevan and M. Maggioni. Proto-value functions: a Laplacian framework for learning representations and control in Markov decision processes. Journal of Machine Learning Research, 8:2169-2231, 2007.
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 46
  • 47
    • 55149090494 scopus 로고    scopus 로고
    • Transfer in variable-reward hierarchical reinforcement learning
    • N. Mehta, S. Natarajan, P. Tadepalli, and A. Fern. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 73:289-312, 2008.
    • (2008) Machine Learning , vol.73 , pp. 289-312
    • Mehta, N.1    Natarajan, S.2    Tadepalli, P.3    Fern, A.4
  • 50
    • 0000325341 scopus 로고
    • On lines and planes of closest fit to systems of points in space
    • K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2(11):559-572, 1901.
    • (1901) Philosophical Magazine Series 6 , vol.2 , Issue.11 , pp. 559-572
    • Pearson, K.1
  • 51
    • 84880899807 scopus 로고    scopus 로고
    • An analysis of Laplacian methods for value function approximation in MDPs
    • M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In International Joint Conference on Artificial Intelligence, pages 2574-2579, 2007.
    • (2007) International Joint Conference on Artificial Intelligence , pp. 2574-2579
    • Petrik, M.1
  • 52
    • 80555145304 scopus 로고    scopus 로고
    • Robust approximate bilinear programming for value function approximation
    • M. Petrik and S. Zilberstein. Robust approximate bilinear programming for value function approximation. Journal of Machine Learning Research, 12:3027-3063, 2011.
    • (2011) Journal of Machine Learning Research , vol.12 , pp. 3027-3063
    • Petrik, M.1    Zilberstein, S.2
  • 56
    • 0347243182 scopus 로고    scopus 로고
    • Nonlinear component analysis as a kernel eigenvalue problem
    • B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299-1319, 1998.
    • (1998) Neural Computation , vol.10 , Issue.5 , pp. 1299-1319
    • Schölkopf, B.1    Smola, A.2    Müller, K.-R.3
  • 57
    • 0000081872 scopus 로고
    • Estimating uncertain spatial relationships in robotics
    • Springer-Verlag
    • R. Smith, M. Slef, and P. Cheeseman. Estimating uncertain spatial relationships in robotics. In Autonomous Robot Vehicles. Springer-Verlag, 1990.
    • (1990) Autonomous Robot Vehicles
    • Smith, R.1    Slef, M.2    Cheeseman, P.3
  • 59
    • 84861659924 scopus 로고    scopus 로고
    • Multi-task reinforcement learning: Shaping and feature selection
    • M. Snel and S. Whiteson. Multi-task reinforcement learning: Shaping and feature selection. In European Workshop on Reinforcement Learning, pages 237-248, 2011.
    • (2011) European Workshop on Reinforcement Learning , pp. 237-248
    • Snel, M.1    Whiteson, S.2
  • 61
    • 84856370602 scopus 로고    scopus 로고
    • On the relationship of slow feature analysis and Laplacian eigenmaps
    • H. Sprekeler. On the relationship of slow feature analysis and Laplacian eigenmaps. Neural Computation, 23(12):3287-3302, 2011.
    • (2011) Neural Computation , vol.23 , Issue.12 , pp. 3287-3302
    • Sprekeler, H.1
  • 63
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • R. S. Sutton. Generalization in reinforcement learning: successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems, pages 1038-1044, 1996.
    • (1996) Advances in Neural Information Processing Systems , pp. 1038-1044
    • Sutton, R.S.1
  • 65
    • 68949157375 scopus 로고    scopus 로고
    • Transfer learning for reinforcement learning domains: A survey
    • M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10:1633-1685, 2009.
    • (2009) Journal of Machine Learning Research , vol.10 , pp. 1633-1685
    • Taylor, M.E.1    Stone, P.2
  • 66
    • 0034704229 scopus 로고    scopus 로고
    • A global framework for nonlinear dimensionality reduction
    • J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global framework for nonlinear dimensionality reduction. Science, 290:2319-2323, 2000.
    • (2000) Science , vol.290 , pp. 2319-2323
    • Tenenbaum, J.B.1    De Silva, V.2    Langford, J.C.3
  • 68
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 71
    • 84883212257 scopus 로고    scopus 로고
    • Predictively defined representations of state
    • M. Wiering and M. van Otterlo, editors, Springer-Verlag Berlin Heidelberg
    • D. Wingate. Predictively defined representations of state. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State-of-the-Art, pages 415-439. Springer-Verlag Berlin Heidelberg, 2012.
    • (2012) Reinforcement Learning: State-of-the-art , pp. 415-439
    • Wingate, D.1
  • 72
    • 60349110114 scopus 로고    scopus 로고
    • On discovery and learning of models with predictive representations of state for agents with continuous actions and observations
    • D. Wingate and S. P. Singh. On discovery and learning of models with predictive representations of state for agents with continuous actions and observations. In International Joint Conference on Autonomous Agents and Multiagent Systems, pages 1128-1135, 2007.
    • (2007) International Joint Conference on Autonomous Agents and Multiagent Systems , pp. 1128-1135
    • Wingate, D.1    Singh, S.P.2
  • 73
    • 0041324871 scopus 로고    scopus 로고
    • Slow feature analysis: A theoretical analysis of optimal free responses
    • L. Wiskott. Slow feature analysis: a theoretical analysis of optimal free responses. Neural Computation, 15(9):2147-2177, 2003.
    • (2003) Neural Computation , vol.15 , Issue.9 , pp. 2147-2177
    • Wiskott, L.1
  • 74
    • 0036546660 scopus 로고    scopus 로고
    • Slow feature analysis: Unsupervised learning of invariances
    • L. Wiskott and T. Sejnowski. Slow feature analysis: unsupervised learning of invariances. Neural Computation, 14(4):715-770, 2002.
    • (2002) Neural Computation , vol.14 , Issue.4 , pp. 715-770
    • Wiskott, L.1    Sejnowski, T.2
  • 75
    • 33750299109 scopus 로고    scopus 로고
    • A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning
    • Springer Berlin/Heidelberg
    • X. Xu. A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning. In Advances in Natural Computation, volume 4221 of Lecture Notes in Computer Science, pages 47-56. Springer Berlin/Heidelberg, 2006.
    • (2006) Advances in Natural Computation, Volume 4221 of Lecture Notes in Computer Science , pp. 47-56
    • Xu, X.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.