메뉴 건너뛰기




Volumn 7, Issue 5, 2013, Pages 746-758

Feature search in the grassmanian in online reinforcement learning

Author keywords

Feature adaptation; Grassman manifold; online learning; residual gradient scheme; stochastic approximation; temporal difference learning

Indexed keywords

FEATURE ADAPTATION; GRASSMAN MANIFOLD; ONLINE LEARNING; RESIDUAL GRADIENT; STOCHASTIC APPROXIMATIONS; TEMPORAL DIFFERENCE LEARNING;

EID: 84884515550     PISSN: 19324553     EISSN: None     Source Type: Journal    
DOI: 10.1109/JSTSP.2013.2255022     Document Type: Article
Times cited : (9)

References (35)
  • 2
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • Morgan Kaufmann
    • L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. 12th Int. Conf. Mach. Learn., 1995, pp. 30-37, Morgan Kaufmann.
    • (1995) Proc. 12th Int. Conf. Mach. Learn , pp. 30-37
    • Baird, L.C.1
  • 3
    • 79952389882 scopus 로고    scopus 로고
    • Online identification and tracking of subspaces from highly incomplete information
    • L. Balzano, R. Nowak, and B. Recht, "Online identification and tracking of subspaces from highly incomplete information," in Proc. 48th Annu. Allerton Conf., 2010, pp. 704-711.
    • (2010) Proc. 48th Annu. Allerton Conf , pp. 704-711
    • Balzano, L.1    Nowak, R.2    Recht, B.3
  • 4
    • 0034445353 scopus 로고    scopus 로고
    • A learning algorithm for Markov decision processes with adaptive state aggregation
    • J. S. Baras andV. S. Borkar, "Alearning algorithmfor Markov decision processes with adaptive state aggregation," in Proc. 39th IEEE Conf. Decision Control, Dec. 12-15, 2000, Sydney, Australia, 2000, vol. 4, pp. 3351-3356. (Pubitemid 32528175)
    • (2000) Proceedings of the IEEE Conference on Decision and Control , vol.4 , pp. 3351-3356
    • Baras, J.S.1    Borkar, V.S.2
  • 5
    • 47649102775 scopus 로고    scopus 로고
    • A note on linear function approximation using random projections
    • K. Barman and V. S. Borkar, "A note on linear function approximation using random projections," Syst. Control Lett., vol. 57, no. 9, pp. 784-786, 2008.
    • (2008) Syst. Control Lett , vol.57 , Issue.9 , pp. 784-786
    • Barman, K.1    Borkar, V.S.2
  • 10
    • 61849106433 scopus 로고    scopus 로고
    • Projected equation methods for approximate solution of large linear systems
    • D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Appl. Math., vol. 227, pp. 27-50, 2009.
    • (2009) J. Comput. Appl. Math , vol.227 , pp. 27-50
    • Bertsekas, D.P.1    Yu, H.2
  • 11
    • 0346902105 scopus 로고    scopus 로고
    • Two time scale simultaneous perturbation stochastic approximation using deterministic perturbation sequences
    • S. Bhatnagar, M. C. Fu, S. I. Marcus, and I.-J. Wang, "Two time scale simultaneous perturbation stochastic approximation using deterministic perturbation sequences," ACM Trans. Model. Comput. Simulat., vol. 13, no. 2, pp. 180-209, 2003.
    • (2003) ACM Trans. Model. Comput. Simulat , vol.13 , Issue.2 , pp. 180-209
    • Bhatnagar, S.1    Fu, M.C.2    Marcus, S.I.3    Wang, I.-J.4
  • 13
    • 84884530005 scopus 로고    scopus 로고
    • Chapter 23 of Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, F. Lewis and D. Liu, Eds. Piscataway, NJ, USA IEEE Press Computational Intelligence Series
    • S. Bhatnagar, V. S. Borkar, and L. A. Prashanth, "Adaptive feature pursuit: Online adaptation of features in reinforcement learning," in Chapter 23 of Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, F. Lewis and D. Liu, Eds. Piscataway, NJ, USA: IEEE Press Computational Intelligence Series, 2012.
    • (2012) Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
    • Bhatnagar, S.1    Borkar, V.S.2    Prashanth, L.A.3
  • 15
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two timescales
    • V. S. Borkar, "Stochastic approximation with two timescales," Syst. Control Lett., vol. 29, pp. 291-294, 1997.
    • (1997) Syst. Control Lett , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 16
    • 0032075427 scopus 로고    scopus 로고
    • Asynchronous stochastic approximations
    • V. S. Borkar, "Asynchronous stochastic approximations," SIAM J. Control Optimiz., vol. 36, no. 3, pp. 840-851, 1998. (Pubitemid 128493576)
    • (1998) SIAM Journal on Control and Optimization , vol.36 , Issue.3 , pp. 840-851
    • Borkar, V.S.1
  • 17
    • 58849087743 scopus 로고    scopus 로고
    • Cambridge, U. K.: (Jointly published by) Cambridge Univ. Press Hindustan Book Agency, New Delhi, India
    • V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge, U. K.: (Jointly published by) Cambridge Univ. Press, 2008, Hindustan Book Agency, New Delhi, India.
    • (2008) Stochastic Approximation: A Dynamical Systems Viewpoint
    • Borkar, V.S.1
  • 18
    • 0038595396 scopus 로고    scopus 로고
    • Least-squares temporal difference learning
    • San Francisco, CA, USA: Morgan Kaufmann
    • J. A. Boyan, "Least-squares temporal difference learning," in Proc. 16th Int. Conf. Mach. Learn. San Francisco, CA, USA: Morgan Kaufmann, 1999, pp. 49-56.
    • (1999) Proc. 16th Int. Conf. Mach. Learn , pp. 49-56
    • Boyan, J.A.1
  • 19
    • 78049343060 scopus 로고    scopus 로고
    • Adaptive bases for reinforcement learning
    • J. L. Balcar, F. Bonchi, A. Gionis, and M. Sebag, Eds. Mach. Learn. Knowl. Discovery in Databases Proc. Eur. Conf. Mach. Learn., Barcelona, Spain, ser
    • D. Di Castro and S.Mannor, J. L. Balcar, F. Bonchi, A. Gionis, and M. Sebag, Eds., "Adaptive bases for reinforcement learning," Mach. Learn. Knowl. Discovery in Databases, Proc. Eur. Conf. Mach. Learn., Barcelona, Spain, ser. Lecture Notes in Computer Science, vol. 6321, pp. 312-327, 2010.
    • (2010) Lecture Notes in Computer Science , vol.6321 , pp. 312-327
    • Di Castro, D.1    Mannor, S.2
  • 20
  • 21
    • 34250706852 scopus 로고    scopus 로고
    • Automatic basis function construction for approximate dynamic programming and reinforcement learning
    • Pittsburgh, PA, USA
    • P.W.Keller, S. Mannor, and D. Precup, "Automatic basis function construction for approximate dynamic programming and reinforcement learning," in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, USA, 2006.
    • (2006) Proc. 23rd Int. Conf. Mach. Learn
    • Keller, P.W.1    Mannor, S.2    Precup, D.3
  • 25
    • 85161990353 scopus 로고    scopus 로고
    • Basis construction from power series expansions of value functions
    • Vancouver, BC, Canada
    • S. Mahadevan and B. Liu, "Basis construction from power series expansions of value functions," in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, 2010.
    • (2010) Proc. Adv. Neural Inf. Process. Syst
    • Mahadevan, S.1    Liu, B.2
  • 26
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • DOI 10.1007/s10479-005-5732-z
    • I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Ann. Operat. Res., vol. 134, pp. 215-238, 2005. (Pubitemid 40550047)
    • (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 30
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • R. S. Sutton, "Learning to predict by the method of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
    • (1988) Mach. Learn , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 33
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • PII S0018928697034375
    • J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal difference learning with function approximation," IEEE Trans. Autom. Control, vol. 42, no. 5, pp. 674-690, 1997. (Pubitemid 127760263)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 34
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • DOI 10.1016/S0005-1098(99)00099-0
    • J. Tsitsikis and B. Van Roy, "Average cost temporal-difference learning," Automatica, vol. 35, pp. 1799-1808, 1999. (Pubitemid 32078092)
    • (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.