SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 7, Issue 5, 2013, Pages 746-758

Feature search in the grassmanian in online reinforcement learning

(3) Bhatnagar, Shalabh a Borkar, Vivek S b Prabuchandran, K J a

b INDIAN INSTITUTE OF TECHNOLOGY BOMBAY (India)

Author keywords

Feature adaptation; Grassman manifold; online learning; residual gradient scheme; stochastic approximation; temporal difference learning

Indexed keywords

FEATURE ADAPTATION; GRASSMAN MANIFOLD; ONLINE LEARNING; RESIDUAL GRADIENT; STOCHASTIC APPROXIMATIONS; TEMPORAL DIFFERENCE LEARNING;

APPROXIMATION ALGORITHMS; REINFORCEMENT LEARNING;

E-LEARNING;

EID: 84884515550 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2013.2255022 Document Type: Article

Times cited : (9)

References (35)

1
- 84884085211
- Princeton, NJ, USA: Princeton Univ. Press
- P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton, NJ, USA: Princeton Univ. Press, 2008.
- (2008) Optimization Algorithms on Matrix Manifolds
- Absil, P.-A.¹ Mahony, R.² Sepulchre, R.³

2
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Morgan Kaufmann
- L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. 12th Int. Conf. Mach. Learn., 1995, pp. 30-37, Morgan Kaufmann.
- (1995) Proc. 12th Int. Conf. Mach. Learn , pp. 30-37
- Baird, L.C.¹

3
- 79952389882
- Online identification and tracking of subspaces from highly incomplete information
- L. Balzano, R. Nowak, and B. Recht, "Online identification and tracking of subspaces from highly incomplete information," in Proc. 48th Annu. Allerton Conf., 2010, pp. 704-711.
- (2010) Proc. 48th Annu. Allerton Conf , pp. 704-711
- Balzano, L.¹ Nowak, R.² Recht, B.³

4
- 0034445353
- A learning algorithm for Markov decision processes with adaptive state aggregation
- J. S. Baras andV. S. Borkar, "Alearning algorithmfor Markov decision processes with adaptive state aggregation," in Proc. 39th IEEE Conf. Decision Control, Dec. 12-15, 2000, Sydney, Australia, 2000, vol. 4, pp. 3351-3356. (Pubitemid 32528175)
- (2000) Proceedings of the IEEE Conference on Decision and Control , vol.4 , pp. 3351-3356
- Baras, J.S.¹ Borkar, V.S.²

5
- 47649102775
- A note on linear function approximation using random projections
- K. Barman and V. S. Borkar, "A note on linear function approximation using random projections," Syst. Control Lett., vol. 57, no. 9, pp. 784-786, 2008.
- (2008) Syst. Control Lett , vol.57 , Issue.9 , pp. 784-786
- Barman, K.¹ Borkar, V.S.²

6
- 0003778897
- Berlin Germany: Springer-Verlag
- A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations. Berlin, Germany: Springer-Verlag, 1990.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

7
- 0003565783
- (3rd ed.). Belmont, MA, USA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control Vol. I (3rd ed.). Belmont, MA, USA: Athena Scientific, 2005.
- (2005) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

8
- 0003565783
- (4th ed.). Belmont, MA, USA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control Vol. II (4th ed.). Belmont, MA, USA: Athena Scientific, 2011.
- (2011) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

9
- 0003487482
- Belmont, MA, USA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA, USA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

10
- 61849106433
- Projected equation methods for approximate solution of large linear systems
- D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Appl. Math., vol. 227, pp. 27-50, 2009.
- (2009) J. Comput. Appl. Math , vol.227 , pp. 27-50
- Bertsekas, D.P.¹ Yu, H.²

11
- 0346902105
- Two time scale simultaneous perturbation stochastic approximation using deterministic perturbation sequences
- S. Bhatnagar, M. C. Fu, S. I. Marcus, and I.-J. Wang, "Two time scale simultaneous perturbation stochastic approximation using deterministic perturbation sequences," ACM Trans. Model. Comput. Simulat., vol. 13, no. 2, pp. 180-209, 2003.
- (2003) ACM Trans. Model. Comput. Simulat , vol.13 , Issue.2 , pp. 180-209
- Bhatnagar, S.¹ Fu, M.C.² Marcus, S.I.³ Wang, I.-J.⁴

12
- 84884510576
- Stochastic recursive algorithms for optimization: Simultaneous perturbation methods, ser
- London, U.K.: Springer
- S. Bhatnagar, H. L. Prasad, and L. A. Prashanth, Stochastic Recursive Algorithms for Optimization: Simultaneous PerturbationMethods, ser. Lecture Notes in Control and Information Sciences. London, U.K.: Springer, 2013.
- (2013) Lecture Notes in Control and Information Sciences
- Bhatnagar, S.¹ Prasad, H.L.² Prashanth, L.A.³

13
- 84884530005
- Chapter 23 of Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, F. Lewis and D. Liu, Eds. Piscataway, NJ, USA IEEE Press Computational Intelligence Series
- S. Bhatnagar, V. S. Borkar, and L. A. Prashanth, "Adaptive feature pursuit: Online adaptation of features in reinforcement learning," in Chapter 23 of Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, F. Lewis and D. Liu, Eds. Piscataway, NJ, USA: IEEE Press Computational Intelligence Series, 2012.
- (2012) Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
- Bhatnagar, S.¹ Borkar, V.S.² Prashanth, L.A.³

14
- 84884524714
- Dept. of Comput. Sci., Indian Inst. of Science, IISc-CSA-SSL-TR-2013-2, March-2013 [Online]. Available
- S. Bhatnagar, V. S. Borkar, and K. J. Prabuchandran, "Adaptive Feature Tuning in the Grassmanian in Online Reinforcement Learning," Dept. of Comput. Sci., Indian Inst. of Science, IISc-CSA-SSL-TR-2013-2,March-2013 [Online]. Available: http://stochastic. csa.iisc.ernet.in/www/research/files/ IISc-CSA-SSLTR-2013-2.pdf
- Adaptive Feature Tuning in the Grassmanian in Online Reinforcement Learning
- Bhatnagar, S.¹ Borkar, V.S.² Prabuchandran, K.J.³

15
- 0031076413
- Stochastic approximation with two timescales
- V. S. Borkar, "Stochastic approximation with two timescales," Syst. Control Lett., vol. 29, pp. 291-294, 1997.
- (1997) Syst. Control Lett , vol.29 , pp. 291-294
- Borkar, V.S.¹

16
- 0032075427
- Asynchronous stochastic approximations
- V. S. Borkar, "Asynchronous stochastic approximations," SIAM J. Control Optimiz., vol. 36, no. 3, pp. 840-851, 1998. (Pubitemid 128493576)
- (1998) SIAM Journal on Control and Optimization , vol.36 , Issue.3 , pp. 840-851
- Borkar, V.S.¹

17
- 58849087743
- Cambridge, U. K.: (Jointly published by) Cambridge Univ. Press Hindustan Book Agency, New Delhi, India
- V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge, U. K.: (Jointly published by) Cambridge Univ. Press, 2008, Hindustan Book Agency, New Delhi, India.
- (2008) Stochastic Approximation: A Dynamical Systems Viewpoint
- Borkar, V.S.¹

18
- 0038595396
- Least-squares temporal difference learning
- San Francisco, CA, USA: Morgan Kaufmann
- J. A. Boyan, "Least-squares temporal difference learning," in Proc. 16th Int. Conf. Mach. Learn. San Francisco, CA, USA: Morgan Kaufmann, 1999, pp. 49-56.
- (1999) Proc. 16th Int. Conf. Mach. Learn , pp. 49-56
- Boyan, J.A.¹

19
- 78049343060
- Adaptive bases for reinforcement learning
- J. L. Balcar, F. Bonchi, A. Gionis, and M. Sebag, Eds. Mach. Learn. Knowl. Discovery in Databases Proc. Eur. Conf. Mach. Learn., Barcelona, Spain, ser
- D. Di Castro and S.Mannor, J. L. Balcar, F. Bonchi, A. Gionis, and M. Sebag, Eds., "Adaptive bases for reinforcement learning," Mach. Learn. Knowl. Discovery in Databases, Proc. Eur. Conf. Mach. Learn., Barcelona, Spain, ser. Lecture Notes in Computer Science, vol. 6321, pp. 312-327, 2010.
- (2010) Lecture Notes in Computer Science , vol.6321 , pp. 312-327
- Di Castro, D.¹ Mannor, S.²

20
- 0032216898
- The geometry of algorithms with orthogonality constraints
- PII S0895479895290954
- A. Edelman, T. A. Arias, and S. T. Smith, "The geometry of algorithms with orthogonality constraints," SIAM J. Matrix Anal. Applicat., vol. 20, no. 2, pp. 303-353, 1998. (Pubitemid 129333771)
- (1999) SIAM Journal on Matrix Analysis and Applications , vol.20 , Issue.2 , pp. 303-353
- Edelman, A.¹ Arias, T.A.² Smith, S.T.³

21
- 34250706852
- Automatic basis function construction for approximate dynamic programming and reinforcement learning
- Pittsburgh, PA, USA
- P.W.Keller, S. Mannor, and D. Precup, "Automatic basis function construction for approximate dynamic programming and reinforcement learning," in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, USA, 2006.
- (2006) Proc. 23rd Int. Conf. Mach. Learn
- Keller, P.W.¹ Mannor, S.² Precup, D.³

22
- 0003452601
- NewYorkNYUSA: Springer-Verlag
- H. J. Kushner and D. S. Clark, Stochastic Approximation Methods for Constrained and Unconstrained Systems. NewYork,NY,USA: Springer-Verlag, 1978.
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems
- Kushner, H.J.¹ Clark, D.S.²

23
- 0004066022
- New York NY USA: Springer Verlag
- H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications. New York, NY, USA: Springer Verlag, 1997.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.G.²

24
- 79951481923
- Convergent temporal-difference learning with arbitrary smooth function approximation
- Vancouver, BC, Canada
- H. R. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, 2009.
- (2009) Proc. Adv. Neural Inf. Process. Syst
- Maei, H.R.¹ Szepesvari, C.² Bhatnagar, S.³ Precup, D.⁴ Silver, D.⁵ Sutton, R.S.⁶

25
- 85161990353
- Basis construction from power series expansions of value functions
- Vancouver, BC, Canada
- S. Mahadevan and B. Liu, "Basis construction from power series expansions of value functions," in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, 2010.
- (2010) Proc. Adv. Neural Inf. Process. Syst
- Mahadevan, S.¹ Liu, B.²

26
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- DOI 10.1007/s10479-005-5732-z
- I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Ann. Operat. Res., vol. 134, pp. 215-238, 2005. (Pubitemid 40550047)
- (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

27
- 34547982545
- Analyzing feature generation for value-function approximation
- Corvallis, OR, USA
- R. Parr, C. Painter-Wakefield, L. Li, and M. Littman, "Analyzing feature generation for value-function approximation," in Proc. 24th Int. Conf. Mach. Learn., Corvallis, OR, USA, 2007.
- (2007) Proc. 24th Int. Conf. Mach. Learn
- Parr, R.¹ Painter-Wakefield, C.² Li, L.³ Littman, M.⁴

28
- 85102627959
- New York NY USA: Wiley
- M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York, NY, USA: Wiley, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

29
- 80053457849
- Incremental basis construction from temporal difference error
- Bellevue,WA, USA
- Y. Sun, F. Gomez, M. Ring, and J. Schmidhuber, "Incremental basis construction from temporal difference error," in Proc. 28th Int. Conf. Mach. Learn., Bellevue,WA, USA, 2011.
- (2011) Proc. 28th Int. Conf. Mach. Learn
- Sun, Y.¹ Gomez, F.² Ring, M.³ Schmidhuber, J.⁴

30
- 33847202724
- Learning to predict by the method of temporal differences
- R. S. Sutton, "Learning to predict by the method of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
- (1988) Mach. Learn , vol.3 , pp. 9-44
- Sutton, R.S.¹

31
- 0004102479
- Cambridge MA USA: MIT Press
- R. S. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.²

32
- 71149099079
- Fast gradient-descent methods for temporaldifference learning with linear function approximation
- R. S. Sutton, H. R.Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, and E. Wiewiora, "Fast gradient-descent methods for temporaldifference learning with linear function approximation," in Proc. 26th Int. Conf. Mach. Learn., 2009, pp. 993-1000.
- (2009) Proc. 26th Int. Conf. Mach. Learn , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvari, C.⁶ Wiewiora, E.⁷

33
- 0031143730
- An analysis of temporal-difference learning with function approximation
- PII S0018928697034375
- J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal difference learning with function approximation," IEEE Trans. Autom. Control, vol. 42, no. 5, pp. 674-690, 1997. (Pubitemid 127760263)
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

34
- 0033221519
- Average cost temporal-difference learning
- DOI 10.1016/S0005-1098(99)00099-0
- J. Tsitsikis and B. Van Roy, "Average cost temporal-difference learning," Automatica, vol. 35, pp. 1799-1808, 1999. (Pubitemid 32078092)
- (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

35
- 67650458822
- Basis function adaptation methods for cost approximation in MDP
- Nashville, TN
- H. Yu and D. P. Bertsekas, "Basis function adaptation methods for cost approximation in MDP," in Proc. IEEE Int. Symp. Adaptive Dynamic Program. Reinforce. Learn., Nashville, TN, 2009.
- (2009) Proc. IEEE Int. Symp. Adaptive Dynamic Program. Reinforce. Learn
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.