-
1
-
-
84884085211
-
-
Princeton, NJ, USA: Princeton Univ. Press
-
P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton, NJ, USA: Princeton Univ. Press, 2008.
-
(2008)
Optimization Algorithms on Matrix Manifolds
-
-
Absil, P.-A.1
Mahony, R.2
Sepulchre, R.3
-
2
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Morgan Kaufmann
-
L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. 12th Int. Conf. Mach. Learn., 1995, pp. 30-37, Morgan Kaufmann.
-
(1995)
Proc. 12th Int. Conf. Mach. Learn
, pp. 30-37
-
-
Baird, L.C.1
-
3
-
-
79952389882
-
Online identification and tracking of subspaces from highly incomplete information
-
L. Balzano, R. Nowak, and B. Recht, "Online identification and tracking of subspaces from highly incomplete information," in Proc. 48th Annu. Allerton Conf., 2010, pp. 704-711.
-
(2010)
Proc. 48th Annu. Allerton Conf
, pp. 704-711
-
-
Balzano, L.1
Nowak, R.2
Recht, B.3
-
4
-
-
0034445353
-
A learning algorithm for Markov decision processes with adaptive state aggregation
-
J. S. Baras andV. S. Borkar, "Alearning algorithmfor Markov decision processes with adaptive state aggregation," in Proc. 39th IEEE Conf. Decision Control, Dec. 12-15, 2000, Sydney, Australia, 2000, vol. 4, pp. 3351-3356. (Pubitemid 32528175)
-
(2000)
Proceedings of the IEEE Conference on Decision and Control
, vol.4
, pp. 3351-3356
-
-
Baras, J.S.1
Borkar, V.S.2
-
5
-
-
47649102775
-
A note on linear function approximation using random projections
-
K. Barman and V. S. Borkar, "A note on linear function approximation using random projections," Syst. Control Lett., vol. 57, no. 9, pp. 784-786, 2008.
-
(2008)
Syst. Control Lett
, vol.57
, Issue.9
, pp. 784-786
-
-
Barman, K.1
Borkar, V.S.2
-
10
-
-
61849106433
-
Projected equation methods for approximate solution of large linear systems
-
D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Appl. Math., vol. 227, pp. 27-50, 2009.
-
(2009)
J. Comput. Appl. Math
, vol.227
, pp. 27-50
-
-
Bertsekas, D.P.1
Yu, H.2
-
11
-
-
0346902105
-
Two time scale simultaneous perturbation stochastic approximation using deterministic perturbation sequences
-
S. Bhatnagar, M. C. Fu, S. I. Marcus, and I.-J. Wang, "Two time scale simultaneous perturbation stochastic approximation using deterministic perturbation sequences," ACM Trans. Model. Comput. Simulat., vol. 13, no. 2, pp. 180-209, 2003.
-
(2003)
ACM Trans. Model. Comput. Simulat
, vol.13
, Issue.2
, pp. 180-209
-
-
Bhatnagar, S.1
Fu, M.C.2
Marcus, S.I.3
Wang, I.-J.4
-
12
-
-
84884510576
-
Stochastic recursive algorithms for optimization: Simultaneous perturbation methods, ser
-
London, U.K.: Springer
-
S. Bhatnagar, H. L. Prasad, and L. A. Prashanth, Stochastic Recursive Algorithms for Optimization: Simultaneous PerturbationMethods, ser. Lecture Notes in Control and Information Sciences. London, U.K.: Springer, 2013.
-
(2013)
Lecture Notes in Control and Information Sciences
-
-
Bhatnagar, S.1
Prasad, H.L.2
Prashanth, L.A.3
-
13
-
-
84884530005
-
-
Chapter 23 of Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, F. Lewis and D. Liu, Eds. Piscataway, NJ, USA IEEE Press Computational Intelligence Series
-
S. Bhatnagar, V. S. Borkar, and L. A. Prashanth, "Adaptive feature pursuit: Online adaptation of features in reinforcement learning," in Chapter 23 of Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, F. Lewis and D. Liu, Eds. Piscataway, NJ, USA: IEEE Press Computational Intelligence Series, 2012.
-
(2012)
Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
-
-
Bhatnagar, S.1
Borkar, V.S.2
Prashanth, L.A.3
-
14
-
-
84884524714
-
-
Dept. of Comput. Sci., Indian Inst. of Science, IISc-CSA-SSL-TR-2013-2, March-2013 [Online]. Available
-
S. Bhatnagar, V. S. Borkar, and K. J. Prabuchandran, "Adaptive Feature Tuning in the Grassmanian in Online Reinforcement Learning," Dept. of Comput. Sci., Indian Inst. of Science, IISc-CSA-SSL-TR-2013-2,March-2013 [Online]. Available: http://stochastic. csa.iisc.ernet.in/www/research/files/ IISc-CSA-SSLTR-2013-2.pdf
-
Adaptive Feature Tuning in the Grassmanian in Online Reinforcement Learning
-
-
Bhatnagar, S.1
Borkar, V.S.2
Prabuchandran, K.J.3
-
15
-
-
0031076413
-
Stochastic approximation with two timescales
-
V. S. Borkar, "Stochastic approximation with two timescales," Syst. Control Lett., vol. 29, pp. 291-294, 1997.
-
(1997)
Syst. Control Lett
, vol.29
, pp. 291-294
-
-
Borkar, V.S.1
-
16
-
-
0032075427
-
Asynchronous stochastic approximations
-
V. S. Borkar, "Asynchronous stochastic approximations," SIAM J. Control Optimiz., vol. 36, no. 3, pp. 840-851, 1998. (Pubitemid 128493576)
-
(1998)
SIAM Journal on Control and Optimization
, vol.36
, Issue.3
, pp. 840-851
-
-
Borkar, V.S.1
-
17
-
-
58849087743
-
-
Cambridge, U. K.: (Jointly published by) Cambridge Univ. Press Hindustan Book Agency, New Delhi, India
-
V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge, U. K.: (Jointly published by) Cambridge Univ. Press, 2008, Hindustan Book Agency, New Delhi, India.
-
(2008)
Stochastic Approximation: A Dynamical Systems Viewpoint
-
-
Borkar, V.S.1
-
18
-
-
0038595396
-
Least-squares temporal difference learning
-
San Francisco, CA, USA: Morgan Kaufmann
-
J. A. Boyan, "Least-squares temporal difference learning," in Proc. 16th Int. Conf. Mach. Learn. San Francisco, CA, USA: Morgan Kaufmann, 1999, pp. 49-56.
-
(1999)
Proc. 16th Int. Conf. Mach. Learn
, pp. 49-56
-
-
Boyan, J.A.1
-
19
-
-
78049343060
-
Adaptive bases for reinforcement learning
-
J. L. Balcar, F. Bonchi, A. Gionis, and M. Sebag, Eds. Mach. Learn. Knowl. Discovery in Databases Proc. Eur. Conf. Mach. Learn., Barcelona, Spain, ser
-
D. Di Castro and S.Mannor, J. L. Balcar, F. Bonchi, A. Gionis, and M. Sebag, Eds., "Adaptive bases for reinforcement learning," Mach. Learn. Knowl. Discovery in Databases, Proc. Eur. Conf. Mach. Learn., Barcelona, Spain, ser. Lecture Notes in Computer Science, vol. 6321, pp. 312-327, 2010.
-
(2010)
Lecture Notes in Computer Science
, vol.6321
, pp. 312-327
-
-
Di Castro, D.1
Mannor, S.2
-
20
-
-
0032216898
-
The geometry of algorithms with orthogonality constraints
-
PII S0895479895290954
-
A. Edelman, T. A. Arias, and S. T. Smith, "The geometry of algorithms with orthogonality constraints," SIAM J. Matrix Anal. Applicat., vol. 20, no. 2, pp. 303-353, 1998. (Pubitemid 129333771)
-
(1999)
SIAM Journal on Matrix Analysis and Applications
, vol.20
, Issue.2
, pp. 303-353
-
-
Edelman, A.1
Arias, T.A.2
Smith, S.T.3
-
21
-
-
34250706852
-
Automatic basis function construction for approximate dynamic programming and reinforcement learning
-
Pittsburgh, PA, USA
-
P.W.Keller, S. Mannor, and D. Precup, "Automatic basis function construction for approximate dynamic programming and reinforcement learning," in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, USA, 2006.
-
(2006)
Proc. 23rd Int. Conf. Mach. Learn
-
-
Keller, P.W.1
Mannor, S.2
Precup, D.3
-
24
-
-
79951481923
-
Convergent temporal-difference learning with arbitrary smooth function approximation
-
Vancouver, BC, Canada
-
H. R. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, 2009.
-
(2009)
Proc. Adv. Neural Inf. Process. Syst
-
-
Maei, H.R.1
Szepesvari, C.2
Bhatnagar, S.3
Precup, D.4
Silver, D.5
Sutton, R.S.6
-
25
-
-
85161990353
-
Basis construction from power series expansions of value functions
-
Vancouver, BC, Canada
-
S. Mahadevan and B. Liu, "Basis construction from power series expansions of value functions," in Proc. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, 2010.
-
(2010)
Proc. Adv. Neural Inf. Process. Syst
-
-
Mahadevan, S.1
Liu, B.2
-
26
-
-
17444414191
-
Basis function adaptation in temporal difference reinforcement learning
-
DOI 10.1007/s10479-005-5732-z
-
I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Ann. Operat. Res., vol. 134, pp. 215-238, 2005. (Pubitemid 40550047)
-
(2005)
Annals of Operations Research
, vol.134
, Issue.1
, pp. 215-238
-
-
Menache, I.1
Mannor, S.2
Shimkin, N.3
-
27
-
-
34547982545
-
Analyzing feature generation for value-function approximation
-
Corvallis, OR, USA
-
R. Parr, C. Painter-Wakefield, L. Li, and M. Littman, "Analyzing feature generation for value-function approximation," in Proc. 24th Int. Conf. Mach. Learn., Corvallis, OR, USA, 2007.
-
(2007)
Proc. 24th Int. Conf. Mach. Learn
-
-
Parr, R.1
Painter-Wakefield, C.2
Li, L.3
Littman, M.4
-
29
-
-
80053457849
-
Incremental basis construction from temporal difference error
-
Bellevue,WA, USA
-
Y. Sun, F. Gomez, M. Ring, and J. Schmidhuber, "Incremental basis construction from temporal difference error," in Proc. 28th Int. Conf. Mach. Learn., Bellevue,WA, USA, 2011.
-
(2011)
Proc. 28th Int. Conf. Mach. Learn
-
-
Sun, Y.1
Gomez, F.2
Ring, M.3
Schmidhuber, J.4
-
30
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
R. S. Sutton, "Learning to predict by the method of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
-
(1988)
Mach. Learn
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
32
-
-
71149099079
-
Fast gradient-descent methods for temporaldifference learning with linear function approximation
-
R. S. Sutton, H. R.Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, and E. Wiewiora, "Fast gradient-descent methods for temporaldifference learning with linear function approximation," in Proc. 26th Int. Conf. Mach. Learn., 2009, pp. 993-1000.
-
(2009)
Proc. 26th Int. Conf. Mach. Learn
, pp. 993-1000
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvari, C.6
Wiewiora, E.7
-
33
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
PII S0018928697034375
-
J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal difference learning with function approximation," IEEE Trans. Autom. Control, vol. 42, no. 5, pp. 674-690, 1997. (Pubitemid 127760263)
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
34
-
-
0033221519
-
Average cost temporal-difference learning
-
DOI 10.1016/S0005-1098(99)00099-0
-
J. Tsitsikis and B. Van Roy, "Average cost temporal-difference learning," Automatica, vol. 35, pp. 1799-1808, 1999. (Pubitemid 32078092)
-
(1999)
Automatica
, vol.35
, Issue.11
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
|