-
3
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
ISSN 1533-7928
-
P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397-422, 2003. ISSN 1533-7928.
-
(2003)
Journal of Machine Learning Research
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
4
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
8
-
-
0022044787
-
Adaptive control with the stochastic approximation algorithm: Geometry and convergence
-
A. Becker, P. R. Kumar, and C. Z. Wei. Adaptive control with the stochastic approximation algorithm: Geometry and convergence. IEEE Trans, on Automatic Control, 30(4) :330-338, 1985.
-
(1985)
IEEE Trans, on Automatic Control
, vol.30
, Issue.4
, pp. 330-338
-
-
Becker, A.1
Kumar, P.R.2
Wei, C.Z.3
-
11
-
-
84877734570
-
Adaptive control of linear time invariant systems: The "bet on the best" principle
-
S. Bittanti and M. C. Campi. Adaptive control of linear time invariant systems: the "bet on the best" principle. Communications in Information and Systems, 6(4):299-320, 2006.
-
(2006)
Communications in Information and Systems
, vol.6
, Issue.4
, pp. 299-320
-
-
Bittanti, S.1
Campi, M.C.2
-
12
-
-
0041965975
-
R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
-
R. I. Brafman and M. Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
13
-
-
0032203343
-
Adaptive linear quadratic Gaussian control: The cost-biased approach revisited
-
M. C. Campi and P. R. Kumar. Adaptive linear quadratic Gaussian control: the cost-biased approach revisited. SIAM Journal on Control and Optimization, 36(6):1890-1907, 1998.
-
(1998)
SIAM Journal on Control and Optimization
, vol.36
, Issue.6
, pp. 1890-1907
-
-
Campi, M.C.1
Kumar, P.R.2
-
14
-
-
0023383665
-
Optimal adaptive control and consistent parameter estimates for armax model with quadratic cost
-
H. Chen and L. Guo. Optimal adaptive control and consistent parameter estimates for armax model with quadratic cost. SIAM Journal on Control and Optimization, 25(4): 845-867, 1987. (Pubitemid 17599082)
-
(1987)
SIAM Journal on Control and Optimization
, vol.25
, Issue.4
, pp. 845-867
-
-
Chen, H.-F.1
Guo, L.2
-
15
-
-
0025470399
-
Identification and adaptive control for systems with unknown orders, delay, and coefficients
-
DOI 10.1109/9.58496
-
H. Chen and J. Zhang. Identification and adaptive control for systems with unknown orders, delay, and coefficients. Automatic Control, IEEE Transactions on, 35(8):866-877, August 1990. (Pubitemid 20738736)
-
(1990)
IEEE Transactions on Automatic Control
, vol.35
, Issue.8
, pp. 866-877
-
-
Chen, H.-F.1
Zhang, J.-F.2
-
16
-
-
33244456637
-
Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary
-
V. Dani and T. P. Hayes. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 937-943, 2006.
-
(2006)
16th Annual ACM-SIAM Symposium on Discrete Algorithms
, pp. 937-943
-
-
Dani, V.1
Hayes, T.P.2
-
17
-
-
84898072179
-
Stochastic linear optimization under bandit feedback
-
V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under bandit feedback. COLT-2008, pages 355-366, 2008.
-
(2008)
COLT-2008
, pp. 355-366
-
-
Dani, V.1
Hayes, T.P.2
Kakade, S.M.3
-
20
-
-
1942452450
-
Exploration in metric state spaces
-
T. Fawcett and N. Mishra, editors, AAAI Press
-
S. M. Kakade, M. J. Kearns, and J. Langford. Exploration in metric state spaces. In T. Fawcett and N. Mishra, editors, ICML 2003, pages 306-312. AAAI Press, 2003.
-
(2003)
ICML 2003
, pp. 306-312
-
-
Kakade, S.M.1
Kearns, M.J.2
Langford, J.3
-
21
-
-
0012257655
-
Near-optimal performance for reinforcement learning in polynomial time
-
J. W. Shavlik, editor, Morgan Kauffmann
-
M. Kearns and S. P. Singh. Near-optimal performance for reinforcement learning in polynomial time. In J. W. Shavlik, editor, ICML 1998, pages 260-268. Morgan Kauffmann, 1998.
-
(1998)
ICML 1998
, pp. 260-268
-
-
Kearns, M.1
Singh, S.P.2
-
24
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
25
-
-
0000258837
-
Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems
-
T. L. Lai and C. Z. Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1):pp. 154-166, 1982a.
-
(1982)
The Annals of Statistics
, vol.10
, Issue.1
, pp. 154-166
-
-
Lai, T.L.1
Wei, C.Z.2
-
26
-
-
0000258837
-
Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems
-
T. L. Lai and C. Z. Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1) : 154-166, 1982b.
-
(1982)
The Annals of Statistics
, vol.10
, Issue.1
, pp. 154-166
-
-
Lai, T.L.1
Wei, C.Z.2
-
28
-
-
33750972856
-
Efficient recursive estimation and adaptive control in stochastic regression and armax models
-
T. L. Lai and Z. Ying. Efficient recursive estimation and adaptive control in stochastic regression and armax models. Statistica Sinica, 16:741-772, 2006. (Pubitemid 44744348)
-
(2006)
Statistica Sinica
, vol.16
, Issue.3
, pp. 741-772
-
-
Lai, T.L.1
Ying, Z.2
-
30
-
-
0001787217
-
Dynamic programming under uncertainty with a quadratic criterion function
-
H. A. Simon, dynamic programming under uncertainty with a quadratic criterion function. Econometrica, 24(1):741, 1956.
-
(1956)
Econometrica
, vol.24
, Issue.1
, pp. 741
-
-
Simon, H.A.1
-
31
-
-
85162058047
-
Online linear regression and its application to model-based reinforcement learning
-
A. L. Strehl and M. L. Littman. Online linear regression and its application to model-based reinforcement learning. In NIPS, pages 1417-1424, 2008.
-
(2008)
NIPS
, pp. 1417-1424
-
-
Strehl, A.L.1
Littman, M.L.2
-
32
-
-
34250700033
-
PAC model-free reinforcement learning
-
A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In ICML, pages 881-888, 2006.
-
(2006)
ICML
, pp. 881-888
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Langford, J.4
Littman, M.L.5
-
33
-
-
77956520676
-
Model-based reinforcement learning with nearly tight exploration complexity bounds
-
I. Szita and Cs. Szepesvári. Model-based reinforcement learning with nearly tight exploration complexity bounds. In ICML 2010, pages 1031-1038, 2010.
-
(2010)
ICML 2010
, pp. 1031-1038
-
-
Szita, I.1
Szepesvári, Cs.2
|