SCOPUS 정보 검색 플랫폼

Algorithmica (New York)

Volumn 37, Issue 4, 2003, Pages 263-293

Reinforcement learning with immediate rewards and linear hypotheses

(3) Abe, Naoki a Biermann, Alan W b Long, Philip M c

a IBM T J WATSON RESEARCH CENTER (United States)

b Duke University (United States)

c GENOME INSTITUTE OF SINGAPORE (Singapore)

Author keywords

Computational learning theory; Decision theory; Dialogue systems; Immediate rewards; Online algorithms; Online learning; Reinforcement learning

Indexed keywords

BOUNDARY CONDITIONS; CLIENT SERVER COMPUTER SYSTEMS; CONVERGENCE OF NUMERICAL METHODS; DECISION THEORY; ELECTRONIC COMMERCE; FUNCTIONS; INTERNET; LEARNING SYSTEMS; RANDOM PROCESSES; THEOREM PROVING; VECTORS;

COMPUTATIONAL LEARNING THEORY; CONTINUOUS VALUED REWARDS; DIALOGUE SYSTEMS; IMMEDIATE REWARDS; ONLINE ALGORITHMS; ONLINE LEARNING; REINFORCEMENT LEARNING;

LEARNING ALGORITHMS;

EID: 0344118814 PISSN: 01784617 EISSN: None Source Type: Journal
DOI: 10.1007/s00453-003-1038-1 Document Type: Article

Times cited : (114)

References (31)

1
- 0042996986
- Associative reinforcement learning using linear probabilistic concepts
- N. Abe and P. M. Long. Associative reinforcement learning using linear probabilistic concepts. Proceedings of the 16th International Conference on Machine Learning, pages 3-11, 1999.
- (1999) Proceedings of the 16th International Conference on Machine Learning , pp. 3-11
- Abe, N.¹ Long, P.M.²

2
- 0042496195
- Learning to optimally schedule internet banner advertisements
- N. Abe and A. Nakamura. Learning to optimally schedule internet banner advertisements. Proceedings of the 16th International Conference on Machine Learning, 1999.
- Proceedings of the 16th International Conference on Machine Learning, 1999
- Abe, N.¹ Nakamura, A.²

3
- 0033280413
- Individual sequence prediction - Upper bounds and an application for complexity
- C. Allenberg. Individual sequence prediction - upper bounds and an application for complexity. Proceedings of the 1999 Conference on Computational Learning Theory, pages 233-242, 1999.
- (1999) Proceedings of the 1999 Conference on Computational Learning Theory , pp. 233-242
- Allenberg, C.¹

4
- 0344793355
- An improved algorithm for learning linear evaluation functions
- P. Auer. An improved algorithm for learning linear evaluation functions. Proceedings of the 2000 Conference on Computational Learning Theory, 2000.
- Proceedings of the 2000 Conference on Computational Learning Theory, 2000
- Auer, P.¹

5
- 84856039729
- Using upper confidence bounds for online learning
- P. Auer. Using upper confidence bounds for online learning. Proceedings of the 41st Annual Symposium on the Foundations of Computer Science, 2000.
- Proceedings of the 41st Annual Symposium on the Foundations of Computer Science, 2000
- Auer, P.¹

6
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397-422, 2002.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
- Auer, P.¹

7
- 0344361499
- A preliminary version has appeared
- P. Auer. A preliminary version has appeared in Proceedings of the 41st Annual Symposium on Foundations of Computer Science.
- Proceedings of the 41st Annual Symposium on Foundations of Computer Science
- Auer, P.¹

8
- 0029513526
- Gambling in a rigged casino: The adversarial multi-armed bandit problem
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. Proceedings of the 36th Annual Symposium on the Foundations of Computer Science, 1995.
- Proceedings of the 36th Annual Symposium on the Foundations of Computer Science, 1995
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

9
- 0042496192
- Gambling in a rigged casino: The adversarial multi-armed bandit problem
- Technical Report NC-TR-98-025, Neurocolt
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. Technical Report NC-TR-98-025, Neurocolt, 1998.
- (1998)
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

10
- 0037709910
- The nonstochastic multiarmed bandit problem
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.
- (2002) SIAM Journal on Computing , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

11
- 0344361499
- A preliminary version has appeared
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. A preliminary version has appeared in Proceedings of the 36th Annual Symposium on Foundations of Computer Science.
- Proceedings of the 36th Annual Symposium on Foundations of Computer Science
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

12
- 84927461265
- Pattern recognizing stochastic learning automata
- A. G. Barto and P. Anandan. Pattern recognizing stochastic learning automata. IEEE Transactions on Systems, Man and Cybernetics, 15:360-374, 1985.
- (1985) IEEE Transactions on Systems, Man and Cybernetics , vol.15 , pp. 360-374
- Barto, A.G.¹ Anandan, P.²

13
- 0019519039
- Associative search network: A reinforcement learning associative memory
- A. G. Barto, R. S. Sutton, and P. S. Brouwer. Associative search network: a reinforcement learning associative memory. Biological Cybernetics, 40:201-211, 1981.
- (1981) Biological Cybernetics , vol.40 , pp. 201-211
- Barto, A.G.¹ Sutton, R.S.² Brouwer, P.S.³

14
- 0004218171
- Chapman and Hall, New York
- D.A. Berry and B. Fristedt. Bandit Problems. Chapman and Hall, New York, 1985.
- (1985) Bandit Problems
- Berry, D.A.¹ Fristedt, B.²

15
- 84895163091
- Goal-oriented multimedia dialogue with variable initiative
- Z. W. Ras and A. Skowron (eds.)
- A. W. Biermann, C. I. Guinn, M. Fulkerson, G. Keim, Z. Liang, D. Melamed, and K. Rajagopalan. Goal-oriented multimedia dialogue with variable initiative. In Foundations of Intelligent Systems, Z. W. Ras and A. Skowron (eds.), 1997.
- (1997) Foundations of Intelligent Systems
- Biermann, A.W.¹ Guinn, C.I.² Fulkerson, M.³ Keim, G.⁴ Liang, Z.⁵ Melamed, D.⁶ Rajagopalan, K.⁷

16
- 0042759445
- The composition of messages in speech-graphics interactive systems
- A. W. Biermann and P. M. Long. The composition of messages in speech-graphics interactive systems. Proceedings of the 1996 International Symposium on Spoken Dialogue, 1996.
- Proceedings of the 1996 International Symposium on Spoken Dialogue, 1996
- Biermann, A.W.¹ Long, P.M.²

17
- 0030145382
- Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
- N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.
- (1996) IEEE Transactions on Neural Networks , vol.7 , Issue.3 , pp. 604-619
- Cesa-Bianchi, N.¹ Long, P.M.² Warmuth, M.K.³

18
- 0031140246
- How to use expert advice
- May
- N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the Association for Computing Machinery, 44(3):427-485, May 1997.
- (1997) Journal of the Association for Computing Machinery , vol.44 , Issue.3 , pp. 427-485
- Cesa-Bianchi, N.¹ Freund, Y.² Haussler, D.³ Helmbold, D.P.⁴ Schapire, R.E.⁵ Warmuth, M.K.⁶

19
- 26544465671
- Design and analysis of efficient reinforcement learning algorithms
- Ph.D. thesis, University of Pittsburgh
- C.-N. Fiechter. Design and Analysis of Efficient Reinforcement Learning Algorithms. Ph.D. thesis, University of Pittsburgh, 1997.
- (1997)
- Fiechter, C.-N.¹

20
- 0030282940
- Rigorous learning curve bounds from statistical mechanics
- D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25:195-236, 1996.
- (1996) Machine Learning , vol.25 , pp. 195-236
- Haussler, D.¹ Kearns, M.² Seung, H.S.³ Tishby, N.⁴

21
- 0034666805
- Apple tasting
- D.P.Helmbold, N.Littlestone, and P.M.Long. Apple tasting. Information and Computation, 161(2):85-139, 2000. Preliminary version in FOCS'92.
- (2000) Information and Computation , vol.161 , Issue.2 , pp. 85-139
- Helmbold, D.P.¹ Littlestone, N.² Long, P.M.³

22
- 0028442414
- Associative reinforcement learning: A generate and test algorithm
- L. P. Kaelbling. Associative reinforcement learning: a generate and test algorithm. Machine Learning, 15(3):299-320, 1994.
- (1994) Machine Learning , vol.15 , Issue.3 , pp. 299-320
- Kaelbling, L.P.¹

23
- 0028442413
- Associative reinforcement learning: Functions in k-dnf
- L. P. Kaelbling. Associative reinforcement learning: functions in k-dnf. Machine Learning, 15(3):279-298, 1994.
- (1994) Machine Learning , vol.15 , Issue.3 , pp. 279-298
- Kaelbling, L.P.¹

24
- 0023545078
- On the learnability of Boolean formulae
- M. Kearns, M. Li, L. Pitt, and L. G. Valiant. On the learnability of Boolean formulae. Proceedings of the 19th Annual Symposium on the Theory of Computation, pages 285-295, 1987.
- (1987) Proceedings of the 19th Annual Symposium on the Theory of Computation , pp. 285-295
- Kearns, M.¹ Li, M.² Pitt, L.³ Valiant, L.G.⁴

25
- 0001553979
- Toward efficient agnostic learning
- M. J. Kearns, R. E. Schapire, and L. M. Sellie. Toward efficient agnostic learning. Machine Learning, 17:115-141, 1994.
- (1994) Machine Learning , vol.17 , pp. 115-141
- Kearns, M.J.¹ Schapire, R.E.² Sellie, L.M.³

26
- 34250091945
- Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
- N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2:285-318, 1988.
- (1988) Machine Learning , vol.2 , pp. 285-318
- Littlestone, N.¹

27
- 0030718664
- On-line evaluation and prediction using linear functions
- P.M. Long. On-line evaluation and prediction using linear functions. Proceedings of the 1997 Conference on Computational Learning Theory, 1997
- Proceedings of the 1997 Conference on Computational Learning Theory, 1997
- Long, P.M.¹

28
- 0003891507
- Prentice-Hall, Englewood Cliffs, NJ
- K. S. Narendra and M. A. L. Thathachar. Learning Automata: An Introduction. Prentice-Hall, Englewood Cliffs, NJ, 1989.
- (1989) Learning Automata: An Introduction
- Narendra, K.S.¹ Thathachar, M.A.L.²

29
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

30
- 0002278965
- Adaptive switching circuits
- 1960 IRE WESCON Conv. Record
- B. Widrow and M. E. Hoff. Adaptive switching circuits. 1960 IRE WESCON Conv. Record, pages 96-104, 1960.
- (1960) , pp. 96-104
- Widrow, B.¹ Hoff, M.E.²

31
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.