메뉴 건너뛰기




Volumn 37, Issue 4, 2003, Pages 263-293

Reinforcement learning with immediate rewards and linear hypotheses

Author keywords

Computational learning theory; Decision theory; Dialogue systems; Immediate rewards; Online algorithms; Online learning; Reinforcement learning

Indexed keywords

BOUNDARY CONDITIONS; CLIENT SERVER COMPUTER SYSTEMS; CONVERGENCE OF NUMERICAL METHODS; DECISION THEORY; ELECTRONIC COMMERCE; FUNCTIONS; INTERNET; LEARNING SYSTEMS; RANDOM PROCESSES; THEOREM PROVING; VECTORS;

EID: 0344118814     PISSN: 01784617     EISSN: None     Source Type: Journal    
DOI: 10.1007/s00453-003-1038-1     Document Type: Article
Times cited : (114)

References (31)
  • 6
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397-422, 2002.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
    • Auer, P.1
  • 9
    • 0042496192 scopus 로고    scopus 로고
    • Gambling in a rigged casino: The adversarial multi-armed bandit problem
    • Technical Report NC-TR-98-025, Neurocolt
    • P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. Technical Report NC-TR-98-025, Neurocolt, 1998.
    • (1998)
    • Auer, P.1    Cesa-Bianchi, N.2    Freund, Y.3    Schapire, R.E.4
  • 13
    • 0019519039 scopus 로고
    • Associative search network: A reinforcement learning associative memory
    • A. G. Barto, R. S. Sutton, and P. S. Brouwer. Associative search network: a reinforcement learning associative memory. Biological Cybernetics, 40:201-211, 1981.
    • (1981) Biological Cybernetics , vol.40 , pp. 201-211
    • Barto, A.G.1    Sutton, R.S.2    Brouwer, P.S.3
  • 17
    • 0030145382 scopus 로고    scopus 로고
    • Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
    • N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.
    • (1996) IEEE Transactions on Neural Networks , vol.7 , Issue.3 , pp. 604-619
    • Cesa-Bianchi, N.1    Long, P.M.2    Warmuth, M.K.3
  • 19
    • 26544465671 scopus 로고    scopus 로고
    • Design and analysis of efficient reinforcement learning algorithms
    • Ph.D. thesis, University of Pittsburgh
    • C.-N. Fiechter. Design and Analysis of Efficient Reinforcement Learning Algorithms. Ph.D. thesis, University of Pittsburgh, 1997.
    • (1997)
    • Fiechter, C.-N.1
  • 20
    • 0030282940 scopus 로고    scopus 로고
    • Rigorous learning curve bounds from statistical mechanics
    • D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25:195-236, 1996.
    • (1996) Machine Learning , vol.25 , pp. 195-236
    • Haussler, D.1    Kearns, M.2    Seung, H.S.3    Tishby, N.4
  • 22
    • 0028442414 scopus 로고
    • Associative reinforcement learning: A generate and test algorithm
    • L. P. Kaelbling. Associative reinforcement learning: a generate and test algorithm. Machine Learning, 15(3):299-320, 1994.
    • (1994) Machine Learning , vol.15 , Issue.3 , pp. 299-320
    • Kaelbling, L.P.1
  • 23
    • 0028442413 scopus 로고
    • Associative reinforcement learning: Functions in k-dnf
    • L. P. Kaelbling. Associative reinforcement learning: functions in k-dnf. Machine Learning, 15(3):279-298, 1994.
    • (1994) Machine Learning , vol.15 , Issue.3 , pp. 279-298
    • Kaelbling, L.P.1
  • 26
    • 34250091945 scopus 로고
    • Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
    • N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2:285-318, 1988.
    • (1988) Machine Learning , vol.2 , pp. 285-318
    • Littlestone, N.1
  • 30
    • 0002278965 scopus 로고
    • Adaptive switching circuits
    • 1960 IRE WESCON Conv. Record
    • B. Widrow and M. E. Hoff. Adaptive switching circuits. 1960 IRE WESCON Conv. Record, pages 96-104, 1960.
    • (1960) , pp. 96-104
    • Widrow, B.1    Hoff, M.E.2
  • 31
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.