메뉴 건너뛰기




Volumn 155, Issue 3, 2004, Pages 654-674

Reinforcement learning for long-run average cost

Author keywords

Reinforcement learning; Stochastic processes; Two time scales

Indexed keywords

COMPUTER SIMULATION; COSTS; DYNAMIC PROGRAMMING; LEARNING SYSTEMS; MARKOV PROCESSES; PROBABILITY; QUALITY CONTROL; RANDOM PROCESSES; REINFORCEMENT;

EID: 0742319170     PISSN: 03772217     EISSN: None     Source Type: Journal    
DOI: 10.1016/S0377-2217(02)00874-3     Document Type: Conference Paper
Times cited : (115)

References (35)
  • 1
    • 85034481533 scopus 로고    scopus 로고
    • Ode analysis for Q-learning algorithms
    • MIT, Cambridge, MA
    • J. Abounadi, D. Bertsekas, V. Borkar, Ode analysis for Q-learning algorithms, LIDS Report, MIT, Cambridge, MA, 1996.
    • (1996) LIDS Report
    • Abounadi, J.1    Bertsekas, D.2    Borkar, V.3
  • 5
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two-time scales
    • Borkar V.S. Stochastic approximation with two-time scales. System and Control Letters. 29:1997;291-294.
    • (1997) System and Control Letters , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 7
    • 0742326372 scopus 로고    scopus 로고
    • The ode method for convergence of stochastic approximation and reinforcement learning
    • V.S. Borkar, S.P. Meyn, The ode method for convergence of stochastic approximation and reinforcement learning, Working paper.
    • Working Paper
    • Borkar, V.S.1    Meyn, S.P.2
  • 10
    • 0032643313 scopus 로고    scopus 로고
    • Solving semi-Markov decision problems using average reward reinforcement learning
    • Das T.K., Gosavi A., Mahadevan S., Marchalleck N. Solving semi-Markov decision problems using average reward reinforcement learning. Management Science. 45(4):1999;560-574.
    • (1999) Management Science , vol.45 , Issue.4 , pp. 560-574
    • Das, T.K.1    Gosavi, A.2    Mahadevan, S.3    Marchalleck, N.4
  • 11
    • 0032683886 scopus 로고    scopus 로고
    • Optimal preventive maintenance in a production inventory system
    • Das T.K., Sarkar S. Optimal preventive maintenance in a production inventory system. IIE Transactions on Quality and Reliability. 31:1999;537-551.
    • (1999) IIE Transactions on Quality and Reliability , vol.31 , pp. 537-551
    • Das, T.K.1    Sarkar, S.2
  • 12
    • 0742326370 scopus 로고    scopus 로고
    • Optimal and near-optimal control of a two-part stochastic manufacturing system with dynamic setups
    • at the Department of Industrial Engineering, University of Florida, Gainesville
    • M. Elhafsi, S. Bai, Optimal and near-optimal control of a two-part stochastic manufacturing system with dynamic setups, Research Report 95-10, at the Department of Industrial Engineering, University of Florida, Gainesville, 1997.
    • (1997) Research Report , vol.95 , Issue.10
    • Elhafsi, M.1    Bai, S.2
  • 14
    • 0742308756 scopus 로고    scopus 로고
    • On the convergence of some reinforcement learning algorithms
    • Department of Engineering, University of Southern Colorado, Pueblo
    • A. Gosavi, On the convergence of some reinforcement learning algorithms. Working paper, Department of Engineering, University of Southern Colorado, Pueblo, 2000.
    • (2000) Working Paper
    • Gosavi, A.1
  • 15
    • 0036722536 scopus 로고    scopus 로고
    • Airline seat allocation among multiple fare classes with overbooking
    • Gosavi A., Bandla N., Das T.K. Airline seat allocation among multiple fare classes with overbooking. IIE Transactions. 34(9):2002;729-742.
    • (2002) IIE Transactions , vol.34 , Issue.9 , pp. 729-742
    • Gosavi, A.1    Bandla, N.2    Das, T.K.3
  • 16
    • 84995317030 scopus 로고    scopus 로고
    • A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward
    • in press
    • A. Gosavi, T.K. Das, S. Sarkar, A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward. IIE Transactions (in press).
    • IIE Transactions
    • Gosavi, A.1    Das, T.K.2    Sarkar, S.3
  • 17
    • 0742291458 scopus 로고    scopus 로고
    • Actor-critic type learning algorithms for Markov decision processes
    • Indian Institute of Sciences, Bangalore, India
    • V.R. Konda, V.S. Borkar, Actor-critic type learning algorithms for Markov decision processes, Working paper, Indian Institute of Sciences, Bangalore, India.
    • Working Paper
    • Konda, V.R.1    Borkar, V.S.2
  • 20
    • 0017526570 scopus 로고
    • Analysis of recursive stochastic algorithms
    • Ljung L. Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control. 22:1977;551-575.
    • (1977) IEEE Transactions on Automatic Control , vol.22 , pp. 551-575
    • Ljung, L.1
  • 21
    • 0029752592 scopus 로고    scopus 로고
    • Average reward reinforcement learning: Foundations, algorithms, and empirical results
    • Mahadevan S. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning. 22(1):1996;159-195.
    • (1996) Machine Learning , vol.22 , Issue.1 , pp. 159-195
    • Mahadevan, S.1
  • 27
    • 0021595914 scopus 로고
    • Part selection policy for a flexible manufacturing cell feeding several production lines
    • Seidmann A., Schweitzer P.J. Part selection policy for a flexible manufacturing cell feeding several production lines. IIE Transactions. 16(4):1984;355-362.
    • (1984) IIE Transactions , vol.16 , Issue.4 , pp. 355-362
    • Seidmann, A.1    Schweitzer, P.J.2
  • 29
    • 0026168810 scopus 로고
    • Optimal control of a queuing network system with two types of customers
    • Shioyama T. Optimal control of a queuing network system with two types of customers. European Journal of Operational Research. 52:1991;367-372.
    • (1991) European Journal of Operational Research , vol.52 , pp. 367-372
    • Shioyama, T.1
  • 30
    • 0028574683 scopus 로고
    • Reinforcement learning algorithms for average-payoff Markovian decision processes
    • Cambridge, MA: MIT Press
    • Singh S. Reinforcement learning algorithms for average-payoff Markovian decision processes. Proceedings of the 12th AAAI. 1994;MIT Press, Cambridge, MA.
    • (1994) Proceedings of the 12th AAAI
    • Singh, S.1
  • 31
    • 0004102479 scopus 로고
    • Reinforcement Learning
    • special issue
    • Sutton R. Reinforcement Learning. Machine Learning Journal. 8(3):1992;5. (special issue).
    • (1992) Machine Learning Journal , vol.8 , Issue.3 , pp. 5
    • Sutton, R.1
  • 33
    • 0002313852 scopus 로고    scopus 로고
    • Scaling up average reward reinforcement learning by approximating the domain models and the value function
    • New York: Morgan Kaufmann
    • Tadepalli P., Ok D. Scaling up average reward reinforcement learning by approximating the domain models and the value function. Proceedings of the Thirteenth International Machine Learning Conference. 1996;471-479 Morgan Kaufmann, New York.
    • (1996) Proceedings of the Thirteenth International Machine Learning Conference , pp. 471-479
    • Tadepalli, P.1    Ok, D.2
  • 34
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and q-learning
    • Tsitsiklis J. Asynchronous stochastic approximation and q-learning. Machine Learning. 16:1994;185-202.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.1
  • 35
    • 0004049893 scopus 로고
    • Ph.D. Thesis, Kings College, Cambridge, England, May
    • C.J. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Kings College, Cambridge, England, May 1989.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.