SCOPUS 정보 검색 플랫폼

IIE Transactions (Institute of Industrial Engineers)

Volumn 36, Issue 4, 2004, Pages 373-385

A reinforcement learning approach to stochastic business games

(3) Ravulapati, Kiran Kumar a Rao, Jaideep b Das, Tapas K c

a Delta Technology (United States)

b Pilgrim Software (United States)

c University of South Florida (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; COMPETITION; COMPUTER SIMULATION; DECISION MAKING; ELECTRONIC COMMERCE; INTERNET; INVENTORY CONTROL; LEARNING SYSTEMS; MARKOV PROCESSES; MATRIX ALGEBRA; PROBABILITY; STRATEGIC PLANNING;

BUSINESS GAMES; STOCHASTIC GAMES;

GAME THEORY;

EID: 1642315516 PISSN: 0740817X EISSN: None Source Type: Journal
DOI: 10.1080/07408170490278698 Document Type: Article

Times cited : (25)

References (24)

1
- 0003874616
- LIDS-P-2434, Laboratory for Information and Decision Systems, MIT, Cambridge, MA
- Abounadi, J., Bertsekas, D. and Borkar, V.S. (1998) Learning algorithms for Markov decision processes with average cost report. LIDS-P-2434, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
- (1998) Learning Algorithms for Markov Decision Processes with Average Cost Report
- Abounadi, J.¹ Bertsekas, D.² Borkar, V.S.³

2
- 0013155747
- A general framework for the study of decentralized distribution systems
- Anupindi, R., Bassok, Y. and Zemel, E. (2001) A general framework for the study of decentralized distribution systems. Journal of Manufacturing and Service Operations Management, 3(4).
- (2001) Journal of Manufacturing and Service Operations Management , vol.3 , Issue.4
- Anupindi, R.¹ Bassok, Y.² Zemel, E.³

3
- 0003787146
- Princeton University Press, Princeton, NJ
- Bellman, R.E. (1957) Dynamic Programming, Princeton University Press, Princeton, NJ.
- (1957) Dynamic Programming
- Bellman, R.E.¹

4
- 0004211236
- Athena Scientific, Belmont, MA
- Bertsekas, D. and Tsitsiklis, J. (1996) Neurodynamic Programming, Athena Scientific, Belmont, MA.
- (1996) Neurodynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

5
- 84996565038
- Learning rate schedules for faster stochastic gradient search
- White, D.A. and Sofge, D.A. (eds.), IEEE Press, Piscataway, NJ
- Darken, C., Chang, J. and Moody, J. (1992) Learning rate schedules for faster stochastic gradient search, Neural Networks for Signal Processing 2 - Proceedings of the 1992 IEEE Workshop, in White, D.A. and Sofge, D.A. (eds.), IEEE Press, Piscataway, NJ.
- (1992) Neural Networks for Signal Processing 2 - Proceedings of the 1992 IEEE Workshop
- Darken, C.¹ Chang, J.² Moody, J.³

6
- 0032643313
- Solving semi-Markov decision problems using average reward reinforcement learning
- Das, T.K., Gosavi, A., Mahadevan, S. and Marchalleck, N. (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560-574.
- (1999) Management Science , vol.45 , Issue.4 , pp. 560-574
- Das, T.K.¹ Gosavi, A.² Mahadevan, S.³ Marchalleck, N.⁴

7
- 0038829878
- Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria
- Erev, I. and Roth, A.E. (1998) Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review, 88(4), 848-881.
- (1998) The American Economic Review , vol.88 , Issue.4 , pp. 848-881
- Erev, I.¹ Roth, A.E.²

8
- 0003989209
- Springer-Verlag, New York, NY
- Filar, J. and Vrieze, K. (1997) Competitive Markov Decision Processes, Springer-Verlag, New York, NY.
- (1997) Competitive Markov Decision Processes
- Filar, J.¹ Vrieze, K.²

9
- 0742319170
- Reinforcement learning for long-run average cost
- to appear
- Gosavi, A. (2004) Reinforcement learning for long-run average cost. European Journal of Operations Research, to appear.
- (2004) European Journal of Operations Research
- Gosavi, A.¹

10
- 0036722536
- A reinforcement learning approach to airline seat allocation for multiple fare classes with over-booking
- Gosavi, A., Bandla, N. and Das, T.K. (2002) A reinforcement learning approach to airline seat allocation for multiple fare classes with over-booking. IIE Transactions, 34(9), 729-742.
- (2002) IIE Transactions , vol.34 , Issue.9 , pp. 729-742
- Gosavi, A.¹ Bandla, N.² Das, T.K.³

11
- 0000929496
- Multi-agent reinforcement learning: Theoretical framework and an algorithm
- Hu, J. and Wellman, M.P. (1998) Multi-agent reinforcement learning: theoretical framework and an algorithm, Proceedings of the 15th International Conference on Machine Learning, pp. 242-250.
- (1998) Proceedings of the 15th International Conference on Machine Learning , pp. 242-250
- Hu, J.¹ Wellman, M.P.²

12
- 1642351771
- Learning Nash equilibrium for average reward irreducible stochastic games
- Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL 33620
- Li, J. and Das, T.K. (2003) Learning Nash equilibrium for average reward irreducible stochastic games. Working paper, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL 33620.
- (2003) Working Paper
- Li, J.¹ Das, T.K.²

13
- 85149834820
- Markov games as a framework for multi-agent reinforcement learning
- Littman, M.L. (1994) Markov games as a framework for multi-agent reinforcement learning, in Proceedings of the 11th International Conference on Machine Learning, pp. 157-163.
- (1994) Proceedings of the 11th International Conference on Machine Learning , pp. 157-163
- Littman, M.L.¹

14
- 0001730497
- Non-cooperative games
- Nash, J.F. (1951) Non-cooperative games. Annals of Mathematics, 54, 286-295.
- (1951) Annals of Mathematics , vol.54 , pp. 286-295
- Nash, J.F.¹

15
- 0016594972
- On the core of linear production games
- Owen, G. (1975) On the core of linear production games. Mathamatical Programming, 9, 358-370.
- (1975) Mathamatical Programming , vol.9 , pp. 358-370
- Owen, G.¹

16
- 0035124331
- Intelligent dynamic control policies for serial production lines
- Paternina, C.D. and Das, T.K. (2000) Intelligent dynamic control policies for serial production lines. IIE Transactions, 33(1), 65-77.
- (2000) IIE Transactions , vol.33 , Issue.1 , pp. 65-77
- Paternina, C.D.¹ Das, T.K.²

17
- 0003998452
- Wiley, New York, NY
- Puterman, M.L. (1994) Markov Decision Processes, Wiley, New York, NY.
- (1994) Markov Decision Processes
- Puterman, M.L.¹

18
- 84953405534
- Cambridge University Press, Oxford, UK
- Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge University Press, Oxford, UK.
- (1996) Pattern Recognition and Neural Networks
- Ripley, B.D.¹

19
- 0000016172
- A stochastic approximation method
- Robbins, H. and Monro, S. (1951) A stochastic approximation method. Annals of Mathematical and Statistics, 22, 400-407.
- (1951) Annals of Mathematical and Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

20
- 0346523383
- Competitive outcomes in the core of market games
- The Rand Corporation
- Shapley, L. and Shubik, M. (1975) Competitive outcomes in the core of market games. Technical report R-1692-NSF, The Rand Corporation.
- (1975) Technical Report , vol.R-1692-NSF
- Shapley, L.¹ Shubik, M.²

21
- 0004007508
- MIT Press, Cambridge, MA
- Sutton, R.S. and Barto, A. (1998) Reinforcement Learning, MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.²

22
- 0001081294
- Simplicial variable dimension algorithms for solving the nonlinear complimentary problem on a product of unit simplices using a general labeling
- Van der Lann, G., Talman, A.J.J. and Van der Heyden, L. (1987) Simplicial variable dimension algorithms for solving the nonlinear complimentary problem on a product of unit simplices using a general labeling. Mathematics of Operations Research, 377-397.
- (1987) Mathematics of Operations Research , pp. 377-397
- Van der Lann, G.¹ Talman, A.J.J.² Van der Heyden, L.³

23
- 0003787427
- Ph.D. thesis, Laboratory for Information and Decision Systems, MIT, Cambridge, MA
- Van Roy, B. (1998) Learning and value function approximation in complex decision processes. Ph.D. thesis, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
- (1998) Learning and Value Function Approximation in Complex Decision Processes
- Van Roy, B.¹

24
- 0004049893
- Ph.D. thesis, Cambridge University, Cambridge, UK
- Watkins, C.J.C.H. (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University, Cambridge, UK.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.