SCOPUS 정보 검색 플랫폼

Studies in Systems, Decision and Control

Volumn 42, Issue , 2015, Pages 29-52

The explore-exploit dilemma in nonstationary decision making under uncertainty

(2) Axelrod, Allan a Chowdhary, Girish a

a OKLAHOMA STATE UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 85028919187 PISSN: 21984182 EISSN: 21984190 Source Type: Book Series
DOI: 10.1007/978-3-319-26327-4_2 Document Type: Chapter

Times cited : (2)

References (61)

1
- 83755173254
- Traffic light control in non-stationary environments based onmulti agent q-learning
- Abdoos M, Mozayani N, Bazzan AL (2011) Traffic light control in non-stationary environments based onmulti agent q-learning. In: 2011 14th international IEEE conference on, IEEE Intelligent Transportation Systems (ITSC), pp 1580-1585
- (2011) 2011 14th international IEEE conference on, IEEE Intelligent Transportation Systems (ITSC) , pp. 1580-1585
- Abdoos, M.¹ Mozayani, N.² Bazzan, A.L.³

2
- 70049091620
- Tractable nonparametric bayesian inference in poisson processes with gaussian process intensities
- ACM
- Adams RP, Murray I, MacKay DJ (2009) Tractable nonparametric bayesian inference in poisson processes with gaussian process intensities. In: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, pp 9-16
- (2009) Proceedings of the 26th Annual International Conference on Machine Learning , pp. 9-16
- Adams, R.P.¹ Murray, I.² MacKay, D.J.³

3
- 84929155218
- Human aware path planning in urban environments with nonstationary mdps
- Hong Kong, China
- Allamaraju R, Kingravi H, Axelrod A, Chowdhary G, Grande R, Crick C, Sheng W, How J (2014) Human aware path planning in urban environments with nonstationary mdps. In: IEEE international conference on robotics and automation, Hong Kong, China
- (2014) IEEE international conference on robotics and automation
- Allamaraju, R.¹ Kingravi, H.² Axelrod, A.³ Chowdhary, G.⁴ Grande, R.⁵ Crick, C.⁶ Sheng, W.⁷ How, J.⁸

4
- 85085850027
- AIAA Aerospace science and technology forum, Kissimmee, FL
- Axelrod A, Chowdhary G (2015) Adaptive algorithms for autonomous data-ferrying in nonstationary environments. In: AIAA Aerospace science and technology forum, Kissimmee, FL
- (2015) Adaptive algorithms for autonomous data-ferrying in nonstationary environments
- Axelrod, A.¹ Chowdhary, G.²

5
- 0032022695
- The information-theoretic capacity of discrete-time queues
- Bedekar AS, AzizogluM(1998) The information-theoretic capacity of discrete-time queues. IEEE Trans Inf Theory 44(2):446-461
- (1998) IEEE Trans Inf Theory , vol.44 , Issue.2 , pp. 446-461
- Bedekar, A.S.¹ Azizoglu, M.²

6
- 84870655865
- Technical report
- Bodik P, Hong W, Guestrin C, Madden S, Paskin M, Thibaux R (2004) Intel lab data. Technical report
- (2004) Intel lab data
- Bodik, P.¹ Hong, W.² Guestrin, C.³ Madden, S.⁴ Paskin, M.⁵ Thibaux, R.⁶

7
- 0030675610
- Efficient reinforcement learning: Model-based acrobot control
- IEEE
- Boone G (1997) Efficient reinforcement learning: Model-based acrobot control. In: Robotics and Automation, 1997. Proceedings., 1997 IEEE International Conference on, IEEE, vol 1, pp 229-234
- (1997) Robotics and Automation, 1997. Proceedings., 1997 IEEE International Conference on , vol.1 , pp. 229-234
- Boone, G.¹

8
- 85046476577
- 1st edn. CRC Press
- Busoniu L, Babuska R, Schutter BD, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators, 1st edn. CRC Press
- (2010) Reinforcement learning and dynamic programming using function approximators
- Busoniu, L.¹ Babuska, R.² Schutter, B.D.³ Ernst, D.⁴

9
- 33749246501
- Hidden-mode markov decision processes for nonstationary sequential decision making
- Springer
- Choi SP, Yeung DY, Zhang NL (2001) Hidden-mode markov decision processes for nonstationary sequential decision making. In: Sequence learning, Springer, pp 264-287
- (2001) Sequence learning , pp. 264-287
- Choi, S.P.¹ Yeung, D.Y.² Zhang, N.L.³

10
- 50249102821
- The rate-distortion function of a poisson process with a queueing distortion measure
- DCC 2008, IEEE
- Coleman TP, Kiyavash N, SubramanianVG(2008) The rate-distortion function of a poisson process with a queueing distortion measure. In: Data Compression Conference, DCC 2008, IEEE, pp 63-72
- (2008) Data Compression Conference , pp. 63-72
- Coleman, T.P.¹ Kiyavash, N.² Subramanian, V.G.³

11
- 0038891993
- Sparse on-line Gaussian processes
- Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641-668
- (2002) Neural Comput , vol.14 , Issue.3 , pp. 641-668
- Csató, L.¹ Opper, M.²

12
- 33745223257
- Cortical substrates for exploratory decisions in humans
- Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095):876-879
- (2006) Nature , vol.441 , Issue.7095 , pp. 876-879
- Daw, N.D.¹ O’Doherty, J.P.² Dayan, P.³ Seymour, B.⁴ Dolan, R.J.⁵

13
- 84939061075
- Traffic modeling for telecommunications networks
- Frost VS, Melamed B (1994) Traffic modeling for telecommunications networks. IEEE Commun Mag 32(3):70-81
- (1994) IEEE Commun Mag , vol.32 , Issue.3 , pp. 70-81
- Frost, V.S.¹ Melamed, B.²

14
- 70049104217
- arXiv preprint arXiv:08053415
- Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:08053415
- (2008) On upper-confidence bound policies for non-stationary bandit problems
- Garivier, A.¹ Moulines, E.²

15
- 0004012196
- Taylor & Francis
- Gelman A, Carlin JB, Stern HS, Rubin DB (2014) Bayesian data analysis, vol 2. Taylor & Francis
- (2014) Bayesian data analysis , vol.2
- Gelman, A.¹ Carlin, J.B.² Stern, H.S.³ Rubin, D.B.⁴

16
- 84890920160
- A tutorial on linear function approximators for dynamic programming and reinforcement learning
- Geramifard A, Walsh TJ, Tellex S, Chowdhary G, Roy N, How JP (2013) A tutorial on linear function approximators for dynamic programming and reinforcement learning. Foundations and Trends® in Machine Learning 6(4): 375-451. doi:10.1561/2200000042
- (2013) Foundations and Trends® in Machine Learning , vol.6 , Issue.4 , pp. 375-451
- Geramifard, A.¹ Walsh, T.J.² Tellex, S.³ Chowdhary, G.⁴ Roy, N.⁵ How, J.P.⁶

17
- 84946071785
- Ph.D thesis, Massachusetts Institute of Technology
- GrandeRC(2014) Computationally efficient gaussian process changepoint detection and regression. Ph.D thesis, Massachusetts Institute of Technology
- (2014) Computationally efficient gaussian process changepoint detection and regression
- Grande, R.C.¹

18
- 85019807023
- URL
- Grande RC, Walsh TJ, How JP (2014) Sample efficient reinforcement learning with Gaussian processes. URL http://acl.mit.edu/papers/Grande14_ICML.pdf
- (2014) Sample efficient reinforcement learning with Gaussian processes
- Grande, R.C.¹ Walsh, T.J.² How, J.P.³

19
- 79551524402
- Solving non-stationary bandit problems by random sampling from sibling kalman filters
- Springer
- Granmo OC, Berg S (2010) Solving non-stationary bandit problems by random sampling from sibling kalman filters. In: Trends in applied intelligent systems, Springer, pp 199-208
- (2010) Trends in applied intelligent systems , pp. 199-208
- Granmo, O.C.¹ Berg, S.²

20
- 84871756682
- A survey of actor-critic reinforcement learning: Standard and natural policy gradients
- Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst, Man, and Cybern, Part C: Appl Rev 42(6):1291-1307
- (2012) IEEE Trans Syst, Man, and Cybern, Part C: Appl Rev , vol.42 , Issue.6 , pp. 1291-1307
- Grondman, I.¹ Busoniu, L.² Lopes, G.A.³ Babuska, R.⁴

21
- 84937825024
- arXiv preprint arXiv:14076949
- Gunter T, Lloyd C, Osborne MA, Roberts SJ (2014) Efficient bayesian nonparametric modelling of structured point processes. arXiv preprint arXiv:14076949
- (2014) Efficient bayesian nonparametric modelling of structured point processes
- Gunter, T.¹ Lloyd, C.² Osborne, M.A.³ Roberts, S.J.⁴

22
- 43749104456
- Mutual information and conditional mean estimation in poisson channels
- Guo D, Shamai S, Verdú S (2008) Mutual information and conditional mean estimation in poisson channels. IEEE Trans Inf Theory 54(5):1837-1849
- (2008) IEEE Trans Inf Theory , vol.54 , Issue.5 , pp. 1837-1849
- Guo, D.¹ Shamai, S.² Verdú, S.³

23
- 84937906754
- Stochastic multi-armed-bandit problem with non-stationary rewards
- Gur Y, Zeevi A, Besbes O (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. In: Advances in neural information processing systems, pp 199-207
- (2014) Advances in neural information processing systems , pp. 199-207
- Gur, Y.¹ Zeevi, A.² Besbes, O.³

24
- 0035397657
- Binomial and poisson distributions as maximum entropy distributions
- Harremoës P (2001) Binomial and poisson distributions as maximum entropy distributions. IEEE Trans Inf Theory 47(5):2039-2041
- (2001) IEEE Trans Inf Theory , vol.47 , Issue.5 , pp. 2039-2041
- Harremoës, P.¹

25
- 4544261302
- Harremoës P, Ruzankin P (2004) Rate of convergence to poisson law in terms of information divergence
- (2004) Rate of convergence to poisson law in terms of information divergence
- Harremoës, P.¹ Ruzankin, P.²

26
- 46749140150
- Thinning and the law of small numbers
- ISIT 2007. IEEE
- Harremoës P, Johnson O, Kontoyiannis I (2007) Thinning and the law of small numbers. IEEE International Symposium on Information Theory, ISIT 2007. IEEE, pp 1491-1495
- (2007) IEEE International Symposium on Information Theory , pp. 1491-1495
- Harremoës, P.¹ Johnson, O.² Kontoyiannis, I.³

27
- 51649090077
- A nash equilibrium related to the poisson channel
- Harremoës P, Vignat C et al (2003) A nash equilibrium related to the poisson channel. Commun Inf Syst 3(3):183-190
- (2003) Commun Inf Syst , vol.3 , Issue.3 , pp. 183-190
- Harremoës, P.¹ Vignat, C.²

28
- 34547516258
- Approximating the kullback leibler divergence between gaussian mixture models
- Hershey JR, Olsen PA (2007) Approximating the kullback leibler divergence between gaussian mixture models. In: ICASSP (4), pp 317-320
- (2007) ICASSP , vol.4 , pp. 317-320
- Hershey, J.R.¹ Olsen, P.A.²

29
- 84874698101
- Texplore: Real-time sample-efficient reinforcement learning for robots
- Hester T, Stone P (2013) Texplore: real-time sample-efficient reinforcement learning for robots. Mach learning 90(3):385-429
- (2013) Mach learning , vol.90 , Issue.3 , pp. 385-429
- Hester, T.¹ Stone, P.²

30
- 34247645455
- Log-concavity and the maximum entropy property of the poisson distribution
- Johnson O (2007) Log-concavity and the maximum entropy property of the poisson distribution. Stoch Process Appl 117(6):791-802
- (2007) Stoch Process Appl , vol.117 , Issue.6 , pp. 791-802
- Johnson, O.¹

31
- 85028948903
- Kakade S, Langford J, Kearns M (2003) Exploration in metric state spaces
- (2003) Exploration in metric state spaces
- Kakade, S.¹ Langford, J.² Kearns, M.³

32
- 8344223694
- A nonstationary poisson view of internet traffic
- INFOCOM 2004, IEEE
- Karagiannis T, Molle M, Faloutsos M, Broido A (2004) A nonstationary poisson view of internet traffic. In: INFOCOM 2004. Twenty-third annualjoint conference of the IEEE computer and communications societies. IEEE, vol 3, pp 1558-1569
- (2004) Twenty-third annualjoint conference of the IEEE computer and communications societies , vol.3 , pp. 1558-1569
- Karagiannis, T.¹ Molle, M.² Faloutsos, M.³ Broido, A.⁴

33
- 84866711082
- Anytime motion planning using the RRT*
- IEEE
- Karaman S, Walter M, Perez A, Frazzoli E, Teller S (2011) Anytime motion planning using the RRT*. In: International conference on robotics and automation. IEEE, pp 1478-1483
- (2011) International conference on robotics and automation , pp. 1478-1483
- Karaman, S.¹ Walter, M.² Perez, A.³ Frazzoli, E.⁴ Teller, S.⁵

34
- 71149109483
- Near-bayesian exploration in polynomial time
- ACM
- Kolter JZ, Ng AY (2009) Near-bayesian exploration in polynomial time. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 513-520
- (2009) Proceedings of the 26th annual international conference on machine learning , pp. 513-520
- Kolter, J.Z.¹ Ng, A.Y.²

35
- 13444292359
- Entropy and the law of small numbers
- Kontoyiannis I, Harremoës P, Johnson O (2005) Entropy and the law of small numbers. IEEE Trans Inf Theory 51(2):466-472
- (2005) IEEE Trans Inf Theory , vol.51 , Issue.2 , pp. 466-472
- Kontoyiannis, I.¹ Harremoës, P.² Johnson, O.³

36
- 38649118249
- Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
- Koulouriotis DE, Xanthopoulos A (2008) Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Appl Math Comput 196(2):913-922
- (2008) Appl Math Comput , vol.196 , Issue.2 , pp. 913-922
- Koulouriotis, D.E.¹ Xanthopoulos, A.²

37
- 4644323293
- Least-squares policy iteration
- URL
- Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107-1149, URL http://dl.acm.org/citation.cfm-id=945365.964290
- (2003) J Mach Learn Res , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

38
- 84892407563
- Expected entropy as a measure and criterion of randomness of binary sequences
- Leśniewicz M (2014) Expected entropy as a measure and criterion of randomness of binary sequences. Przeglad Elektrotechniczny 90(1):42-46
- (2014) Przeglad Elektrotechniczny , vol.90 , Issue.1 , pp. 42-46
- Leśniewicz, M.¹

39
- 85028936685
- Markov decision processes (MDP) toolbox (2012). http://www7.inra.fr/mia/T/MDPtoolbox/MDPtoolbox.html
- (2012)

40
- 0040843527
- Log gaussian cox processes
- Møller J, Syversveen AR, Waagepetersen RP (1998) Log gaussian cox processes. Scand J Stat 25(3):451-482
- (1998) Scand J Stat , vol.25 , Issue.3 , pp. 451-482
- Møller, J.¹ Syversveen, A.R.² Waagepetersen, R.P.³

41
- 84902145619
- Efficient distributed sensing using adaptive censoring-based inference
- Mu B, Chowdhary G, How J (2014) Efficient distributed sensing using adaptive censoring-based inference. Automatica
- (2014) Automatica
- Mu, B.¹ Chowdhary, G.² How, J.³

42
- 84898921982
- Pazis J, Parr R (2013) Pac optimal exploration in continuous space markov decision processes
- (2013) Pac optimal exploration in continuous space markov decision processes
- Pazis, J.¹ Parr, R.²

43
- 0037319560
- Entropy and the timing capacity of discrete queues
- Prabhakar B, Gallager R (2003) Entropy and the timing capacity of discrete queues. IEEE Trans Inf Theory 49(2):357-370
- (2003) IEEE Trans Inf Theory , vol.49 , Issue.2 , pp. 357-370
- Prabhakar, B.¹ Gallager, R.²

44
- 25444448065
- The MIT Press
- Rasmussen C,Williams C (2005) Gaussian processes for machine learning (Adaptive Computation and Machine Learning). The MIT Press
- (2005) Gaussian processes for machine learning (Adaptive Computation and Machine Learning)
- Rasmussen, C.¹ Williams, C.²

45
- 84874248431
- Towards optimization of a human-inspired heuristic for solving explore-exploit problems
- Reverdy P, Wilson RC, Holmes P, Leonard NE (2012) Towards optimization of a human-inspired heuristic for solving explore-exploit problems. In: CDC, pp 2820-2825
- (2012) CDC , pp. 2820-2825
- Reverdy, P.¹ Wilson, R.C.² Holmes, P.³ Leonard, N.E.⁴

46
- 84908593736
- arXiv preprint arXiv:12063281
- Ross S, Pineau J (2012) Model-based bayesian reinforcement learning in large structured domains. arXiv preprint arXiv:12063281
- (2012) Model-based bayesian reinforcement learning in large structured domains
- Ross, S.¹ Pineau, J.²

47
- 0016036648
- Information rates and data-compression schemes for poisson processes
- Rubin I (1974) Information rates and data-compression schemes for poisson processes. IEEE Trans Inf Theory 20(2):200-210
- (1974) IEEE Trans Inf Theory , vol.20 , Issue.2 , pp. 200-210
- Rubin, I.¹

48
- 84865131152
- A generalized representer theorem
- Helmbold D, Williamson B (eds), Lecture notes in computer scienceSpringer, Berlin, URL
- Scholkopf B, Herbrich R, Smola A (2001) A generalized representer theorem. In: Helmbold D, Williamson B (eds) Computational learning theory., Lecture notes in computer scienceSpringer, Berlin, pp 416-426 URL http://dx.doi.org/10.1007/3-540-44581-1_27
- (2001) Computational learning theory , pp. 416-426
- Scholkopf, B.¹ Herbrich, R.² Smola, A.³

49
- 0002691611
- Fisher discriminant analysis with kernels
- Madison, WI, USA
- Scholkopft B, Mullert KR (1999) Fisher discriminant analysis with kernels. Proceedings of the IEEE signal processing society workshop neural networks for signal processing IX. Madison, WI, USA, pp 23-25
- (1999) Proceedings of the IEEE signal processing society workshop neural networks for signal processing IX , pp. 23-25
- Scholkopft, B.¹ Mullert, K.R.²

50
- 0004102479
- MIT Press, Cambridge
- Sutton R, Barto A (1998) Reinforcement learning, an introduction. MIT Press, Cambridge
- (1998) Reinforcement learning, an introduction
- Sutton, R.¹ Barto, A.²

51
- 27744518715
- MIT press Cambridge
- Thrun S, Burgard W, Fox D, et al (2005) Probabilistic robotics, vol 1. MIT press Cambridge
- (2005) Probabilistic robotics , vol.1
- Thrun, S.¹ Burgard, W.² Fox, D.³

52
- 0003411271
- Carnegie-MellonUniversity,Technical report
- Thrun SB(1992) Efficient exploration in reinforcement learning. Carnegie-MellonUniversity,Technical report
- (1992) Efficient exploration in reinforcement learning
- Thrun, S.B.¹

53
- 0031143730
- An analysis of temporal difference learning with function approximation
- Tsitsiklis JN, Roy BV (1997) An analysis of temporal difference learning with function approximation. IEEE Trans Autom Control 42(5):674-690
- (1997) IEEE Trans Autom Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Roy, B.V.²

54
- 85042936847
- Bayesian reinforcement learning
- Springer
- Vlassis N, Ghavamzadeh M, Mannor S, Poupart P (2012) Bayesian reinforcement learning. In: Reinforcement learning, Springer, pp 359-386
- (2012) Reinforcement learning , pp. 359-386
- Vlassis, N.¹ Ghavamzadeh, M.² Mannor, S.³ Poupart, P.⁴

55
- 84864491752
- AAAI
- Walsh TJ, Goschin S, Littman ML (2010) Integrating sample-based planning and model-based reinforcement learning. In: AAAI
- (2010) Integrating sample-based planning and model-based reinforcement learning
- Walsh, T.J.¹ Goschin, S.² Littman, M.L.³

56
- 0000807870
- Q-learning
- Watkins CJCH, Dayan P (1992) Q-learning. J Mach Learn 16:185-202
- (1992) J Mach Learn , vol.16 , pp. 185-202
- Watkins, C.J.C.H.¹ Dayan, P.²

57
- 0345161977
- Ph.D thesis, University of Amsterdam/IDSIA
- Wiering MA (1999) Explorations in efficient reinforcement learning. Ph.D thesis, University of Amsterdam/IDSIA
- (1999) Explorations in efficient reinforcement learning
- Wiering, M.A.¹

58
- 84925600345
- Humans use directed and random exploration to solve the explore-exploit dilemma
- Wilson RC, Geana A, White JM, Ludvig EA, Cohen JD (2014) Humans use directed and random exploration to solve the explore-exploit dilemma. J Exp Psychol: Gen 143(6):2074
- (2014) J Exp Psychol: Gen , vol.143 , Issue.6 , pp. 2074
- Wilson, R.C.¹ Geana, A.² White, J.M.³ Ludvig, E.A.⁴ Cohen, J.D.⁵

59
- 0000723439
- Explore/exploit strategies in autonomy
- Wilson SW, et al (1996) Explore/exploit strategies in autonomy. In: From animals to animats 4: Proceedings of the 4th international conference on simulation of adaptive behavior, pp 325-332
- (1996) From animals to animats 4: Proceedings of the 4th international conference on simulation of adaptive behavior , pp. 325-332
- Wilson, S.W.¹

60
- 70349986740
- Online learning in markov decision processes with arbitrarily changing rewards and transitions
- IEEE
- Yu JY, Mannor S (2009) Online learning in markov decision processes with arbitrarily changing rewards and transitions. In: International conference on game theory for networks, GameNets’ 09. IEEE, pp 314-322
- (2009) International conference on game theory for networks, GameNets’ 09 , pp. 314-322
- Yu, J.Y.¹ Mannor, S.²

61
- 84937929311
- arXiv preprint arXiv:14015547
- Zhou Z, Matteson DS, Woodard DB, Henderson SG, Micheas AC (2014) A spatio-temporal point process model for ambulance demand. arXiv preprint arXiv:14015547
- (2014) A spatio-temporal point process model for ambulance demand
- Zhou, Z.¹ Matteson, D.S.² Woodard, D.B.³ Henderson, S.G.⁴ Micheas, A.C.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.