메뉴 건너뛰기




Volumn 42, Issue , 2015, Pages 29-52

The explore-exploit dilemma in nonstationary decision making under uncertainty

Author keywords

[No Author keywords available]

Indexed keywords


EID: 85028919187     PISSN: 21984182     EISSN: 21984190     Source Type: Book Series    
DOI: 10.1007/978-3-319-26327-4_2     Document Type: Chapter
Times cited : (2)

References (61)
  • 5
    • 0032022695 scopus 로고    scopus 로고
    • The information-theoretic capacity of discrete-time queues
    • Bedekar AS, AzizogluM(1998) The information-theoretic capacity of discrete-time queues. IEEE Trans Inf Theory 44(2):446-461
    • (1998) IEEE Trans Inf Theory , vol.44 , Issue.2 , pp. 446-461
    • Bedekar, A.S.1    Azizoglu, M.2
  • 9
    • 33749246501 scopus 로고    scopus 로고
    • Hidden-mode markov decision processes for nonstationary sequential decision making
    • Springer
    • Choi SP, Yeung DY, Zhang NL (2001) Hidden-mode markov decision processes for nonstationary sequential decision making. In: Sequence learning, Springer, pp 264-287
    • (2001) Sequence learning , pp. 264-287
    • Choi, S.P.1    Yeung, D.Y.2    Zhang, N.L.3
  • 10
    • 50249102821 scopus 로고    scopus 로고
    • The rate-distortion function of a poisson process with a queueing distortion measure
    • DCC 2008, IEEE
    • Coleman TP, Kiyavash N, SubramanianVG(2008) The rate-distortion function of a poisson process with a queueing distortion measure. In: Data Compression Conference, DCC 2008, IEEE, pp 63-72
    • (2008) Data Compression Conference , pp. 63-72
    • Coleman, T.P.1    Kiyavash, N.2    Subramanian, V.G.3
  • 11
    • 0038891993 scopus 로고    scopus 로고
    • Sparse on-line Gaussian processes
    • Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641-668
    • (2002) Neural Comput , vol.14 , Issue.3 , pp. 641-668
    • Csató, L.1    Opper, M.2
  • 12
    • 33745223257 scopus 로고    scopus 로고
    • Cortical substrates for exploratory decisions in humans
    • Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095):876-879
    • (2006) Nature , vol.441 , Issue.7095 , pp. 876-879
    • Daw, N.D.1    O’Doherty, J.P.2    Dayan, P.3    Seymour, B.4    Dolan, R.J.5
  • 13
    • 84939061075 scopus 로고
    • Traffic modeling for telecommunications networks
    • Frost VS, Melamed B (1994) Traffic modeling for telecommunications networks. IEEE Commun Mag 32(3):70-81
    • (1994) IEEE Commun Mag , vol.32 , Issue.3 , pp. 70-81
    • Frost, V.S.1    Melamed, B.2
  • 19
    • 79551524402 scopus 로고    scopus 로고
    • Solving non-stationary bandit problems by random sampling from sibling kalman filters
    • Springer
    • Granmo OC, Berg S (2010) Solving non-stationary bandit problems by random sampling from sibling kalman filters. In: Trends in applied intelligent systems, Springer, pp 199-208
    • (2010) Trends in applied intelligent systems , pp. 199-208
    • Granmo, O.C.1    Berg, S.2
  • 22
    • 43749104456 scopus 로고    scopus 로고
    • Mutual information and conditional mean estimation in poisson channels
    • Guo D, Shamai S, Verdú S (2008) Mutual information and conditional mean estimation in poisson channels. IEEE Trans Inf Theory 54(5):1837-1849
    • (2008) IEEE Trans Inf Theory , vol.54 , Issue.5 , pp. 1837-1849
    • Guo, D.1    Shamai, S.2    Verdú, S.3
  • 24
    • 0035397657 scopus 로고    scopus 로고
    • Binomial and poisson distributions as maximum entropy distributions
    • Harremoës P (2001) Binomial and poisson distributions as maximum entropy distributions. IEEE Trans Inf Theory 47(5):2039-2041
    • (2001) IEEE Trans Inf Theory , vol.47 , Issue.5 , pp. 2039-2041
    • Harremoës, P.1
  • 27
    • 51649090077 scopus 로고    scopus 로고
    • A nash equilibrium related to the poisson channel
    • Harremoës P, Vignat C et al (2003) A nash equilibrium related to the poisson channel. Commun Inf Syst 3(3):183-190
    • (2003) Commun Inf Syst , vol.3 , Issue.3 , pp. 183-190
    • Harremoës, P.1    Vignat, C.2
  • 28
    • 34547516258 scopus 로고    scopus 로고
    • Approximating the kullback leibler divergence between gaussian mixture models
    • Hershey JR, Olsen PA (2007) Approximating the kullback leibler divergence between gaussian mixture models. In: ICASSP (4), pp 317-320
    • (2007) ICASSP , vol.4 , pp. 317-320
    • Hershey, J.R.1    Olsen, P.A.2
  • 29
    • 84874698101 scopus 로고    scopus 로고
    • Texplore: Real-time sample-efficient reinforcement learning for robots
    • Hester T, Stone P (2013) Texplore: real-time sample-efficient reinforcement learning for robots. Mach learning 90(3):385-429
    • (2013) Mach learning , vol.90 , Issue.3 , pp. 385-429
    • Hester, T.1    Stone, P.2
  • 30
    • 34247645455 scopus 로고    scopus 로고
    • Log-concavity and the maximum entropy property of the poisson distribution
    • Johnson O (2007) Log-concavity and the maximum entropy property of the poisson distribution. Stoch Process Appl 117(6):791-802
    • (2007) Stoch Process Appl , vol.117 , Issue.6 , pp. 791-802
    • Johnson, O.1
  • 36
    • 38649118249 scopus 로고    scopus 로고
    • Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
    • Koulouriotis DE, Xanthopoulos A (2008) Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Appl Math Comput 196(2):913-922
    • (2008) Appl Math Comput , vol.196 , Issue.2 , pp. 913-922
    • Koulouriotis, D.E.1    Xanthopoulos, A.2
  • 37
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • URL
    • Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107-1149, URL http://dl.acm.org/citation.cfm-id=945365.964290
    • (2003) J Mach Learn Res , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 38
    • 84892407563 scopus 로고    scopus 로고
    • Expected entropy as a measure and criterion of randomness of binary sequences
    • Leśniewicz M (2014) Expected entropy as a measure and criterion of randomness of binary sequences. Przeglad Elektrotechniczny 90(1):42-46
    • (2014) Przeglad Elektrotechniczny , vol.90 , Issue.1 , pp. 42-46
    • Leśniewicz, M.1
  • 39
    • 85028936685 scopus 로고    scopus 로고
    • Markov decision processes (MDP) toolbox (2012). http://www7.inra.fr/mia/T/MDPtoolbox/MDPtoolbox.html
    • (2012)
  • 41
    • 84902145619 scopus 로고    scopus 로고
    • Efficient distributed sensing using adaptive censoring-based inference
    • Mu B, Chowdhary G, How J (2014) Efficient distributed sensing using adaptive censoring-based inference. Automatica
    • (2014) Automatica
    • Mu, B.1    Chowdhary, G.2    How, J.3
  • 43
    • 0037319560 scopus 로고    scopus 로고
    • Entropy and the timing capacity of discrete queues
    • Prabhakar B, Gallager R (2003) Entropy and the timing capacity of discrete queues. IEEE Trans Inf Theory 49(2):357-370
    • (2003) IEEE Trans Inf Theory , vol.49 , Issue.2 , pp. 357-370
    • Prabhakar, B.1    Gallager, R.2
  • 45
    • 84874248431 scopus 로고    scopus 로고
    • Towards optimization of a human-inspired heuristic for solving explore-exploit problems
    • Reverdy P, Wilson RC, Holmes P, Leonard NE (2012) Towards optimization of a human-inspired heuristic for solving explore-exploit problems. In: CDC, pp 2820-2825
    • (2012) CDC , pp. 2820-2825
    • Reverdy, P.1    Wilson, R.C.2    Holmes, P.3    Leonard, N.E.4
  • 47
    • 0016036648 scopus 로고
    • Information rates and data-compression schemes for poisson processes
    • Rubin I (1974) Information rates and data-compression schemes for poisson processes. IEEE Trans Inf Theory 20(2):200-210
    • (1974) IEEE Trans Inf Theory , vol.20 , Issue.2 , pp. 200-210
    • Rubin, I.1
  • 48
    • 84865131152 scopus 로고    scopus 로고
    • A generalized representer theorem
    • Helmbold D, Williamson B (eds), Lecture notes in computer scienceSpringer, Berlin, URL
    • Scholkopf B, Herbrich R, Smola A (2001) A generalized representer theorem. In: Helmbold D, Williamson B (eds) Computational learning theory., Lecture notes in computer scienceSpringer, Berlin, pp 416-426 URL http://dx.doi.org/10.1007/3-540-44581-1_27
    • (2001) Computational learning theory , pp. 416-426
    • Scholkopf, B.1    Herbrich, R.2    Smola, A.3
  • 53
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • Tsitsiklis JN, Roy BV (1997) An analysis of temporal difference learning with function approximation. IEEE Trans Autom Control 42(5):674-690
    • (1997) IEEE Trans Autom Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Roy, B.V.2
  • 58
    • 84925600345 scopus 로고    scopus 로고
    • Humans use directed and random exploration to solve the explore-exploit dilemma
    • Wilson RC, Geana A, White JM, Ludvig EA, Cohen JD (2014) Humans use directed and random exploration to solve the explore-exploit dilemma. J Exp Psychol: Gen 143(6):2074
    • (2014) J Exp Psychol: Gen , vol.143 , Issue.6 , pp. 2074
    • Wilson, R.C.1    Geana, A.2    White, J.M.3    Ludvig, E.A.4    Cohen, J.D.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.