SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 82, Issue 3, 2011, Pages 399-443

Knows what it knows: A framework for self-aware learning

(4) Li, Lihong a Littman, Michael L b Walsh, Thomas J c Strehl, Alexander L d

a YAHOO RESEARCH (United States)

b RUTGERS UNIVERSITY (United States)

c University of Arizona (United States)

d FACEBOOK (United States)

Author keywords

Active learning; Computational learning theory; Exploration; Knows What It Knows (KWIK); Mistake bound; Probably Approximately Correct (PAC); Reinforcement learning

Indexed keywords

ACTIVE LEARNING; COMPUTATIONAL LEARNING THEORY; KNOWS WHAT IT KNOWS (KWIK); MISTAKE BOUNDS; PROBABLY APPROXIMATELY CORRECT;

COMPUTATION THEORY;

REINFORCEMENT LEARNING;

EID: 79958797519 PISSN: 08856125 EISSN: 15730565 Source Type: Journal
DOI: 10.1007/s10994-010-5225-4 Document Type: Article

Times cited : (117)

References (61)

1
- 31844444663
- Exploration and apprenticeship learning in reinforcement learning
- DOI 10.1145/1102351.1102352, ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
- Abbeel, P., & Ng, A. Y. (2005). Exploration and apprenticeship learning in reinforcement learning. In Proceedings of the twenty-second international conference on machine learning (pp. 1-8). (Pubitemid 43183309)
- (2005) ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning , pp. 1-8
- Abbeel, P.¹ Ng, A.Y.²

2
- 0000710299
- Queries and concept learning
- Angluin, D. (1988). Queries and concept learning. Machine Learning, 2, 319-342.
- (1988) Machine Learning , vol.2 , pp. 319-342
- Angluin, D.¹

3
- 0742284346
- Queries revisited
- Angluin, D. (2004). Queries revisited. Theoretical Computer Science, 313, 175-194.
- (2004) Theoretical Computer Science , vol.313 , pp. 175-194
- Angluin, D.¹

4
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397-422.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
- Auer, P.¹

5
- 1942450194
- Technical Report CMU-RI-TR-01-25). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA
- Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report CMU-RI-TR-01-25). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
- (2001) Solving Uncertain Markov Decision Problems
- Bagnell, J.¹ Ng, A.Y.² Schneider, J.³

6
- 0347113516
- University Press
- Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.
- (1957) Dynamic Programming. Princeton: Princeton
- Bellman, R.¹

7
- 0003923091
- New York: Academic Press
- Bertsekas, D., & Shreve, S. (1978). Stochastic optimal control: The discrete time case. New York: Academic Press.
- (1978) Stochastic Optimal Control: The Discrete Time Case
- Bertsekas, D.¹ Shreve, S.²

8
- 0028517062
- Separating distribution-free and mistake-bound learning models over the Boolean domain
- Blum, A. (1994). Separating distribution-free and mistake-bound learning models over the Boolean domain. SIAM Journal on Computing, 23, 990-1000.
- (1994) SIAM Journal on Computing , vol.23 , pp. 990-1000
- Blum, A.¹

9
- 0346942368
- Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
- Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94. (Pubitemid 129628760)
- (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
- Boutilier, C.¹ Dean, T.² Hanks, S.³

10
- 0041965975
- R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R. I., & Tennenholtz, M. (2002). R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

11
- 70049084399
- CORL: A continuous-state offsetdynamics reinforcement learner
- Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., & Roy, N. (2008). CORL: A continuous-state offsetdynamics reinforcement learner. In Proceedings of the twenty-fourth conference on uncertainty in artificial intelligence (UAI-08) (pp. 53-61).
- (2008) Proceedings of the Twenty-fourth Conference on Uncertainty in Artificial Intelligence (UAI-08) , pp. 53-61
- Brunskill, E.¹ Leffler, B.R.² Li, L.³ Littman, M.L.⁴ Roy, N.⁵

12
- 70349416596
- Provably efficient learning with typed parametric models
- Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., & Roy, N. (2009). Provably efficient learning with typed parametric models. Journal of Machine Learning Research, 10, 1955-1988.
- (2009) Journal of Machine Learning Research , vol.10 , pp. 1955-1988
- Brunskill, E.¹ Leffler, B.R.² Li, L.³ Littman, M.L.⁴ Roy, N.⁵

13
- 20544462399
- Minimizing regret with label efficient prediction
- DOI 10.1109/TIT.2005.847729
- Cesa-Bianchi, N., Lugosi, G., & Stoltz, G. (2005). Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51, 2152-2162. (Pubitemid 40843632)
- (2005) IEEE Transactions on Information Theory , vol.51 , Issue.6 , pp. 2152-2162
- Cesa-Bianchi, N.¹ Lugosi, G.² Stoltz, G.³

14
- 33745738567
- Worst-case analysis of selective sampling for linear classification
- Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006). Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 7, 1205-1230. (Pubitemid 44015299)
- (2006) Journal of Machine Learning Research , vol.7 , pp. 1205-1230
- Cesa-Bianchi, N.¹ Gentile, C.² Zaniboni, L.³

15
- 71149102767
- Robust bounds for classification via selective sampling
- Cesa-Bianchi, N., Gentile, C., & Orabona, F. (2009). Robust bounds for classification via selective sampling. In Proceedings of the twenty-sixth international conference on machine learning (ICML-09) (pp. 121-128).
- (2009) Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML-09) , pp. 121-128
- Cesa-Bianchi, N.¹ Gentile, C.² Orabona, F.³

16
- 38249024662
- The complexity of dynamic programming
- Chow, C.-S., & Tsitsiklis, J. N. (1989). The complexity of dynamic programming. Journal of Complexity, 5, 466-488.
- (1989) Journal of Complexity , vol.5 , pp. 466-488
- Chow, C.-S.¹ Tsitsiklis, J.N.²

17
- 0028424239
- Improving generalization with active learning
- Cohn, D. A., Atlas, L., & Ladner, R. E. (1994). Improving generalization with active learning. Machine Learning, 15, 201-221.
- (1994) Machine Learning , vol.15 , pp. 201-221
- Cohn, D.A.¹ Atlas, L.² Ladner, R.E.³

18
- 84990553353
- A model for reasoning about persistence and causation
- Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
- (1989) Computational Intelligence , vol.5 , pp. 142-150
- Dean, T.¹ Kanazawa, K.²

19
- 71149108881
- The adaptive k-meteorologists problem and its application to structure discovery and feature selection in reinforcement learning
- Diuk, C., Li, L., & Leffler, B. R. (2009). The adaptive k-meteorologists problem and its application to structure discovery and feature selection in reinforcement learning. In Proceedings of the twenty-sixth international conference on machine learning (ICML-09) (pp. 249-256).
- (2009) Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML-09) , pp. 249-256
- Diuk, C.¹ Li, L.² Leffler, B.R.³

20
- 78650606637
- A quantitative study of hypothesis selection
- Fong, P. W. L. (1995a). A quantitative study of hypothesis selection. In Proceedings of the twelfth international conference on machine learning (ICML-95) (pp. 226-234).
- (1995) Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) , pp. 226-234
- Fong, P.W.L.¹

21
- 2542446495
- Master's thesis, Department of Computer Science, University of Waterloo, Ontario, Canada
- Fong, P.W. L. (1995b). A quantitative study of hypothesis selection.Master's thesis, Department of Computer Science, University of Waterloo, Ontario, Canada.
- (1995) A Quantitative Study of Hypothesis Selection
- Fong, P.W.L.¹

22
- 0030643068
- Using and combining predictors that specialize
- Freund, Y., Schapire, R. E., Singer, Y., &Warmuth, M. K. (1997a). Using and combining predictors that specialize. In STOC'97: Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 334-343).
- (1997) STOC'97: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing , pp. 334-343
- Freund, Y.¹ Schapire, R.E.² Singer, Y.³ Warmuth, M.K.⁴

23
- 0031209604
- Selective Sampling Using the Query by Committee Algorithm
- Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997b). Selective sampling using the query by committee algorith M. Machine Learning, 28, 133-168. (Pubitemid 127506338)
- (1997) Machine Learning , vol.28 , Issue.2-3 , pp. 133-168
- Freund, Y.¹ Seung, H.S.² Shamir, E.³ Tishby, N.⁴

24
- 24344500472
- Generalization bounds for averaged classifiers
- DOI 10.1214/009053604000000058
- Freund, Y.,Mansour, Y., & Schapire, R. E. (2004). Generalization bounds for averaged classifiers. The Annals of Statistics, 32, 1698-1722. (Pubitemid 41250282)
- (2004) Annals of Statistics , vol.32 , Issue.4 , pp. 1698-1722
- Freund, Y.¹ Mansour, Y.² Schapire, R.E.³

25
- 0004236492
- (2nd ed.). Baltimore: The Johns Hopkins University Press
- Golub, G. H., & Van Loan, C. F. (1989). Matrix computations (2nd ed.). Baltimore: The Johns Hopkins University Press.
- (1989) Matrix Computations
- Golub, G.H.¹ Van Loan, C.F.²

26
- 0034666805
- Apple tasting
- Helmbold, D. P., Littlestone, N., & Long, P. M. (2000). Apple tasting. Information and Computation, 161, 85-139.
- (2000) Information and Computation , vol.161 , pp. 85-139
- Helmbold, D.P.¹ Littlestone, N.² Long, P.M.³

27
- 84947403595
- Probability inequalities for sums of bounded random variables
- Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13-30.
- (1963) Journal of the American Statistical Association , vol.58 , pp. 13-30
- Hoeffding, W.¹

28
- 23244466805
- Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London
- Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.M.¹

29
- 1942452450
- Exploration in metric state spaces
- Kakade, S., Kearns, M., & Langford, J. (2003). Exploration in metric state spaces. In Proceedings of the 20th international conference on machine learning.
- (2003) Proceedings of the 20th International Conference on Machine Learning
- Kakade, S.¹ Kearns, M.² Langford, J.³

30
- 84880677563
- Efficient reinforcement learning in factored MDPs
- Kearns, M. J., & Koller, D. (1999). Efficient reinforcement learning in factored MDPs. In Proceedings of the 16th International joint conference on artificial intelligence (IJCAI) (pp. 740-747).
- (1999) Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI) , pp. 740-747
- Kearns, M.J.¹ Koller, D.²

31
- 0028460231
- Efficient distribution-free learning of probabilistic concepts
- DOI 10.1016/S0022-0000(05)80062-5
- Kearns, M. J., & Schapire, R. E. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48, 464-497. (Pubitemid 124013300)
- (1994) Journal of Computer and System Sciences , vol.48 , Issue.3 , pp. 464-497
- Kearns, M.J.¹ Schapire, R.E.²

32
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209-232.
- (2002) Machine Learning , vol.49 , pp. 209-232
- Kearns, M.J.¹ Singh, S.P.²

33
- 0001553979
- Toward efficient agnostic learning
- Kearns, M. J., Schapire, R. E., & Sellie, L. (1994). Toward efficient agnostic learning. Machine Learning, 17, 115-141.
- (1994) Machine Learning , vol.17 , pp. 115-141
- Kearns, M.J.¹ Schapire, R.E.² Sellie, L.³

34
- 0036832951
- A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- DOI 10.1023/A:1017932429737
- Kearns, M. J., Mansour, Y., & Ng, A. Y. (2002). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49, 193-208. (Pubitemid 34325686)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 193-208
- Kearns, M.¹ Mansour, Y.² Ng, A.Y.³

35
- 85022970688
- From noise-free to noise-tolerant and from on-line to batch learning
- Klasner, N., & Simon, H. U. (1995). From noise-free to noise-tolerant and from on-line to batch learning. In Proceedings of the eighth annual conference on computational learning theory (COLT-95) (pp. 250-257).
- (1995) Proceedings of the Eighth Annual Conference on Computational Learning Theory (COLT-95) , pp. 250-257
- Klasner, N.¹ Simon, H.U.²

36
- 33750293964
- Bandit based Monte-Carlo planning
- Machine Learning: ECML 2006 - 17th European Conference on Machine Learning, Proceedings
- Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the seventeenth European conference on machine learning (ECML-06) (pp. 282-293). (Pubitemid 44618839)
- (2006) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol.4212 , pp. 282-293
- Kocsis, L.¹ Szepesvari, C.²

37
- 0037400054
- An empirical study of two approaches to sequence learning for anomaly detection
- Lane, T., & Brodley, C. E. (2003). An empirical study of two approaches to sequence learning for anomaly detection. Machine Learning, 51, 73-107.
- (2003) Machine Learning , vol.51 , pp. 73-107
- Lane, T.¹ Brodley, C.E.²

38
- 36349026477
- Efficient reinforcement learning with relocatable action models
- Leffler, B. R., Littman, M. L., & Edmunds, T. (2007). Efficient reinforcement learning with relocatable action models. In Proceedings of the twenty-second conference on artificial intelligence (AAAI-07).
- (2007) Proceedings of the Twenty-second Conference on Artificial Intelligence (AAAI-07)
- Leffler, B.R.¹ Littman, M.L.² Edmunds, T.³

39
- 70349428076
- Doctoral dissertation, Rutgers University, New Brunswick, NJ
- Li, L. (2009). A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Rutgers University, New Brunswick, NJ.
- (2009) A Unifying Framework for Computational Reinforcement Learning Theory
- Li, L.¹

40
- 78649496546
- Reducing reinforcement learning to KWIK online regression
- doi:10.1007/s10472-010-9201-2
- Li, L., & Littman, M. L. (2010). Reducing reinforcement learning to KWIK online regression. Annals of Mathematics and Artificial Intelligence. doi:10.1007/s10472-010-9201-2.
- (2010) Annals of Mathematics and Artificial Intelligence
- Li, L.¹ Littman, M.L.²

41
- 56449122733
- Knows what it knows: A framework for self-aware learning
- Li, L., Littman, M. L., & Walsh, T. J. (2008). Knows what it knows: A framework for self-aware learning. In Proceedings of the twenty-fifth international conference on machine learning (pp. 568-575).
- (2008) Proceedings of the Twenty-fifth International Conference on Machine Learning , pp. 568-575
- Li, L.¹ Littman, M.L.² Walsh, T.J.³

42
- 77954641643
- A contextual-bandit approach to personalized news article recommendation
- Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the nineteenth international conference on World Wide Web (WWW-10) (pp. 661-670).
- (2010) Proceedings of the Nineteenth International Conference on World Wide Web (WWW-10) , pp. 661-670
- Li, L.¹ Chu, W.² Langford, J.³ Schapire, R.E.⁴

43
- 34250091945
- Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
- Littlestone, N. (1987). Learning quickly when irrelevant attributes abound: A new linear-threshold algorith M. Machine Learning, 2, 285-318.
- (1987) Machine Learning , vol.2 , pp. 285-318
- Littlestone, N.¹

44
- 85011913774
- From on-line to batch learning
- Littlestone, N. (1989). From on-line to batch learning. In Proceedings of the second annual workshop on computational learning theory (COLT-89) (pp. 269-284).
- (1989) Proceedings of the Second Annual Workshop on Computational Learning Theory (COLT-89) , pp. 269-284
- Littlestone, N.¹

45
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- Moore, A.W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

46
- 85102627959
- New York: Wiley
- Puterman, M. L. (1994). Markov decision processes-discrete stochastic dynamic programming. New York: Wiley.
- (1994) Markov Decision Processes-discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

47
- 0026981853
- Query by committee
- Seung, H. S., Opper, M., & Tishby, N. (1992). Query by committee. In Proceedings of the fifth annual workshop on computational learning theory (COLT-92) (pp. 287-294). (Pubitemid 23615454)
- (1992) Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory , pp. 287-294
- Seung, H.S.¹ Opper, M.² Sompolinsky, H.³

48
- 41549089117
- A tutorial on conformal prediction
- Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371-421. (Pubitemid 351469017)
- (2008) Journal of Machine Learning Research , vol.9 , pp. 371-421
- Shafer, G.¹ Vovk, V.²

49
- 0028497385
- An upper bound on the loss from approximate optimal-value functions
- Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16, 227.
- (1994) Machine Learning , vol.16 , pp. 227
- Singh, S.P.¹ Yee, R.C.²

50
- 0003626635
- (2nd ed.). Berlin: Springer
- Sontag, E. D. (1998). Texts in Applied Mathematics: Vol. 6. Mathematical control theory: Deterministic finite dimensional systems (2nd ed.). Berlin: Springer.
- (1998) Texts in Applied Mathematics: Vol. 6. Mathematical Control Theory: Deterministic Finite Dimensional Systems
- Sontag, E.D.¹

51
- 85162058047
- Online linear regression and its application to model-based reinforcement learning
- Strehl, A. L., & Littman, M. L. (2008). Online linear regression and its application to model-based reinforcement learning. Advances in Neural Information Processing Systems, 20.
- (2008) Advances in Neural Information Processing Systems , vol.20
- Strehl, A.L.¹ Littman, M.L.²

52
- 34548745051
- Incremental model-based learners with formal learning-time guarantees
- Strehl, A. L., Li, L., & Littman, M. L. (2006a). Incremental model-based learners with formal learning-time guarantees. In Proceedings of the 22nd conference on uncertainty in artificial intelligence (UAI 2006).
- (2006) Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI 2006)
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

53
- 33749255382
- PAC model-free reinforcement learning
- Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006b). PAC model-free reinforcement learning. In Proceedings of the twenty-third international conference on machine learning (ICML-06).
- (2006) Proceedings of the Twenty-third International Conference on Machine Learning (ICML-06)
- Strehl, A.L.¹ Li, L.² Wiewiora, E.³ Langford, J.⁴ Littman, M.L.⁵

54
- 33749242078
- Experience-efficient learning in associative bandit problems
- Strehl, A. L., Mesterharm, C., Littman, M. L., & Hirsh, H. (2006c). Experience-efficient learning in associative bandit problems. In Proceedings of the twenty-third international conference on machine learning (ICML-06).
- (2006) Proceedings of the Twenty-third International Conference on Machine Learning (ICML-06)
- Strehl, A.L.¹ Mesterharm, C.² Littman, M.L.³ Hirsh, H.⁴

55
- 36348930987
- Efficient structure learning in factored-state MDPs
- Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state MDPs. In Proceedings of the twenty-second national conference on artificial intelligence (AAAI-07)
- (2007) Proceedings of the Twenty-second National Conference on Artificial Intelligence (AAAI-07)
- Strehl, A.L.¹ Diuk, C.² Littman, M.L.³

56
- 73549084301
- Reinforcement learning in finite MDPs: PAC analysis
- Strehl, A. L., Li, L., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10, 2413-2444.
- (2009) Journal of Machine Learning Research , vol.10 , pp. 2413-2444
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

57
- 0004102479
- Cambridge: The MIT Press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: The MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

58
- 77956520676
- Model-based reinforcement learning with nearly tight exploration complexity bounds
- Szita, I., & Szepesvári, C. (2010). Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proceedings of the twenty-seventh international conference on machine learning (ICML-2010).
- (2010) Proceedings of the Twenty-seventh International Conference on Machine Learning (ICML-2010)
- Szita, I.¹ Szepesvári, C.²

59
- 0021518106
- A theory of the learnable
- Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134-1142.
- (1984) Communications of the ACM , vol.27 , pp. 1134-1142
- Valiant, L.G.¹

60
- 79958846996
- Exploring compact reinforcement-learning representations with linear regression
- A refined version is available as Technical Report DCS-tr-660, Department of Computer Science, Rutgers University, December, 2009
- Walsh, T. J., Szita, I., Diuk, C., & Littman, M. L. (2009). Exploring compact reinforcement-learning representations with linear regression. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (UAI-09) (pp. 591-598). A refined version is available as Technical Report DCS-tr-660, Department of Computer Science, Rutgers University, December, 2009.
- (2009) Proceedings of the Twenty-fifth Conference on Uncertainty in Artificial Intelligence (UAI-09) , pp. 591-598
- Walsh, T.J.¹ Szita, I.² Diuk, C.³ Littman, M.L.⁴

61
- 49549125826
- Maximizing classifier utility when training data is costly
- Weiss, G. M., & Tian, Y. (2006). Maximizing classifier utility when training data is costly. SIGKDD Explorations, 8, 31-38.
- (2006) SIGKDD Explorations , vol.8 , pp. 31-38
- Weiss, G.M.¹ Tian, Y.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.