-
1
-
-
14344251217
-
Apprenticeship learning via inverse reinforcement learning
-
Abbeel, P., & Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of ICML-04.
-
(2004)
Proceedings of ICML-04
-
-
Abbeel, P.1
Ng, A.2
-
3
-
-
0036930295
-
A POMDP formulation of preference elicitation problems
-
Boutilier, C. (2002). A POMDP formulation of preference elicitation problems. In Proceedings AAAI-02.
-
(2002)
Proceedings AAAI-02
-
-
Boutilier, C.1
-
5
-
-
0000184142
-
Constrained markov decision models with weighted discounted rewards
-
Feinberg, E., & Schwartz, A. (1995). Constrained markov decision models with weighted discounted rewards. Mathematics of Operations Research, 20, 302-320.
-
(1995)
Mathematics of Operations Research
, vol.20
, pp. 302-320
-
-
Feinberg, E.1
Schwartz, A.2
-
8
-
-
0032073263
-
Planning and acting in partially observable stochastic domains
-
Kaelbling, L. P., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101.
-
(1998)
Artificial Intelligence
, pp. 101
-
-
Kaelbling, L.P.1
Littman, M.2
Cassandra, A.3
-
10
-
-
51149092685
-
Study of distance vector routing protocols for mobile ad hoc networks
-
Lu, Y., Wang, W., Zhong, Y., & Bhargava, B. (2003). Study of distance vector routing protocols for mobile ad hoc networks. In Proceedings of PerCom-03.
-
(2003)
Proceedings of PerCom-03
-
-
Lu, Y.1
Wang, W.2
Zhong, Y.3
Bhargava, B.4
-
11
-
-
0029752592
-
Average reward reinforcement learning: Foundations, algorithms, and empirical results
-
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22, 159-195.
-
(1996)
Machine Learning
, vol.22
, pp. 159-195
-
-
Mahadevan, S.1
-
12
-
-
79960013704
-
A geometric approach to multi-criterion reinforcement learning
-
Mannor, S., & Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. JMLR, 5, 325-360.
-
(2004)
JMLR
, vol.5
, pp. 325-360
-
-
Mannor, S.1
Shimkin, N.2
-
13
-
-
0042547347
-
Algorithms for inverse reinforcement learning
-
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proceedings of ICML-00.
-
(2000)
Proceedings of ICML-00
-
-
Ng, A.Y.1
Russell, S.2
-
14
-
-
0346738900
-
Flexible decomposition algorithms for weakly coupled markov decision problems
-
Parr, R. (1998). Flexible decomposition algorithms for weakly coupled markov decision problems. In Proceedings UAI-98.
-
(1998)
Proceedings UAI-98
-
-
Parr, R.1
-
17
-
-
85152626183
-
A reinforcement learning method for maximizing undiscounted rewards
-
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of ICML-93.
-
(1993)
Proceedings of ICML-93
-
-
Schwartz, A.1
-
18
-
-
18144424551
-
TPOT-RL applied to network routing
-
Stone, P. (2000). TPOT-RL applied to network routing. In Proceedings of ICML-00.
-
(2000)
Proceedings of ICML-00
-
-
Stone, P.1
-
19
-
-
0032050241
-
Model-based average reward reinforcement learning
-
Tadepalli, P., & Ok, D. (1998). Model-based average reward reinforcement learning. AI Journal, 100, 177-223.
-
(1998)
AI Journal
, vol.100
, pp. 177-223
-
-
Tadepalli, P.1
Ok, D.2
-
20
-
-
13444294406
-
A multi-agent policy-gradient approach to network routing
-
Tao, N., Baxter, J., & Weaver, L. (2001). A multi-agent policy-gradient approach to network routing. In Proceedings of ICML-01.
-
(2001)
Proceedings of ICML-01
-
-
Tao, N.1
Baxter, J.2
Weaver, L.3
-
21
-
-
0040030981
-
Multi-objecticve infinite-horizon discounted markov decision processes
-
White, D. (1982). Multi-objecticve infinite-horizon discounted markov decision processes. Journal of Mathematical Analysis and Applications, 89, 639-647.
-
(1982)
Journal of Mathematical Analysis and Applications
, vol.89
, pp. 639-647
-
-
White, D.1
|