SCOPUS 정보 검색 플랫폼

32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Volumn , Issue , 2018, Pages 3199-3206

Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning

(6) Henderson, Peter a Chang, Wei Di a Bacon, Pierre Luc a Meger, David a Pineau, Joelle a Precup, Doina a

a MCGILL UNIVERSITY (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; INVERSE PROBLEMS;

COMPLEX PROBLEMS; CONTINUOUS CONTROL; INVERSE REINFORCEMENT LEARNING; JOINT REWARDS; LEARNING POLICY; POLICY OPTIONS; REWARD FUNCTION; TRANSFER LEARNING;

REINFORCEMENT LEARNING;

EID: 85060430951 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (73)

References (32)

1
- 14344251217
- Apprenticeship learning via inverse reinforcement learning
- New York, NY, USA: ACM
- Abbeel, P., and Ng, A. Y. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-first International Conference on Machine Learning, 1-. New York, NY, USA: ACM.
- (2004) Proceedings of the Twenty-First International Conference on Machine Learning , pp. 1
- Abbeel, P.¹ Ng, A.Y.²

2
- 80053440459
- Apprenticeship learning about multiple intentions
- Babes, M.; Marivate, V.; Subramanian, K.; and Littman, M. L. 2011. Apprenticeship learning about multiple intentions. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 897-904.
- (2011) Proceedings of the 28th International Conference on Machine Learning (ICML-11) , pp. 897-904
- Babes, M.¹ Marivate, V.² Subramanian, K.³ Littman, M.L.⁴

3
- 85030457046
- The option-critic architecture
- Bacon, P.-L.; Harb, J.; and Precup, D. 2017. The option-critic architecture. In AAAI, 1726-1734.
- (2017) AAAI , pp. 1726-1734
- Bacon, P.-L.¹ Harb, J.² Precup, D.³

4
- 85015392848
- arXiv preprint
- Bengio, E.; Bacon, P.-L.; Pineau, J.; and Precup, D. 2015. Conditional computation in neural networks for faster models. arXiv preprint arXiv:1511.06297.
- (2015) Conditional Computation in Neural Networks for Faster Models
- Bengio, E.¹ Bacon, P.-L.² Pineau, J.³ Precup, D.⁴

5
- 85015444377
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; and Zaremba, W. 2016. OpenAI Gym.
- (2016) OpenAI Gym
- Brockman, G.¹ Cheung, V.² Pettersson, L.³ Schneider, J.⁴ Schulman, J.⁵ Tang, J.⁶ Zaremba, W.⁷

6
- 84877772241
- Nonparametric Bayesian inverse reinforcement learning for multiple reward functions
- Pereira, F.; Burges, C. J. C.; Bottou, L.; and Wein-berger, K. Q., eds, Curran Associates, Inc
- Choi, J., and eung Kim, K. 2012. Nonparametric bayesian inverse reinforcement learning for multiple reward functions. In Pereira, F.; Burges, C. J. C.; Bottou, L.; and Wein-berger, K. Q., eds., Advances in Neural Information Processing Systems 25. Curran Associates, Inc. 305-313.
- (2012) Advances in Neural Information Processing Systems , vol.25 , pp. 305-313
- Choi, J.¹ Eung Kim, K.²

7
- 85030460628
- arXiv preprint
- Christiano, P.; Shah, Z.; Mordatch, I.; Schneider, J.; Blackwell, T.; Tobin, J.; Abbeel, P.; and Zaremba, W. 2016. Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint arXiv:1610.03518.
- (2016) Transfer from Simulation to Real World Through Learning Deep Inverse Dynamics Model
- Christiano, P.¹ Shah, Z.² Mordatch, I.³ Schneider, J.⁴ Blackwell, T.⁵ Tobin, J.⁶ Abbeel, P.⁷ Zaremba, W.⁸

8
- 84870924061
- Hierarchical relative entropy policy search
- Daniel, C.; Neumann, G.; and Peters, J. R. 2012. Hierarchical relative entropy policy search. In International Conference on Artificial Intelligence and Statistics, 273-281.
- (2012) International Conference on Artificial Intelligence and Statistics , pp. 273-281
- Daniel, C.¹ Neumann, G.² Peters, J.R.³

9
- 0001234682
- Feudal reinforcement learning
- Dayan, P., and Hinton, G. E. 1993. Feudal reinforcement learning. In Advances in neural information processing systems, 271-278.
- (1993) Advances in Neural Information Processing Systems , pp. 271-278
- Dayan, P.¹ Hinton, G.E.²

10
- 0002278788
- Hierarchical reinforcement learning with the MAXQ value function decomposition
- Dietterich, T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13:227-303.
- (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
- Dietterich, T.G.¹

11
- 84937849144
- Generative adversarial nets
- Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2672-2680.
- (2014) Advances in Neural Information Processing Systems , pp. 2672-2680
- Goodfellow, I.J.¹ Pouget-Abadie, J.² Mirza, M.³ Xu, B.⁴ Warde-Farley, D.⁵ Ozair, S.⁶ Courville, A.⁷ Bengio, Y.⁸

12
- 85047005895
- arXiv preprint
- Hausman, K.; Chebotar, Y.; Schaal, S.; Sukhatme, G.; and Lim, J. 2017. Multi-modal imitation learning from unstructured demonstrations using generative adversarial nets. arXiv preprint arXiv:1705.10479.
- (2017) Multi-Modal Imitation Learning from Unstructured Demonstrations Using Generative Adversarial Nets
- Hausman, K.¹ Chebotar, Y.² Schaal, S.³ Sukhatme, G.⁴ Lim, J.⁵

13
- 85059142263
- Benchmark environments for multitask learning in continuous domains
- Henderson, P.; Chang, W.-D.; Shkurti, F.; Hansen, J.; Meger, D.; and Dudek, G. 2017. Benchmark environments for multitask learning in continuous domains. ICML Lifelong Learning: A Reinforcement Learning Approach Workshop.
- (2017) ICML Lifelong Learning: A Reinforcement Learning Approach Workshop
- Henderson, P.¹ Chang, W.-D.² Shkurti, F.³ Hansen, J.⁴ Meger, D.⁵ Dudek, G.⁶

14
- 85018872345
- Generative adversarial imitation learning
- Ho, J., and Ermon, S. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, 4565-4573.
- (2016) Advances in Neural Information Processing Systems , pp. 4565-4573
- Ho, J.¹ Ermon, S.²

15
- 0001940458
- Adaptive mixtures of local experts
- Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; and Hinton, G. E. 1991. Adaptive mixtures of local experts. Neural computation 3(1):79-87.
- (1991) Neural Computation , vol.3 , Issue.1 , pp. 79-87
- Jacobs, R.A.¹ Jordan, M.I.² Nowlan, S.J.³ Hinton, G.E.⁴

16
- 85044518938
- arXiv preprint
- Krishnan, S.; Garg, A.; Liaw, R.; Miller, L.; Pokorny, F. T.; and Goldberg, K. 2016. Hirl: Hierarchical inverse reinforcement learning for long-horizon tasks with delayed rewards. arXiv preprint arXiv:1604.06508.
- (2016) Hirl: Hierarchical Inverse Reinforcement Learning for Long-Horizon Tasks with Delayed Rewards
- Krishnan, S.¹ Garg, A.² Liaw, R.³ Miller, L.⁴ Pokorny, F.T.⁵ Goldberg, K.⁶

17
- 85060432302
- arXiv preprint
- Li, Y.; Song, J.; and Ermon, S. 2017. InfoGAIL: Interpretable imitation learning from visual demonstrations. arXiv preprint arXiv:1703.08840.
- (2017) InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
- Li, Y.¹ Song, J.² Ermon, S.³

18
- 84944327298
- Learning and evolution by minimization of mutual information
- Springer
- Liu, Y., and Yao, X. 2002. Learning and evolution by minimization of mutual information. In International Conference on Parallel Problem Solving from Nature, 495-504. Springer.
- (2002) International Conference on Parallel Problem Solving from Nature , pp. 495-504
- Liu, Y.¹ Yao, X.²

19
- 85047019729
- arXiv preprint
- Merel, J.; Tassa, Y.; Srinivasan, S.; Lemmon, J.; Wang, Z.; Wayne, G.; and Heess, N. 2017. Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201.
- (2017) Learning Human Behaviors from Motion Capture by Adversarial Imitation
- Merel, J.¹ Tassa, Y.² Srinivasan, S.³ Lemmon, J.⁴ Wang, Z.⁵ Wayne, G.⁶ Heess, N.⁷

20
- 84971448181
- Asynchronous methods for deep reinforcement learning
- Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, 1928-1937.
- (2016) International Conference on Machine Learning , pp. 1928-1937
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

21
- 0042547347
- Algorithms for inverse reinforcement learning
- San Francisco, CA, USA: Morgan Kaufmann Publishers Inc
- Ng, A. Y., and Russell, S. J. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, 663-670. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning , pp. 663-670
- Ng, A.Y.¹ Russell, S.J.²

22
- 84969963490
- Trust region policy optimization
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; and Moritz, P. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 1889-1897.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML-15) , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.⁴ Moritz, P.⁵

23
- 85083954383
- High-dimensional continuous control using generalized advantage estimation
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; and Abbeel, P. 2016. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the International Conference on Learning Representations (ICLR).
- (2016) Proceedings of the International Conference on Learning Representations (ICLR)
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

24
- 85041194636
- arXiv preprint
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and Klimov, O. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- (2017) Proximal Policy Optimization Algorithms
- Schulman, J.¹ Wolski, F.² Dhariwal, P.³ Radford, A.⁴ Klimov, O.⁵

25
- 85045139097
- arXiv preprint
- Sermanet, P.; Xu, K.; and Levine, S. 2016. Unsupervised perceptual rewards for imitation learning. arXiv preprint arXiv:1612.06699.
- (2016) Unsupervised Perceptual Rewards for Imitation Learning
- Sermanet, P.¹ Xu, K.² Levine, S.³

26
- 85088226307
- arXiv preprint
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; and Dean, J. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538.
- (2017) Outrageously Large Neural Networks: the Sparsely-Gated Mixture-of-Experts Layer
- Shazeer, N.¹ Mirhoseini, A.² Maziarz, K.³ Davis, A.⁴ Le, Q.⁵ Hinton, G.⁶ Dean, J.⁷

27
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, 1057-1063.
- (2000) Advances in Neural Information Processing Systems , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

28
- 0033170372
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Sutton, R. S.; Precup, D.; and Singh, S. 1999. between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112(1-2):181-211.
- (1999) Artificial Intelligence , vol.112 , Issue.1-2 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

29
- 84872292044
- MujoCo: A physics engine for model-based control
- Vilamoura, Algarve, Portugal, October 7-12, 2012
- Todorov, E.; Erez, T.; and Tassa, Y. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, 5026-5033.
- (2012) 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012 , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

30
- 85033441620
- arXiv preprint
- van Seijen, H.; Fatemi, M.; Romoff, J.; Laroche, R.; Barnes, T.; and Tsang, J. 2017. Hybrid reward architecture for reinforcement learning. arXiv preprint arXiv:1706.04208.
- (2017) Hybrid Reward Architecture for Reinforcement Learning
- Van Seijen, H.¹ Fatemi, M.² Romoff, J.³ Laroche, R.⁴ Barnes, T.⁵ Tsang, J.⁶

31
- 85049556018
- arXiv preprint
- Wang, Z.; Merel, J.; Reed, S.; Wayne, G.; de Freitas, N.; and Heess, N. 2017. Robust imitation of diverse behaviors. arXiv preprint arXiv:1707.02747.
- (2017) Robust Imitation of Diverse Behaviors
- Wang, Z.¹ Merel, J.² Reed, S.³ Wayne, G.⁴ De Freitas, N.⁵ Heess, N.⁶

32
- 57749097473
- Maximum entropy inverse reinforcement learning
- AAAI Press
- Ziebart, B. D.; Maas, A.; Bagnell, J. A.; and Dey, A. K. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, 1433-1438. AAAI Press.
- (2008) Proceedings of the 23rd National Conference on Artificial Intelligence - , vol.3 , pp. 1433-1438
- Ziebart, B.D.¹ Maas, A.² Bagnell, J.A.³ Dey, A.K.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.