|
Volumn 23, Issue 2, 2006, Pages 292-296
|
Unified NDP method based on TD(0) learning for both average and discounted Markov decision processes
|
Author keywords
Markov decision processes; Neuro dynamic programming; Performance potentials; TD(0) learning
|
Indexed keywords
LEARNING OPTIMIZATION PROBLEMS;
MARKOV DECISION PROCESSES (MDPS);
NEURO DYNAMIC PROGRAMMING (NDP);
NEURO POLICY ITERATION ALGORITHM;
PERFORMANCE POTENTIALS;
REINFORCEMENT LEARNING (RL);
TD(0) LEARNING;
TEMPORAL DIFFERENCE;
DYNAMIC PROGRAMMING;
ESTIMATION;
EVALUATION;
ITERATIVE METHODS;
NEURAL NETWORKS;
OPTIMIZATION;
PERFORMANCE;
MARKOV PROCESSES;
|
EID: 33745951445
PISSN: 10008152
EISSN: None
Source Type: Journal
DOI: None Document Type: Article |
Times cited : (6)
|
References (12)
|