[1]Kaelbling L,Littman M,Cassandra A.Planning and Acting in Partially Observable Stochastic Domains[J].Artificial Intelligence,1998,101(1):99-134.
[2]洪晔[1],王宏健[1],边信黔[1].基于分层马尔可夫决策过程的AUV全局路径规划研究[J].系统仿真学报,2008,20(9):2361-2363.
[3]范波[1],潘泉[1],张洪才[1].基于Markov对策的多智能体协调方法及其在Robot Soccer中的应用[J].机器人,2005,27(1):46-51.
[4]李晓萌,杨煜普,许晓鸣.基于Markov对策和强化学习的多智能体协作研究[J].上海交通大学学报,2001,35(2):288-292.
[5]李晓萌,杨煜普,等.基于多级决策的多智能体自动导航车调度系统[J].上海交通大学学报,2002,36(8):1146-1149.
[6]高阳,周志华.基于Markov对策的多Agent强化学习模型及算法研究[J].计算机研究与发展,2000,37(3):257-263.
[7]Sharma R,Gopal M.A Markov Game-adaptive Fuzzy Controller for Robot Manipulators[J].Fuzzy Systems,2008,16(1):171-186.
[8]Sharma R,Gopal M.Markov Game Controller De-sign Algorithms[J].World Academy of Science,En-gineering and Technology,2007,34(5):585-593.
[9]Littman M L.Value-function Reinforcement Learning in Markov Games[J].Journal of Cognitive Systems Re-search,2001,2(1):55-66.
[10]Chang H S,Hu Jiaqiao,Fu M C.Adaptive Adver-sarial Multi-armed Bandit Approach to Two-person Zero-sum Markov Games[J].Automatic Control,2010,55(2):463-468.
[11]Dutta D,Goel A,Heidemann J.Oblivious AQM and Nash Equilibria[J].ACM SIGCOMM Com-puter Communication Review,2002,32(3):106-113.
[12]战晓磊[1,2],辛洪兵[2],汉斯·彼德兰特斯[3].基于虚拟现实的MOTOMAN-HP3型机器人运动学仿真[J].中国机械工程,2010(16):1952-1954.
[13]Vrancx P,Verbeeck K,Nowe A.Decentralized Learning in Markov Games[J].Systems Man and Cybernetics,Part B:Cybernetics,2008,38(4):976-981. |