nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2021, 05, v.51 360-366
基于多智能体强化学习的无人机集群攻防对抗策略研究
基金项目(Foundation): 国家自然科学基金资助项目(61973244,61573277)~~
邮箱(Email):
DOI:
摘要:

针对大规模无人机集群攻防对抗问题,提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)的改进多智能体(Multi-agent Proximal Policy Optimization,M-PPO)算法。该算法采用了Actor-Critic框架,但与PPO不同,为实现智能体之间的协作,算法使用了包含全局信息的Critic网络和局部信息的Actor网络。此外,算法采用了集中训练、分散执行的框架,训练得到的模型能够在不依赖通信的基础上实现协作。为了研究该算法的性能,设计了一个考虑无人机飞行约束和真实飞行环境的大型无人机集群对抗平台,并进行仿真实验。实验结果表明,M-PPO算法在攻防对抗问题中的效果显著优于PPO和深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)等主流算法。

Abstract:

In order to solve the problem of attack-defense countermeasure of large-scale unmanned aerial vehicle( UAV) swarm,an improved Multi-agent algorithm( Multi-agent Proximal Policy Optimization,M-PPO) based on proximal policy optimization algorithm( PPO) is proposed.The algorithm uses Actor-Critic framework.Unlike PPO,M-PPO uses the Critic network with global information and the Actor network with local information to achieve the cooperation between agents. In addition,the algorithm adopts the framework of centralized training and decentralized execution.The trained model can achieve cooperation without communication.In order to study the performance of the algorithm,a large UAV swarm countermeasure platform considering UAV flight constraints and real flight environment is designed. The experimental results show that M-PPO algorithm is better than PPO algorithm and deep deterministic policy gradient( DDPG) algorithm.

参考文献

[1] ISAACS R. Differential Games:a Mathematical Theory with Applications to Warfare and Pursuit,Control and Optimization[M].New York:Wiley,1965.

[2] TABACHNIKOV S.Chases and Escapes:The Mathematics of Pursuit and Evasion by Paul J. Nahin[J]. The Mathematical Intelligencer,2009,31(2):78-79.

[3] LITTMAN M L.Markov Games as a Framework for Multiagent Reinforcement Learning[M]. San Franciso:Morgan Kauffman Publishers,Inc.1994.

[4] BOWLING M,VELOSO M. Multiagent Learning Using a Variable Learning Rate[J]. Artificial Intelligence,2002,136(2):215-250.

[5] HARMON ME,BAIRD L C,KLOPF A H.Reinforcement Learning Applied to a Differential Game[J].Adaptive Behavior,1995,4(1):3-28.

[6] SMITH A E.Swarm Intelligence:from Natural to Artificial Systems[Book Reviews][J]. Connection Science,2002,14(2):163-164.

[7] DESOUKY S F,SCHWARTZ H M. Self-learning Fuzzy Logic Controllers for Pursuit-evasion Differential Games[J]. Robotics and Autonomous Systems,2010,59(1):22-33.

[8] WATKINSC J C H,DAYAN P. Technical Note:Qlearning[J].Machine Learning,1992,8(3-4):279-292.

[9] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Humanlevel Control through Deep Reinforcement Learning[J].Nature,2015,518(7540):529-533.

[10] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous Control with Deep Reinforcement Learning[J].arXiv preprint ar Xiv:1509.02971,2015:1-10.

[11] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal Policy Optimization Algorithms[J]. ar Xiv preprint ar Xiv:1707.06347,2017:1-10.

[12] BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 with Large Scale Deep Reinforcement Learning[J]. ar Xiv preprint ar Xiv:1912.06680,2019:1-7.

[13] TAN M.Multi-agent Reinforcement Learning:Independent vs.Cooperative Agents[C]∥Proceedings of the Tenth International Conference on Machine Learning. Amherst,ICML,1993:330-337.

基本信息:

DOI:

中图分类号:TP18;V279

引用信息:

[1]轩书哲,柯良军.基于多智能体强化学习的无人机集群攻防对抗策略研究[J].无线电工程,2021,51(05):360-366.

基金信息:

国家自然科学基金资助项目(61973244,61573277)~~

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文