| 118 | 0 | 450 |
| 下载次数 | 被引频次 | 阅读次数 |
随着人工智能等技术的发展,多智能体如无人机群等的实际应用领域逐渐广泛。多智能体深度确定性策略(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法旨在解决多智能体在协作环境中的协同配合问题,凭借其独特的Actor-Critic架构已成为多智能体领域主流的应用算法之一。针对指挥决策中多智能体协同任务存在的角色分工模糊、信息过载导致的算法策略收敛较慢等问题,提出了一种引入动态角色注意力(Dynamic Role Attention, DRA)机制的改进MADDPG算法——DRA-MADDPG。该算法在Actor-Critic架构中嵌入了DRA模块,通过动态调整智能体对不同角色同伴的关注权重,来实现分工协作的精准优化。具体而言,定义了指挥任务的角色集合与阶段划分,进而构建角色协同矩阵和阶段调整系数;在Critic网络中设计DRA模块,依托角色相关性与任务阶段来计算权重并筛选关键信息;改进了Actor网络,结合角色职责生成针对性的动作。仿真实验表明,与MADDPG相比,DRA-MADDPG的训练累积回报曲线下面积(Area Under the Curve, AUC)提升了2.4%,任务完成耗时降低了19.3%,且通过训练回报曲线对比分析可知,DRA-MADDPG对于短期训练拥有更好的学习效率。证明了该方法适用于复杂指挥决策场景,为多智能体协同提供了一种相对高效的解决方案。
Abstract:With the development of technologies such as artificial intelligence, multi-agents(e.g., unmanned aerial vehicle swarms) have been increasingly applied in practical combat operations. The Multi-Agent Deep Deterministic Policy Gradient(MADDPG) algorithm, designed to solve the coordination problems of multi-agents in cooperative environments, has become one of the mainstream applied algorithms in the multi-agent field owing to its unique Actor-Critic framework. To address the problems in multi-agent collaborative tasks during command and decision-making—including ambiguous role division and slow convergence of the algorithm's policy caused by information overload—an improved MADDPG algorithm incorporating a Dynamic Role Attention(DRA) mechanism, namely DRA-MADDPG, is proposed. This algorithm embeds a DRA module into the Actor-Critic framework, and achieves accurate optimization of division of labor and collaboration by dynamically adjusting the attention weights of each agent towards peers with different roles. Specifically, the role set(reconnaissance, assault, command) and phase division(exploration→execution→encirclement) for command tasks are defined, and on this basis, a role coordination matrix and phase adjustment coefficients are constructed. A DRA module is designed in the Critic network to calculate weights and filter key information by leveraging role relevance and task phases. Additionally, the Actor network is improved to generate targeted actions by integrating role responsibilities. Simulation experiments show that compared with MADDPG, the Area Under the Curve(AUC) of the cumulative training reward of DRA-MADDPG increases by 2.4%, and the task completion time decreases by 19.3%. Furthermore, comparative analysis of training reward curves reveals that DRA-MADDPG exhibits better learning efficiency in short-term training. It is demonstrated that this method is suitable for complex command and decision-making scenarios and provides a relatively efficient solution for multi-agent coordination.
[1] 邹长杰,郑皎凌,张中雷.基于GAED-MADDPG多智能体强化学习的协作策略研究[J].计算机应用研究,2020,37(12):3656-3661.
[2] 刘峰,魏瑞轩,丁超,等.面向多机协同的Att-MADDPG围捕控制方法设计[J].空军工程大学学报(自然科学版),2021,22(3):9-14.
[3] FOERSTER J N,FARQUHAR G,AFOURAS T,et al.Counterfactual Multi-agent Policy Gradients[C]// Proceedings of the AAAI Conference on Artificial Intelligence.New Orleans:AAAI Press,2018:2974-2982.
[4] 贾思雨,毕凌滔,曹扬,等.基于改进MADDPG的多机器人路径规划方法研究[J].计算机仿真,2024,41(8):458-465.
[5] 符小卫,王辉,徐哲.基于DE-MADDPG的多无人机协同追捕策略[J].航空学报,2022,43(5):530-543.
[6] 孙彧,曹雷,陈希亮,等.多智能体深度强化学习研究综述[J].计算机工程与应用,2020,56(5):13-24.
[7] 畅鑫,李艳斌,刘东辉.基于分层强化学习的多智能体博弈策略生成方法[J].无线电工程,2024,54(6):1361-1367.
[8] 张建东,王鼎涵,杨啟明,等.基于分层强化学习的无人机空战多维决策[J].兵工学报,2023,44(6):1547-1563.
[9] 刘东辉,郑赢营,畅鑫,等.基于静态博弈和遗传算法的多智能体博弈策略生成方法[J].无线电工程,2024,54(6):1355-1360.
[10] 李波,越凯强,甘志刚,等.基于MADDPG的多无人机协同任务决策[J].宇航学报,2021,42(6):757-765.
[11] 孙懿豪,闫超,相晓嘉,等.基于分层强化学习的多无人机协同围捕方法[J].控制理论与应用,2025,42(1):96-108.
[12] 轩书哲,柯良军.基于多智能体强化学习的无人机集群攻防对抗策略研究[J].无线电工程,2021,51(5):360-366.
[13] 周浦城,洪炳镕,王月海.动态环境下多机器人合作追捕研究[J].机器人,2005(4):289-295.
[14] 李茹杨,彭慧民,李仁刚,等.强化学习算法与应用综述[J].计算机系统应用,2020,29(12):13-25.
[15] 高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100.
[16] LOWE R,WU Y,TAMAR A,et al.Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments[C]// Advances in Neural Information Processing Systems.Long Beach:[s.n.],2017:6379-6390.
[17] 杜威,丁世飞.多智能体强化学习综述[J].计算机科学,2019,46(8):1-8.
[18] 梁宸.基于强化学习的多智能体协作策略研究[D].沈阳:沈阳理工大学,2020.
[19] 殷宇维,王凡,丁录顺,等.基于MADDPG的多无人战车协同突防决策方法研究[J].指挥控制与仿真,2025,47(3):40-49.
基本信息:
中图分类号:TP13;TP18
引用信息:
[1]苑司宇,康国钦,郑学强,等.面向指挥决策的DRA-MADDPG协同控制方法[J].无线电工程,2025,55(11):2218-2226.
2025-11-05
2025-11-05