nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2024, 12, v.54 2923-2932
基于多模态大模型的智能无人机系统:总结与展望
基金项目(Foundation): 国家自然科学基金面上项目(62171276)~~
邮箱(Email):
DOI:
摘要:

多模态大模型的出现和发展带来了无人机系统智能化的方向,将其高效集成进无人机系统能够显著提升无人机智能体的自主性和灵活性,在多个领域发挥无人机的作用。为促进相关研究,说明了多模态大模型和无人机系统集成的重要性,详细介绍了多模态大模型的发展和应用现状,列举了多模态大模型能为无人机系统提供的人机交互、智能感知、自主决策和群体协同上的革新能力,阐明了其应用范围和面临的挑战,为无人机智能化发展提供了一定参考。

Abstract:

The emergence and development of multimodal large models have brought a direction for the intelligent development of Unmanned Aerial Vehicle(UAV) systems, and the efficient integration of multimodal large models into UAV systems can significantly enhance the autonomy and flexibility of UAV agents, enabling them to play a greater role in multiple fields. The importance of the integration of multimodal large models with UAV systems is explained, and a detailed introduction to the development and application status of multimodal large models is provided. Then, the innovative capabilities multimodal large models can provide for human-machine interaction, intelligent perception, autonomous decision-making, and group collaboration in UAV systems are listed. Finally, the application scope and challenges of multimodal large models are presented, which provides a reference for the intelligent development of UAVs.

参考文献

[1] HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//2016 IEEE Confe-rence on Computer Vision and Pattern Recognition (CVPR).Las Vegas:IEEE,2016:770-778.

[2] RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models from Natural Language Supervision[EB/OL].(2021-02-26)[2024-05-15].https://arxiv.org/abs/2103.00020.

[3] GIRSHICK R.Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision.Santiago:IEEE,2015:1440-1448.

[4] VASWANI A,SHAZEER N,PARMAR N,et al.Attentionis Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17).New York:ACM,2017:6000-6010.

[5] OpenAI.GPT-4 Technical Report[EB/OL].(2024-03-04) [2024-05-01].https://arxiv.org/abs/2303.08774.

[6] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[EB/OL].(2018-10-11)[2024-03-10].https://arxiv.org/abs/1810.04805.

[7] RAMESH A,PAVLOV M,GOH G,et al.Zero-shot Text-to-Image Generation[EB/OL].(2021-02-26) [2024-05-01].https://arxiv.org/abs/2102.12092.

[8] Gemini Team Google.Gemini 1.5:Unlocking Multimodal Understanding Across Millions of Tokens of Context[EB/OL].(2024-04-25) [2024-05-01].https://arxiv.org/abs/2403.05530.

[9] CHEN Z,WU J N,WANG W H,et al.InternVL:Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle:IEEE,2024:24185-24198.

[10] ZHONG J G,LI MING,CHEN Y L,et al.A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs[C]//2024 IEEE/CVF Winter Conference on Applications of Computer Vision.Waikoloa:IEEE,2024:920-929.

[11] DE CURTò J,DE ZARZà I,CALAFATE C T.Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles[J].Drones,2023,7(2):114.

[12] DE ZARZà I,DE CURTò J,CALAFATE C T.Socratic Video Understanding on Unmanned Aerial Vehicles[J].Procedia Computer Science,2023,225:144-154.

[13] ZHAO H R,PAN F X,PING H Q Y,et al.Agent as Cerebrum,Controller as Cerebellum:Implementing an Embodied LMM-based Agent on Drones[EB/OL].(2023-11-25) [2024-05-01].https://arxiv.org/abs/2311.15033.

[14] WANG W H,LV Q S,YU W M,et al.Cogvlm:Visual Expert for Pretrained Language Models[EB/OL].(2024-02-04) [2024-06-01].https://arxiv.org/abs/2311.03079.

[15] BAI J Z,BAI S,YANG S S,et al.Qwen-VL:A Frontier Large Vision-language Model with Versatile Abilities[EB/OL].(2023-10-13) [2024-05-01].https://arxiv.org/abs/2308.12966.

[16] CHEN G J,YU X J,LING N W,et al.TypeFly:Flying Drones with Large Language Model[EB/OL].(2023-12-08) [2024-05-01].https://arxiv.org/abs/2312.14950.

[17] HU Y,CHEN S H,ZHANG Y,et al.Collaborative Motion Prediction via Neural Motion Message Passing[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle:IEEE,2020:6318-6327.

[18] HU Y,PENG J T,LIU S F,et al.Communication-efficient Collaborative Perception via Information Filling with Codebook[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle:IEEE,2024:15481-15490.

[19] LU Y F,HU Y,ZHONG Y Q,et al.An Extensible Framework for Open Heterogeneous Collaborative Perception[EB/OL].(2024-04-01) [2024-05-01].https://arxiv.org/abs/2401.13964.

[20] LU Y F,LI Q H,LIU B A,et al.Robust Collaborative 3D Object Detection in Presence of Pose Errors[C]//2023 IEEE International Conference on Robotics and Automation.London:IEEE,2023:4812-4818.

[21] HU Y,FANG S F,LEI Z X,et al.Where2comm:Communication-efficient Collaborative Perception via Spatial Confidence Maps[EB/OL].(2022-09-26)[2024-05-02].https://arxiv.org/abs/2209.12836.

[22] LYKOV A,KARAF S,MARTYNOV M,et al.FlockGPT:Guiding UAV Flocking with Linguistic Orchestration[EB/OL].(2024-05-09) [2024-06-01].https://arxiv.org/abs/2405.05872.

[23] LUO S C,YAO Y X,ZHAO H H,et al.A Language Model-based Fine-grained Address Resolution Framework in UAV Delivery System[J].IEEE Journal of Selected Topics in Signal Processing,2024,18(3):529-539.

[24] SHE R F,OUYANG Y F.Efficiency of UAV-based Last-mile Delivery Under Congestion in Low-altitude Air[J].Transportation Research Part C:Emerging Technologies,2021,122:102878.

[25] THAKUR N,NAGRATH P,JAIN R,et al.Artificial Intelligence Techniques in Smart Cities Surveillance Using UAVs:A Survey[EB/OL].(2021-06-01)[2024-03-10].https://link.springer.com/chapter/10.1007/978-3-030-72065-0_18.

[26] KUWERTZ A,MüHLENBERG D,SANDER J,et al.Applying Knowledge-based Reasoning for Information Fusion in Intelligence,Surveillance,and Reconnaissance[EB/OL].(2018-07-05)[2024-03-10].https://link.springer.com/chapter/10.1007/978-3-319-90509-9_7.

[27] MAHARANI W.Sentiment Analysis During Jakarta Flood for Emergency Responses and Situational Awareness in Disaster Management Using BERT[C]//2020 8th International Conference on Information and Communication Technology (ICoICT).Yogyakarta:IEEE,2020:1-5.

[28] GOECKS V G,WAYTOWICH N R.DisasterResponseGPT:Large Language Models for Accelerated Plan of Action Development in Disaster Response Scenarios[EB/OL].(2023-06-29) [2024-05-01].https://arxiv.org/abs/2306.17271.

[29] LEE M,MESICEK L,BAE K,et al.AI Advisor Platform for Disaster Response Based on Big Data[J].Concurrency and Computation:Practice and Experience,2023,35(16):6215.

[30] ASADZADEH S,DE OLIVEIRA W J,DE SOUZA F C R.UAV-based Remote Sensing for the Petroleum Industry and Environmental Monitoring:State-of-the-art and Perspectives[J].Journal of Petroleum Science and Engineering,2022,208:109633.

[31] NOVA K.AI-enabled Water Management Systems:An Analysis of System Components and Interdependencies for Water Conservation[EB/OL].[2024-05-01].https://studies.eigenpub.com/index.php/erst/article/download/12/11/24.

[32] MASHALA M J,DUBE T,MUDERERI B T,et al.A Systematic Review on Advancements in Remote Sensing for Assessing and Monitoring Land Use and Land Cover Changes Impacts on Surface Water Resources in Semi-arid Tropical Environments[J].Remote Sensing,2023,15(16):3926.

[33] ADU-MANU K S,TAPPARELLO C,HEINZELMAN W,et al.Water Quality Monitoring Using Wireless Sensor Networks:Current Trends and Future Research Directions[J].ACM Transactions on Sensor Networks (TOSN),2017,13(1):1-41.

[34] STEPHENSON P J.Integrating Remote Sensing into Wildlife Monitoring for Conservation[J].Environmental Conservation,2019,46(3):181-183.

[35] CHANEV M,DOLAPCHIEV N,KAMENOVA I,et al.Application of Remote Sensing Methods For Monitoring Wild Life Populations:A Review[C]//Ninth International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2023).Ayia Napa:SPIE,2023:2681760.

[36] SCHWARTZ S,YAELI A,SHLOMOV S.Enhancing Trust in LLM-based AI Automation Agents:New Considerations and Future Challenges[EB/OL].(2023-08-10) [2024-05-01].https://arxiv.org/pdf/2308.05391.

[37] TELLI K,KRAA O,HIMEUR Y,et al.A Comprehensive Review of Recent Research Trends on Unmanned Aerial Vehicles (UAVs)[J].Systems,2023,11(8):400.

[38] DE CURTò J,DE ZARZA I,CALAFATE C T.Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles[J].Drones,2023,7(2):114.

[39] MISHRA S,PALANISAMY P.Autonomous Advanced Aerial Mobility—An End-to-End Autonomy Framework for UAVs and Beyond[J].IEEE Access,2023,11:136318-136349.

[40] ULLAH A,QI G,HUSSAIN S,et al.The Role of LLMs in Sustainable Smart Cities:Applications,Challenges,and Future Directions[EB/OL].(2024-02-07) [2024-05-01].https://arxiv.org/abs/2402.14596.

[41] WAN L J,HUANG Y B,LI Y H,et al.Software/Hardware Co-design for LLM and Its Application for Design Verification[C]//2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC).Incheon:IEEE,2024:435-441.

[42] YANG J F,JIN H Y,TANG R X,et al.Harnessing The Power of LLMs In Practice:A Survey on ChatGPT and Beyond[EB/OL].(2023-04-26) [2024-05-01].https://arxiv.org/abs/2304.13712.

[43] JAVAID S,SAEED N,QADIR Z,et al.Communication and Control in Collaborative UAVs:Recent Advances and Future Trends[J].IEEE Transactions on Intelligent Transportation Systems,2023,24(6):5719-5739.

[44] MA X Y,FANG G F,WANG X C.LLM-pruner:On the Structural Pruning of Large Language Models[J].Advances in Neural Information Processing Systems,2023,36:21702-21720.

[45] RONG B,RUTAGEMWA H.Leveraging Large Language Models for Intelligent Control of 6G Integrated TN-NTN with IoT Service[J].IEEE Network,2024,38(4):136-142.

[46] HASSAN S S,PARK Y M,TUN Y K,et al.Satellite-based ITS Data Offloading & Computation in 6G Networks:A Cooperative Multi-agent Proximal Policy Optimization DRL with Attention Approach[J].IEEE Transactions on Mobile Computing,2023,23(5):4956-4974.

[47] CHEN Q,GUO Z,MENG W X,et al.A Survey on Resource Management in Joint Communication and Computing-embedded SAGIN[EB/OL].(2024-05-14)[2024-05-01].https://arxiv.org/html/2403.17400v2.

[48] YAO Y F,DUAN J H,XU K D,et al.A Survey on Large Language Model (LLM) Security and Privacy:The Good,the Bad,and the Ugly[J].High-Confidence Computing,2024,4(2):100211.

[49] WU F Z,ZHANG N,JHA S,et al.A New Era in LLM Security:Exploring Security Concerns in Real-world LLM-based Systems[EB/OL].(2024-02-28) [2024-05-01].https://arxiv.org/abs/2402.18649.

[50] PIGGOTT B,PATIL S,FENG G H,et al.Net-GPT:A LLM-empowered Man-in-the-middle Chatbot for Unmanned Aerial Vehicle[C]//2023 IEEE/ACM Symposium on Edge Computing (SEC).Wilmington:IEEE,2023:287-293.

[51] 罗锦钊,孙玉龙,钱增志,等.人工智能大模型综述及展望[J].无线电工程,2023,53(11):2461-2472.

[52] 赵林,张宇飞,姚明旿,等.无人机集群协同技术发展与展望[J].无线电工程,2021,51(8):823-828.

基本信息:

DOI:

中图分类号:TP18;V279

引用信息:

[1]刘畅行,陈思衡,杨峰.基于多模态大模型的智能无人机系统:总结与展望[J].无线电工程,2024,54(12):2923-2932.

基金信息:

国家自然科学基金面上项目(62171276)~~

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文