nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2023, 11, v.53 2461-2472
人工智能大模型综述及展望
基金项目(Foundation): 国家自然科学基金(62006230)~~
邮箱(Email):
DOI:
12,115 147 1335
下载次数 被引频次 阅读次数
摘要:

大模型是人工智能领域的热门研究方向。以ChatGPT为代表的大模型技术应用掀起了国内外的大模型研究热潮,大模型参数规模和训练数据量级迅速增长,模型性能显著提升。概述了大模型的发展历程以及代表性的算法模型,介绍了大模型的基础架构及其核心原理,分析了大模型的特点,讨论了大模型的局限性以及未来发展方向。

Abstract:

Large model is a popular research direction in the field of artificial intelligence.The application of large model technology represented by ChatGPT has set off an upsurge of large model research at home and abroad.The scale of large model parameters and the magnitude of training data have increased rapidly, and the performance of the model has been significantly improved.The development process of large models and representative algorithm models are summarized, the basic structure and core principles of large models are introduced, the characteristics of large models are analyzed, and finally limitations of large models and the future development direction are discussed.

参考文献

[1] KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A Convolutional Neural Network for Modelling Sentences[J/OL].(2014-04-08)[2023-07-20].https://arxiv.org/abs/1404.2188.

[2] SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[C]//27th International Conference on Neural Information Processing Systems.Montreal:MIT Press,2014:3104-3112.

[3] SOCHER R,PERELYGIN A,WU J,et al.Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.Seattle:ACL,2013:1631-1642.

[4] DENG J,DONG W,SOCHER R,et al.Imagenet:A Large-scale Hierarchical Image Database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:248-255.

[5] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed Representations of Words and Phrases and Their Compositionality[C]//26th International Conference on Neural Information Processing Systems.Lake Tahoe:Curran Associates Inc.,2013:3111-3119.

[6] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J/OL].(2018-10-11)[2023-07-20].https://arxiv.org/abs/1810.04805.

[7] RADFORD A,WU J,CHILD R,et al.Language Models are Unsupervised Multitask Learners[J/OL].[2023-07-20].https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

[8] BROWN T B,MANN B,RYDER N,et al.Language Models are Few-shot Learners[C]//34th International Conference on Neural Information Processing Systems.Vancouver:Curran Associates Inc.,2020:1877-1901.

[9] BENGIO Y,DUCHARME R,VINCENT P.A Neural Probabilistic Language Model[J].The Journal of Machine Learning Research,2003,3:1137-1155.

[10] MIKOLOV T,KARAFIáT M,BURGET L,et al.Recurrent Neural Network Based Language Model[C]//Interspeech 2010.Makuhari:ISCA,2010:1045-1048.

[11] SUNDERMEYER M,SCHLüTER R,NEY H.LSTM Neural Networks for Language Modeling[C]//Interspeech 2012.Portland:ISCA,2012:194-197.

[12] DAI A M,LE Q V.Semi-supervised Sequence Learning[C]∥28th International Conference on Neural Information Processing Systems.Montreal:MIT Press,2015;3079-3087.

[13] RAMACHANDRAN P,LIU P J,LE Q V.Unsupervised Pretraining for Sequence to Sequence Learning[J/OL].(2016-11-08)[2023-07-20].https://arxiv.org/abs/1611.02683.

[14] SARZYNSKA-WAWER J,WAWER A,PAWLAK A,et al.Detecting Formal Thought Disorder by Deep Contextualized Word Representations[J].Psychiatry Research,2021,304:114135.

[15] SHANAHAN M.Talking About Large Language Models[J/OL].(2022-12-07)[2023-07-20].https://arxiv.org/abs/2212.03551.

[16] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving Language Understanding by Generative Pre-training[J/OL].[2023-07-20].https://cdn.openai.com/research-covers/language-unsupervised/language_understan-ding_paper.pdf.

[17] LIU Y H,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[J/OL].(2019-07-26)[2023-07-20].https://arxiv.org/abs/1907.11692.

[18] ZHAO W X,ZHOU K,LI J Y,et al.A Survey of Large Language Models[J/OL].(2023-03-31)[2023-07-20].https://arxiv.org/abs/2303.18223.

[19] CHEN M,TWOREK J,JUN H,et al.Evaluating Large Language Models Trained on Code[J/OL].(2021-07-07)[2023-07-20].https://arxiv.org/abs/2107.03374.

[20] DRORI I,ZHANG S,SHUTTLEWORTH R,et al.A Neural Network Solves,Explains,and Generates University Math Problems by Program Synthesis and Few-shot Learning at Human Level[J/OL].(2021-12-31)[2023-07-20].https://arxiv.org/abs/2112.15594.

[21] YE J J,CHEN X T,XU N,et al.A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models[J/OL].(2023-03-18)[2023-07-20].https://arxiv.org/abs/2303.10420.

[22] OUYANG L,WU J,JIANG X,et al.Training Language Models to Follow Instructions with Human Feedback[C]//36th Conference on Neural Information Processing Systems.New Orleans:[s.n.],2022:27730-27744.

[23] RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J].The Journal of Machine Learning Research,2020,21(1):5485-5551.

[24] CHUNG H W,HOU L,LONGPRE S,et al.Scaling Instruction-finetuned Language Models[J/OL].(2022-10-20)[2023-07-20].https://arxiv.org/abs/2210.11416.

[25] DU Z X,QIAN Y J,LIU X,et al.GLM:General Language Model Pretraining with Autoregressive Blank Infilling[J/OL].(2021-03-18)[2023-07-20].https://arxiv.org/abs/2103.10360.

[26] CHOWDHERY A,NARANG S,DEVLIN J,et al.PaLM:Scaling Language Modeling with Pathways[J/OL].(2022-04-05)[2023-07-20].https://arxiv.org/abs/2204.02311.

[27] TOUVRON H,LAVRIL T,IZACARD G,et al.LLaMA:Open and Efficient Foundation Language Models[J/OL].(2023-02-27)[2023-07-20].https://arxiv.org/abs/2302.13971.

[28] The Vicuna Team.Vicuna:An Open-source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality[EB/OL].(2023-03-30)[2023-07-20].https://vicuna.lmsys.org.

[29] TOUVRON H,MARTIN L,STONE K,et al.LLaMA 2:Open Foundation and Fine-tuned Chat Models[J/OL].(2023-07-18)[2023-07-20].https://arxiv.org/abs/2307.09288.

[30] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[J].Communications of the ACM,2017,60(6):84-90.

[31] SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-scale Image Recognition[J/OL].(2014-09-04)[2023-07-20].https://arxiv.org/abs/1409.1556.

[32] HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.

[33] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J/OL].(2017-04-17)[2023-07-20].https://arxiv.org/abs/1704.04861.

[34] DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An Image is Worth 16×16 Words:Transformers for Image Recognition at Scale[J/OL].(2020-10-22)[2023-07-20].https://arxiv.org/abs/2010.11929.

[35] TAN M X,LE Q V.EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks[C]//International Conference on Machine Learning.Long Beach:PMLR,2019:6105-6114.

[36] LIU Z,HU H,LIN Y T,et al.Swin Transformer v2:Scaling up Capacity and Resolution[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE,2022:12009-12019.

[37] WANG J P,GE Y X,YAN R,et al.All in One:Exploring Unified Video-language Pre-training[C]//2013 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:6598-6608.

[38] HE K M,CHEN X L,XIE S N,et al.Masked Autoencoders Are Scalable Vision Learners[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE,2022:16000-16009.

[39] LI T H,CHANG H W,MISHRA S K,et al.Mage:Masked Generative Encoder to Unify Representation Learning and Image Synthesis[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:2142-2152.

[40] ESSER P,ROMBACH R,OMMER B.Taming Transformers for High-resolution Image Synthesis[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Nashville:IEEE,2021:12873-12883.

[41] TONG Z,SONG Y B,WANG J,et al.VideoMAE:Masked Autoencoders Are Data-efficient Learners for Self-supervised Video Pre-training[C]//36th Conference on Neural Information Processing Systems.New Orleans:[s.n.],2022:1-16.

[42] WANG L M,HUANG B K,ZHAO Z Y,et al.VideoMAE v2:Scaling Video Masked Autoencoders with Dual Masking[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Vancouver:IEEE,2023:14549-14560.

[43] KIRILLOV A,MINTUN E,RAVI N,et al.Segment Anything[J/OL].(2023-04-05)[2023-07-20].https://arxiv.org/abs/2304.02643.

[44] DENG R N,CUI C,LIU Q,et al.Segment Anything Model (SAM) for Digital Pathology:Assess Zero-shot Segmentation on Whole Slide Imaging[J/OL].(2023-04-09)[2023-07-20].https://arxiv.org/abs/2304.04155.

[45] CHENG Y M,LI L L,XU Y Y,et al.Segment and Track Anything[J/OL].(2023-05-11)[2023-07-20].https://arxiv.org/abs/2305.06558.

[46] WANG X L,WANG W,CAO Y,et al.Images Speak in Images:A Generalist Painter for In-context Visual Learning[C]//2023 the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:6830-6839.

[47] XI T,SUN Y F,YU D L,et al.UFO:Unified Feature Optimization[C]//European Conference on Computer Vision.Tel Aviv:Springer,2022:472-488.

[48] VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[C]//31st International Conference on Neural Information Processing Systems.Long Beach:Curran Associates Inc.,2017:6000-6010.

[49] RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models from Natural Language Supervision[C]∥International Conference on Machine Learning.[S.l.]:PMLR,2021:8748-8763.

[50] LUO H S,JI L,ZHONG M,et al.CLIP4Clip:An Empirical Study of Clip for End to End Video Clip Retrieval and Captioning[J].Neurocomputing,2022,508(C):293-304.

[51] WANG M M,XING J Z,LIU Y.ActionCLIP:A New Paradigm for Video Action Recognition[J/OL].(2021-09-17)[2023-07-20].https://arxiv.org/abs/2109.08472.

[52] LI J N,LI D X,XIONG C M,et al.BLIP:Bootstrapping Language-image Pre-training for Unified Vision-language Understanding and Generation[C]//International Conference on Machine Learning.Baltimore:PMLR,2022:12888-12900.

[53] ALAYRAC J B,DONAHUE J,LUC P,et al.Flamingo:A Visual Language Model for Few-shot Learning[C]//36th Conference on Neural Information Processing Systems.New Orleans:[s.n.],2022:23716-23736.

[54] LI J N,LI D X,SAVARESE S,et al.BLIP-2:Bootstrapping Language-image Pre-training with Frozen Image Encoders and Large Language Models[J/OL].(2023-01-30)[2023-07-20].https://arxiv.org/abs/2301.12597.

[55] OpenAI.GPT-4 Technical Report[J/OL].(2023-03-15)[2023-07-20].https://arxiv.org/abs/2303.08774.

[56] ZHU D Y,CHEN J,SHEN X Q,et al.MiniGPT 4:Enhancing Vision Language Understanding with Advanced Large Language Models[J/OL].(2023-04-20)[2023-07-20].https://arxiv.org/abs/2304.10592.

[57] DRIESS D,XIA F,SAJJADI M,et al.PaLM E:An Embodied Multimodal Language Model[J/OL].(2023-03-06)[2023-07-20].https://arxiv.org/abs/2303.03378.

[58] LIU H T,LI C Y,WU Q Y,et al.Visual Instruction Tuning[J/OL].(2023-04-17)[2023-07-20].https://arxiv.org/abs/2304.08485.

[59] JIA C,YANG Y F,XIA Y,et al.Scaling up Visual and Vision-language Representation Learning with Noisy Text Supervision[C]//International Conference on Machine Learning.[S.l.]:PMLR,2021:4904-4916.

[60] KAPLAN J,MCCANDLISH S,HENIGHANT,et al.Scaling Laws for Neural Language Models[J/OL].(2020-01-23)[2023-07-20].https://arxiv.org/abs/2001.08361.

[61] HOFFMANN J,BORGEAUD S,MENSCH A,et al.Training Compute-optimal Large Language Models[J/OL].(2022-03-29)[2023-07-20].https://arxiv.org/abs/2203.15556.

[62] ANANTHASWAMY A.In AI,Is Bigger Always Better?[J].Nature,2023,615(7951):202-205.

基本信息:

中图分类号:TP18

引用信息:

[1]罗锦钊,孙玉龙,钱增志,等.人工智能大模型综述及展望[J].无线电工程,2023,53(11):2461-2472.

基金信息:

国家自然科学基金(62006230)~~

发布时间:

2023-08-29

出版时间:

2023-08-29

网络发布时间:

2023-08-29

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文