107 | 2 | 6 |
下载次数 | 被引频次 | 阅读次数 |
深度学习模型有时会将一些未知类别数据误分类为已知类别,这些未知类别数据定义为在某些领域的分布外数据,例如生物信息、医疗保健、自动驾驶和网络安全等,这样的误分行为将会导致严重的后果。对网络流量识别与分类技术以及分布外数据进行了简要介绍,提出了一种在测试样本中检测存在分布外数据的方法。根据分布外数据特点,通过训练并计算2个模型得到的结果的似然比判断分布外数据。在网络流量公开数据集Moore数据集和4个自采集数据集上进行了测试,该检测方法的识别精度可以达到92.3%。
Abstract:Deep learning models sometimes misclassify some unknown categories of data into known categories.These unknown categories of data are defined as out-of-distribution data in some fields, such as biological information, medical care, automatic driving, network security and so on.These mistakes will lead to serious consequences.The identification and classification of network traffic and the out-of-distribution data are briefly introduced, and a method to detect the out-of-distribution data in test samples is proposed.According to the characteristics of out-of-distribution data, the out-of-distribution data can be judged by training two models and calculating the likelihood ratio of the results of the two models.The proposed method is tested on Moore data set and four self-collected data sets.The accuracy of the proposed method can reach 92.3%.
[1] 郭宝华.基于SDN和机器学习的QoS保障技术研究[D].西安:西安电子科技大学,2019.
[2] LU Q,LIN J D,SU Z,et al.System Design of Network Data Classification Based on Deep Packet Inspection[J].Journal of Physics:Conference Series,2021,1738(1):112-118.
[3] LI C X,GUO Y,WANG X.Towards Privacy-preserving Dynamic Deep Packet Inspection over Outsourced Middleboxes[J].High-confidence Computing,2022,2(1):33-38.
[4] 付文亮,嵩天,周舟.RocketTC:一个基于 FPGA 的高性能网络流量分类架构[J].计算机学报,2014,37(2):414-422.
[5] BERNAILLE L,TEIXEIRA R.Implementation Issues of Early Application Identification[C]//3rd Asian Conference on Internet Engineering.Berlin:Springer,2007:156-166.
[6] 鲁刚,余翔湛,张宏莉,等.基于集成聚类的流量分类架构[J].软件学报,2016,27(11):2870-2883.
[7] NALISNICK E,MATSUKAWA A,TEH Y W,et al.Do Deep Generative Models Know What They Don’t Know?[J/OL].(2019-02-24)[2021-03-15].https://arxiv.org/abs/1810.09136.
[8] LEE K,LEE K,LEE H,et al.A Simple Unified Framework for Detecting Out-of-distribution Samples and Adversarial Attacks[J].Advances in Neural Information Processing Systems,2018,31:7167-7177.
[9] JIANG H,KIM B,GUAN M,et al.To Trust or Not to Trust a Classifier[J].Advances in Neural Information Processing Systems,2018,31:5541-5552.
[10] LIU S,GARREPALLI R,DIETTERICH T G,et al.Open Category Detection with PAC Guarantees[J/OL].(2018-08-01)[2021-11-15].https://arxiv.org/abs/1808.00529v1.
[11] YU Q,AIZAWA K.Unsupervised Out-of-distribution Detection by Maximum Classifier Discrepancy[C]//IEEE/CVF International Conference on Computer Vision.Seoul:IEEE,2019:9517-9525.
[12] LINDEMANN B,MASCHLER B,SAHLAB N,et al.A Survey on Anomaly Detection for Technical Systems Using LSTM Networks[J].Computers in Industry,2021,131:103-109.
[13] 凌象政.循环神经网络压缩方法研究[D].南昌:华东交通大学,2021.
[14] ZHANG B,WANG Y N,YANG D P.Research on Credit Decision Based on BP Neural Network-Decision Tree[J].International Journal of Education and Teaching Research,2022,3(2):150-156.
[15] PARRA J S,DE SOUZA Z M,OLIVEIRA S R D M,et al.Phosphorus Adsorption Prediction through Decision Tree Algorithm under Different Topographic Conditions in Sugarcane Fields[J].Caterna,2022,213:106114.
[16] OLIVA L,HORLICK E,WANG B,et al.Developing a Random Forest Algorithm to Identify Patent Foramen Ovale and Atrial Septal Defects in Ontario Administrative Databases[J].BMC Medical Informatics and Decision Making,2022,22(1):93-94.
[17] PHAM Q B,TRAN D A,HA N T,et al.Random Forest and Nature-inspired Algorithms for Mapping Groundwater Nitrate Concentration in a Coastal Multi-layer Aquifer System[J].Journal of Cleaner Production,2022(3):343-346.
[18] GURLEKIAN J A,SULIGOY S,UNIVASO P,et al.Determining the Likelihood Ratio from Perceptual Attributes of Voice[J].Journal of Voice,2022,45(2):235-241.
[19] AL-AREQI F,KONYAR M Z.Effectiveness Evaluation of Different Feature Extraction Methods for Classification Covid-19 from Computed Tomography Images:A High Accuracy Classification Study[J].Biomedical Signal Processing and Control,2022,76:103-109.
基本信息:
DOI:
中图分类号:TP393.06
引用信息:
[1]卓子寒,吕欣润,刘立坤等.基于计算似然比的分布外网络流量数据检测方法[J].无线电工程,2022,52(08):1322-1329.
基金信息:
国家自然科学基金面上项目(61872111)~~