nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2022, 01, v.52 62-69
基于注意力机制特征融合与增强的自然场景文本检测
基金项目(Foundation): 国家自然科学基金(61876093); 江苏省自然科学基金资助项目(BK20181393)~~
邮箱(Email):
DOI:
摘要:

为了解决自然场景文本检测中由于文本实例分布随机、形态与尺度多样造成的检测难题,设计了一种基于注意力机制特征融合与增强的自然场景文本检测算法。利用注意力机制对有效特征提取的优势,在模型的解码融合阶段设计并引入了一种基于注意力的特征融合模块(Attention-based Feature Fusion Module, AFFM),利用空间和通道注意力分别为高层特征和低层特征引入更丰富的细节和全局信息,进一步提高了检测的准确率;设计了联合注意力特征增强模块(Joint Attention Feature Enhancement Module, JAM),利用卷积对级联后的特征在不同通道之间、空间位置间的联系建模,并生成联合特征权重mask对级联特征做加权,从而提高信息的表征能力,有效减少误检与漏检。在Total-Text和ICDAR2015两个数据集上对模型做评估,测试结果表明,该方法的F1综合指标分别达到了85.1%和87.6%,均优于当前主流算法。

Abstract:

In order to solve the problem of text detection in natural scenes caused by random distribution of text instances and various forms and scales, a text detection algorithm based on attention mechanism feature fusion and enhancement is designed.Taking advantage of attention mechanism to extract effective features, an Attention-based Feature Fusion Module(AFFM) is designed and introduced in the decoding and fusion stage of the model, which uses spatial and channel attention to introduce more details and global information for high-level features and low-level features respectively, so as to further improve the detection accuracy; a Joint Attention Feature Enhancement Module(JAM) is designed, and the convolution is used to model the relationship between different channels and spatial positions of cascaded features, then a joint feature weight mask is generated to weight the cascaded features, so as to improve the ability of information representation and effectively reduce false detection and missed detection.The model is evaluated on both Total-Text and ICDAR2015 datasets, and the test results show that the F1 comprehensive index of this method reaches 85.1% and 87.6% respectively, both better than the current mainstream algorithm.

参考文献

[1] 郭芬红,谢立艳,熊昌镇.自然场景图像文字检测研究综述[J].计算机应用,2018,38(S1):173-178.

[2] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.

[3] ZHI T,HUANG W,TONG H,et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]∥2016 European Conference on Computer Vision.Amsterdam:ECCV,2016:56-72.

[4] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot Multi-box Detector[C]//European Conference on Computer Vision.Amsterdam:ECCV,2016:21-37.

[5] SHI B G,BAI X,BELONGIE S.Detecting Oriented Text in Natural Images by Linking Segments[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:3482-3490.

[6] LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.

[7] LIN T Y,DOLLáR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:2117-2125.

[8] DAN D,LIU H,LI X,et al.PixelLink:Detecting Scene Text via Instance Segmentation[J].AAAI Conference on Artificial Intelligence,2018,32(1):6773-6780.

[9] ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:2642-2651.

[10] WANG W H,XIE E,LI X,et al.Shape Robust Text Detection with Progressive Scale Expansion Network[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach:IEEE,2019:9328-9337.

[11] BAEK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:9357-9366.

[12] LIAO M,WAN Z,YAO C,et al.Real-time Scene Text Detection with Differentiable Binarization[J].AAAI Conference on Artificial Intelligence,2020,34(7):11474-11481.

[13] HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:770-778.

[14] PARK J,WOO S,LEE J Y,et al.Bam:Bottleneck Attention Module[J].arXiv preprint arXiv:1807.06514,2018.

[15] VATTI B R A.Generic Solution to Polygon Clipping[J].Communications of the ACM,1992,35(7):56-63.

[16] GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic Data for Text Localisation in Natural Images[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:2315-2324.

[17] KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR 2015 Competition on Robust Reading[C]//13th International Conference on Document Analysis and Recognition(ICDAR).Tunis:IEEE,2015:1156-1160.

[18] CH′NG C K,CHAN C S.Total-Text:A Comprehensive Dataset for Scene Text Detection and Recognition[C]//14th IAPR International Conference on Document Analysis and Recognition(ICDAR).Kyoto:IEEE,2018:935-942.

[19] XU Y,WANG Y,ZHOU W,et al.Textfield:Learning a Deep Direction Field for Irregular Scene Text Detection[J].IEEE Transactions on Image Processing,2019,28(11):5566-5579.

[20] LONG S,RUAN J,ZHANG W,et al.Textsnake:A Flexible Representation for Detecting Text of Arbitrary Shapes[C]//European Conference on Computer Vision.Munich:ECCV,2018:19-35.

[21] ZHANG C Q,LIANG B R,HUANG Z M,et al.Look More Than Once:An Accurate Detector for Text of Arbitrary Shapes[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach:IEEE,2019:10552-10561.

基本信息:

中图分类号:TP391.41

引用信息:

[1]陈静娴,周全.基于注意力机制特征融合与增强的自然场景文本检测[J].无线电工程,2022,52(01):62-69.

基金信息:

国家自然科学基金(61876093); 江苏省自然科学基金资助项目(BK20181393)~~

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文