数据集包含 200 种鸟类(主要为北美洲鸟类)照片的图像数据集,可用于图像识别工作。分类数量:200;图片数量: 11,788;平均每张图片含有的标注数量:15 个局部位置,312 个二进制属性,1 个边框框。
We present a simple and effective architecture for fine-grained visual recognition called Bilinear Convolutional Neural Networks (B-CNNs). These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner. B-CNNs belong to the class of orderless texture representations but unlike prior work they can be trained in an end-to-end manner. Our most accurate model obtains 84.1%, 79.4%, 86.9% and 91.3% per-image accuracy on the Caltech-UCSD birds, NABirds, FGVC aircraft, and Stanford cars datasets respectively and runs at 30 frames-per-second on an NVIDIA Titan X GPU. We then present a systematic analysis of these networks and show that (1) the bilinear features are highly redundant and can be reduced by an order of magnitude in size without significant loss in accuracy, (2) are also effective for other image classification tasks such as texture and scene recognition, and (3) can be trained from scratch on the ImageNet dataset offering consistent improvements over the baseline architecture. Finally, we present visualizations of these models on various datasets using top activations of neural units and gradient-based inversion techniques.
译:
我们提出了一种简单有效的细粒度视觉识别体系结构,称为双线性卷积神经网络(B-CNNs)。这些网络将图像表示为从两个cnn派生的特征的集合外积,并以平移不变的方式捕捉局部特征交互作用。B-cnn属于无序纹理表示类,但与以往的研究不同,B-cnn可以进行端到端的训练。在加州理工大学UCSD birds、NABirds、FGVC aircraft和Stanford cars数据集上,我们最精确的模型分别获得84.1%、79.4%、86.9%和91.3%的每幅图像精度,并在NVIDIA Titan X GPU上以每秒30帧的速度运行。然后,我们对这些网络进行了系统的分析,结果表明:(1)双线性特征高度冗余,可以在不显著降低精度的情况下减小一个数量级;(2)对于其他图像分类任务(如纹理和场景识别)也有效;(3)可以从零开始在ImageNet数据集提供了与基线体系结构一致的改进。最后,我们使用神经单元的顶部激活和基于梯度的反演技术在各种数据集上展示这些模型的可视化。
大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020081901”获取下载链接。
只要自己有时间,都尽量写写文章,与大家交流分享。
本人公众号:
CSDN博客地址:https://blog.csdn.net/ispeasant