Abstract
Logo detection has been gaining considerable attention because of its wide range of applications in the multimedia field, such as copyright infringement detection, brand visibility monitoring, and product brand management on social media. In this article, we introduce LogoDet-3K, the largest logo detection dataset with full annotation, which has 3,000 logo categories, about 200,000 manually annotated logo objects, and 158,652 images. LogoDet-3K creates a more challenging benchmark for logo detection, for its higher comprehensive coverage and wider variety in both logo categories and annotated objects compared with existing datasets. We describe the collection and annotation process of our dataset and analyze its scale and diversity in comparison to other datasets for logo detection. We further propose a strong baseline method Logo-Yolo, which incorporates Focal loss and CIoU loss into the basic YOLOv3 framework for large-scale logo detection. It obtains about 4% improvement on the average performance compared with YOLOv3, and greater improvements compared with reported several deep detection models on LogoDet-3K. We perform extensive evaluation on three other existing datasets to further verify on both logo detection and retrieval tasks, and we demonstrate better generalization ability of LogoDet-3K on logo detection and retrieval tasks. The LogoDet-3K dataset is used to promote large-scale logo-related research. The code and LogoDet-3K can be found at https://github.com/Wangjing1551/LogoDet-3K-Dataset.
- [1] . 2015. Logo recognition using CNN features. In Proceedings of the International Conference on Image Analysis and Processing. 438–448.Google Scholar
Digital Library
- [2] . 2017. Deep learning for logo recognition. Neurocomputing 245 (July 2017), 23–30. Google Scholar
Digital Library
- [3] . 2018. Cascade R-CNN: Delving into high-quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6154–6162.Google Scholar
Cross Ref
- [4] . 2020. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision. 213–229.Google Scholar
Digital Library
- [5] . 2016. Video ECommerce: Towards online video advertising. In Proceedings of the ACM International Conference on Multimedia. 1365–1374. Google Scholar
Digital Library
- [6] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google Scholar
Cross Ref
- [7] . 2017. Improving small object proposals for company logo detection. In Proceedings of the ACM on International Conference on Multimedia Retrieval. 167–174. Google Scholar
Digital Library
- [8] . 2010. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision (2010), 303–338. Google Scholar
Digital Library
- [9] . 2019. Scalable logo recognition using proxies. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 715–725.Google Scholar
Cross Ref
- [10] . 2015. LOGO-Net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. Retrieved from https://arXiv:1511.02462.Google Scholar
- [11] . 2015. DeepLogo: Hitting logo recognition with the deep neural network hammer. Retrieved from https://arXiv:1510.02131.Google Scholar
- [12] . 2016. R-fcn: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems. MIT Press, 379–387. Google Scholar
Digital Library
- [13] . 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. 346–361.Google Scholar
- [14] . 2011. Scalable triangulation-based logo recognition. In Proceedings of the ACM International Conference on Multimedia Retrieval. 1–7. Google Scholar
Digital Library
- [15] . 2011. Segmentation as selective search for object recognition. In Proceedings of the IEEE International Conference on Computer Vision. 1879–1886. Google Scholar
Digital Library
- [16] . 2020. Foveabox: Beyound anchor-based object detection. IEEE Trans. Image Process. 29 (2020), 7389–7398.Google Scholar
Digital Library
- [17] . 2020. A new sport teams logo dataset for detection tasks. In Computer Vision and Graphics. Springer, 87–97.Google Scholar
Digital Library
- [18] . 2018. CornerNet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision. 765–781.Google Scholar
Cross Ref
- [19] . 2017. Mutual enhancement for detection of multiple logos in sports videos. In Proceedings of the IEEE International Conference on Computer Vision. 4856–4865.Google Scholar
Cross Ref
- [20] . 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 936–944.Google Scholar
Cross Ref
- [21] . 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2999–3007.Google Scholar
Cross Ref
- [22] . 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740–755.Google Scholar
Cross Ref
- [23] . 2014. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb. In Computer Vision Image Understanding. Elsevier, 31–41.Google Scholar
- [24] . 2020. A benchmark dataset and comparison study for multi-modal human action analytics. ACM Trans. Multimedia Comput. Commun. Appl. (2020).Google Scholar
Digital Library
- [25] . 2018. Visual listening in: Extracting brand image portrayed on social media. In Proceedings of the AAAI Conference on Artificial Intelligence. 71–77.Google Scholar
- [26] . 2016. SSD: Single shot MultiBox detector. In Proceedings of the European Conference on Computer Vision. 21–37.Google Scholar
Cross Ref
- [27] . 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference. 1150–1157. Google Scholar
Digital Library
- [28] . 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. 818–833.Google Scholar
Cross Ref
- [29] . 2007. VideoSense-towards effective online video advertising. In Proceedings of the ACM International Conference on Multimedia. 1075–1084. Google Scholar
Digital Library
- [30] . 2005. Histograms of oriented gradients for human detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 886–893. Google Scholar
Digital Library
- [31] . 2002. Integration of local and global shape analysis for logo classification. Pattern Recogn. Lett. (2002), 1449–1457. Google Scholar
Digital Library
- [32] . 2016. Automatic graphic logo detection via fast region-based convolutional networks. In Proceedings of the International Joint Conference on Neural Networks. 985–991.Google Scholar
Cross Ref
- [33] . 2010. Cascade object detection with deformable part models. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2241–2248.Google Scholar
- [34] . 2010. Object detection with discriminatively trained part-based models. In IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 1627–1645. Google Scholar
Digital Library
- [35] . 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1–8.Google Scholar
- [36] . 2020. A coarse-to-fine facial landmark detection method based on self-attention mechanism. IEEE Trans. Multimedia (2020), 1–10.Google Scholar
- [37] . 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.Google Scholar
Cross Ref
- [38] . 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6517–6525.Google Scholar
Cross Ref
- [39] . 2018. YOLOv3: An incremental improvement. Retrieved from https://arXiv:1804.02767.Google Scholar
- [40] . 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Neural Information Processing Systems. 91–99. Google Scholar
Digital Library
- [41] . 2012. Correlation-based burstiness for logo retrieval. In Proceedings of the ACM International Conference on Multimedia. 965–968. Google Scholar
Digital Library
- [42] . 2013. Bundle min-hashing for logo recognition. In Proceedings of the ACM Conference on International Conference on Multimedia Retrieval. 113–120. Google Scholar
Digital Library
- [43] . 2011. Scalable logo recognition in real-world images. In Proceedings of the ACM Conference on International Conference on Multimedia Retrieval. 1–8. Google Scholar
Digital Library
- [44] . 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587. Google Scholar
Digital Library
- [45] . 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. 1–14.Google Scholar
- [46] . 2018. SNIPER: Efficient multi-scale training. In Proceedings of the Conference on Neural Information Processing Systems. 9333–9343. Google Scholar
Digital Library
- [47] . 2017. WebLogo-2M: Scalable logo detection by deep learning from the web. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 270–279.Google Scholar
Cross Ref
- [48] . 2020. Scalable logo detection by self co-learning. Pattern Recogn. (2020), 107003.Google Scholar
Cross Ref
- [49] . 2021. Multi-perspective cross-class domain adaptation for open logo detection. Comput. Vision Image Understand. (2021), 103–156.Google Scholar
- [50] . 2017. Deep learning logo detection with data expansion by synthesising context. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 530–539.Google Scholar
Cross Ref
- [51] . 2018. Open logo detection challenge. In Proceedings of the British Machine Vision Conference. 111–119.Google Scholar
- [52] . 2020. Sparse r-cnn: End-to-end object detection with learnable proposals. Retrieved from https://arXiv:2011.12450.Google Scholar
- [53] . 2018. Open set logo detection and retrieval. In Proceedings of the Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. 284–292.Google Scholar
- [54] . 2020. Logo-2K+: A large-scale logo dataset for scalable logo classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 6194–6201.Google Scholar
Cross Ref
- [55] . 2020. Controlling neural learning network with multiple scales for image splicing forgery detection. ACM Trans. Multimedia Comput. Commun. Appl. (2020), 1–22. Google Scholar
Digital Library
- [56] . 2005. Automatic video logo detection and removal. Multimedia Syst. (2005), 379–391. Google Scholar
Digital Library
- [57] . 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3973–3981.Google Scholar
Cross Ref
- [58] . 2016. Region-based CNN for logo detection. In Internet Multimedia Computing and Service. Springer, 319–322. Google Scholar
Digital Library
- [59] . 2014. Brand data gathering from live social media streams. In Proceedings of the International Conference on Multimedia Retrieval. 169–176. Google Scholar
Digital Library
- [60] . 2016. Filtering of brand-related microblogs using social-smooth multiview embedding. IEEE Trans. Multimedia (2016), 2115–2126.Google Scholar
- [61] . 2013. Multifeature analysis and semantic context learning for image classification. ACM Trans. Multimedia Comput. Commun. Appl. (2013), 1–20. Google Scholar
Digital Library
- [62] . 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9759–9768.Google Scholar
Cross Ref
- [63] . 2020. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence. 12993–13000.Google Scholar
Cross Ref
- [64] . 2020. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’20).Google Scholar
- [65] . 2017. Video eCommerce++: Toward large scale online video advertising. IEEE Trans. Multimedia (2017), 1170–1183. Google Scholar
Digital Library
- [66] . 2019. Steganographer detection via multi-scale embedding probability estimation. ACM Trans. Multimedia Comput. Commun. Appl. (2019), 1–23. Google Scholar
Digital Library
Index Terms
LogoDet-3K: A Large-scale Image Dataset for Logo Detection
Recommendations
FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via Multi-Scale Feature Decoupling Network
MM '21: Proceedings of the 29th ACM International Conference on MultimediaFood logo detection plays an important role in the multimedia for its wide real-world applications, such as food recommendation of the self-service shop and infringement detection on e-commerce platforms. A large-scale food logo dataset is urgently ...
Robust Logo Detection in E-Commerce Images by Data Augmentation
MM '21: Proceedings of the 29th ACM International Conference on MultimediaLogo detection is an important task in the intellectual property protection in e-commerce. In the paper, we introduce our solution for the ACM MM2021 Robust Logo Detection Grand Challenge. The competition requires the detection of logos (515 categories) ...
Region-based CNN for Logo Detection
ICIMCS'16: Proceedings of the International Conference on Internet Multimedia Computing and ServiceLogo detection has been extensively studied because of its applications in many fields. Most existing studies for logo detection are based on hand-designed local features which have certain limitations. Recently, Convolutional Neural Networks (CNN), ...






Comments