A Self-Supervised Approach for Enhanced Feature Representations in Object Detection Tasks

  1. Vilabella, Santiago C. 2
  2. Pérez-Núñez, Pablo 1
  3. Remeseiro, Beatriz 1
  1. 1 Universidad de Oviedo
    info

    Universidad de Oviedo

    Oviedo, España

    ROR https://ror.org/006gksa02

  2. 2 Universidad Internacional Menéndez Pelayo
    info

    Universidad Internacional Menéndez Pelayo

    Madrid, España

    ROR https://ror.org/02zngfv65

Actes de conférence:
IJCNN 2024 Conferences Proceedings

ISSN: 2161-4407 2161-4393

ISBN: 979-8-3503-5931-2 979-8-3503-5932-9

Année de publication: 2024

Volumen: 2

Pages: 1-8

Congreso: 2024 International Joint Conference on Neural Networks (IJCNN)

Type: Communication dans un congrès

DOI: 10.1109/IJCNN60899.2024.10651388 SCOPUS: 2-s2.0-85204973206 GOOGLE SCHOLAR

Résumé

In the fast-evolving field of artificial intelligence,where models are increasingly growing in complexity and size,the availability of labeled data for training deep learning modelshas become a significant challenge. Addressing complex problemslike object detection demands considerable time and resourcesfor data labeling to achieve meaningful results. For companiesdeveloping such applications, this entails extensive investmentin highly skilled personnel or costly outsourcing. This researchwork aims to demonstrate that enhancing feature extractors cansubstantially alleviate this challenge, enabling models to learnmore effective representations with less labeled data. Utilizinga self-supervised learning strategy, we present a model trainedon unlabeled data that outperforms state-of-the-art feature extractors pre-trained on ImageNet and particularly designed forobject detection tasks. Moreover, the results demonstrate thatour approach encourages the model to focus on the most relevantaspects of an object, thus achieving better feature representationsand, therefore, reinforcing its reliability and robustness.

Information sur le financement

Grant PID2019-109238GB-C21 funded by MICIU/AEI/10.13039/501100011033.

Financeurs

Références bibliographiques

  • [1] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015.
  • [2] S. J. Russell and P. Norvig, Artificial intelligence a modern approach. Pearson Education, 2010.
  • [3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  • [4] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
  • [5] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, pp. 211–252, 2015.
  • [6] X. Bu, J. Peng, J. Yan, T. Tan, and Z. Zhang, “GAIA: A Transfer Learning System of Object Detection That Fits Your Needs,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 274–283.
  • [7] S. Soo, “Object detection using Haar-cascade Classifier,” Institute of Computer Science, University of Tartu, vol. 2, no. 3, pp. 1–12, 2014.
  • [8] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 886–893.
  • [9] Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019.
  • [10] B. Adhikari and H. Huttunen, “Iterative bounding box annotation for object detection,” in 25th International Conference on Pattern Recognition, 2021, pp. 4040–4046.
  • [11] R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Goldstein, F. Bordes, A. Bardes, G. Mialon, Y. Tian et al., A Cookbook of Self-Supervised Learning. UMBC Faculty Collection, 2023.
  • [12] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised Representation Learning by Predicting Image Rotations,” in 6th International Conference on Learning Representations, 2018, pp. 1–16.
  • [13] I. Misra and L. v. d. Maaten, “Self-supervised learning of pretextinvariant representations,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6707–6717.
  • [14] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International Conference on Machine Learning, 2020, pp. 1597–1607.
  • [15] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  • [16] M. Tan and Q. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in International Conference on Machine Learning, 2019, pp. 6105–6114.
  • [17] A. Mao, M. Mohri, and Y. Zhong, “Cross-entropy loss functions: Theoretical analysis and applications,” in International Conference on Machine Learning, 2023, pp. 23 803–23 828.
  • [18] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU loss: Faster and better learning for bounding box regression,” in AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 993–13 000.
  • [19] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in 13th European Conference on Computer Vision, 2014, pp. 740–755.
  • [20] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision, vol. 88, pp. 303–338, 2010.
  • [21] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in IEEE International Conference on Computer Vision, 2017, pp. 618–626.
  • [22] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015, pp. 1–15.