QuantificationLib: A Python library for quantification and prevalence estimation

  1. Castaño, Alberto 1
  2. Alonso, Jaime 1
  3. González, Pablo 1
  4. Pérez, Pablo 1
  5. del Coz, Juan José 1
  1. 1 Universidad de Oviedo
    info

    Universidad de Oviedo

    Oviedo, España

    ROR https://ror.org/006gksa02

Revista:
SoftwareX

ISSN: 2352-7110

Año de publicación: 2024

Volumen: 26

Páginas: 101728

Tipo: Artículo

DOI: 10.1016/J.SOFTX.2024.101728 SCOPUS: 2-s2.0-85189752939 GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: SoftwareX

Resumen

QuantificationLib is an open-source Python library that provides a comprehensive set of algorithms for quantification learning. Quantification, also known as prevalence estimation, is a supervised machine-learning task where the objective is to train a model that is able to predict the distribution of classes in a set of unseen examples or bags. This library offers a wide variety of quantification methods suited for easy prototyping and experimentation, applicable to a wide range of quantification applications.

Información de financiación

This work was supported by grant PID2019-110742RB-I00 from Spanish Ministerio de Economía y Competitividad (MINECO) and grant PID2019-109238GB-C21 from Spanish Ministry of Science and Innovation .

Financiadores

Referencias bibliográficas

  • González P, Díez J, Chawla N, del Coz JJ. Why is quantification an interesting learning problem? Progr Artif Intell 2017;6:53–8.
  • González P, Castaño A, Chawla NV, Coz JJD. A review on quantification learning. ACM Comput Surv 2017;50(5):1–40.
  • Esuli A, Fabris A, Moreo A, Sebastiani F. Learning to quantify. In: The information retrieval series, vol. 47, Springer; 2023, http://dx.doi.org/10.1007/978-3-031-20467-8.
  • Moreo A, Esuli A, Sebastiani F. QuaPy: A Python-based framework for quantification. In: Proceedings of the 30th ACM international conference on information & knowledge management. 2021, p. 4534–43.
  • Bunse M. qunfold: Composable quantification and unfolding methods in Python. In: Proceedings of the 3rd international workshop on learning to quantify (LQ 2023), co-located at ECML-pKDD. 2023, p. 1–7.
  • Schumacher T, Strohmaier M, Lemmerich F. A comparative evaluation of quantification methods. 2021, arXiv preprint arXiv:2103.03223.
  • Firat A. Unified framework for quantification. 2016, arXiv preprint arXiv:1606.00868
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011;12:2825–30.
  • Friedman J. Class counts in future unlabeled samples (detecting and dealing with concept drift). 2014, Presentation at MIT CSAIL Big Data Event.
  • Maletzke A, dos Reis D, Cherman E, Batista G. Dys: a framework for mixture models in quantification. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 33, 2019, p. 4552–60.
  • Castaño A, Alonso J, González P, del Coz JJ. An equivalence analysis of binary quantification methods. In: Proceedings of the AAAI Conference on Artificial Intelligence. 37, (6):2023, p. 6944–52.
  • Forman G. Quantifying counts and costs via classification. Data Min Knowl Discov 2008;17:164–206.
  • Lipton Z, Wang Y-X, Smola A. Detecting and correcting for label shift with black box predictors. In: International conference on machine learning. PMLR; 2018, p. 3122–30.
  • Bella A, Ferri C, Hernández-Orallo J, Ramirez-Quintana MJ. Quantification via probability estimators. In: 2010 IEEE international conference on data mining. IEEE; 2010, p. 737–42.
  • González-Castro V, Alaiz-Rodríguez R, Alegre E. Class distribution estimation based on the Hellinger distance. Inform Sci 2013;218:146–64.
  • Forman G. Counting positives accurately despite inaccurate classification. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L, editors. Machine learning: ECML 2005. Berlin, Heidelberg: Springer; 2005, p. 564–75.
  • Saerens M, Latinne P, Decaestecker C. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput 2002;14(1):21–41.
  • Alexandari A, Kundaje A, Shrikumar A. Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation. In: International conference on machine learning. PMLR; 2020, p. 222–32.
  • Kawakubo H, Du Plessis MC, Sugiyama M. Computationally efficient class-prior estimation under class balance change using energy distance. IEICE Trans Inf Syst 2016;99(1):176–86.
  • Castaño A, Morán-Fernández L, Alonso J, Bolón-Canedo V, Alonso-Betanzos A, del Coz J. A theoretical analysis of quantification methods based on matching distributions. University of Oviedo; 2021, https://github.com/bertocast/adjust_dist_xy.
  • Barranquero J, González P, Díez J, Del Coz JJ. On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognit 2013;46(2):472–82.
  • Castaño A, González P, González JA, Del Coz JJ. Matching distributions algorithms based on the earth mover’s distance for ordinal quantification. IEEE Trans Neural Netw Learn Syst 2022.
  • Da San Martino G, Gao W, Sebastiani F. Ordinal text quantification. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. 2016, p. 937–40.
  • Frank E, Hall M. A simple approach to ordinal classification. In: Machine learning: ECML 2001: 12th European conference on machine learning Freiburg, Germany, September 5–7, 2001 proceedings. Vol. 12. Springer; 2001, p. 145–56.
  • Pérez-Gállego P, Quevedo JR, del Coz JJ. Using ensembles for problems with characterizable changes in data distribution: A case study on quantification. Inf Fusion 2017;34:87–100.
  • Pérez-Gállego P, Castano A, Quevedo JR, del Coz JJ. Dynamic ensemble selection for quantification tasks. Inf Fusion 2019;45:1–15.
  • Sebastiani F. Evaluation measures for quantification: An axiomatic approach. Inf Retrieval J 2020;23(3):255–88.
  • González P, Castaño A, Peacock EE, Díez J, Del Coz JJ, Sosik HM. Automatic plankton quantification using deep features. J Plankton Res 2019;41(4):449–63.
  • Esuli A, Moreo Fernández A, Sebastiani F. A recurrent neural network for sentiment quantification. In: Proceedings of the 27th ACM international conference on information and knowledge management. 2018, p. 1775–8.
  • Baccianella S, Esuli A, Sebastiani F. Variable-constraint classification and quantification of radiology reports under the ACR index. Expert Syst Appl 2013;40(9):3441–9.