QuantificationLib: A Python library for quantification and prevalence estimation

Castaño, Alberto; Alonso, Jaime; González, Pablo; Pérez, Pablo; del Coz, Juan José

doi:10.1016/J.SOFTX.2024.101728

QuantificationLib: A Python library for quantification and prevalence estimation

1 Universidad de Oviedo

Universidad de Oviedo

Oviedo, España

ROR https://ror.org/006gksa02

Revista:

SoftwareX

ISSN: 2352-7110

Año de publicación: 2024

Volumen: 26

Páginas: 101728

Tipo: Artículo

DOI: 10.1016/J.SOFTX.2024.101728 SCOPUS: 2-s2.0-85189752939 GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: SoftwareX

Resumen

QuantificationLib is an open-source Python library that provides a comprehensive set of algorithms for quantification learning. Quantification, also known as prevalence estimation, is a supervised machine-learning task where the objective is to train a model that is able to predict the distribution of classes in a set of unseen examples or bags. This library offers a wide variety of quantification methods suited for easy prototyping and experimentation, applicable to a wide range of quantification applications.

€ Ver financiación

Información de financiación

This work was supported by grant PID2019-110742RB-I00 from Spanish Ministerio de Economía y Competitividad (MINECO) and grant PID2019-109238GB-C21 from Spanish Ministry of Science and Innovation .

Financiadores

Ministerio de Economía y Competitividad Spain
- PID2019-110742RB-I00
Ministry of Science and Innovation Spain
- PID2019-109238GB-C21

Referencias bibliográficas

González P, Díez J, Chawla N, del Coz JJ. Why is quantification an interesting learning problem? Progr Artif Intell 2017;6:53–8.
González P, Castaño A, Chawla NV, Coz JJD. A review on quantification learning. ACM Comput Surv 2017;50(5):1–40.
Esuli A, Fabris A, Moreo A, Sebastiani F. Learning to quantify. In: The information retrieval series, vol. 47, Springer; 2023, http://dx.doi.org/10.1007/978-3-031-20467-8.
Moreo A, Esuli A, Sebastiani F. QuaPy: A Python-based framework for quantification. In: Proceedings of the 30th ACM international conference on information & knowledge management. 2021, p. 4534–43.
Bunse M. qunfold: Composable quantification and unfolding methods in Python. In: Proceedings of the 3rd international workshop on learning to quantify (LQ 2023), co-located at ECML-pKDD. 2023, p. 1–7.
Schumacher T, Strohmaier M, Lemmerich F. A comparative evaluation of quantification methods. 2021, arXiv preprint arXiv:2103.03223.
Firat A. Unified framework for quantification. 2016, arXiv preprint arXiv:1606.00868
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011;12:2825–30.
Friedman J. Class counts in future unlabeled samples (detecting and dealing with concept drift). 2014, Presentation at MIT CSAIL Big Data Event.
Maletzke A, dos Reis D, Cherman E, Batista G. Dys: a framework for mixture models in quantification. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 33, 2019, p. 4552–60.
Castaño A, Alonso J, González P, del Coz JJ. An equivalence analysis of binary quantification methods. In: Proceedings of the AAAI Conference on Artificial Intelligence. 37, (6):2023, p. 6944–52.
Forman G. Quantifying counts and costs via classification. Data Min Knowl Discov 2008;17:164–206.
Lipton Z, Wang Y-X, Smola A. Detecting and correcting for label shift with black box predictors. In: International conference on machine learning. PMLR; 2018, p. 3122–30.
Bella A, Ferri C, Hernández-Orallo J, Ramirez-Quintana MJ. Quantification via probability estimators. In: 2010 IEEE international conference on data mining. IEEE; 2010, p. 737–42.
González-Castro V, Alaiz-Rodríguez R, Alegre E. Class distribution estimation based on the Hellinger distance. Inform Sci 2013;218:146–64.
Forman G. Counting positives accurately despite inaccurate classification. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L, editors. Machine learning: ECML 2005. Berlin, Heidelberg: Springer; 2005, p. 564–75.
Saerens M, Latinne P, Decaestecker C. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput 2002;14(1):21–41.
Alexandari A, Kundaje A, Shrikumar A. Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation. In: International conference on machine learning. PMLR; 2020, p. 222–32.
Kawakubo H, Du Plessis MC, Sugiyama M. Computationally efficient class-prior estimation under class balance change using energy distance. IEICE Trans Inf Syst 2016;99(1):176–86.
Castaño A, Morán-Fernández L, Alonso J, Bolón-Canedo V, Alonso-Betanzos A, del Coz J. A theoretical analysis of quantification methods based on matching distributions. University of Oviedo; 2021, https://github.com/bertocast/adjust_dist_xy.
Barranquero J, González P, Díez J, Del Coz JJ. On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognit 2013;46(2):472–82.
Castaño A, González P, González JA, Del Coz JJ. Matching distributions algorithms based on the earth mover’s distance for ordinal quantification. IEEE Trans Neural Netw Learn Syst 2022.
Da San Martino G, Gao W, Sebastiani F. Ordinal text quantification. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. 2016, p. 937–40.
Frank E, Hall M. A simple approach to ordinal classification. In: Machine learning: ECML 2001: 12th European conference on machine learning Freiburg, Germany, September 5–7, 2001 proceedings. Vol. 12. Springer; 2001, p. 145–56.
Pérez-Gállego P, Quevedo JR, del Coz JJ. Using ensembles for problems with characterizable changes in data distribution: A case study on quantification. Inf Fusion 2017;34:87–100.
Pérez-Gállego P, Castano A, Quevedo JR, del Coz JJ. Dynamic ensemble selection for quantification tasks. Inf Fusion 2019;45:1–15.
Sebastiani F. Evaluation measures for quantification: An axiomatic approach. Inf Retrieval J 2020;23(3):255–88.
González P, Castaño A, Peacock EE, Díez J, Del Coz JJ, Sosik HM. Automatic plankton quantification using deep features. J Plankton Res 2019;41(4):449–63.
Esuli A, Moreo Fernández A, Sebastiani F. A recurrent neural network for sentiment quantification. In: Proceedings of the 27th ACM international conference on information and knowledge management. 2018, p. 1775–8.
Baccianella S, Esuli A, Sebastiani F. Variable-constraint classification and quantification of radiology reports under the ACR index. Expert Syst Appl 2013;40(9):3441–9.

QuantificationLib: A Python library for quantification and prevalence estimation

Universidad de Oviedo

Resumen

Información de financiación

Financiadores

Referencias bibliográficas