QuantificationLib: A Python library for quantification and prevalence estimation
- Castaño, Alberto 1
- Alonso, Jaime 1
- González, Pablo 1
- Pérez, Pablo 1
- del Coz, Juan José 1
-
1
Universidad de Oviedo
info
ISSN: 2352-7110
Año de publicación: 2024
Volumen: 26
Páginas: 101728
Tipo: Artículo
Otras publicaciones en: SoftwareX
Resumen
QuantificationLib is an open-source Python library that provides a comprehensive set of algorithms for quantification learning. Quantification, also known as prevalence estimation, is a supervised machine-learning task where the objective is to train a model that is able to predict the distribution of classes in a set of unseen examples or bags. This library offers a wide variety of quantification methods suited for easy prototyping and experimentation, applicable to a wide range of quantification applications.
Información de financiación
This work was supported by grant PID2019-110742RB-I00 from Spanish Ministerio de Economía y Competitividad (MINECO) and grant PID2019-109238GB-C21 from Spanish Ministry of Science and Innovation .Financiadores
-
Ministerio de Economía y Competitividad
Spain
- PID2019-110742RB-I00
-
Ministry of Science and Innovation
Spain
- PID2019-109238GB-C21
Referencias bibliográficas
- González P, Díez J, Chawla N, del Coz JJ. Why is quantification an interesting learning problem? Progr Artif Intell 2017;6:53–8.
- González P, Castaño A, Chawla NV, Coz JJD. A review on quantification learning. ACM Comput Surv 2017;50(5):1–40.
- Esuli A, Fabris A, Moreo A, Sebastiani F. Learning to quantify. In: The information retrieval series, vol. 47, Springer; 2023, http://dx.doi.org/10.1007/978-3-031-20467-8.
- Moreo A, Esuli A, Sebastiani F. QuaPy: A Python-based framework for quantification. In: Proceedings of the 30th ACM international conference on information & knowledge management. 2021, p. 4534–43.
- Bunse M. qunfold: Composable quantification and unfolding methods in Python. In: Proceedings of the 3rd international workshop on learning to quantify (LQ 2023), co-located at ECML-pKDD. 2023, p. 1–7.
- Schumacher T, Strohmaier M, Lemmerich F. A comparative evaluation of quantification methods. 2021, arXiv preprint arXiv:2103.03223.
- Firat A. Unified framework for quantification. 2016, arXiv preprint arXiv:1606.00868
- Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011;12:2825–30.
- Friedman J. Class counts in future unlabeled samples (detecting and dealing with concept drift). 2014, Presentation at MIT CSAIL Big Data Event.
- Maletzke A, dos Reis D, Cherman E, Batista G. Dys: a framework for mixture models in quantification. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 33, 2019, p. 4552–60.
- Castaño A, Alonso J, González P, del Coz JJ. An equivalence analysis of binary quantification methods. In: Proceedings of the AAAI Conference on Artificial Intelligence. 37, (6):2023, p. 6944–52.
- Forman G. Quantifying counts and costs via classification. Data Min Knowl Discov 2008;17:164–206.
- Lipton Z, Wang Y-X, Smola A. Detecting and correcting for label shift with black box predictors. In: International conference on machine learning. PMLR; 2018, p. 3122–30.
- Bella A, Ferri C, Hernández-Orallo J, Ramirez-Quintana MJ. Quantification via probability estimators. In: 2010 IEEE international conference on data mining. IEEE; 2010, p. 737–42.
- González-Castro V, Alaiz-Rodríguez R, Alegre E. Class distribution estimation based on the Hellinger distance. Inform Sci 2013;218:146–64.
- Forman G. Counting positives accurately despite inaccurate classification. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L, editors. Machine learning: ECML 2005. Berlin, Heidelberg: Springer; 2005, p. 564–75.
- Saerens M, Latinne P, Decaestecker C. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput 2002;14(1):21–41.
- Alexandari A, Kundaje A, Shrikumar A. Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation. In: International conference on machine learning. PMLR; 2020, p. 222–32.
- Kawakubo H, Du Plessis MC, Sugiyama M. Computationally efficient class-prior estimation under class balance change using energy distance. IEICE Trans Inf Syst 2016;99(1):176–86.
- Castaño A, Morán-Fernández L, Alonso J, Bolón-Canedo V, Alonso-Betanzos A, del Coz J. A theoretical analysis of quantification methods based on matching distributions. University of Oviedo; 2021, https://github.com/bertocast/adjust_dist_xy.
- Barranquero J, González P, Díez J, Del Coz JJ. On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognit 2013;46(2):472–82.
- Castaño A, González P, González JA, Del Coz JJ. Matching distributions algorithms based on the earth mover’s distance for ordinal quantification. IEEE Trans Neural Netw Learn Syst 2022.
- Da San Martino G, Gao W, Sebastiani F. Ordinal text quantification. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. 2016, p. 937–40.
- Frank E, Hall M. A simple approach to ordinal classification. In: Machine learning: ECML 2001: 12th European conference on machine learning Freiburg, Germany, September 5–7, 2001 proceedings. Vol. 12. Springer; 2001, p. 145–56.
- Pérez-Gállego P, Quevedo JR, del Coz JJ. Using ensembles for problems with characterizable changes in data distribution: A case study on quantification. Inf Fusion 2017;34:87–100.
- Pérez-Gállego P, Castano A, Quevedo JR, del Coz JJ. Dynamic ensemble selection for quantification tasks. Inf Fusion 2019;45:1–15.
- Sebastiani F. Evaluation measures for quantification: An axiomatic approach. Inf Retrieval J 2020;23(3):255–88.
- González P, Castaño A, Peacock EE, Díez J, Del Coz JJ, Sosik HM. Automatic plankton quantification using deep features. J Plankton Res 2019;41(4):449–63.
- Esuli A, Moreo Fernández A, Sebastiani F. A recurrent neural network for sentiment quantification. In: Proceedings of the 27th ACM international conference on information and knowledge management. 2018, p. 1775–8.
- Baccianella S, Esuli A, Sebastiani F. Variable-constraint classification and quantification of radiology reports under the ACR index. Expert Syst Appl 2013;40(9):3441–9.