Assessing the quality of testsrevision of the EFPA review model

  1. Evers, Arne
  2. Muñiz Fernández, José
  3. Hagemeister, Markus
  4. Hagemeister, Carmen
  5. Hstmaelingen, Andreas
  6. Lindley, Patricia
  7. Sjöberg, Anders
  8. Bartram, Dave
Revista:
Psicothema

ISSN: 0214-9915

Año de publicación: 2013

Volumen: 25

Número: 3

Páginas: 283-291

Tipo: Artículo

Otras publicaciones en: Psicothema

Resumen

Antecedentes: el objetivo de este trabajo es presentar una revisión del modelo de la Federación Europea de Asociaciones de Psicólogos (EFPA) para la evaluación de los tests. El modelo trata de poner a disposición de los usuarios información contrastada sobre las características teóricas, prácticas y psicométricas de los tests, facilitando con ello un mejor uso de las pruebas. Método: para llevar a cabo la revisión del modelo de evaluación de los tests se formó una comisión de trabajo en el seno de la EFPA formada por seis expertos de diferentes países que trabajaron en la actualización del modelo europeo previo, adaptándolo a los nuevos desarrollos en el ámbito de la evaluación psicológica y educativa. Resultados: la versión actualizada del modelo de la EFPA permite evaluar los tests de forma integral. En una primera parte se describe la prueba de forma exhaustiva, y en la segunda se lleva a cabo la evaluación cuantitativa y cualitativa de las características psicométricas de la prueba. Conclusiones: se presenta la revisión del modelo europeo para la descripción y evaluación de los tests y se comentan los resultados a la luz de los desarrollos recientes en el ámbito de la evaluación psicológica y educativa.

Referencias bibliográficas

  • Abad, F.J., Olea, J., Ponsoda, V., & García, C. (2011). Medición en Ciencias Sociales y de la Salud [Measurement in social health sciences]. Madrid: Síntesis.
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
  • Bartram, D. (1996). Test qualifications and test use in the UK: The competence approach. European Journal of Psychological Assessment, 12, 62-71.
  • Bartram, D. (2002). Review model for the description and evaluation of psychological tests. Brussels: European Federation of Psychologists' Associations (EFPA).
  • Bartram, D. (2011). Contributions of the EFPA Standing Committee on Tests and Testing (SCTT) to standards and good practice. European Psychologist, 16, 149-159.
  • Bartram, D., & Coyne, I. (1998). Variations in national patterns of testing and test use: The ITC/EFPPA international survey. European Journal of Psychological Assessment, 14, 249-260.
  • Bartram, D., & Hambleton, R.K. (Eds.) (2006). Computer-based testing and the Internet. Chichester, UK: Wiley and Sons.
  • Bartram, D., Lindley, P. A., & Foster, J. M. (1990). A review of psychometric tests for assessment in vocational training. Sheffield, UK: The Training Agency.
  • Bechger, T., Hemker, B., & Maris, G. (2009). Over het gebruik van continue normering [On the use of continuous norming]. Arnhem, The Netherlands: Cito.
  • Bennett, R.E. (1999). Using new technology to improve assessment. Educational Measurement: Issues and Practice. 18(3), 5-12.
  • Bennett, R.E. (2006). Inexorable and inevitable: The continuing story of technology and assessment. In D. Bartram & R.K. Hambleton (Eds.), Computer-based testing and the Internet (pp. 201-217). Chichester, UK: Wiley and Sons.
  • Borsboom, D., Mellenbergh, G.J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061-1071.
  • Breithaupt, K.J., Mills, C.N., & Melican, G.J. (2006). Facing the opportunities of the future. In D. Bartram & R.K. Hambleton (Eds.), Computer-based testing and the Internet (pp. 219-251). Chichester, UK: Wiley and Sons.
  • Brennan, R.L. (Ed.) (2006). Educational measurement. Westport, CT: ACE/Praeger.
  • Byrne, B.M., Leong, F.T., Hambleton, R.K., Oakland, T., van de Vijver, F.J., & Cheung, F.M. (2009). A critical analysis of cross-cultural research and testing practices: Implications for improved education and training in psychology. Training and Education in Professional Psychology, 3(2), 94-105.
  • Calvo, N., Gutiérrez, F., Andión, O., Caseras, X., Torrubia, R., & Casas, M. (2012). Psychometric properties of the Spanish version of the selfreport personality diagnostic questionnaire-4+ (PDQ-4+) in psychiatric outpatients. Psicothema, 24, 156-160.
  • De Ayala, R.J. (2009). The theory and practice of item response theory. New York: Guilford Press.
  • Downing, S.M., & Haladyna, T.M. (Eds.) (2006). Handbook of test development. Hillsdale, NJ: Erlbaum.
  • Drasgow, F., Luecht, R.M., & Bennett, R.E. (2006). Technology and testing. In R.L. Brennan (Ed.), Educational measurement (pp. 471-515). Westport, CT: ACE/Praeger.
  • Drenth, P.J.D., & Sijtsma, K. (2006). Testtheorie. Inleiding in de theorie van de psychologische test en zijn toepassingen (4e herziene druk) [Test theory. Introduction to the theory and application of psychological tests (4th revised ed.)]. Houten, The Netherlands: Bohn Stafl eu van Loghum.
  • Elosua, P., Hambleton, R., & Muñiz, J. (2013). Teoría de la respuesta al ítem aplicada con R [Item response theory applied with R]. Madrid: La Muralla.
  • European Federation of Professional Psychologists' Associations (2005). Meta-code of ethics. Brussels: Author (www.efpa.eu).
  • Evers, A. (2001a). Improving test quality in the Netherlands: Results of 18 years of test ratings. International Journal of Testing, 1, 137-153.
  • Evers, A. (2001b). The revised Dutch rating system for test quality. International Journal of Testing, 1, 155-182.
  • Evers, A. (2012). The Internationalization of Test Reviewing: Trends, differences and results. International Journal of Testing, 12, 136-156.
  • Evers, A., Lucassen, W., Meijer, R., & Sijtsma, K. (2010). COTAN Beoordelingssysteem voor de Kwaliteit van Tests (geheel herziene versie; gewijzigde herdruk) [COTAN Rating system for test quality (completely revised edition; revised reprint)]. Amsterdam: NIP.
  • Evers, A., Muñiz, J., Bartram, D., Boben, D., Egeland, J., Fernández-Hermida, J.R., et al. (2012). Testing practices in the 21stCentury: Developments and European psychologists' opinions. European Psychologist, 17, 300-319.
  • Evers, A., Sijtsma, K., Lucassen, W., & Meijer, R.R. (2010). The Dutch review process for evaluating the quality of psychological tests: History, procedure and results. International Journal of Testing, 10, 295-317.
  • Fernández-Ballesteros, R., De Bruyn, E., Godoy, A., Hornke, L., Ter Laak, J., Vizcarro, C., et al. (2001). Guidelines for the assessment process (GAP): A proposal for discussion. European Journal of Psychological Assessment, 17, 187-200.
  • Goodman, D.P., & Hambleton, R.K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17, 145-220.
  • Hambleton, R.K. (2004). Theory, methods, and practices in testing for the 21stcentury. Psicothema, 16, 696-701.
  • Hambleton, R.K. (2006, March). Testing practices in the 21stcentury. Key Note Address, University of Oviedo, Spain.
  • Hambleton, R.K., Merenda, P.F., & Spielberger, C.D. (Eds.) (2005). Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Erlbaum.
  • Hambleton, R.K., Swaminathan, H., & Rogers, J. (1991). Fundamentals of item response theory. Beverly Hills, CA: Sage.
  • Irvine, S., & Kyllonen, P. (Eds.) (2002). Item generation for test development. Mahwah, NJ: Erlbaum.
  • ISO (2011). Procedures and methods to assess people in work and organizational settings (part 1 and 2). Geneva: Author.
  • Joint Committee on Testing Practices (2002). Ethical principles of psychologists and code of conduct. Washington, DC: Author.
  • Koocher, G., & Kith-Spiegel, P. (2007). Ethics in psychology. New York: Oxford University Press.
  • Leach, M., & Oakland, T. (2007). Ethics standards impacting test development and use: A review of 31 ethics codes impacting practices in 35 countries. International Journal of Testing, 7, 71-88.
  • Leeson, H.V. (2006). The mode effect: A literature review of human and technological issues in computerized testing. International Journal of Testing, 6, 1-24.
  • Lindley, P.A. (2009). Reviewing translated and adapted tests. Leicester, UK: British Psychological Society.
  • Lindley, P.A., Bartram, D., & Kennedy, N. (2008). EFPA Review Model for the description and evaluation of psychological tests: Test review form and notes for reviewers: Version 3.42. Brussels: EFPA Standing Committee on Tests and Testing (September, 2008).
  • Lindley, P.A. (Senior Editor), Cooper, J., Robertson, I., Smith, M., & Waters, S. (Consulting Editors) (2001). Review of personality assessment instruments (Level B) for use in occupational settings. 2ndEdition. Leicester, UK: BPS Books.
  • Lindsay, G., Koene, C., Ovreeide. H., & Lang, F. (Eds.) (2008). Ethics for European psychologists. Göttingen, Germany, and Cambridge, MA: Hogrefe.
  • Mills, C.N., Potenza, M.T., Fremer, J.J., & Ward, W.C. (Eds.) (2002). Computer-based testing: Building the foundation for future assessments. Hillsdale, NJ: Erlbaum.
  • Muñiz, J. (1997). Introducción a la teoría de respuesta a los ítems [Introduction to item response theory]. Madrid: Pirámide.
  • Muñiz, J. (2012). Perspectivas actuales y retos futuros de la evaluación psicológica [Current perspectives and future challenges of psychological evaluation]. In C. Zúñiga (Ed.), Psicología, sociedad y equidad [Psychology, society and equity]. Santiago de Chile: Universidad de Chile.
  • Muñiz, J., & Bartram, D. (2007). Improving international tests and testing. European Psychologist, 12, 206-219.
  • Muñiz, J., Bartram, D., Evers, A., Boben, D., Matesic, K., Glabeke, K., et al. (2001). Testing practices in European countries. European Journal of Psychological Assessment, 17, 201-211.
  • Muñiz, J., Elosua, P., & Hambleton, R.K. (2013). Directrices para la traducción y adaptación de los tests: segunda edición [International Test Commission Guidelines for test translation and adaptation: Second edition]. Psicothema, 25, 151-157.
  • Muñiz, J., & Fernández-Hermida, J.R. (2000). La utilización de los tests en España [Test use in Spain]. Papeles del Psicólogo, 76, 41-49.
  • Muñiz, J., Prieto, G., Almeida, L., & Bartram, D. (1999). Test use in Spain, Portugal and Latin American countries. European Journal of Psychological Assessment, 15, 151-157.
  • Nogueira, R., Godoy, A., Romero, P., Gavino, A., & Cobos, M.P. (2012). Propiedades psicométricas de la versión española del Obssesive Belief Questionnaire-Children Version (OBQ-CV) en una muestra no clínica [Psychometric properties of the Spanish version of the Obsessive Belief Questionnaire-Children's Version in a non-clinical sample]. Psicothema, 24, 674-679.
  • Olea, J., Abad, F., & Barrada, J.R. (2010). Tests informatizados y otros nuevos tipos de tests [Computerized tests and other new types of tests]. Papeles del Psicólogo, 31, 94-107.
  • Ortiz, S., Navarro, C., García, E., Ramis, C., & Manassero, M.A. (2012). Validación de la versión española de la escala de trabajo emocional de Frankfurt [Validation of the Spanish version of the Frankfurt Emotion Work Scales]. Psicothema, 24, 337-342.
  • Parshall, C.G., Spray, J.A., Davey, T., & Kalohn, J. (2001). Practical considerations in computer-based testing. New York: Springer Verlag.
  • Phelps, R. (Ed.) (2005). Defending standardized testing. London: Erlbaum.
  • Phelps, R. (Ed.) (2008). Correcting fallacies about educational and psychological testing. Washington: American Psychological Association.
  • Prieto, G., & Muñiz, J. (2000). Un modelo para evaluar la calidad de los tests utilizados en España [A model for the evaluation of test quality in Spain]. Papeles del Psicólogo, 77, 65-71.
  • Shermis, M.D., & Burstein, J.C. (Eds.) (2003). Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: Erlbaum.
  • Sireci, S., & Zenisky, A.L. (2006). Innovative items format in computerbased testing: In pursuit of construct representation. In S.M. Downing & T.M. Haladyna (Eds.), Handbook of test development (pp. 329-348). Hillsdale, NJ: Erlbaum.
  • Van der Linden, W.J., & Glas, C.A.W. (Eds.) (2010). Elements of adaptive testing. London: Springer.
  • Van der Linden, W.J., & Hambleton, R.K. (1997). Handbook of modern item response theory. New York: Springer-Verlag.
  • Williamson, D.M., Xi, X., & Breyer, J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2-13.
  • Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Erlbaum.
  • Zachary, R.A., & Gorsuch, R.L. (1985). Continuous norming: Implications for the WAIS-R. Journal of Clinical Psychology, 41, 86-94.
  • Zenisky, A.L., & Sireci, S.G. (2002). Technological innovations in largescale assessment. Applied Measurement in Education, 15, 337-362.