Using reversed items in Likert scalesA questionable practice
- Javier Suárez-Alvarez 1
- Ignacio Pedrosa 2
- Luis M. Lozano 4
- Eduardo García-Cueto 3
- Marcelino Cuesta 3
- José Muñiz 3
- 1 Organization for Economic Cooperation and Development
- 2 CTIC Technologic Center
-
3
Universidad de Oviedo
info
-
4
Universidad de Granada
info
ISSN: 0214-9915
Año de publicación: 2018
Volumen: 30
Número: 2
Páginas: 149-158
Tipo: Artículo
Otras publicaciones en: Psicothema
Resumen
Antecedentes: el uso de ítems formulados positivamente junto con otros inversos es una práctica habitual para tratar de evitar sesgos de respuesta. El objetivo del presente trabajo es analizar las implicaciones psicométricas de utilizar ítems directos e inversos en la misma prueba. Método: se utilizó una muestra de 374 participantes con edades comprendidas entre 18 y 73 años (M=33.98; DT=14.12), con un 62,60% de mujeres. Mediante un diseño de medidas repetidas se evaluó a los participantes en una prueba de autoeficacia con tres condiciones: todos los ítems positivos, todos negativos y un combinado de ambos. Resultados: cuando se utilizan en la misma prueba tanto ítems positivos como negativos su fiabilidad se deteriora, y la unidimensionalidad de la prueba se ve comprometida por fuentes secundarias de varianza. La varianza de las puntuaciones disminuye, y las medias difieren significativamente respecto de las pruebas en las que todos los ítems están formulados positiva o negativamente. Conclusiones: los resultados de este estudio presentan una disyuntiva entre un posible sesgo de aquiescencia cuando los ítems tienen una redacción positiva y una comprensión potencialmente diferente cuando se combinan ítems regulares e invertidos en la misma prueba. La literatura especializada recomienda combinar ítems regulares e invertidos para poder controlar el sesgo del estilo de respuesta, pero estos resultados advierten a los investigadores que los usen también después de tener en cuenta el potencial efecto de las habilidades lingüísticas y de los hallazgos presentados en este estudio
Información de financiación
The views expressed in the paper represent the views of the individual authors and do not represent an official position of the Organisation for Economic Co-operation and Development. This research was funded by the Spanish Association of Methodology of Behavioral Sciences and Health (AEMCCO), member of the European Association of Methodology (EAM), and by the FPI programme from the Ministry of Economy and Competitiveness of the Government of Spain (PSI2014-56114-P, BES2012-053488, and PSI2017-85724-P).Financiadores
-
- PSI2017-85724-P
Referencias bibliográficas
- Abad, F.J., Olea, J., & Ponsoda, V. (2011). Medición en ciencias sociales y de la salud [Measurement in social sciences and health]. Madrid: Síntesis.
- Alessandri, G., Vecchione, M., Fagnani, C., Bentler, P.M., Barbaranelli, C., Medda, E., …, & Caprara, G.V. (2010). Much more than model fitting? Evidence for the heritability of method effect associated with positively worded items of the Life Orientation Test Revised. Structural Equation Modeling, 17, 642-653. doi:10.1080/10705511. 2010.510064
- Baker, F. (2001). The basics of item response theory. University of Maryland: College Park: ERIC Clearinghouse on Assessment and Evaluation.
- Baumgartner, H., & Steenkamp, J.B.E.M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38, 143-156. doi:10.1509/jmkr.38.2.143.18840
- Böckenholt, U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17, 665-678.
- Bolt, D.M., Lu, Y., & Kim, J.S. (2014). Measurement and control of response styles using anchoring vignettes: A model-based approach. Psychological Methods, 19(4), 528-541. doi: 10.1037/met0000016
- Bourque, L.B., & Shen, H. (2005). Psychometric characteristics of Spanish and English versions of the Civilian Mississippi scale. Journal of Traumatic Stress, 18(6), 719-728. doi:10.1002/jts.20080
- Brooks, G.P., & Johanson, G.A. (2003). Test analysis program. Applied Psychological Measurement, 27, 305-306.
- Brown, A. (2015). Item response models for forced-choice questionnaires: A Common framework. Psychometrika, 81(1), 135-160. doi: 10.1007/ s11336-014-9434-9
- Brown, A., & Maydeu-Olivares, A. (2012). How IRT can solve problems of ipsative data in forced-choice questionnaire. Psychological Methods, 18(1), 36-52.
- Byrne, B., & van de Vijver, F. J. R. (2017). The maximum likelihood alignment approach to testing for approximate measurement invariance: A paradigmatic cross-cultural application. Psicothema, 29, 539-551.
- Cai, L. (2013). flexMIRT version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
- Carlson, M., Wilcox, R., Chou, C-P., Chang, M., Yang, F., Blanchard, J., …, & Clark, F. (2011). Psychometric properties of reverse-scored items on the CES-D in a sample of ethnically diverse older adults. Psychological Assessment, 23(2), 558-562. doi:10.1037/a0022484.
- Chiavaroli, N. (2017). Negatively-worded multiple choice questions: An avoidable threat to validity. Practical Assessment, Research and Evaluation, 22(3), 1-14.
- Chiorri, C., Anselmi, P., & Robusto, E. (2009). Reverse items are not opposites of straightforward items. In U. Savardi (Ed.), The Perception and Cognition of Contraries (pp. 295-328). Milano: McGraw-Hill.
- Cronbach, L.J. (1946). Response sets and test validity. Educational and Psychological Measurement, 6, 475-494. doi:10.1177/001316444600600405
- Cronbach, L.J. (1950). Further evidence on response sets and test design. Educational and Psychological Measurement, 10(1), 3-31. doi:10.1177/001316445001000101
- Cumming, G., & Finch, S. (2006). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170180. doi:10.1037/0003-066X.60.2.170
- Davies, M.F. (2003). Confirmatory bias in the evaluation of personality descriptions: Positive test strategies and output interference. Journal of Personality and Social Psychology, 85, 736-744. doi:10.1037/00223514.85.4.736
- De Ayala, R.J. (2009). The theory and practice of item response theory. New York: Guilford Press.
- Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement and Evaluation in Counseling and Development, 43(2), 121-149. doi: 10.1177/0748175610373459
- DiStefano, C., & Motl, R.W. (2009). Personality correlates of method effects due to negatively worded items on the Rosenberg self-esteem scale. Personality and Individual Differences, 46, 309-313. doi:10.1016/j. paid.2008.10.020
- Dunbar, M., Ford, G., Hunt, K., & Der, G. (2000). Question wording effects in the assessment of global self-esteem. European Journal of Psychological Assessment, 16(1), 13-19. doi:10.1027//10155759.16.1.13
- Ebesutani, C., Drescher, C.F., Reise, S.P., Heiden, L., High, T.L., Damon, J.D., & Young, J. (2012). The Loneliness Questionnaire-Short Version: An evaluation of reverseworded and non-reverse-worded items via item response theory. Journal of Personality Assessment, 94(4), 427437. doi:10.1080/00223891.2012.662188.
- Elosua, P., & Zumbo, B.D. (2008). Reliability coefficients for ordinal response scales. Psicothema, 20(4), 896-901.
- Essau, C. A., Guzmán, B.O., Anastassiou-Hadjicharalambous, X., Pauli, G., Gilvarry, C., Bray, D., ..., & Ollendick, T.H. (2012). Psychometric properties of the Strength and Difficulties Questionnaire from five European countries. International Journal of Methods in Psychiatric Research, 21(3), 232-245. doi:10.1002/mpr.1364
- Evers, A., Muñiz, J., Hagemeister, C., Hstmælingen, A., Lindley, P., Sjöberg, A., & Bartram, D. (2013). Assessing the quality of tests: Revision of the EFPA review model. Psicothema, 25(3), 283-291. doi:10.7334/psicothema2013.97
- Feldt, L.S. (1969). A test of the Hypothesis that Cronbach’s alpha or Kuder Richardson reliability coefficient twenty. Psychometrika, 30, 357-370.
- Ferrando, P.J., & Lorenzo-Seva, U. (2010). Acquiescence as a source of bias and model and person misfit: A theoretical and empirical analysis. British Journal of Mathematical and Statistical Psychology, 63, 427-448.
- Ferrando, P. J., & Lorenzo-Seva, U. (2017). Program FACTOR at 10: Origins, development and future directions. Psicothema, 29, 236-240.
- Ferrando, P.J., Lorenzo-Seva, U., & Chico, E. (2003). Unrestricted factor analytic procedures for assessing acquiescent responding in balanced, theoretically unidimensional personality scales. Multivariate Behavioral Research, 38, 353-374.
- Fernández-Alonso, R., Suárez-Álvarez, J., & Muñiz, J. (2012). Imputation methods for missing data in educational diagnostic evaluation. Psicothema, 24(1), 167-175.
- Fonseca-Pedrero, E., & Debbané, M. (2017). Schizotypal traits and psychotic-like experiences during adolescence: An update. Psicothema, 29, 5-17.
- García-Cueto, E., Muñiz, J., & Yela, M. (1984). Estructura factorial de la comprensión verbal [Factorial structure of verbal comprehension]. Investigaciones Psicológicas, 2(2), 59-75.
- Haladyna, T.M., Downing, S.M., & Rodríguez, M.C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-334.
- Haladyna, T.M., & Rodríguez, M.C. (2013). Developing and validating test items. New York, NY: Routledge.
- Horan, P. M., DiStefano, C., & Motl, R.W. (2003). Wording effects in self esteem scales: Methodological artifact or response style? Structural Equation Modeling, 10, 444-455.
- Hughes, D. (2009). The impact of incorrect responses to reverse-coded survey items. Research in the Schools, 16(2), 76-88.
- IBM (2011). IBM SPSS Statistics for Windows, Version 20 [Computer software]. Armonk, NY: IBM Corp.
- Józsa, K., & Morgan, G. A. (2017). Reversed items in Likert scales: Filtering out invalid responders. Journal of Psychological and Educational Research, 25(1), 7-25.
- Kam, C.C.S., & Meyer, J.P. (2015). How careless responding and acquiescence response bias can infl uence construct dimensionality: The case of job satisfaction. Organizational Research Methods, 18(3), 512541. doi:10.1177/1094428115571894
- Khorramdel, L., & von Davier, M. (2014). Measuring response styles across the Big Five: A multiscale extension of an approach using multinomial processing trees. Multivariate Behavioral Research, 49(29), 161-177. d oi:10.1080/00273171.2013.866536.
- Kline, R.B. (2010). Principles and practice of structural equation modeling. New York: Guilford Press.
- Lane, S., Raymond, M. R., & Haladyna, T. M. (2016). Handbook of test development (2nd edition). New York, NY: Routledge.
- Liang, T., Han, K.T., & Hambleton, R.K. (2008). User’s guide for ResidPlots-2: Computer software for IRT graphical residual analyses, Version 2.0 (Center for Educational Assessment Research Report No. 688). Amherst: Center for Educational Assessment, University of Massachusetts.
- Liang,T., Han, K.T., & Hambleton, R.K. (2009). ResidPlots-2: Computer software for IRT graphical residual analyses. Applied Psychological Measurement, 33(5), 411-412.
- Lorenzo-Seva, U., & Ferrando, P.J. (2013). Manual of the program FACTOR v. 9.20. Retrieved from: http:// psico.fcep.urv.es/utilitats/ factor/documentation/Manual-of-the-Factor-Program-v92.pdf
- Marsh, H.W. (1986). Negative item bias in ratings scales for preadolescent children: A cognitive-developmental phenomenon. Developmental Psychology, 22(1), 37-49. doi:10.1037/0012-1649.22.1.37
- Marsh, H.W. (1996). Positive and negative global self-esteem: A substantively meaningful distinction or artifactors? Journal of Personality and Social Psychology, 70, 810-819. doi:10.1037/00223514.70.4.810
- Mestre, J.P. (1988). The role of language comprehension in mathematics and problem solving. In R.R. Cocking & J.P. Mestre (Eds.), Linguistic and cultural influences on learning mathematics (pp. 200-220). Hillsdale, NJ: Lawrence Erlbaum Associates.
- Moreno, R., Martínez, R., & Muñiz, J. (2004). Guidelines for the construction of multiple choice test items. Psicothema, 16(3), 490-497.
- Moreno, R., Martínez, R., & Muñiz, J. (2006). New guidelines for developing multiple-choice items. Methodology, 2(2), 65-72.
- Moreno, R., Martínez, R., & Muñiz, J. (2015). Guidelines based on validity criteria for the development of multiple choice items. Psicothema, 27(4), 388-394. doi:10.7334/psicothema2015.110
- Muñiz, J., Elosua, P., Padilla, J. L., & Hambleton, R. K. (2016). Test adaptation standards for cross-lingual assessment. In C. S. Wells & M. Faulkner-Bond (Eds.), Educational measurement. From foundations to future (pp. 291-304). New York: The Guilford Press.
- Muñiz, J., Sánchez, P., & Yela, M. (1986). Comprensión verbal en monolingües y bilingües [Verbal comprehension on monolingual and bilingual]. Informes de Psicología, 5, 139-153.
- Muñiz, J., Suárez-Álvarez, J., Pedrosa, I., Fonseca-Pedrero, E., & García-Cueto, E. (2014). Enterprising personality profile in youth: Components and assessment. Psicothema, 26(4), 545-553. doi:10.7334/ psicothema2014.182
- Muthén, L.K., & Muthén, B.O. (1998-2012). Mplus User’s Guide. Seventh Edition. Los Angeles, CA: Muthén & Muthén.
- Navarro-González, D., Lorenzo-Seva, U., & Vigil-Colet, A. (2016). How response bias affects the factorial structure of personality selfreports. Psicothema, 28, 465-470.
- Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.
- Paulhus, D.L. (1991). Measurement and control of response bias. In J. P. Robinson, P.R. Shaver, & L.S. Wrightsman (Eds.), Measures of personality and social psychological attitudes (pp. 17-59). San Diego, CA: Academic Press.
- Pedrosa, I., Suárez-Álvarez, J., García-Cueto, E., & Muñiz, J. (2016). A computerized adaptive test for enterprising personality assessment in youth. Psicothema, 28, 471-478.
- Podsakoff, P.M., MacKenzie, S.B., Lee, J.Y., & Podsakoff, N.P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879-903. doi:10.1037/0021-9010.88.5.879
- Prieto, G., & Delgado, A.R. (1996). Construcción de los ítems [Item development]. In J. Muñiz (Ed.), Psicometría (pp. 105-135). Madrid: Universitas.
- Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
- Savalei, V., & Falk, C.F. (2014). Recovering substantive factor loadings in the presence of acquiescence bias: A comparison of three approaches. Multivariate Behavioral Research, 49(5), 407-424. doi:10.1080/00273 171.2014.931800
- Solís-Salazar, M. (2015). The dilemma of combining positive and negative items in scales. Psicothema, 27(2), 192-199. doi:10.7334/ psicothema2014.266
- Suárez-Álvarez, J., Pedrosa, I., García-Cueto, E., & Muñiz, J. (2014). Screening enterprising personality in youth: An empirical model. Spanish Journal of Psychology, 17(E60). doi: 10.1017/sjp.2014.61
- Swain, S. D., Weathers, D., & Niedrich, R.W. (2008). Assessing three sources of misresponse to reversed Likert items. Journal of Marketing Research, 45, 116-131.
- Thurstone, L. (1996). Test de Aptitudes Primarias [Primary Mental Abilities]. Madrid: TEA Ediciones (Orig. 1938).
- Trigo, M. E., & Martínez, R. J. (2016). Generalized ETA square for multiple comparisons on between-groups designs. Psicothema, 28, 340-345.
- van der Linden, W. J., & Hambleton, R. K. (1996). Handbook of Modern Item Response Theory. New York: Springer-Verlag.
- van Sonderen, E., Sanderman, R., & Coyne, J. C. (2013). Ineffectiveness of reverse wording of questionnaire items: Let’s learn from cows in the rain. Plos One, 8(7), e68967. doi:10.1371/journal.pone.0068967
- von Davier, M., Shin, H-J., Khorramdel, L., & Stankov, L. (2017). The effects of vignette scoring on reliability and validity of Self-Reports. Applied Psychological Measurement. Advance online publication. doi: 10.1177/0146621617730389
- Weijters, B., & Baumgartner, H. (2012). Misresponse to reversed and negated Items in surveys: A Review. Journal of Marketing Research, 49, 737-747.
- Weijters, B., Baumgartner, H., & Schillewaert, N. (2013). Reverse item bias: An integrative model. Psychological Methods, 18, 320-334.
- Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The effect of rating scale format on response styles: The number of response categories and response category labels. International Journal of Research in Marketing, 27(3), 236-247. doi:10.1016/j.ijresmar.2010.02.004
- Weijters, B., Geuens, M., & Schillewaert, N. (2009). The proximity effect: The role of inter-item distance on reverse-item bias. International Journal of Research in Marketing, 26(1), 2-12. doi:10.1016/j. ijresmar.2008.09.003
- Weijters, B., Geuens, M., & Schillewaert, N. (2010). The stability of individual response styles. Psychological Methods, 15, 96-110.
- Wilson, M. (2005). Constructing measures: An item response modelling approach. Mahwah, NJ: Erlbaum.
- Woods, C. M. (2006). Careless responding to reverse-worded items: Implications for confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28(3), 189-194. doi:10.1007/s10862-005-9004-7
- Yela, M. (1987). Estudios sobre inteligencia y lenguaje [Studies on intelligence and language]. Madrid: Pirámide.