SUBTLEX-ESPSpanish word frequencies based on film subtitles

  1. Cuetos Vega, Fernando 1
  2. González Nosti, María 1
  3. Barbón Gutiérrez, Analía
  4. Brysbaert, Marc
  1. 1 Universidad de Oviedo
    info

    Universidad de Oviedo

    Oviedo, España

    ROR https://ror.org/006gksa02

Revista:
Psicológica: Revista de metodología y psicología experimental

ISSN: 1576-8597

Año de publicación: 2011

Volumen: 32

Número: 2

Páginas: 133-143

Tipo: Artículo

Otras publicaciones en: Psicológica: Revista de metodología y psicología experimental

Resumen

Recent studies have shown that word frequency estimates obtained from films and television subtitles are better to predict performance in word recognition experiments than the traditional word frequency estimates based on books and newspapers. In this study, we present a subtitle-based word frequency list for Spanish, one of the most widely spoken languages. The subtitle frequencies are based on a corpus of 41M words taken from contemporary movies and TV series (screened between 1990 and 2009). In addition, the frequencies have been validated by correlating them with the RTs from two megastudies involving 2,764 words each (lexical decision and word naming tasks). The subtitle frequencies explained 6% more of the variance than the existing written frequencies in lexical decision, and 2% extra in word naming.

Referencias bibliográficas

  • Alameda, J.R. & Cuetos, F. (1995). Diccionario de frecuencias de las unidades lingüísticas del castellano. Oviedo, Servicio de Publicaciones Universidad de Oviedo.
  • Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1995). The CELEX lexical database, Release 2 (CD-ROM). Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
  • Balota, D.A., Cortese, M.J., Sergent-Marshal, S.D., Spieler, D.H. & Yap, M.J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283-316.
  • Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977-990 (see also http://expsy.ugent.be/subtlexus).
  • Burgess, C. & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kucera and Francis. Behavioral Research Methods, Instruments & Computers 30, 272-277.
  • Cai, Q. & Brysbaert, M. (2010). SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles. Plos One.
  • Cortese, M.J. & Khanna, M.M. (2007). Age of acquisition predicts naming and lexical decision performance above and beyond 22 other predictor variables: an analysis of 2,342 words. Quarterly Journal of Experimental Psychology, 60, 1072-1082.
  • Davies, R., Barbón, A. & Cuetos, F. (submitted) Reading in transparent orthographies relies flexibly on lexical and sub-lexical knowledge: A mega-study of list composition effects in Spanish.
  • Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Meot, A., Augustinova, M., & Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42, 488-496.
  • Foster, K.I. & Foster, J.C. (2003). DMDX: A window display program with millisecond accuracy. Behavior Research Methods, Instruments & Computers, 35, 116-124.
  • Gonzalez-Nosti, M., Rodríguez-Ferreiro, J., Barbón, A. & Cuetos, F. (submitted). Lexical decision in Spanish: Data from a mega-study.
  • Keuleers, E., Brysbaert, M. & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on films subtitles. Behavior Research Methods, 42, 643-650.
  • Kucera, H. & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
  • New, B., Brysbaert, M., Veronis, J., & Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28, 661-677.
  • New, B., Ferrand, L., Pallier, C., & Brysbaert, M. (2006). Re-examining word length effects in visual word recognition: New evidence from the English Lexicon Project. Psychonomic Bulletin and Review, 13, 45-52.
  • Sebastian, N., Martí, M.A., Carreiras, M. & Cuetos, F. (2000). LEXESP: Léxico informatizado del español. Barcelona, University of Barcelona Press.
  • Thorndike, E.L. & Lorge, I. (1944). The teacher's Word book of 30.000 words. Teachers College, Columbia University.
  • Zevin, J.D. & Seidenberg, M.S. (2002). Age of acquisition effects in reading and other tasks. Journal of Memory and Language, 47, 1-29.