Assessing the usefulness of Google Books' word frequencies for psycholinguistic research on word processing

被引:34
作者
Brysbaert, Marc [1 ]
Keuleers, Emmanuel [1 ]
New, Boris [2 ,3 ]
机构
[1] Univ Ghent, Dept Expt Psychol, B-9000 Ghent, Belgium
[2] CNRS, Lab Psychol Expt, Paris, France
[3] Univ Paris 05, Paris, France
关键词
word frequency; lexical decision; Google Books ngrams; SUBTLEX;
D O I
10.3389/fpsyg.2011.00027
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies.
引用
收藏
页数:8
相关论文
共 21 条
[1]   Contextual diversity, not word frequency, determines word-naming and lexical decision times [J].
Adelman, James S. ;
Brown, Gordon D. A. ;
Quesada, Jose F. .
PSYCHOLOGICAL SCIENCE, 2006, 17 (09) :814-823
[2]   Morphological influences on the recognition of monosyllabic monomorphemic words [J].
Baayen, R. H. ;
Feldman, L. B. ;
Schreuder, R. .
JOURNAL OF MEMORY AND LANGUAGE, 2006, 55 (02) :290-313
[3]   Visual word recognition of single-syllable words [J].
Balota, DA ;
Cortese, MJ ;
Sergent-Marshall, SD ;
Spieler, DH ;
Yap, MJ .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2004, 133 (02) :283-316
[4]   The English Lexicon Project [J].
Balota, David A. ;
Yap, Melvin J. ;
Cortese, Michael J. ;
Hutchison, Keith A. ;
Kessler, Brett ;
Loftis, Bjorn ;
Neely, James H. ;
Nelson, Douglas L. ;
Simpson, Greg B. ;
Treiman, Rebecca .
BEHAVIOR RESEARCH METHODS, 2007, 39 (03) :445-459
[5]   Do the effects of subjective frequency and age of acquisition survive better word frequency norms? [J].
Brysbaert, Marc ;
Cortese, Michael J. .
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2011, 64 (03) :545-559
[6]   Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English [J].
Brysbaert, Marc ;
New, Boris .
BEHAVIOR RESEARCH METHODS, 2009, 41 (04) :977-990
[7]   SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles [J].
Cai, Qing ;
Brysbaert, Marc .
PLOS ONE, 2010, 5 (06)
[8]  
Cuetos F, 2011, PSICOLOGICA, V32, P133
[9]   Improving accuracy in detecting acoustic onsets [J].
Duyck, Wouter ;
Anseel, Frederik ;
Szmalec, Arnaud ;
Mestdagh, Pascal ;
Tavernier, Antoine ;
Hartsuiker, Robert J. .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2008, 34 (05) :1317-1326
[10]   The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords [J].
Ferrand, Ludovic ;
New, Boris ;
Brysbaert, Marc ;
Keuleers, Emmanuel ;
Bonin, Patrick ;
Meot, Alain ;
Augustinova, Maria ;
Pallier, Christophe .
BEHAVIOR RESEARCH METHODS, 2010, 42 (02) :488-496