Regular articleGoogle Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories
Introduction
The launch of Google Scholar (GS) in November of 2004 brought the simplicity of Google searches to the academic environment, and revolutionized the way researchers and the public searched, found, and accessed academic information. Until that point, the coverage of academic databases depended on lists of selected sources (usually scientific journals). In contrast, and using automated methods, Google Scholar crawled the web and indexed any document with a seemingly academic structure. This inclusive approach gave GS potentially more comprehensive coverage of the scientific and scholarly literature compared to the two major existing multidisciplinary databases with selective journal-based inclusion policies, the Web of Science (WoS) and Scopus (Orduna-Malea, Ayllón, Martín-Martín, & Delgado López-Cózar, 2015).
Although citation data in Google Scholar was originally intended to be a means of identifying the most relevant documents for a given query, it could also be used for formal or informal research evaluations. The availability of free citation data in Google Scholar, together with the free software Publish or Perish (Harzing, 2007) to gather it made citation analysis possible without a citation database subscription (Harzing & van der Wal, 2008). Nevertheless, GS has not enabled bulk access to its data, reportedly because their agreements with publishers preclude it (Van Noorden, 2014). Thus, third-party web-scraping software is currently the only practical way to extract more data from GS than permitted by Publish or Perish.
Despite its known errors and limitations, which are consequence of its automated approach to document indexing (Delgado López-Cózar, Robinson-García, & Torres-Salinas, 2014; Jacsó, 2010), GS has been shown to be reliable and to have good coverage of disciplines and languages, especially in the Humanities and Social Sciences, where WoS and Scopus are known to be weak (Chavarro, Ràfols, & Tang, 2018; Mongeon & Paul-Hus, 2016; van Leeuwen, Moed, Tijssen, Visser, & Van Raan, 2001). Analyses of the coverage of GS, WoS, and Scopus across disciplines have compared the numbers of publications indexed or their average citation counts for samples of documents, authors, or journals, finding that GS consistently returned higher numbers of publications and citations (Harzing & Alakangas, 2016; Harzing, 2013; Mingers & Lipitakis, 2010; Prins, Costas, van Leeuwen, & Wouters, 2016). Citation counts from a range of different sources have been shown to correlate positively with GS citation counts at various levels of aggregation (Amara & Landry, 2012; De Groote & Raszewski, 2012; Delgado López-Cózar, Orduna-Malea, & Martín-Martín, 2018; Kousha & Thelwall, 2007; Martín-Martín, Orduna-Malea, & Delgado López-Cózar, 2018; Meho & Yang, 2007; Minasny, Hartemink, McBratney, & Jang, 2013; Moed, Bar-Ilan, & Halevi, 2016; Pauly & Stergiou, 2005; Rahimi & Chandrakumar, 2014; Wildgaard, 2015). See the supplementary materials1, Delgado López-Cózar et al. (2018); Orduña-Malea, Martín-Martín, Ayllón, and Delgado López-Cózar (2016), and Halevi, Moed, and Bar-Ilan (2017) for discussions of the wider strengths and weaknesses of GS.
A key issue is the ability of GS, WoS, and Scopus to find citations to documents, and the extent to which they index citations that the others cannot find. The results of prior studies are confusing, however, because they have examined different small (with one exception) sets of articles. A summary of the results found in these previous studies is presented in Table 1. For example, the number of citations that are unique to GS varies between 13% and 67%, with the differences probably being due to the study year or the document types or disciplines covered. The only multidisciplinary study (Moed et al., 2016) checked articles in 12 journals from 6 subject areas, which is still a limited set.
The fields previously compared for citation sources (Table 1) are Library and Information Science (5 out of 10 articles analyse case studies about LIS documents/journals/researchers), Medicine (3 papers, analysing oncology, general medicine, and dentistry), Physics (2 articles: general and condensed matter), Chemistry (2 articles: general and inorganic), Computer Science (2 articles: general, and computational linguistics), Biology (2 articles: general, and virology), Social Work, Political Science, and Chinese Studies (1 article each). From this list it is clear that most academic fields have not been analysed for Google Scholar coverage. The studies used small samples of documents and citations (9 out of 10 papers analysed less than 10,000 citations), probably because of the difficulty of extracting data from GS, caused by the lack of a public API (Else, 2018; Van Noorden, 2014). Moreover, the most recent data in these studies was collected in 2015 (three years before the current study), and the oldest data is from 2005 (13 years ago).
Given the limited nature of all prior studies of citing sources for GS and the need to update all previous research, a comprehensive analysis of citation sources in GS, WoS, and Scopus across all subject areas is needed. This information is important for those deciding whether to use GS citation counts for informal or formal research evaluations. The following research questions drive this investigation.
- •
How much overlap is there between GS, WoS, and Scopus in the citations that they find to academic documents and does this vary by subject?
- •
Do the citing documents that are only found by GS have a different type to non-unique GS citations, and does this vary by subject?
- •
How similar are citation counts in GS to those found in WoS and Scopus, at the level of subjects?
Section snippets
Methods
The sample used for this study is taken from GS’s Classic Papers product (GSCP)2. The 2017 edition of GSCP lists 2515 highly-cited documents written in English and published in 20063. These documents were classified by GS into 252 subject categories within 8 broad subject areas. Background about GSCP can be found in Orduna-Malea, Martín-Martín, and Delgado López-Cózar (2018) and Martín-Martín et
RQ1: citing source overlap
Overall, 46.9% of all citations were found by the three databases (Fig. 3). GS found the most citations, including most of the citations found by WoS and Scopus. In contrast, only 6% of all citations were found by WoS and/or Scopus, and not by GS. An additional 10.2% of all citations were found by both GS and Scopus (7.7%), or GS and WoS (2.5%). Over a third (36.9%) of all citations were only found by GS.
When citations are disaggregated by the broad subject area in which the cited document was
Limitations
This study analyses a large sample of citations to highly-cited documents from all subject areas published in English. In order to generalize the results to all articles, it must be assumed that the population of documents that cite highly cited articles is not significantly different from the general population of documents that cite articles. This may not be fully true since, for example, highly cited articles are presumably more likely to be in emerging research areas and larger specialisms.
Conclusions
This study provides evidence that GS finds significantly more citations than the WoS Core Collection and Scopus across all subject areas. Nearly all citations found by WoS (95%) and Scopus (92%) were also found by GS, which found a substantial amount of unique citations that were not found by the other databases. In the Humanities, Literature & Arts, Social Sciences, and Business, Economics & Management, unique GS citations surpass 50% of all citations in the area.
About half (48%–65%, depending
Author contributions
Alberto Martín-Martín: Conceived and designed the analysis, Collected the data, Contributed data or analysis tools, Performed the analysis, Wrote the paper.
Enrique Orduna-Malea: Conceived and designed the analysis, Wrote the paper.
Mike Thelwall: Conceived and designed the analysis, Wrote the paper.
Emilio Delgado López-Cózar: Conceived and designed the analysis, Wrote the paper.
Acknowledgements
Alberto Martín-Martín is funded for a four-year doctoral fellowship (FPU2013/05863) granted by the Ministerio de Educación, Cultura, y Deportes (Spain). An international mobility grant from Universidad de Granada and CEI BioTic Granada funded a research stay at the University of Wolverhampton.
References (51)
- et al.
Coverage of Google Scholar, Scopus, and Web of Science: A case study of the h-index in nursing
Nursing Outlook
(2012) - et al.
Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—review of the literature
Journal of Informetrics
(2017) Finding Citations to Social Work Literature: The Relative Benefits of Using Web of Science, Scopus, or Google Scholar
The Journal of Academic Librarianship
(2012)- et al.
A new methodology for comparing Google Scholar and Scopus
Journal of Informetrics
(2016) - et al.
Counting citations in the field of business and management: Why use Google Scholar rather than the Web of Science
Scientometrics
(2012) - et al.
Three options for citation tracking: Google Scholar, Scopus and Web of Science
Biomedical Digital Libraries
(2006) Citations to the “Introduction to informetrics” indexed by WOS, Scopus and Google Scholar
Scientometrics
(2010)- et al.
To what extent is inclusion in the Web of Science an indicator of journal ‘quality’?
Research Evaluation
(2018) Web of Science & Google Scholar collaboration
(2015)Emerging Sources Citation Index Backfile
(2017)
A technique for computer detection and correction of spelling errors
Communications of the ACM
A general theory of bibliometric and other cumulative advantage processes
Journal of the American Society for Information Science
The expansion of Google Scholar versus Web of Science: A longitudinal study
Scientometrics
Google scholar as a data source for research assessment
The Google scholar experiment: How to index false papers and manipulate bibliometric indicators
Journal of the Association for Information Science and Technology
data.table: Extension of “data.frame
How I scraped data from Google Scholar
Nature
Scopus source list (April 2018)
A longitudinal study of Google Scholar coverage between 2012 and 2013
Scientometrics
Publish or Perish
Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison
Scientometrics
Google Scholar as a new source for citation analysis
Ethics in Science and Environmental Politics
A citation analysis of Serbian Dental Journal using Web of Science, Scopus and Google Scholar
Stomatoloski Glasnik Srbije
Metadata mega mess in Google Scholar
Online Information Review
Google Scholar citations and Google Web/URL citations: A multi-discipline exploratory analysis
Journal of the American Society for Information Science and Technology
Cited by (1022)
A systematic review and bibliometric analysis on agribusiness gaps in emerging markets
2024, Research in GlobalizationImpact of gender composition of academic teams on disruptive output
2024, Journal of InformetricsFine-scale spatial and temporal trends in Red Sea coral reef research
2024, Regional Studies in Marine ScienceData work and practices in healthcare: A scoping review
2024, International Journal of Medical Informatics