PLOS Collections1 is a specific section of the Public Library of Science (PLOS) initiative which hosts collections of articles which have been specially selected by editors working at PLOS, one of the leaders in the Open Access (OA) movement in academic publishing.
Seven subject disciplines are represented in PLOS Collections and these includebiology, medicine, genetics and tropical diseases. However, we would particularly like to highlight at this time two important sub-collections for those that are interested in general in Open Access and in the possibilities that open up from an information science perspective.
A previous post listed a number of salient documents relating to the topic Open Access Collection2. In this post we will highlight two other specialized areas –Text Mining3 and Altmetrics4.
These two specialist areas, bolstered as they are by free access to full text, are those which will bring about remarkable improvements in information retrieval systems in the very near future –by the implementation of the semantic web –and will provide ways of measuring the importance and relevance of academic outputas alternatives to the famous – and much criticized–Impact Factor.
The Text Mining Collection
This PLOS collection is of great interest to specialists in Information Science, especially those whose field of research is advanced retrieval interfaces within the concept of the Semantic Web.
Text Mining is not actually a new field of research, since the theoretical concepts have been under investigation since computing itself came into existence. But it is the fact that there is a substantial volume of full text documents currently available in OA, structured according to open standards, which has allowed this discipline to move from the theoretical stage of the computer laboratory to the practical stage with the development of concrete and efficient products.
Wikipedia5 provides a simple and straightforward definition of Text Mining as the process of deriving high quality information from texts. This high-quality information is obtained by the statistical study of text patterns, from which it is possible to extract concepts, novelty and level of interest. It is also possible to derive the significant relationships between different bodies of text, and also improve the relevance ranking in information retrieval.
According to its write-up in the PLOS Text Mining Collection3,6, the objective of this field of research is to revolutionize the way of accessing and interpreting data which otherwise might have remained buried in the literature by resolving problems relating to retrieval, extraction and analysis of non-structured information in digital text. For non-specialists in this area, this collection makes available two introductory articles to the topic of “Getting Started in Text Mining7,8”.
The document entitled “Open Access: Taking Full Advantage of the Content6”outlines just how important it is that publishers prepare the original digital texts using structured XML mark-up languages, such as those used by the National Library of Medicine, and adapted to a special DTD (Document Type Definition) with suitable extensions for the requirements of the particular discipline. If this is done, it is then possible to extract semantic meanings and integrate them into the databases of the literature.
Finally, a sample document is made available showing the usefulness of XML text mark-up language as applied to an actual example, “Biomedical Text Mining and Its Applications9”.
It is interesting noticing that a recent SciELO in Perspective post “Why XML?10” states that the XML mark-up language is actually one of the recent technological advances which is currently being implemented by SciELO. To fulfill its text mining objective, SciELO is bringing together XML used for the preparation of the journals and books which make up its collections, and the DTD JATS which is precisely the one used in NLM’s PMC for text mark-up in accordance with NISO standards.
Readers are also referred to the book entitled SciELO: 15 Years of Open Access (An analytic study of Open Access and scholarly communication), and in particular to Chapter 5 –Production of SciELO Collections and Journals.
The Altmetrics Collection
Altmetrics is the study and use of non-traditional measures of academic impact which are based upon activity in the Web environment. As academic activity moves into the online environment, these metrics track these interactions, generating greater data granularity, thereby allowing researchers and policy makers to produce a more detailed picture of the academic impact of research.
The PLOS Altmetrics Collection brings together an emerging body of research in this field to advance the study and use of altmetrics. The objetive of the collection is to cover a wide range of topics including statistical analysis of altmetrics data sources, validation of the metrics and the identification of biases in measurement, and the validation of scientific discovery models/ recommendations based on altmetrics.
To cope with the increasing amount of information, researchers have always used filters to select the most relevant items. Filters traditionally incorporate citation analysis and impact factors, but like manual indexing practiced over the past 60 years, the volume of literature produced today requires new approaches that are in step with the speed of production and diversity of information. As is known, citation analysis can be biased, while citations are slow to accumulate and overlook the increasingly important social impacts of academic works.
The scientometric community is aware of the inadequacy of citation measures and has recently proposed methods for gathering more extensive information on these impacts, and for providing more details on the system of scholarly publication. Thanks to the Web, scientometrics has begun to investigate the use of some filters which could be promising.
channel | example |
social media | Twitter y Facebook |
reference managers | CiteULike, Zotero, and Mendeley |
collaborative encyclopedias | Wikipedia |
blogs | academics and the general public |
Academic social networks | ResearchGate and Academia.edu |
Conference organizing sites | Lanyrd.com |
The PLOS Altmetrics Collection includes articles that evaluate the statistical analysis and metric validation of the databases that compile this category of information. It also includes the theoretical foundations of the use of altmetrics, and comparisons with the traditional methods of scientometrics. The article “What Can Article-Level Metrics Do for You?11” illustrates why these metrics are useful and provides some examples.
Of course, in recent times all of us have seen and read presentations on altmetrics and the supposed benefits that they will give us in the short term; however whenever something new comes up, the question naturally arises if it really works or if it will be a passing fad. Is it true that altmetrics will do everything it says? This question is the basis for the article “Do Altmetrics Work? Twitter and Ten Other Social Web Services12” published recently in the Altmetrics Collection.
This article states that although there is great promotion of the use of altmetrics measures as early indicators of the future impact and usefulness of a publication, there is as yet no systematic evidence that shows enough of a significant correlation to justify their use as real alternative indicators. To date the published literature has consisted of case studies of some areas of research and of a few journals. This article investigated altmetrics indicators for more than 200,000 PubMed articles published in 1,891 journals. Good correlations were found between highly cited articles and significant altmetrics values coming from Twitter, Facebook, and blogs. However, there is a low correlation with Google+, and there is insufficient evidence to conclude that there is any correlation with LinkedIn, Pinterest and Reddit. On the other hand, it was not possible to establish any correlation for articles without altmetrics measures. In general, the best correlations are with Twitter. For the rest, the correlations are generally low and it is not clear if they will be sufficiently prevalent to be useful in practice.
Reflections
OA has opened up new areas of research, and we are just beginning to see the results in products which still have to be evaluated and incorporated as tools into academic activity. PLOS Collections is a Web site worth returning to periodically to see how the future of information is developing.
SciELO in Perspective will continue to publish updates throughout the year on the state of the art in information science.
Notes
¹What are the PLOS Collections – http://www.ploscollections.org/;jsessionid=7E86FB385236F1DD3425171788D264AB
²Open Access Collection – http://www.ploscollections.org/article/browse/issue/info%3Adoi%2F10.1371%2Fissue.pcol.v01.i10
³Text Mining Collection – http://www.ploscollections.org/article/browse/issue/info%3Adoi%2F10.1371%2Fissue.pcol.v01.i14
⁴Altmetrics Collection – http://www.ploscollections.org/article/browse/issue/info%3Adoi%2F10.1371%2Fissue.pcol.v02.i19
⁵Wikipedia – text mining – http://en.wikipedia.org/wiki/Text_mining
6 Open Access: Taking Full Advantage of the Content – http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000037
7 Getting Started in Text Mining – http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0040020
8 Getting Started in Text Mining: Part Two – http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000411
9 Biomedical Text Mining and Its Applications – http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000597
10 Why XML? SciELO in Perspective. [viewed 15 May 2014]. Available from: http://blog.scielo.org/en/2014/04/04/why-xml
11 What Can Article-Level Metrics Do for You? – http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001687
12 Do Altmetrics Work? Twitter and Ten Other Social Web Services – http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0064841
References
What are the Public Library of Science Collections ? – Part I. SciELO in Perspective. [viewed 24 June 2014]. Available from: http://blog.scielo.org/en/2014/06/03/what-are-the-public-library-of-science-collections-part-i/
PACKER, AL. et al, orgs.SciELO: 15 Years of Open Access (An analytic study of Open Access and scholarly communication). Paris: UNESCO, 2014.
Tenth Anniversary PLOS Biology Collection. PLOS Collection. Available from: http://www.ploscollections.org/article/browse/issue/info%3Adoi%2F10.1371%2Fissue.pcol.v06.i03
External link
PLOS – http://www.plos.org/
About Ernesto Spinak
Collaborator on the SciELO program, a Systems Engineer with a Bachelor’s degree in Library Science, and a Diploma of Advanced Studies from the Universitat Oberta de Catalunya (Barcelona, Spain) and a Master’s in “Sociedad de la Información” (Information Society) from the same university. Currently has a consulting company that provides services in information projects to 14 government institutions and universities in Uruguay.
Translated from the original in Spanish by Nicholas Cop Consulting.
Como citar este post [ISO 690/2010]:
Recent Comments