Dealing with information overload

By Jan Velterop

Are we overwhelmed by the amount of scientific information that is being published? PubMed adds an average of more than two abstracts a minute to its database, and that is just in the life and medical sciences. If that doesn’t amount to information overload, what does?

In 2010, in an article entitled “On the impossibility of being expert”, Alan Fraser and Frank Dunstan¹ made the point that even in a relatively small field, one would have to devote all one’s working hours to reading the relevant literature, from the beginning of one’s career until retirement, in order to be called an expert. Obviously, this is neither possible nor desirable, as nobody would have the time to actually apply the knowledge they acquired. No research scientists would be able to do their own experiments. No medical specialists would be able to treat patients.

What to do? We must find ways to deal with the ever-growing amount of scientific information, and plainly reading it is out of the question. We must capture and ingest the knowledge that is being discovered and created by science in different ways.

Strategies have been proposed to deal with the problem, ranging from simply creating less data and information, to just accepting we can’t read everything, to resorting to developing pattern-recognition to decide what fraction of the literature we can most efficiently spend our reading efforts on in order to keep up with the developing knowledge.

Creating less data and information is a ludicrous idea, although publishing that information more concisely, in fewer papers, may provide some relief. On the other hand, there is great benefit in publishing so-called ‘confirmatory’ papers, that report on the replicability and reproducibility of experiments.

Just accepting that we can’t read everything that’s relevant to our chosen field is a pragmatic approach often taken, but it carries the great danger of what I call ‘lamppost research’. The analogy refers to the joke about a drunk, who is looking under the light of a lamppost for the keys he has lost, and when being asked if he remembers where he lost them, he answers “over there in the dark, but I can’t see anything there, so it’s pointless that I search there.” Relying only on information one has access to or for which one has the time to read, potentially leads to a very similar situation. Whole hypotheses can be built on information missing key elements, simply because of unawareness of their existence.

Frankly, we have no choice other than to develop ways to create overviews of the knowledge that has been published, and then to home in on the areas of most relevance to our specific interest. Field archaeologists don’t start to dig at random. They take aerial surveys first, and analyse those to determine if there are sub-surface structures and where they have the best chance of finding something significant if they dig there.

But how do we create these knowledge overviews? It is not easy, but a good start is being made with an initiative called Lazarus. It is an initiative that aims to gather significant concepts and assertions from the literature via crowd-sourcing, using a plug-in to the free scientific PDF-reader Utopia Documents. Although this is a crowd-sourcing initiative, there is nothing that prevents publishers from adding these concepts and assertions to the abstracts of the articles they publish. (As some articles may have many of these concepts and assertions, they can even be added ‘blind’, i.e. not visible to the human reader, but only readable by the computer.) Let me illustrate the idea with the following example.

Imagine you had a paper that concluded:

“On hot days, it turns out that aspirin decreases the chances of blot clots, but increases the chances of heart attack in humans; the effect wasn’t observed in rats at all; simulations of dogs seem to suggest that the effect is present but independent of temperature unless the dog is accompanied by a human”

Lazarus would recognise concepts and assertions of significance in this text (indicated in italics for nouns and underscores for verbs). And renders them as follows, suitable for machine-reading:

Significant concepts:

[CHEMBL25] (aspirin)

[EFO_0001702] (‘temperature’ from the experimental factors ontology)

[Canis lupus familiaris]

[Homo sapiens]

[Mus musculus]

Headline Interactions (in the form of Triples):

[ASPIRIN] [DECREASES] [THROMBOSIS]

[ASPIRIN] [INCREASES] [MYOCARDIAL INFARCTION]

These concepts and assertions would carry metadata (turning assertions into so-called nanopublications) to indicate the article of origin of the concept or assertion, based on the article’s Digital Object Identifier (DOI). With the help of computers, they would be combined, from thousands, even millions, of articles, to create a ‘knowledge map’, allowing researchers to ‘navigate’ the existing knowledge more easily and to home in on areas of interest in order to make well-informed choices as to which articles actually to read out of the overwhelming amount available.

The Lazarus plug-in is expected to be released later this year. Publishers wishing to add these concepts and assertions to the abstracts of the papers they published, are advised to take up contact.

Note

¹ FRASER, A.G., and DUNSTAN F.D. On the impossibility of being expert. BMJ. 2010, vol. 341:c6815. DOI: 10.1136/bmj.c6815.

References

FRASER, A.G., and DUNSTAN F.D. On the impossibility of being expert. BMJ. 2010, vol. 341:c6815. DOI: 10.1136/bmj.c6815.

GROTH, P., GIBSON, A., and VELTEROP, J. The anatomy of a nanopublication. Information Services and Use. 2010, vol. 30, nº 1-2. DOI: 10.3233/ISU-2010-0613.

Lazarus. The university of Manchester. [viewed 10 May 2015] Available from: http://www.cs.manchester.ac.uk/our-research/activities/lazarus/.

VELTEROP, J. Nanopublications: the future of coping with information overload. Logos. 2010, vol. 21, nº3, pp. 119–122. DOI: 10.1163/095796511X560006.

External link

Utopiadocs – <http://utopiadocs.com>

About Jan Velterop

Jan Velterop (1949), marine geophysicist who became a science publisher in the mid-1970s. He started his publishing career at Elsevier in Amsterdam. in 1990 he became director of a Dutch newspaper, but returned to international science publishing in 1993 at Academic Press in London, where he developed the first country-wide deal that gave electronic access to all AP journals to all institutes of higher education in the United Kingdom (later known as the BigDeal). He next joined Nature as director, but moved quickly on to help get BioMed Central off the ground. He participated in the Budapest Open Access Initiative. In 2005 he joined Springer, based in the UK as Director of Open Access. In 2008 he left to help further develop semantic approaches to accelerate scientific discovery. He is an active advocate of BOAI-compliant open access and of the use of microattribution, the hallmark of so-called “nanopublications”. He published several articles on both topics.

Como citar este post [ISO 690/2010]:

VELTEROP, J. Dealing with information overload [online]. SciELO in Perspective, 2015 [viewed ]. Available from: https://blog.scielo.org/en/2015/05/18/dealing-with-information-overload/