https://elifesciences.org/articles/27725?utm_source=content_alert&
- Cited
- 0
- Views
- 1,689
- Comments
- 1
Cite as: eLife 2017;6:e27725
doi: 10.7554/eLife.27725
Abstract
Clarity and accuracy of reporting are fundamental to the
scientific process. Readability formulas can estimate how difficult a
text is to read. Here, in a corpus consisting of 709,577 abstracts
published between 1881 and 2015 from 123 scientific journals, we show
that the readability of science is steadily decreasing. Our analyses
show that this trend is indicative of a growing use of general
scientific jargon. These results are concerning for scientists and for
the wider public, as they impact both the reproducibility and
accessibility of research findings.
https://doi.org/10.7554/eLife.27725.001
https://doi.org/10.7554/eLife.27725.001
Introduction
Reporting science clearly and accurately is a fundamental part
of the scientific process, facilitating both the dissemination of
knowledge and the reproducibility of results. The clarity of written
language can be quantified using readability formulas, which estimate
how understandable written texts are (Flesch, 1948; Kincaid et al., 1975; Chall and Dale, 1995; Danielson, 1987; DuBay, 2004; Štajner et al., 2012).
Texts written at different times can vary in their readability: trends
towards simpler language have been observed in US presidential speeches (Lim, 2008), novels (Danielson et al., 1992; Jatowt and Tanaka, 2012) and news articles (Stevenson, 1964).
There are studies that have investigated linguistic trends within the
scientific literature. One study showed an increase in positive
sentiment (Vinkers et al., 2015),
finding that positive words such as 'novel' have increased dramatically
in scientific texts since the 1970s. A tentative increase in complexity
has been reported in scientific texts in a limited dataset (Hayes, 1992), but the extent of this phenomenon and any underlying reasons for such a trend remain unknown.
To investigate trends in scientific readability over time, we downloaded 709,577 article abstracts from PubMed, from 123 highly cited journals selected from 12 fields of research (Figure 1A–C). These journals cover general, biomedical and life sciences. This journal list included, among others, Nature, Science, NEJM, The Lancet, PNAS and JAMA (see Materials and methods and Supplementary file 1) and the publication dates ranged from 1881 to 2015. We quantified the reading level of each abstract using two established measures of readability: the Flesch Reading Ease (FRE; Flesch, 1948; Kincaid et al., 1975) and the New Dale-Chall Readability Formula (NDC; Chall and Dale, 1995). The FRE is calculated using the number of syllables per word and the number of words in each sentence. The NDC is calculated using the number of words in each sentence and the percentage of 'difficult words'. Difficult words are defined as those words which do not belong to a predefined list of common words (see Materials and methods). Lower readability is indicated by a low FRE score or a high NDC score (Figure 1A).
To investigate trends in scientific readability over time, we downloaded 709,577 article abstracts from PubMed, from 123 highly cited journals selected from 12 fields of research (Figure 1A–C). These journals cover general, biomedical and life sciences. This journal list included, among others, Nature, Science, NEJM, The Lancet, PNAS and JAMA (see Materials and methods and Supplementary file 1) and the publication dates ranged from 1881 to 2015. We quantified the reading level of each abstract using two established measures of readability: the Flesch Reading Ease (FRE; Flesch, 1948; Kincaid et al., 1975) and the New Dale-Chall Readability Formula (NDC; Chall and Dale, 1995). The FRE is calculated using the number of syllables per word and the number of words in each sentence. The NDC is calculated using the number of words in each sentence and the percentage of 'difficult words'. Difficult words are defined as those words which do not belong to a predefined list of common words (see Materials and methods). Lower readability is indicated by a low FRE score or a high NDC score (Figure 1A).
Figure 1
No comments:
Post a Comment