A Novel and Objective Measure of Scientific Impact
Introduction
Two months ago, I wrote that Intuitive Quantitation was hard at work building a new metric for measuring scientific impact – one that worked independently from traditional citation metrics. I am pleased to report that those efforts have been largely successful. This post presents some of our results when applying our new methodology to several disease topic areas.
Previously, I teased the idea of tracking the uptake of ideas and keywords as a potential method for calculating impact. We were inspired to develop a system based on memes, hearkening back to the work of Richard Dawkins who posited that ideas propagate through cultures much as genes do through biological systems. Through quantitative tracking of idea uptake within a topic, we could theoretically identify and track specific instances where new, and critically successful, concepts originated and began to spread.
From Title, Abstract, and Keywords to score
Our analysis relies on tracking concepts in publications through unique words and phrases (n-grams) appearing in the title, abstract and keyword list. Our hypothesis was that the frequency of an impactful publication’s n-grams would measurably increase in subsequent published papers. To prepare the data for analysis, each title and abstract was put through a process of tokenization where trivial words were discarded and duplicate words removed. This generated sets of single words (monograms), two-word phrases (bigrams) and three-word phrases (trigrams). For any given paper each n-gram was compared to all other papers in the search set to determine the frequency each term appears prior to publication, versus the frequency that the term appears post publication. Given by the equation:
A paper’s overall score was then calculated by averaging the individual term scores.
We also applied a concept-based impact analysis to measure the uptake of MeSH Keywords. Following similar logic, we hypothesized that the first paper to use a highly successful keyword within a topic area should be considered more impactful than a paper whose keywords are not well adopted by future publications.
Combining these data using statistical tools and analysis we generated scores for each paper based entirely on the usage of n-grams and uptake of keywords.
We were inspired to develop a system based on memes, hearkening back to the work of Richard Dawkins who posited that ideas propagate through cultures much as genes do through biological systems..
– Benjamin Verdoorn, Senior Data Scientist
Data
Our initial data suggests that these new metrics add critical information to the evaluation of paper impact. We performed IQuant Engine searches on three specific disease areas and analyzed the results.
Search | Number of Results | Range of Scores | Average Score |
---|---|---|---|
Parkinson’s and Dementia [Title/Abstract] | 9903 | -3.89 – 11.22 | -0.004 |
Restless Leg or Restless legs | 6358 | -3.01 – 16.25 | 0.016 |
Charcot Marie Tooth [Title/Abstract] | 5479 | -5.11 – 7.78 | 0.031 |
The following papers from the search topics scored highly using our new metrics.
- Parkinson’s Disease Dementia
- Restless Legs Syndrome
- Charcot Marie Tooth Disease
Parkinson's Disease Dementia
Publication Title | IQuant Score | Cited By |
---|---|---|
9.88 | 51 | |
6.79 | 2 | |
6.50 | 404 | |
Defining mild cognitive impairment in Parkinson's disease - (2007) | 4.74 | 166 |
Restless Legs Syndrome
Publication Title | IQuant Score | Cited By |
---|---|---|
16.25 | 26 | |
[Psychophysiologic insomnia and periodic leg movements in sleep syndrome] - (1995) | 8.13 | 0 |
4.24 | 15 | |
3.68 | 9 |
Charcot Marie Tooth Disease
Publication Title | IQuant Score | Cited By |
---|---|---|
7.77 | 0 | |
5.54 | 0 | |
Connexin mutations in X-linked Charcot-Marie-Tooth disease – (1993) | 5.46 | 237 |
Exome sequencing allows for rapid gene identification in a Charcot-Marie-Tooth family – (2011) | 5.35 | 54 |
The new metrics appear to generate results that are independent from citation metrics while still highlighting papers that could reasonably be seen as genuinely impactful. While we recognize that no metric can fully capture a concept as complex as scientific impact, we posit that this novel analysis not only identifies papers that may otherwise have been overlooked, but can also tag papers in which the citation metrics alone may have exaggerated their true impact.
While the results so far have been informative and robust, we have identified certain situations in which this score may be less accurate. The most glaring of these is in the subset of papers published within the last 1-2 years. While it can be interesting to look at the data for these papers, our language-based metrics can be skewed by the small number of post-publication papers available for analysis, sometimes producing artifactual results. Additionally, we have found that some of the oldest papers with very few MeSH keywords, n-grams, or preceding papers can generate anomalous scores.
Caveats aside, these new metrics provide critical additional data that will help us provide useful insights to our partners and collaborators, as well as prompting several new ideas for our own research. Our current focus is on developing a comprehensive impact score using a combination of all the available metrics (language based and traditional citation based). We believe this is likely to provide the best possible coverage and the most complete picture possible. Check back next time for more data as I explore these exciting possibilities.
Let us know how we can help enhance your research.
We work with scientists, drug discovery professionals, pharmaceutical companies and researchers to create custom reports and precision analytics to fit your project's needs – with more transparency, on tighter timelines, and prices that make sense.
A Novel and Objective Measure of Scientific Impact
Introduction
Two months ago, I wrote that Intuitive Quantitation was hard at work building a new metric for measuring scientific impact – one that worked independently from traditional citation metrics. I am pleased to report that those efforts have been largely successful. This post presents some of our results when applying our new methodology to several disease topic areas.
Previously, I teased the idea of tracking the uptake of ideas and keywords as a potential method for calculating impact. We were inspired to develop a system based on memes, hearkening back to the work of Richard Dawkins who posited that ideas propagate through cultures much as genes do through biological systems. Through quantitative tracking of idea uptake within a topic, we could theoretically identify and track specific instances where new, and critically successful, concepts originated and began to spread.
From Title, Abstract, and Keywords to score
Our analysis relies on tracking concepts in publications through unique words and phrases (n-grams) appearing in the title, abstract and keyword list. Our hypothesis was that the frequency of an impactful publication’s n-grams would measurably increase in subsequent published papers. To prepare the data for analysis, each title and abstract was put through a process of tokenization where trivial words were discarded and duplicate words removed. This generated sets of single words (monograms), two-word phrases (bigrams) and three-word phrases (trigrams). For any given paper each n-gram was compared to all other papers in the search set to determine the frequency each term appears prior to publication, versus the frequency that the term appears post publication. Given by the equation:
A paper’s overall score was then calculated by averaging the individual term scores.
We also applied a concept-based impact analysis to measure the uptake of MeSH Keywords. Following similar logic, we hypothesized that the first paper to use a highly successful keyword within a topic area should be considered more impactful than a paper whose keywords are not well adopted by future publications.
Combining these data using statistical tools and analysis we generated scores for each paper based entirely on the usage of n-grams and uptake of keywords.
Data
Our initial data suggests that these new metrics add critical information to the evaluation of paper impact. We performed IQuant Engine searches on three specific disease areas and analyzed the results.
Search | Number of Results | Range of Scores | Average Score |
---|---|---|---|
Parkinson’s and Dementia [Title/Abstract] | 9903 | -3.89 – 11.22 | -0.004 |
Restless Leg or Restless legs | 6358 | -3.01 – 16.25 | 0.016 |
Charcot Marie Tooth [Title/Abstract] | 5479 | -5.11 – 7.78 | 0.031 |
The following papers from the search topics scored highly using our new metrics.
- Parkinson’s Disease Dementia
- Restless Legs Syndrome
- Charcot Marie Tooth Disease
Parkinson's Disease Dementia
Publication Title | IQuant Score | Cited By |
---|---|---|
9.88 | 51 | |
6.79 | 2 | |
6.50 | 404 | |
Defining mild cognitive impairment in Parkinson's disease - (2007) | 4.74 | 166 |
Restless Legs Syndrome
Publication Title | IQuant Score | Cited By |
---|---|---|
16.25 | 26 | |
[Psychophysiologic insomnia and periodic leg movements in sleep syndrome] - (1995) | 8.13 | 0 |
4.24 | 15 | |
3.68 | 9 |
Charcot Marie Tooth Disease
Publication Title | IQuant Score | Cited By |
---|---|---|
7.77 | 0 | |
5.54 | 0 | |
Connexin mutations in X-linked Charcot-Marie-Tooth disease – (1993) | 5.46 | 237 |
Exome sequencing allows for rapid gene identification in a Charcot-Marie-Tooth family – (2011) | 5.35 | 54 |
The new metrics appear to generate results that are independent from citation metrics while still highlighting papers that could reasonably be seen as genuinely impactful. While we recognize that no metric can fully capture a concept as complex as scientific impact, we posit that this novel analysis not only identifies papers that may otherwise have been overlooked, but can also tag papers in which the citation metrics alone may have exaggerated their true impact.
While the results so far have been informative and robust, we have identified certain situations in which this score may be less accurate. The most glaring of these is in the subset of papers published within the last 1-2 years. While it can be interesting to look at the data for these papers, our language-based metrics can be skewed by the small number of post-publication papers available for analysis, sometimes producing artifactual results. Additionally, we have found that some of the oldest papers with very few MeSH keywords, n-grams, or preceding papers can generate anomalous scores.
Caveats aside, these new metrics provide critical additional data that will help us provide useful insights to our partners and collaborators, as well as prompting several new ideas for our own research. Our current focus is on developing a comprehensive impact score using a combination of all the available metrics (language based and traditional citation based). We believe this is likely to provide the best possible coverage and the most complete picture possible. Check back next time for more data as I explore these exciting possibilities.
Let us know how we can help enhance your research.
We work with scientists, drug discovery professionals, pharmaceutical companies and researchers to create custom reports and precision analytics to fit your project's needs – with more transparency, on tighter timelines, and prices that make sense.