Benjamin Verdoorn

Let us know how we can help enhance your research.

We work with scientists, drug discovery professionals, pharmaceutical companies and researchers to create custom reports and precision analytics to fit your project's needs – with more transparency, on tighter timelines, and prices that make sense.

A Novel and Objective Measure of Scientific Impact

Introduction

From Title, Abstract, and Keywords to score

$$ \text{TermScore} = \left( \frac{\text{OccurancesFuture}}{\text{#Papers in Future}} \right) - \left( \frac{\text{OccurancesPast}}{\text{#Papers in Past}} \right) $$

A paper’s overall score was then calculated by averaging the individual term scores.

Combining these data using statistical tools and analysis we generated scores for each paper based entirely on the usage of n-grams and uptake of keywords.

Data

Search	Number of Results	Range of Scores	Average Score
Parkinson’s and Dementia [Title/Abstract]	9903	-3.89 – 11.22	-0.004
Restless Leg or Restless legs	6358	-3.01 – 16.25	0.016
Charcot Marie Tooth [Title/Abstract]	5479	-5.11 – 7.78	0.031

The following papers from the search topics scored highly using our new metrics.

Parkinson’s Disease Dementia
Restless Legs Syndrome
Charcot Marie Tooth Disease

Parkinson's Disease Dementia

Publication Title	IQuant Score	Cited By
Dementia in Parkinson's Disease - (1979)	9.88	51
Abnormally phosphorylated tau protein in senile dementia of Lewy body type and Alzheimer disease: evidence that the disorders are distinct - (1995)	6.79	2
Filamentous alpha-synuclein inclusions link multiple system atrophy with Parkinson's disease and dementia with Lewy bodies - (1998)	6.50	404
Defining mild cognitive impairment in Parkinson's disease - (2007)	4.74	166

Restless Legs Syndrome

Publication Title	IQuant Score	Cited By
Clonazepam and vibration in restless legs syndrome - (1984)	16.25	26
[Psychophysiologic insomnia and periodic leg movements in sleep syndrome] - (1995)	8.13	0
Systematic evaluation of augmentation during treatment with ropinirole in restless legs syndrome (Willis-Ekbom disease): results from a prospective, multicenter study over 66 weeks - (2012)	4.24	15
Sleep disorders in Parkinson's disease - (2009)	3.68	9

Charcot Marie Tooth Disease

Publication Title	IQuant Score	Cited By
[The characteristics of gene mutations in Chinese patients with Charcot-Marie-Tooth disease] - (2005)	7.77	0
Unusual motor conduction velocity values in Charcot-Marie-Tooth disease associated with essential tremor: report of a kinship - (1975)	5.54	0
Connexin mutations in X-linked Charcot-Marie-Tooth disease – (1993)	5.46	237
Exome sequencing allows for rapid gene identification in a Charcot-Marie-Tooth family – (2011)	5.35	54

Our Services

Let us know how we can help enhance your research.

The Impact Problem

The Flaws of ‘Publish or Perish’: Chasing Grants Over Groundbreaking Science

I’ve been thinking about the concept of impact a lot lately – specifically how do we quantify the impact of scientific work and by extension scientists themselves? Currently, the scientific community uses a few classical metrics to determine the worth of a scientist’s contribution, despite a growing awareness of their deep flaws.

The “publish or perish” paradigm has become the law of the land, and perceived ability to secure grant money is too often the driving force behind scientists’ decision-making. This attitude becomes especially problematic when compelling science is neglected because it doesn’t yield enough financial benefit to the sponsoring institution.

Brilliant scientist, Katalin Kariko, faced demotion several times during her career at UPenn – allegedly for failing to secure appropriate funding for her research; the same research that would be awarded the 2023 Nobel Prize and that led directly to the technology behind the COVID-19 vaccines.

The Hirsch Index – Is There a Better Way to Measure Success?

Unfortunately, the metrics commonly used to quantify impact regularly discount truly influential and groundbreaking scientific research. The Hirsch Index (h-Index), a citation-based measure of influence that is widely used to determine the quality of a scientist’s contribution, is particularly problematic.

A Nature article proposing an alternative metric highlighted the limits of H-index – most notably the fact that as the importance of H-index has grown so have the motivating factors that would encourage a scientist to manipulate their score.

Beyond Citation Metrics: The Search for a True Measure of Scientific Value

In his 2015 article The Slavery of the h-index—Measuring the Unmeasurable, Grzegorz Kreiner rails against the prevalence of h-index and points out that a surprising number of Nobel prize winners have relatively low h-indices. He writes, “it could be surmised that in many cases the moderate h-indices at this stage (e.g. Nobel Prizes) are simply correlated to the persistence in pursuing the hypothesis that ultimately was proven, thereby yielding a scientific breakthrough.” This seems particularly prescient in light of the recent news about Dr. Kariko’s persistence.

There are a number of reasons to believe that all citation-based metrics for determining scientific value may be fatally flawed. Foremost among them is the pressure that these metrics place on both scientists and journals to produce more content, regardless of quality; content that is designed to artificially boost metrics through circular or coercive citation practices. Scientific progress then suffers under the weight of the publication Ouroboros, with incremental breakthroughs being restated repeatedly, adding minimal value or, worse yet, drowned out by the noise of previous ‘discoveries’.

Scientific progress then suffers under the weight of the publication Ouroboros, with incremental breakthroughs being restated repeatedly, adding minimal value or, worse yet, drowned out by the noise of previous ‘discoveries’.
– Benjamin Verdoorn, Senior Data Scientist

Properly quantifying scientific impact is important and valuable, especially considering the current system of paper inflation. Getting up to speed in a new field can be an overwhelming task and finding the salient papers within the mass of publication debris would be extremely valuable.

Scientific Integrity in New Metrics

At IQuant we are working to develop a new metric for scientific impact that is independent from citation metrics.

This is critical, both for the practical reason of paywalled citation information (see my previous post about why we only use publicly available information) and for the far more important reason of championing scientific integrity. By tracking the uptake of ideas and keywords through time, we hope to accurately identify papers and scientists that profoundly impact the development of a particular field, shift paradigms and foster new directions.

With this new meme-based metric we will be able to more accurately measure scientific impact and hopefully provide more enlightening evaluation of ideas, papers and scientists that truly influence the progress of science.

Check back here soon for examples of the new metric in action, and comment below if you have experience with coercive publishing practices or are frustrated with how citation metrics are affecting the scientific publishing landscape.

Our Services

Let us know how we can help enhance your research.

The Impact Problem

At IQuant we are working to develop a new metric for scientific impact that is independent from citation metrics.

Our Services

Let us know how we can help enhance your research.

Illustration of many colorful faces coalescing into a single face.

Challenges in Authorship Attribution: The “Name Problem”

I’ve encountered many challenges during the development of the Iquant Engine, but none has been more frustrating than what I’ve dubbed the “Name Problem”. In programming it is often the simplest of errors that are the most difficult to figure out and a misplaced comma has devoured my entire day more than once. However, I wasn’t expecting the same to be true on the data analysis side as well.

Simply put – how do I ensure that the bibliography of each author is both fully correct and exhaustive? Relatedly, how do I differentiate authors with often very similar names?

Early on in development I foresaw aspects of this problem and included checks to verify authors with multiple or similar names included in their publication list – combining authors who publish a portion of their papers with a middle initial and some without, for example. These initial rules worked for relatively small sets of publications in which there were fewer authors and consequently fewer possible matches. Unfortunately, I started encountering more errors when confronting larger topic areas as there were a surprising number of examples that defeated even my best algorithms. Undaunted, I set to work developing more algorithms – confident that I could develop a set of functions that would permanently solve the problem for me.

I can only plead ignorance to excuse my hubris.

Affiliation comparison and co-author analysis partially minimized the problem. Techniques like algorithmic string comparison also helped further address the issues. Yet no matter what I did, I always found exceptions to my carefully constructed and increasingly complicated system. After many months of trying new approaches, I had to accept that perfection was not achievable. As frustrating as it was, some level of error had to be accepted in the author accounting process.

I eventually learned that I was far from the first person to encounter this problem. In fact, this appears to be a widespread problem and one that doesn’t have a clear solution even given significantly more resources than I was able to devote. In 2010 the National Library for Medicine announced a project to assign authors a unique ID, but four years later the project was scrapped (Pubmed press release) and it was announced that Pubmed would instead rely on third parties like ORCID.

The Name Problem will persist unless a universal system with full adoption is instituted. Until then, if you are publishing it is critical to put careful thought into how your name is displayed on papers.
– Ben Verdoorn, Senior Data Scientist

The Name Problem has real consequences, beyond my data-nerd frustration. Notably, it fosters inequality in our publication-based merit systems, at least partly because of culture-specific naming conventions. Without a unique author identifier many scientists will not be able to be properly credited or correctly sought out when their expertise is required. For now, we will account for The Name Problem in our analysis and accept the error it injects into our data. We will not be rid of it without concerted universal effort, which is likely out of reach for our shambolic scientific publishing landscape. Therefore, I must adjust to this reality despite the errors it propagates in my data and the sleepless nights spent worrying away at different possible solutions – like an ever-present popcorn kernel I just can’t get out of my tooth.

Our Services

Let us know how we can help enhance your research.

Challenges in Authorship Attribution: The “Name Problem”

Simply put – how do I ensure that the bibliography of each author is both fully correct and exhaustive? Relatedly, how do I differentiate authors with often very similar names?

I can only plead ignorance to excuse my hubris.

Third party solutions will never be the answer simply due to lack of uptake. The Name Problem will persist unless a universal system with full adoption is instituted. Until then, if you are publishing it is critical to put careful thought into how your name is displayed on papers. Unfortunately, differing conventions in how journals display author names and what information is accessible to databases like Pubmed will trip up even the most diligent of scientists, especially if you have relatively common last name.

The Name Problem has real consequences, beyond my data-nerd frustration. Notably, it fosters inequality in our publication-based merit systems, at least partly because of culture-specific naming conventions. Without a unique author identifier many scientists will not be able to be properly credited or correctly sought out when their expertise is required. For now, we will account for The Name Problem in our analysis and accept the error it injects into our data. We will not be rid of it without concerted universal effort, which is likely out of reach for our shambolic scientific publishing landscape. Therefore, I must adjust to this reality despite the errors it propagates in my data and the sleepless nights spent worrying away at different possible solutions – like an ever-present popcorn kernel I just can’t get out of my tooth.

Our Services

Let us know how we can help enhance your research.

The Publisher Problem

Why do we rely solely on publicly available data at Iquant? As Iquant’s senior data scientist, I’ve spent a lot of time reflecting on the source of the information that’s available to us. The choice to use only publicly available data isn’t one that we took lightly. The high cost of premium data services might be the most obvious reason, but on a deeper level we are fundamentally opposed to the excessive, profit-driven model employed by both scientific publishing and other data services companies.

These complaints are by no means revolutionary. For upwards of 20 years there has been a growing movement towards open access to publications along with louder criticism of the pay-per-view science and institutionally priced subscription services models. Open access could be a solution, but unfortunately, open access publishers continue to collect their toll. They simply have shifted the costs from consumers to those submitting their work through increasingly exorbitant processing fees. In the quest for greater scientific understanding, this becomes an unethical roadblock.

These publishing business models have unfortunately created a scientific information landscape that resembles the rest of our society. One where the ‘haves’ maintain their status without impediment, while the ‘have-nots’ continue to struggle for the recognition and impact they clearly deserve. This inequality within the scientific community poses all the same problems that it does within society as a whole. New, exciting thinking could be lost – whether merely by circumstance and location, or more nefariously by publishers enforcing their own bias. Can we trust the impartiality of a system when the driving motivation seems not to be advancement of knowledge, but rather, ever-increasing profits?

At Iquant, we’re simply not able to justify supporting such a system – not as some grand gesture (our size precludes anything using the word grand) – but simply because we view ourselves first and foremost as scientists, who fundamentally believe in the principles of transparency and diversity of thought.

This inequality within the scientific community poses all the same problems that it does within society as a whole.

– Ben Verdoorn, Senior Data Scientist

Publishers would have you believe that all the answers are waiting to be accessed behind their paywalls, but data means very little without careful analysis. I believe that there exists a vast trove of knowledge and insight to be found in public, open access, sources. While the information is freely accessible, at Iquant we have the skills necessary to synthesize it into valuable results. By extracting and carefully analyzing all the information available we are in a position to create novel insights and draw compelling conclusions from freely available data. This is why we built the Iquant Engine, to help other scientists reach their goals without contributing to a system that we profoundly disagree with.

Our Services

Let us know how we can help enhance your research.

The Publisher Problem

Publishers would have you believe that all the answers are waiting to be accessed behind their paywalls, but data means very little without careful analysis. I believe that there exists a vast trove of knowledge and insight to be found in public, open access, sources. While the information is freely accessible, at IQuant we have the skills necessary to synthesize it into valuable results. By extracting and carefully analyzing all the information available we are in a position to create novel insights and draw compelling conclusions from freely available data. This is why we built the Iquant Engine, to help other scientists reach their goals without contributing to a system that we profoundly disagree with.

Our Services

Let us know how we can help enhance your research.