profile

Journalology

Journalology #69: Detecting paper mills

Published 2 months ago • 13 min read


Hello fellow journalologists,

Improving research integrity is front and centre of everyone’s mind at the moment, as this week’s newsletter makes crystal clear. Wiley announced a new paper mill detection service last week and The Wall Street Journal ran a story about image manipulation.

The lead story in the Opinion section is the latest episode in a spat between a publisher and a group of academics, which is both entertaining and unedifying in equal measure.

Thank you to our sponsor, Kotahi by Coko

Built on the foundation of collaboration, Kotahi represents the collective wisdom of the Coko community. Harness the power of shared innovation to tackle publishing challenges, making scholarly communication more efficient and accessible. Kotahi, where community meets technology, 100% open source.

Contact us today

News

Wiley announces pilot of new AI-powered Papermill Detection service

From the London Book Fair, Wiley today unveiled plans for its new AI-powered Papermill Detection service. Following an extensive series of internal beta tests, Wiley will advance this new service into the next phase of testing in partnership with Sage and IEEE.

Wiley (Press release)

JB: You can read about the new service here and the accompanying white paper here. It’s noteworthy that this service is distinct from products such as Papermill Alarm or Signals.

The day after the press release was issued, Retraction Watch ran this article:

Up to one in seven submissions to hundreds of Wiley journals show signs of paper mill activity

More than 270 of its titles rejected anywhere from 600 to 1,000 papers per month before peer review once they implemented a pilot of what the publisher calls its Papermill Detection service. That service flagged 10-13% of all of the 10,000 manuscripts submitted to those journals per month, Wiley told Retraction Watch.
Wiley said the service includes “six distinct tools,” including looking for similarities with known paper mill papers, searching for “tortured phrases” and other problematic passages, flagging “irregular publishing patterns by paper authors,” verifying researcher identity, detecting hallmarks of generative AI, and analyzing the relevance of a given manuscript to the journal.
Wiley will now “advance this new service into the next phase of testing in partnership with Sage and IEEE,” a spokesperson said.

The white paper also says that tools are in development for: (a) reference quality checks; (b) image manipulation detection; (c) citation and author network analyses.


How Science Sleuths Track Down Bad Research

Since going live that year, Imagetwin’s customers include major universities and some of the biggest names in scientific publishing, the founder said.
The tools, which can cost between $35 and $50 a manuscript for Proofig and about $27 a paper for Imagetwin, with subscriptions varying, offer a chance to improve the quality of published science. And there is some evidence the software is moving the needle.

The Wall Street Journal (Nidhi Subbaraman)

JB: There’s no real news in this story; if you work in scholarly publishing you’ve likely heard this all before. However, the fact that the WSJ is covering this topic is of interest in and of itself. Research integrity is now front and centre in investors’ minds.


The Royal Society and DataSeer sign agreement for Open Data Checks

The Royal Society, the oldest scientific academy in continuous existence, and DataSeer.ai, a leading provider of open science analytics and compliance solutions, proudly announce the signing of a strategic agreement aimed at enhancing open research data compliance in the Royal Society journal Proceedings of the Royal Society B.
It follows the completion of a successful pilot, which commenced with the journal in January 2024, to address the growing need for robust mechanisms to ensure compliance with open research data policies. With the exponential growth of scientific data and the increasing emphasis on transparency in research, there has been a corresponding demand for innovative solutions to streamline the open data compliance process.

Data Seer (press release)


Jisc review of UK open access and transitional agreements finds positives, but that a full transition is not in sight

The availability of TAs across a broader range of publishers in this period does not appear to have changed author behaviours and UK authors continue to choose traditional publishers to disseminate their research. The top ten publishers account for just over 90% of UK Hybrid output – a higher consolidation than for Closed (78% of output) and for fully Gold (66% of output). The top four publishers (Elsevier, Springer Nature, Wiley and T&F) together account for just under 50% of all articles published. The consolidation of output with the top four publishers is greater for the Hybrid route (66%) compared to the Gold (c. 25%) and Closed (58%). It is unclear whether TAs are contributing to this consolidation.

JISC (announcement)

JB: This quote is taken from the executive summary, which you can read here. The full report is available here.


Journal blacklists doctor in Pakistan ‘out of an abundance of caution’

We conducted a thorough investigation and were unable to confirm these allegations. However, the circumstantial evidence presented to the journal was, out of an abundance of caution, enough to warrant the rejection of any in-progress article submissions that involved Dr. Kumar. Additionally, Dr. Kumar’s Cureus account was permanently suspended.

Retraction Watch (Kiley Price)

JB: The allegations could not be confirmed, but Cureus banned the author anyway. It’s worth noting that COPE (Committee on Publication Ethics) says: “An article should be retracted only if the findings are unreliable or in cases of serious misconduct such as plagiarism. Also, COPE does not advocate banning offending authors from publication for any period of time as it may have legal implications.” Of course, journals also need to consider the impact to their reputation of publishing fraudulent research.


OurResearch receives $7.5M grant from Arcadia to establish OpenAlex, a milestone development for Open Science

OurResearch is proud to announce a $7.5M grant from Arcadia, to establish a sustainable and completely open index of the world’s research ecosystem. With this 5-year grant, OurResearch expands their open science ambitions to replace paywalled knowledge graphs with OpenAlex.
Researchers, funders, and organizations around the world rely on scientific knowledge graphs to find, perform, and manage their research. For decades, only paywalled proprietary systems have provided this information and they have become unaffordable (costing libraries $1B annually); uninclusive (systematically excluding works from some fields and geographies); and unavailable (even paid subscribers are limited in their use of the data).
OpenAlex indexes more than twice as many scholarly works as the leading proprietary products and the entirety of the knowledge graph and its source code are openly licensed and freely available through data snapshots, an easy to use API, and a nascent user interface.

OurResearch (announcement)

JB: OpenAlex is being used by a number of research integrity tools. Is a $7.5m investment over 5 years enough to challenge the commercial incumbents, though?


Peer-replication model aims to address science’s ‘reproducibility crisis’

In a white paper detailing the process, Rehfeld, Lord and their colleagues describe how journal editors could invite peers to attempt to replicate select experiments of submitted or accepted papers by authors who have opted in. In the field of cell biology, for example, that might involve replicating a western blot, a technique used to detect proteins, or an RNA-interference experiment that tests the function of a certain gene. “Things that would take days or weeks, but not months, to do” would be replicated, Lord says.
The model is designed to incentivize all parties to participate. Peer replicators — unlike peer reviewers — would gain a citable publication, and the authors of the original paper would benefit from having their findings confirmed. Early-career faculty members at mainly undergraduate universities could be a good source of replicators: in addition to gaining citable replication reports to list on their CVs, they would get experience in performing new techniques in consultation with the original research team.

Nature Index (James Mitchell Crow)

JB: You can read the white paper here: Peer Replication (first published November, 2023).


Crystallography databases hunt for fraudulent structures

Another challenge databases like the CSD must navigate when looking for fraud is that they currently rely on journals to retract papers before taking similar action on their end. As a result, such databases are introducing greater scrutiny of crystals by doing peer review in collaboration with journals, Ward tells C&EN. Steps taken include hiring more staff dedicated to policing research integrity issues and establishing new relevant policies, processes, and guidelines.
Under the CSD’s current process, when a paper goes to a journal for peer review, the CSD shares information about crystals with journal editors and peer reviewers, Ward says. Every single entry is checked by an in-house entry-level doctorate scientist and an expert in crystallography, she adds. They validate a structure’s novelty and plausibility by comparing it to past submissions and using software that predicts how molecules pack in crystals.

C&EN (Dalmeet Singh Chawla)


Other news stories

Improving analytical standards: Global Analytical Robustness Initiative (The Official PLOS Blog)

How OpenAI’s text-to-video tool Sora could change science – and society (Nature)

Embrace AI to break down barriers in publishing for people who aren’t fluent in English (Nature, paywall)

Subject codes, incomplete and unreliable, have got to go (Crossref)

DOAJ’s year ahead (DOAJ News Service)

March 2024 AI Reading Roundup (Silverchair)

Two AIP Publishing Journals Now Open Access Under Subscribe to Open Pilot (AIP Publishing) JB: This was first announced in July last year

Gender, Work and Organization Editorial Resignation letter

Editors of Syntax resign, found new journal (OA Linguistics)

Numbers highlight US dominance in clinical research (Nature Index) JB: I wrote a critique of the clinical journals included in the Nature Index in issue 32.

AI-generated rat genitalia: Swiss publisher of scientific journal under pressure (SWI swissinfo.ch)

Australia’s chief scientist takes on the journal publishers gatekeeping knowledge (The Guardian)

By design, PKP is not for sale (Public Knowledge Project) JB: The news peg for this article is the sale of PeerJ to Taylor & Francis.

DOAJ and Crossref renew their partnership to support the least-resourced journals (DOAJ News Service)

Thank you to our sponsor, Morgan Healey

Global Executive Search Specialists in STM/Scholarly Publishing, Open Research & Digital Content.

Opinion

Response to: “Bad bibliometrics don’t add up for research or why research publishing policy needs sound science”

We struggled to come up with a delicate way to phrase this, but we simply couldn’t find the words. To be blunt: the Frontiers analysis is amateurish, careless in its data curation and interpretation, leans heavily on visual impressions, and evokes results out of thin air. Let’s dive in.

The Strain on Scientific Publishing (Mark Hanson, Pablo Gómez Barreiro, Paolo Crosetto, Dan Brockington)

JB: This is Round 3 in an ongoing and intriguing exchange. Here’s a quick recap to get everyone up to speed

Round 1: In September last year the four researchers published a preprint entitled The strain on scientific publishing, which I covered in issue 51; I noted that Figure 2 was interesting, if unsurprising.

The preprint got some press attention, which the authors have catalogued here (they missed off Journalology, but, unlike some people, I’m not one to hold a grudge).

Round 2: Last month Fred Fenter, the Executive Editor at Frontiers, published a blog post refuting the findings of the preprint: Bad bibliometrics don’t add up for research or why research publishing policy needs sound science. In issue 66 I wrote:

I’ve read this article a few times and I’m somewhat perplexed. There’s a lot going on at Frontiers right now and it seems odd that so much effort has gone into debunking (if indeed that’s what’s been done) this preprint. This is clearly a topic that Frontiers cares a lot about.

It’s certainly unusual for a publisher to take on a group of academics like this. Here’s the core message from the Frontiers article:

The study’s key findings could not be reproduced. This was due to two fundamental issues with the data used in the study: 1) the reliance on unverifiable proxy data sources while portraying that data as reflecting original sources and 2) the omittance of any data or evidence that contradicts the cornerstone correlation and conclusion that special issues cause significant increases in research output and add to strain on researchers.

Round 3: This week, the authors of the preprint responded to the Frontiers article and their arguments are compelling. Their blog post is worth reading in full, but here are the first few paragraphs of their Conclusion section to give you a sense of the backstory:

We started a conversation with Frontiers while we were working on our article. We offered them the chance to comment on our work before we released it publicly. This is a courtesy we also extended to other publishers. We were thus surprised to find this blog posted without our knowledge. Indeed, we had numerous emails with Frontiers in the lead-up to our work that were good-faith exchanges. We even have the original comments that Frontiers provided to us from when we sent them our draft article: they praised aspects of our work and provided constructive feedback, including correcting our phrasing to avoid ambiguous wording. We would be happy to share those comments, if Frontiers would give us permission to. We still thank Frontiers for those comments.
That is why it is so surprising to see this latest piece, which contains a startling level of animosity and many derogatory accusations that are simply untrue.

Fanning the flames is rarely a good PR strategy. The preprint would likely have been largely forgotten by now, but six months on it’s still creating news. It’s worth remembering that the preprint has not been formally peer reviewed and published yet (as far as I'm aware).


From scraped to published

At Patterns, we frequently encounter authors who have used web-scraped datasets but are unclear on how to meet the journal’s transparency and reproducibility requirements. These cases can raise complex questions regarding data origin, copyright, and ethics for what may otherwise appear to be straightforward research articles. Here, we offer our authors and readers some practical recommendations for publishing science based on web-scraped data. Our key message is that authors need to take responsibility for what they have used, even, and especially, when complex issues arise.

Patterns (Alejandra Alvarado and Andrew L. Hufton)

JB: This is relevant to the previous story.


David versus Goliath: Early career researchers in an unethical publishing system

The academic publishing system is in crisis, and systemic change is needed to make it more fair and equitable. While there is widespread motivation and desire to make large-scale publishing changes across the academic system, the task feels daunting. To address this, we suggest a set of actions to promote change that can be implemented by researchers across varying aspects of their academic lives – as readers, authors, reviewers, editors, evaluation committee members and colleagues. While many of the actions we propose are lower risk and can be implemented by ECRs, these actions must be complemented by higher risk ones undertaken by established researchers. While the main goal of these actions is to improve the publishing system in ecology and evolution, they will also address other inequalities, including the accessibility of research in general, and the evaluation of researchers for employment and promotion.

Ecology Letters (Aurore Receveur et al)


Other opinion articles

Misspelled cell lines take on new lives — and why that’s bad for the scientific literature (Retraction Watch)

PeerJ Takes a Different Path (The Geyser)

Living in Another World (The Geyser)

Peer review demystified: part 1 (Nature Methods)

An Interview with Will Schweitzer of Silverchair (The Scholarly Kitchen)

Honesty is being put through the mill (Nature Physics, paywall)

Misconduct’s forgotten victims (Science)

Is it ethical to use generative AI if you can't tell whether it is right or wrong? (Impact of Social Sciences)

Inequities in Academic Publishing: Where Is the Evidence and What Can Be Done? (American Journal of Public Health)

Chef de Cuisine: Perspectives from Publishing's Top Table - Amy Brand (The Scholarly Kitchen)

Guest Post - Navigating the Drift: Persistence Challenges in the Digital Scientific Record and the Promise of dPIDs (The Scholarly Kitchen)


Webinars

Here are the webinars being held this week. I posted a longer list on LinkedIn yesterday, if you want to get a heads up into next week and beyond.

Equity, Diversity and Inclusion Survey results and next steps
March 19 (European Association of Science Editors)

Preprints across the globe; landscapes, perceptions and challenges
March 19 (ASAPbio Community Call)

Preprints as the central pillar of the PRC model
March 19 (Peer Community In)

Publishing Integrity
March 19 and March 20 (Charleston Hub)

Open Access Evolution, Revolution, or Demise?
March 20 (Society for Scholarly Publishing)

Author Engagement Outside of the Publishing Process
March 21 (ChronosHub)

Unveiling the Impact of Transformative Agreements on Scholarly Publishing
March 21 (De Gruyter)

Horizon Planning: Preparing for the Users of the Future
March 21 (Renewd.net)

Is Open Access Truly Open to All?
March 22 (Karger)

Journal Club

Academic publishing requires linguistically inclusive policies

Beyond the adoption of the linguistically inclusive policies described in this study, we propose a set of actions that can further advance journals in this mission: (i) Journals should scrutinize and revise author guidelines to communicate their linguistic policies in a clear manner and reconcile author guidelines with the perception of editors. (ii) The use of discriminatory language in author guidelines and arbitrary requests that disproportionally affect authors with limited or perceived limited English proficiency should be strongly discouraged. Those exclusionary practices, such as requesting certificates of professional English-editing services, could impose a significant economic burden to scholars from lower-income countries. (iii) Authors should be allowed to harness AI tools, such as ChatGPT or DeepL Write, to proofread their manuscripts and submit both the original and the AI-proofread versions for the sake of transparency. (iv) Scholars, editors and scientific societies should keep assessing the power dynamics between publishers and journal editorial boards in the development of linguistically inclusive policies and promote their renegotiation in the instances where it is deemed necessary. Finally, (v) journals should implement mandatory double-blind peer review systems to procure a fair assessment of manuscripts regardless of the English proficiency of the authors.

Proceedings of the Royal Society B: Biological Sciences (Henry Arenas-Castro et al)

JB: See also: Embrace AI to break down barriers in publishing for people who aren’t fluent in English.


Using automated analysis of the bibliography to detect potential research integrity issues

Just as plagiarism-detection software gives the user an indicator of possible plagiarism, rather than a definitive answer to the question Was this article plagiarised?, analysing the bibliography can provide indications that a closer look is needed, rather than a definitive answer to the question Are these references legitimate?
We have shown here that analysis of the bibliography can surface clues that an article may need further investigation of potential research integrity issues. While these tests can be conducted manually, scaling them up for use in a production environment requires automated methods such as those described here.
As with other indicators, such as detection of possible image manipulation, data fabrication, or plagiarism, the responsibility then rests with the publisher to investigate whether there are genuine grounds for concern, and to take appropriate action.

Learned Publishing (Robin Dunford, Bruce Rosenblum and Sylvia Izzo Hunter)


And finally...

It’s always nice to have something to look forward to. This book, by Stephen Pinfield, will be published in June and is likely to be a good read. I don't know Stephen, but I've seen him speak at conferences a few times and he repeatedly impressed me.

Until next time,

James

P.S. Can someone please tell me if it should be “paper mills” or “papermills”? My vote is for the former, but product names seem to prefer the latter.


113 Cherry St #92768, Seattle, WA 98104-2205
Unsubscribe · Preferences

Journalology

James Butcher

The Journalology newsletter helps editors and publishing professionals keep up to date with scholarly publishing, and guides them on how to build influential scholarly journals.

Read more from Journalology

Subscribe to newsletter Hello fellow journalologists, When I write these newsletters I try to add value by giving my opinion on the story behind the story. Getting the balance between insight and speculation is hard; I have no desire to create a gossip magazine. Last week I wrote about the new collaboration between JACC (Journal of the American College of Cardiology) and The Lancet and I read the tea leaves wrong. The downside of working for corporates for 20+ years, as I have, is that it can...

about 14 hours ago • 20 min read

Subscribe to newsletter Hello fellow journalologists, This week’s newsletter delves into a new transfer pathway — between two competitor journals — that’s been two decades in the making. I also touch on the steep learning curve for Taylor & Francis’ new CEO. As usual, there’s a lot to cover, but first here’s a message from the newsletter’s primary sponsor. Thank you to our sponsor, Digital Science Digital Science’s flagship solution, Dimensions, is the world’s largest linked-research database...

8 days ago • 15 min read

Subscribe to newsletter Hello fellow journalologists, There’s a strong DEI theme to this week’s issue, with reports from Springer Nature (on editorial board diversity) and C4DISC (on workplace equity) released this week. The newsletter also includes a fascinating map of the biomedical publishing landscape, a primer on COUNTER, and a discussion of F1000’s recently revised editorial model. Thank you to our sponsor, Digital Science In late 2023, Digital Science fully acquired Writefull, which...

15 days ago • 16 min read
Share this post