PRESENTATIONS
Recent Presentations and SymposiumsPiracy Websites and Research Integrity
Remarks/presentation for Open Athens Access Lab 26 February 2025
Thank you very much to the Access Lab organizers for the invitation to address this important topic. What I will cover is a description of the SNSI organization, general research integrity issues in science publishing, the specific research integrity issues we see on piracy websites, and then a discussion on possible solutions.
SNSI background
second slide
SNSI (Scholarly Network Security Initiative) was formed in 2020 by journal publishers, and now includes 15 organizations ranging from university presses to society publishers to commercial publishers. SNSI’s goal is to address “holistically” the development of websites with unauthorized access (“piracy websites”), including SciHub (probably the best known), LibGen, ZLibrary, Anna’s Archive, and the more recent SuperPirate service (set up in the past year). In my view many of these services are connected, most have some kind of connection to Russian actors or entities, possibly Russian state actors, and most operate outside of “rule of law” countries. You will notice on this slide a reference to the City of London IP Police warning of 2021 about accessing such sites, noting the potential cybersecurity dangers (to personal information and research data) that such sites pose to users. I think we are all aware of the ransomware attack on the British Library of October 2023 which was reported as starting as a phishing attack on temporary contract workers access methods— the piracy websites I’ve mentioned were not highlighted in the reports but this is still an important example of weak cybersecurity.
These piracy websites rely on voluntary “donations” of content, access credentials and even cryptocurrency (they receive and contribute cryptocurrency to other organizations), and exploit the inherently insecure but common method of accessing scholarly literature, the IP address protocol, which was put into place between publishers and universities more than 20 years ago. We now have better methods including Open Athens for accessing content, but IP address continues as the primary method in this field.
The holistic SNSI approach is to look at more than just legal claims or disputes, but also to focus on technological tools and guidelines, and educational efforts in the university relations and communications teams. Our university relations team attends and presents on these issues at library conferences and our communications team works to ensure the SNSI website has useful and updated briefings and guidelines, and that we communicate in the press on these points.
Research integrity issues in science publishing
third slide
I’d like to talk now about the general issue of research integrity issues in publishing. Most of us will be aware that the number of retractions (a notice that there is an ethical issue in the paper) have been increasing in the past 5-10 years. This can be seen in the reporting on the subject and in looking at publisher websites. When I started working on these issues at Elsevier 25 years ago, where I was then working, we dealt with perhaps dozens of retraction issues. The Heliyon article I mention here shows that between 2002 and 2022, the number of retractions per year is up to 1,000 per year (reference list included at the end). Of course we do have occasional blips in the literature—as in the Wiley-Hindawi special issues retractions of 2023 (8,000) and the recently announced Springer-Nature retraction of 3,000. All this begs the question of whether we simply experiencing more ethical violations in publishing, or are we getting better at detecting problems in the literature?
I think most people would say as I do that the answer is both—more violations but also better detection. We have collectively made significant improvements in publishing processes, clearer ethical policies, better communications, education and testing, the development of larger and more experienced integrity teams (at both publishers and universities), and better communications between publishers and universities. On tools we now have automated plagiarism detection for example. Importantly we have whistleblowers in this area, with the start of the formal Retraction Watch team (Adam Marcus and Ivan Oransky) in 2010 to the more recent activities of the Dutch researcher Elizabeth Bik, who began by looking at image manipulation specifically. We also have Graham Kendall who began by looking at predatory journals, journals that mimic a “real” scientific journal and which seek publication fees from researchers, who later find out that the journal they submitted to is not real (always look at DOAJ to see if a particular Open Access journal is actually listed).
Unfortunately the fraudsters evolve as well—in addition to the predatory journal problem we also have the more recent “paper mills” issue, where something that may look like a research paper (but often with fudged or non-existent data) is offered as something that for a fee someone could claim to be a co-author of. Hopefully these fake papers are ultimately rejected by real journals, although by then the “co-author” has parted with their money. Researchers and authors should always be cautioned to be careful about who they are dealing with.
My view is that data and image manipulation (and in science publishing, images are data) are the most serious ethical violations, as it means that the data in the paper (or at least some of the data) cannot be relied on, and certainly shouldn’t be cited in further papers—they shouldn’t be part of the stream of research activity. A good example of the data problem comes up with Covid retractions, which RetractionWatch tracks (now more than 500 papers). You may remember that some of the editorial policies at some journals and preprint servers were relaxed or processes sped up during the height of the Covid emergency, to try to get more information out more quickly. This is very understandable in the light of that emergency, but did lead to the publication of inaccurate data. One of the more famous examples of this was the hydroxychloroquine paper from the “lab” run by Didier Raoult that featured in the US Senate hearing on the candidacy of Robert F Kennedy Jr to be head of (US) Health & Human Services agency.
Another important issue is what I think of as the “downstream” issue— you can see in some of the lists that RetractionWatch maintain (look at their “top 10” list for example) that many retracted papers receive more citations post-retraction than pre-retraction. This means that there are researchers out there who rely on the data in the retracted paper. As retracted papers receive more citations, they further pollute the research literature stream.
Research integrity issues on piracy websites
fourth slide
There’s a great and very helpful article published in a Taylor & Francis journal just in January of this year that looks at retracted articles on SciHub. The authors looked at 17,000 papers that are marked as retracted in Scopus or Web of Science between 2002 and 2022. Of those 17,000 articles, they found that 85% of these articles posted on SciHub had no retraction information included. So if a researcher is relying on SciHub as a valid research resource, they would be sadly mistaken. I think we have to understand that there is a question about the motivation of these piracy sites. I note here the recent advisory note from the (UK) National Cyber Security Centre on Russian state actors in spear-phishing attacks—this warning isn’t specifically about the piracy sites we’re talking about today but they are examples of the kinds of malign actors that might use insecure access methods to hack university servers. We also see AI companies such as the Chinese DeepSeek service using Anna’s Archive. My view is that these piracy websites want to appear to have enough mass/content to be viewed as comprehensive, which then enable some of those malign actions. The mantra must remain to be sure who you are dealing with.
Users who might be assuming that these sites are valid research resources should be mindful of the T&F article on the absence of retraction information, and the example of the bad data in the retracted Covid papers, along with the risk of further downstream pollution of the literature.
Solutions
fifth slide
In my view the first and most important solution is education—spreading the knowledge about the problems in using such sites and the importance of knowing who one is dealing with. This event today is part of those educational efforts. I’ve noted before the presentations and discussions that SNSI has participated in on these topics at library conferences. I’d also like to suggest that the participants here take this information back to their institutions or organizations. On the technology side, there are better solutions for access protocols, and I’m looking forward to Matt’s presentation on Open Athens programs. There are better bibliographic and technology tools to identify retracted content. We need more researchers to be aware of DOAJ as a source of information on Open Access journals (to protect against the predatory journal problem). Finally on the legal side we do seek to block or limit access to piracy sites, and we do have site blocks initiated in many countries.
Questions
A question was asked about references made in the presentation
(see the reference list on the last page of the powerpoint (and below).
A question was raised about whether we should be moving to federated access (such as Open Athens):
We should absolutely be moving to more secure methods of access—loose security is an invitation to misuse and presents easy pathways for malign actors.
A question was raised about site blocking in connection with developing research integrity tools—and question about the use of such tools if institutions become aware that they are accessing piracy websites.
Clearly I don’t believe that using piracy websites as a research reference source is a good idea—there are plenty of other resources such as preprint servers, official sites such as PubMed, and of course publisher sites (which in my perhaps prejudiced view are the most reliable).
Mark Seeley
References
• PIPCU press release warning re Sci-Hub (March 2021)
• Koo & Lin, Retracted Articles in scientific literature: A bibliometric analysis from 2003 to 2022 using the Web of Science, Heliyon 10:20 (October 2024)
• Biju, Franklin & Jasimudeen, An analysis of availability and implications of unlabeled retracted articles on Sci-Hub, Accountability in Research (January 2025)
• NSA (US) and NCSC (UK) release update on Russian state actors engaged in spear-phishing (December 2023)
Privacy and Scholarship
NISO Conference Presentation – September 2020
Participated in a terrific NISO webinar “Privacy in the Age of Surveillance” on 16 September 2020 (ppt presentations available on the NISO site), where I tried to identify the regulatory framework (such as it is) that governs protecting privacy in scientific communications, analytic products, and in the research itself.
I noted the inconsistencies between Europe’s GDPR (quite protective with requirements for active opting in, data management) and the US patchwork (HIPPA, COPPA, GLB Act). Such inconstancies are a challenge to international businesses with customers on both sides of the Atlantic, such as publishers. In describing best practices, I emphasized that sharing of identifying information outside of the user’s home institution should be limited if not eliminated, although this also requires that institutions take on more responsibility for monitoring their own users’ behavior. For analytic products such as in adaptive learning and personalized medicine, the protective measures to be taken need to take a significant step up, although the motivations on the part of users also encourage the development of such services.
Dylan Gilbert from NIST discussed the brand new privacy policy, NIST version 1.0, which emphasizes careful analysis of risk and benefit, and focused quite a bit on anonymizing user information when dealing with third parties. Emily Singley (Boston College) addressed the problems in IP address authentication and possible resolutions through federated access (which also has some efficiencies in navigating across multiple platforms). Interestingly, Emily noted that with the pandemic, BC users have moved away from EZProxy log-ons (and general online searches including Sci-Hub) to the recently implemented federated access programs, with significant support from the IT department.
Todd Digby (U of Florida) continued the discussion about coordinating the library issues with the technology concerns. UF has launched a “Fast Path” solution which addresses IT and online access implementation issues in one program. Micah Vandegrift (Open Knowledge) and Hannah Rainey (NCSU) gave a fascinating presentation on browser-based game designed to better inform users about privacy issues (Tally Saves the Internet/Digital Life Decoded) and the trade-offs involved. Finally, Qiana Johnson (Northwestern U) gave an excellent summary of the issues facing university libraries and vendors.
“Frankfurt Book Fair discussions on EU Digital Single Market copyright directive 2019”
Frankfurt Book Fair discussions on EU Digital Single Market copyright directive 2019
In addition to discussions on OA and Transformative Agreements, I was happy to join a CCC panel to discuss the next steps in the European digital copyright agenda, the DSM, which attracted controversy and communication cascades last winter and spring before final passage in the European Parliament in May 2019. Much of that controversy was around the question of greater technology platform responsibility for content and copyright compliance—which was often equated in that debate with censorship. The DSM sound up being about far more than Articles 11 and 13 (which wound up being Articles 15 and 17), the news publishers right and platform compliance, as one would expect of a significant review and rewrite of digital copyright rules that have in Europe been largely unchanged since the Information Society directive of 2001. The infographic below notes a whole host of issues, from copyright exceptions to contract law and collective licensing.
“Copyright and innovation in the life sciences (Publishers, licensing & innovation)”
Presentation at September 2020 panel organized by US Patent & Trademark Office and US Department of Justice (antitrust)
Legal issues in “Controlled Digital Lending
Presentation – May 2019
“Challenges and Opportunities for Europe’s Digital Single Market”
Panelist , Suffolk University Law School, October 2018
“Professional ethics requirements for publishing on preprint servers”
February 2018
“Professional ethics requirements for publishing on preprint servers,” NISO virtual conference on the Preprint: Integrating the Form into the Scholarly Ecosystem, February 2018
“Copyright in the age of analytics”
STM Association Innovation Seminar, London December 2017
“Rethinking the Publishing Agreement”
Panelist, Copyright Society of the USA, Boston, June 2016
Grants, Rights and Responsibilities in the 21st Century – Even through the technology boom and the digital revolution, the publishing agreement and grant of rights by authors has remained relatively the same.
“Copyright, related rights and news in the EU: Assessing potential new laws”
University of Amsterdam, April 2016
The conference was part of a two-year, AHRC funded project at CIPIL, Cambridge University, entitled Appraising Potential Legal Responses to Threats to the Production of News in a Digital Environment, which the IViR kindly hosted and facilitated.
Library Archiving & Preservation Issues
Panelist, Section 108 Reforms symposia, Columbia University 2013
With Jonathan Band, Mary Minnow, Eric Schwartz. Copyright Exceptions for Libraries in Digital Age: Section 108 Reforms symposia. “Should the §108 exceptions be limited to libraries and archives or extended to other institutions? How should eligible “libraries” and “archives” be defined?”
“Public Access to Federally Funded Research: Copyright and Other Issues”
Harvard, June 2012
Debate with Peter Suber (Open Access advocate at Harvard) over US NIH OA policy for journal publications.







