Evolving Collective Rights Management for Author, Publisher and User Needs

I’ve updated my somewhat nerdy “history of collective licensing” essay that was published by CCC with contributions from fellow essayist Bruce Rich and Lois Wasoff (see “Creating Solutions Together” ), and was happy to see the update published by Law.com on June 3 in its Legaltechnews newsletter.

The update addressed technological change and adaptation by copyright law and collective management organizations, and discussed new specific user needs such as in creating data sets for artificial intelligence and text-data mining exercises. Collective blanket licenses are particularly appropriate for such high-volume automated requirements. CCC in particular has worked to make TDM licensing efficient and effective.

To paraphrase the English copyright advocate Charles Clark, ‘the answer to technological challenge is technological adaptation’.

The Future of Collective Licensing – Copyright in the Digital Marketplace

By permitting researchers, academics, publishers and others to make use of copyrighted materials and enabling rightsholders to receive royalties for those uses, collective licensing creates efficient markets that make copyright work.

EU Open Science Platform

EU Open Science Platform

The EC published their tender specifications for the Open Research Publishing Platform at the end of March , and as I suggested in an earlier blog on 20 March, it is completely open to “all natural and legal persons” at least within the EU (due to Brexit, UK organizations appear to be excluded). I think the Commission is showing a commendable lack of prejudice, and I think good common sense as well, in being open to participants with publishing expertise (whether university or library-organized, funder-led, NFP society, or commercial entity (publisher or other vendor). The Commission’s tender document is ambitious and demanding (more on this later), so it will require a competent organization or consortia of entities to fulfill. Some of the ambition is about technical performance (the 99.9% up-time requirement), some of it is about networking capabilities, but some is also about combining publishing requirements (open peer review) with the technical issues. There is a further area of ambition of requiring a preprint server capability, with linking and automatic repository posting features, while providing no funding.

It had been suggested on Twitter and in some media (see my prior post and Twitter  comments back and forth) that commercial publishers such as my former employer Elsevier should be automatically disqualified because they do not support Open Access enough (odd because two of the three largest OA publishers are commercial publishers)—and because existing publishers and trade associations have the temerity to advocate for sound OA policies (i.e. publishing Green OA with embargo periods given that Green means, in contrast to Gold OA, that no funds have been provided for the formal publishing activities).  Helpfully the Commission was quite universal in its approach, while quite prescriptive in requirements.

Richard Poynder retweeted Martin Eve’s analysis of the tender document (see the analysis here and below—which was quite a good analysis of the ambition of the project).  I assume Poynder is suggesting that Elsevier would regard it as too much work for too little reward—which I do think many organizations would agree with!  Bianca Kramer did an excellent job in doing a 17 point Twitter analysis on 2 April, which I describe below.

Three key themes of the tender document

Running throughout the tender document are these three themes: ambition/demand (particularly on the technical side); control/authority (on the part of the EC) re publication processes (open peer review process; preprints and repositories; standards such as CC BY); and the design of a “scientific utility” which can be later taken over directly by the EC or transferred to a new party (building a platform that is highly portable).  While there is nothing wrong with ambition, and government or other funders should always ensure they are getting value for money, I agree with some of the early critics that it is hard to see how existing scholarly communications participants including established publishers will be eager to bid, other than for the joy of the sheer challenge!

The EC might want to consider whether it might need to make more trade-offs to get the platform that it wants, with all of the technical and portability requirements, by being less prescriptive over the publishing process, for example by being flexible on staffing vs automation, or by not insisting on open peer review, which is uncertain in effect and might well impact the timeliness of formal publication.  It might be that incorporating the possibility of open reviews and post-publication comments, without requiring that peer reviewers openly post their comments and identities, would be more practical.  Even among strong supporters of open review, there’s some disagreement over the exact meaning of open peer review (see the 2017 review by Tony Ross-Hellauer ).

Technical ambition/demand

As others have noted, the technical demands of the system are considerable.  First, building a reliable publishing services platform, with author submissions, peer review, external linking especially to non-publication resources (publication resources would no doubt link through CrossRef), are non-trivial.  There are many vendors in the scholarly communications space now who have worked hard to provide scaleable and reliable services, generally on a proprietary and highly customized basis.  Online submission and review processes challenge most publishers, and the larger the scope of activity the larger the challenge.  The contents must be made available in multiple formats, with significant download activity expected (especially for text and data mining purposes).  Responsiveness at the level of 99.999% might be difficult to obtain if the content is being constantly accessed and mined.  Registration through the use of ORCID and other EU systems are required (though common sign-in protocols will no doubt become more pervasive in any event).  In addition to the identifiers, DOIs must be assigned for all article versions, and logs must be made available of all interactions.  Somehow the system must be able to populate institutional and other repositories on an “automatic transfer” basis (at the request of the author).  Preprints must be annotated with appropriate, CrossRef style links.  Quite a few standards have to be met, including Dublin Core for metadata, LOCKKS for archiving, graphics requirements, although established publishers are already navigating these.

Quite a lot of reporting, not only to the EC but also at the author and funding agency level is required, with citation information.  Much of this is being done now—and in fact F1000 (based in the UK, so probably disqualified) does much of this kind of reporting for users now (seen in the screen shot above).  Finally and fundamentally, the software to be used shall be commercial off-the-shelf or open source, and specifically any “proprietary/exclusive technologies that are not available to other solution providers are not acceptable.”

So plenty of challenges.

Publishing process controls

The tender gives a nice diagram of the publishing process in context of platform requirements as shown below…

The general work-flow diagrammed here is very recognizable and common, although it is important to note that there is both a preprint server aspect (unclear what the relationship is between Horizon 2020 funding and the preprint requirement) and a general publication process.  The diagram also over-simplifies the “first level check” requirements (which are not explored in the tender document in any detail), though perhaps this is like eLife or PLoS initial screening.  One might assume that a plagiarism check through CrossRef is contemplated, but again this is not clear (the tender document itself refers to “editors” performing these checks, so it sounds more manual than automated).  The ALLEA code of conduct is referenced , but this is a general set of principles rather than a process-oriented document.

Some of the criteria sections point to proven experience in developing and managing scientific publishing services, and note the requirement to establish a strong editing and project management team, in addition to the technology staff.  Importantly there are requirements for establishing a scientific advisory board (a fundamental step in establishing any new journal), also important in helping to recruit qualified peer reviewers.  Interestingly the tender document says that the contractor “will be required to gather broad institutional support and the involvement of the research community in many fields across Europe and beyond… [helping to establish the] Platform as a successful and innovative publishing paradigm for research funded by Horizon 2020” without any indication of how the Research directorate or the Commission itself might help in this mission.  Perhaps this is why the document is so heavy in requirements for communications initiatives and staff.

There are very specific requirements of editing, proofreading, layout and production, familiar to established publishers, in addition to communication and networking.  It is interesting to review the staffing requirements—one might wonder whether with the use of more online resources some of this work could be done more efficiently.

Finally, notwithstanding the notion of respecting authors and their copyright (or that of their institutions or funders), there appears to be a straight-forward requirement for CC BY Creative Commons licenses, which of course many OA advocates equate with OA publishing, so the broadest possible re-use rights.  Journal authors, however, when asked whether they might have concerns re CC BY and commercial use, or derivative use, do not seem as wholeheartedly enthusiastic (see the Taylor & Francis surveys).

Building the scholarly communications utility (portability)

The framework contract itself has a duration of 4 years, after which the EC expects the system to be operating well, according to technical functionality, and with a minimum of 5,600 OA articles posted using the strict CC BY licensing approach, and some number of preprints.  Perhaps more importantly, the Commission appears to contemplate transferring the operation of the platform to either itself or some other party or parties at some point.  The successful bidder will thus be responsible for ensuring that they can be eased out of the picture, and with an appropriate depth of knowledge transfer.  Though this might be helpful in ensuring transparency, it likely will be a de-motivating factor in the bidding process.

The price schedule (Annex 8)

While only a form, the EC has made clear that while there may be some “building” costs that would be contemplated in the early phase of the process, the Platform is supposed to operate financially on the basis of a price per peer-reviewed article (assuming there will be 5,600 of those).  I do remember at some point NIH in the US indicated they were building and operating the PubMedCentral database for around $4.4m a year (see the 2013 Scholarly Kitchen post ).  PMC is hosting many 100’s of thousands of manuscripts, so presumably the EC will be looking for a cost significantly below that.  It is important to remember however that in addition to the technical requirements, staffing requirements (editorial and technical), there will also be costs involved on the preprint side.  Of interest is the comment that the bidder “will not charge the Commission for the process leading to the posting of pre-prints or for articles that have been rejected during the initial checks.”

Other summaries/analysis

 As noted, I thought the analysis by Bianca Kramer on 2 April was very good—hard to do on Twitter to capture 17 salient points— noting that certain Open Science protocols and requirements were not incorporated.  The post was also critical that O/S software was not required in all functionalities (though the requirement is either publicly available “off-the-shelf” technology or O/S, so in any event nothing proprietary/private), finding perhaps that the tender was not ambitious enough!

A response to the February 2018 report of Dr. Eleanora Rosati

A response to the February 2018 report of Dr. Eleanora Rosati

A response to the February 2018 report of Dr. Eleanora Rosati (U Southampton) for the Policy Department for Citizens’ Rights and Constitutional Affairs

The recent report by Dr. Rosati echoes concerns she has raised before in her IP Kat blog  about whether limiting the TDM exception in the proposed DSM directive is “ambitious” enough, noting in her report that innovation could come from TDM projects undertaken by business concerns.  This seems to be a topic where I must disagree with Dr. Rosati, who I think usually demonstrates thoughtful balance on matters of IP and is a reliable reporter about new cases.  In this paper Dr. Rosati suggests that because there might be useful insights obtained by copying copyright or database content and applying TDM technologies to it, for commercial purposes by commercial actors, then this should be the basis for a copyright exception (and presumably a database directive exception).  This is not really however a standard that should be applied to exceptions, which of course are governed by the Berne 3-step test as traditionally applied in EU directives, with respect for rights holders and commercial licensing alternatives.  Those traditions also focus on non-commercial research or educational purposes, as does the UK exception from 2014 and the Google Books case from a US “fair use” perspective, both of which are quoted in the report.

Dr. Rosati’s report suggests an expansion of current proposals to business actors (extending this from the proposed public-private partnership concept) and apparently to all kinds of copyright material, from academic literature to news, perhaps to film as well.  Rosati discounts the licensing and permissions options and alternatives that currently exist, including through the CCC’s “RightFind” program (fair disclosure—I am a member of the CCC board) and direct permissions, options and policies of scholarly journal publishers, apparently asserting that there is still too much “legal uncertainty”.  Interestingly some examples given of IBM Watson projects, touted as quite significant from a research or business results perspective, all most likely used the existing permission or licensing mechanisms.  If the content that will be subject to the TDM exception for academic research, the scholarly journals literature, is largely available through policy, permissions or licenses, then Dr. Rosati has not made a convincing case for a non-permission-less environment.  Rosati must add orphan works and out-of-commerce works, and mentions news, general media sources, photographs, in order to bolster a narrative of legal uncertainty (and the orphan works evidence from UK cultural institutions, as Rosati acknowledges, were documented prior to the adoption of the Orphan Works directive).

Fundamental to any law-making is the question of what problem or issue the law is intended to address, and then to attempt to formulate and implement the law so as to avoid unintended consequences—to emphasize the right tool for the job.  As the STM association has said (again full disclosure I chaired the STM Copyright Committee during this time) in 2016 that “STM publishers support, invest in and enable text and data mining” noting specifically the CrossRef initiative, which was not mentioned in the Rosati report (see report).  The Commission itself in its 2015 working plan for the DSM noted the link between exceptions and licensing alternatives—indicating that in the EU as in most countries the question often is whether there is a market “gap” that rights markets are not currently addressing, in discussing new methods of copying content , even while noting possible research gains through greater legal certainty for TDM rights.

From a purely legal perspective, I believe that Dr. Rosati omitted several important recent European decisions which suggest that indexing and linking activities do implicate the communication rights from the InfoSoc directive (see some recent decisions in cases brought by the Dutch organization Stichting Brein , and I believe that Dr. Rosati’s discussion of the Google Books fair use decision is somewhat simplistic.  With respect to the latter, Dr. Rosati is correct in noting the 2nd Circuit’s discussion of the utility of searching across the entire Books corpus for linguistic matching—and that this did factor into the court’s finding of fair use.  However fair use analysis in the US is always a matter of consideration of a number of factors, of which the purpose of the use (particularly as to whether an activity is more akin to a non-commercial research activity or a more commercial activity) and impact on the market are two very critical factors.  If the facts of the case are varied just a little (for example if Google was not displaying just snippets but whole works, or was generating more commercial revenue through advertising), then a very different result might have been reached.  In my view, Google Books does not stand for a wide fair use finding from scanning books into a database to be used for “non-consumptive” purposes (to use a phrase that the Google lawyers coined).   In fact a new decision this week from the same 2nd Circuit, Fox v. TVEyes, also involves the creation of a database of content that is indexed for the convenience of users, but adds a viewing opportunity as well, which the court found went too far for a fair use finding.  The court commented that even in the Google Books decision, the court “cautioned that the case test[ed] the boundaries of fair use” (TVEyes decision).

Text and data mining might well lead to important ativities and research results—and for this reason most STM journal publishers are on record and are strongly supporting academic research projects (some go further, as Dr. Rosati mentions).  In fact by working with organizations such as CCC and CrosRef, they are actively enabling the normalization activities that Rosati mentions as still being critical to the technical processes (see also the STM Declaration covering twenty-one leading publishing houses.  Other copyright sectors would be rightfully concerned about their works being caught up in an exception intended for scholarly research.  Commercial beneficiaries are currently obtaining licenses and permissions, and doing so on a commercial and pragmatic basis, as demonstrated by Dr. Rosati’s own list of IBM Watson projects.  It is not at all clear to me why an exception should be applied to an active and growing copyright market for the benefit of large technology companies.

The bottom line:  there is not a strong legal basis, and an even weaker policy basis for expanding the proposed exception to all types and forms of copyright and database content, or for expanding the number of beneficiaries.  Doing so would violate some of the law-making fundamentals noted above, as well as the EU’s Berne obligations.

Mark Seeley