Publishing: Business & Law

Publishing: Business & Law

I was at BookExpo in late May 2018 for a deep dive on trade publishing (general fiction and non-fiction) trends. I was happy to see the banners for new books – the huge poster above the registration section for Susan Orlean’s new “The Library Book” and for the relentless focus on authors throughout the fair—lines around booths for author signatures, and a lot of imprints that I wasn’t aware of even as a fairly regular consumer of such books.  I also got some updates directly from panels with 3 trade house CEOs and from the leaders of 3 copyright organizations, and I’m happy to share my views and comments here.

The overall news seemed moderately positive for the books industry, and probably publishing more generally.  Bookstore sales are flat to up—unit sales are up modestly—audio book recordings are growing very significantly (although from a small base).  Print seems to be holding its own and even growing a bit more than e-books—I’m not completely sure that’s good news but I suspect it does help on the margin side given Amazon’s presence in the e-book market. Overall, books are resilient, and I think this speaks to an important part of our culture—books and their authors matter. It was good to hear from John Sargent (Macmillan), Carolyn Reidy (Simon & Schuster) and Markus Dohle (Penguin Random House) on these topics. Censorship and the role of publishing was also a topic of conversation, with John talking about Michael Wolff’s “Fire and Fury” book and Trump’s litigation threat (famously Macmillan’s reaction was to get the book to press more quickly than planned).

On the copyright law side, important updates on the small claims court bill and continued work on the copyright office modernization bill (including the question of whether the Copyright Office should properly sit in the executive or remain part of the legislative branch). The panel, including Maria Pallante (Association of American Publishers), Keith Kupferschmid (Copyright Alliance), and Mary Rasenberger (Authors Guild) spoke about the greater openness of the current administration to discussions about IP and copyright in terms of trade, work on anti-piracy, and the like. The panelists noted the significant amount of influence that Google had in the prior administration, often with an anti-IP bent. However these are still early days in the administration, which all agree is difficult to forecast and assess, and that the number of true copyright experts and advocates in Congress are lessening, with the departures of legislators such as Goodlatte and Hatch. Pallante spoke about the importance of balance in the legal system, Keith about the vitality of the copyright community(ies?) as represented by his organization the Alliance, and Mary spoke on the impact of piracy on book author royalties. The Guild of course is deeply involved in a program to review publisher and author contracts, as the Guild takes the view that while publishers seem to be holding the line on revenues, authors have seen their incomes decline significantly over the past 5 years (see https://www.authorsguild.org/where-we-stand/fair-contracts/), which Mary also touched on. I didn’t have the chance to ask Mary what the Guild’s view of the recently announced Authors Alliance ( a group of “academic” authors and copyright law professors) initiative on publishing contracts (see https://www.authorsalliance.org/2018/05/29/meet-the-contracts-guide-team/) which appears to be covering some of the same ground.

All in all, it felt comfortable to spend a day or so in my “publishing silo”, even with the concerns and controversies—the presence of Amazon looming over all, for example. But as a reminder that many things happen outside your comfort zone, I’ve been reading and monitoring other recent developments and controversies over the past several weeks (when I’ve been a bit distracted with home and garden projects), including:

  • Unfortunately WIPO these days seems more focused on copyright exceptions than on pro-copyright policy, and no doubt this will continue with tech industry-funded distortions playing on the “north-south” divide, perhaps playing politics around the draft broadcasting treaties which have been poised for finalization for several years now.
  • The Music Modernization Act, with its “Classics” portion that would harmonize protection for pre-1972 sound recordings in the US, passed the House unanimously but now has a rival bill (“Access”) supported by some copyright law professors and anti-IP organizations (in many cases a similar list to the Authors Alliance team), which as the former RIAA EVP Neil Turkewitz notes plays into the hands of the tech industries.
  • Canada is finally doing a review of the copyright revisions put into place several years ago which decimated secondary rights for authors and publishers, particularly in the educational context—John Degen from the The Writers Union of Canada (and current chair of the International Authors Forum) has been doing his usual stellar job of noting where the emperor of usage has no clothes (“freelance” or professional authors with decreasing remuneration), and raising concerns about the spread of the anti-remuneration narrative to other countries such as Australia…
  • Battles over copyright issues in Europe continue (Digital Single Market proposals), with questions raised over the news publishers’ right (which naturally Google opposes), non-authorized commercial text & data mining rights (which companies like Google and the tech industry organizations desperately want), and hopefully a bit of help on enforcement issues.

All in all, the state of publishing, authors and copyright law remain in a precarious place. Revenues are far from robust, even if they are not imploding. Authors are under enormous pressure (sometimes I must admit from tough negotiations with publishers). Piracy is rampant, both print and digital. Amazon seems to be perfectly content to discount books to a razor thin margin, presumably as a “loss leader” in its goal of selling more groceries and shoes, putting huge pressure on other distributors and retailers. Google and some of its allies seem to invent organizations and hire lobbyists, advocates and other professionals at a breathtaking speed and scale (although some copyright professors seem to resent being identified as Google-funded, as reactions to the a slew of articles on this topic from last year (e.g. https://www.wsj.com/articles/paying-professors-inside-googles-academic-influence-campaign-1499785286). Every positive or at least reasonably balanced draft law, treaty or initiative seems to be met with a remarkably quickly-organized response or counter-proposal, often with a similar anti-copyright narrative that starts with notions of huge populations of users who are deprived of access and the tech industry players that can “transform” content into search nirvana. Somehow creators are expected to continue producing, but now for the benefit of technology.

But as BookExpo shows, the book and the author continue to be celebrated.  I need to get back to the books I’m currently reading, which I’ve split between Mark Bowden’s “Hue 1968” which tells a story about Vietnam that I knew as a teenager, but not well… and Ronan Farrow’s “War on Peace” which I fear will be prescient about other US-led foreign ventures without adequate support from diplomats and non-lethal advisers.

Mark Seeley
June 2018

EU Open Science Platform

EU Open Science Platform

The EC published their tender specifications for the Open Research Publishing Platform at the end of March , and as I suggested in an earlier blog on 20 March, it is completely open to “all natural and legal persons” at least within the EU (due to Brexit, UK organizations appear to be excluded). I think the Commission is showing a commendable lack of prejudice, and I think good common sense as well, in being open to participants with publishing expertise (whether university or library-organized, funder-led, NFP society, or commercial entity (publisher or other vendor). The Commission’s tender document is ambitious and demanding (more on this later), so it will require a competent organization or consortia of entities to fulfill. Some of the ambition is about technical performance (the 99.9% up-time requirement), some of it is about networking capabilities, but some is also about combining publishing requirements (open peer review) with the technical issues. There is a further area of ambition of requiring a preprint server capability, with linking and automatic repository posting features, while providing no funding.

It had been suggested on Twitter and in some media (see my prior post and Twitter  comments back and forth) that commercial publishers such as my former employer Elsevier should be automatically disqualified because they do not support Open Access enough (odd because two of the three largest OA publishers are commercial publishers)—and because existing publishers and trade associations have the temerity to advocate for sound OA policies (i.e. publishing Green OA with embargo periods given that Green means, in contrast to Gold OA, that no funds have been provided for the formal publishing activities).  Helpfully the Commission was quite universal in its approach, while quite prescriptive in requirements.

Richard Poynder retweeted Martin Eve’s analysis of the tender document (see the analysis here and below—which was quite a good analysis of the ambition of the project).  I assume Poynder is suggesting that Elsevier would regard it as too much work for too little reward—which I do think many organizations would agree with!  Bianca Kramer did an excellent job in doing a 17 point Twitter analysis on 2 April, which I describe below.

Three key themes of the tender document

Running throughout the tender document are these three themes: ambition/demand (particularly on the technical side); control/authority (on the part of the EC) re publication processes (open peer review process; preprints and repositories; standards such as CC BY); and the design of a “scientific utility” which can be later taken over directly by the EC or transferred to a new party (building a platform that is highly portable).  While there is nothing wrong with ambition, and government or other funders should always ensure they are getting value for money, I agree with some of the early critics that it is hard to see how existing scholarly communications participants including established publishers will be eager to bid, other than for the joy of the sheer challenge!

The EC might want to consider whether it might need to make more trade-offs to get the platform that it wants, with all of the technical and portability requirements, by being less prescriptive over the publishing process, for example by being flexible on staffing vs automation, or by not insisting on open peer review, which is uncertain in effect and might well impact the timeliness of formal publication.  It might be that incorporating the possibility of open reviews and post-publication comments, without requiring that peer reviewers openly post their comments and identities, would be more practical.  Even among strong supporters of open review, there’s some disagreement over the exact meaning of open peer review (see the 2017 review by Tony Ross-Hellauer ).

Technical ambition/demand

As others have noted, the technical demands of the system are considerable.  First, building a reliable publishing services platform, with author submissions, peer review, external linking especially to non-publication resources (publication resources would no doubt link through CrossRef), are non-trivial.  There are many vendors in the scholarly communications space now who have worked hard to provide scaleable and reliable services, generally on a proprietary and highly customized basis.  Online submission and review processes challenge most publishers, and the larger the scope of activity the larger the challenge.  The contents must be made available in multiple formats, with significant download activity expected (especially for text and data mining purposes).  Responsiveness at the level of 99.999% might be difficult to obtain if the content is being constantly accessed and mined.  Registration through the use of ORCID and other EU systems are required (though common sign-in protocols will no doubt become more pervasive in any event).  In addition to the identifiers, DOIs must be assigned for all article versions, and logs must be made available of all interactions.  Somehow the system must be able to populate institutional and other repositories on an “automatic transfer” basis (at the request of the author).  Preprints must be annotated with appropriate, CrossRef style links.  Quite a few standards have to be met, including Dublin Core for metadata, LOCKKS for archiving, graphics requirements, although established publishers are already navigating these.

Quite a lot of reporting, not only to the EC but also at the author and funding agency level is required, with citation information.  Much of this is being done now—and in fact F1000 (based in the UK, so probably disqualified) does much of this kind of reporting for users now (seen in the screen shot above).  Finally and fundamentally, the software to be used shall be commercial off-the-shelf or open source, and specifically any “proprietary/exclusive technologies that are not available to other solution providers are not acceptable.”

So plenty of challenges.

Publishing process controls

The tender gives a nice diagram of the publishing process in context of platform requirements as shown below…

The general work-flow diagrammed here is very recognizable and common, although it is important to note that there is both a preprint server aspect (unclear what the relationship is between Horizon 2020 funding and the preprint requirement) and a general publication process.  The diagram also over-simplifies the “first level check” requirements (which are not explored in the tender document in any detail), though perhaps this is like eLife or PLoS initial screening.  One might assume that a plagiarism check through CrossRef is contemplated, but again this is not clear (the tender document itself refers to “editors” performing these checks, so it sounds more manual than automated).  The ALLEA code of conduct is referenced , but this is a general set of principles rather than a process-oriented document.

Some of the criteria sections point to proven experience in developing and managing scientific publishing services, and note the requirement to establish a strong editing and project management team, in addition to the technology staff.  Importantly there are requirements for establishing a scientific advisory board (a fundamental step in establishing any new journal), also important in helping to recruit qualified peer reviewers.  Interestingly the tender document says that the contractor “will be required to gather broad institutional support and the involvement of the research community in many fields across Europe and beyond… [helping to establish the] Platform as a successful and innovative publishing paradigm for research funded by Horizon 2020” without any indication of how the Research directorate or the Commission itself might help in this mission.  Perhaps this is why the document is so heavy in requirements for communications initiatives and staff.

There are very specific requirements of editing, proofreading, layout and production, familiar to established publishers, in addition to communication and networking.  It is interesting to review the staffing requirements—one might wonder whether with the use of more online resources some of this work could be done more efficiently.

Finally, notwithstanding the notion of respecting authors and their copyright (or that of their institutions or funders), there appears to be a straight-forward requirement for CC BY Creative Commons licenses, which of course many OA advocates equate with OA publishing, so the broadest possible re-use rights.  Journal authors, however, when asked whether they might have concerns re CC BY and commercial use, or derivative use, do not seem as wholeheartedly enthusiastic (see the Taylor & Francis surveys).

Building the scholarly communications utility (portability)

The framework contract itself has a duration of 4 years, after which the EC expects the system to be operating well, according to technical functionality, and with a minimum of 5,600 OA articles posted using the strict CC BY licensing approach, and some number of preprints.  Perhaps more importantly, the Commission appears to contemplate transferring the operation of the platform to either itself or some other party or parties at some point.  The successful bidder will thus be responsible for ensuring that they can be eased out of the picture, and with an appropriate depth of knowledge transfer.  Though this might be helpful in ensuring transparency, it likely will be a de-motivating factor in the bidding process.

The price schedule (Annex 8)

While only a form, the EC has made clear that while there may be some “building” costs that would be contemplated in the early phase of the process, the Platform is supposed to operate financially on the basis of a price per peer-reviewed article (assuming there will be 5,600 of those).  I do remember at some point NIH in the US indicated they were building and operating the PubMedCentral database for around $4.4m a year (see the 2013 Scholarly Kitchen post ).  PMC is hosting many 100’s of thousands of manuscripts, so presumably the EC will be looking for a cost significantly below that.  It is important to remember however that in addition to the technical requirements, staffing requirements (editorial and technical), there will also be costs involved on the preprint side.  Of interest is the comment that the bidder “will not charge the Commission for the process leading to the posting of pre-prints or for articles that have been rejected during the initial checks.”

Other summaries/analysis

 As noted, I thought the analysis by Bianca Kramer on 2 April was very good—hard to do on Twitter to capture 17 salient points— noting that certain Open Science protocols and requirements were not incorporated.  The post was also critical that O/S software was not required in all functionalities (though the requirement is either publicly available “off-the-shelf” technology or O/S, so in any event nothing proprietary/private), finding perhaps that the tender was not ambitious enough!

Oracle v Google

Oracle v Google

The federal circuit ruling (27 March) on copyright grounds in this long-standing dispute over Oracle’s JAVA platform and Google’s use of Java APIs in its Android phone operating system has been criticized by several copyright-skeptic scholars as a step backwards in fair use analysis (not to mention the underlying foundational question of Java API copyrightability in the first place).   Some have even suggested that the federal circuit, created to be an exclusive appeals court for patent cases, is out of its depth in copyright.  For more pro-copyright scholars and advocates, however, the CAFC fair use analysis correctly emphasizes the fourth factor, impact on the market, not inconsistent with the recent 2nd Circuit TVEyes decision.  Probably not surprisingly, I tend to the latter camp, while acknowledging that the entire case (more than 8 years long) has been convoluted and confusing, including some core questions on what issues are matters of law and which are matters of fact.

The March 2018 decision is not about software copyrightability—that decision was already made in the CAFC’s 2014 decision, after which the district court was instructed to evaluate the fair use defense (which Google won).  The CAFC however was not satisfied with the district court and jury verdicts and process, and determined that Google’s use was not fair as a matter of law.  There might be a question here about facts vs law, and the role of the jury in such cases, but in my view the CAFC’s fair use analysis is not in and of itself remarkable or clearly flawed.

The decision and the case are controversial because of the foundational question of software copyrightability, and the role of copyright in an API infrastructure.  After all, APIs usually involve entities creating code and then offering unaffiliated developers those methods to access that code to develop further applications.  Some have suggested that the entire world of APIs will now be subject to a chilling effect.  But has Oracle/Java been unclear about what kinds of developments and applications they support with free API access and which they want to consider with commercial considerations?  As noted below, I think Oracle has been explicit about what uses require licenses and which do not.

Professor Samuelson has already criticized the court’s 2014 prior decision (as noted this case has meandered across many sub-decisions) for its reliance on the 3rd circuit’s 1996 Whelan “structure, sequence and organization” or SSO type of analysis for software products.  In her 2015 article “Functionality and Expression in Computer Programs: Refining the Tests for Software Copyright Infringement” , Samuelson described the SSO test as one of four approaches among the circuits—with the SSO (3rd circuit) approach being essentially a merger analysis—where the underlying expression is protectable if there is more than one way of performing the function—while noting that the SSO approach is “now mostly discredited,” and suggesting that the court’s outright merger analysis was also overly simplistic. Samuelson argues that the other circuits take a more sophisticated and more appropriate “filtering” approach (Altai, 2nd circuit, et al.) by analyzing whether some copying of the original software was required for “interoperability,” and that courts in future decisions would be better served by using these other filtration tests.

Although I have not re-read all of these cases recently, it is easy to understand why the Whelan test would be easier to meet, and therefore offer more copyright protection to more software modules.  Does this rise to the level of judicial error?  These methods for testing software copyrightability are all about functionality—and perhaps even idea/expression dichotomy.  In my view, there is error only if the Whelan test would provide protection for something that is intrinsically utilitarian, and the notion that there are at least multiple methods of skinning the Java API cat, and the fact that the copyrightable elements found by the CAFC in Java are not about appearance, format or underlying algorithms, suggests that there is no clear fundamental error.  There may be better tests, and there may be a statutory question about whether more or less copyright protection should be afforded software, but those questions do not rise to the level of judicial error.  More protection might mean more permission-seeking, licensing arrangements (commercial or O/S), and less assumptions made about copying APIs—in my view this is not a bad policy result.

With respect to Oracle’s Java APIs, the 28 March decision noted that Oracle provides the programming language itself free and available for use without permission, but had “devised a licensing scheme to attract programmers while simultaneously commercializing the platform,” part of its “write once, run anywhere” approach.  Consequently, if entities wanted to use the Java APIs to create a competing platform or in new devices, which would clearly include Google’s Android platform, Oracle would want to license such activities on a commercial basis.  The parties were unable to reach agreement on licensing terms, and Google decided to take the risk that Oracle would not enforce or that a court would find Google’s use non-infringing.  It is in this sense that allegations of bad faith were made, although I do not think the bad faith narrative impacted the final decision.

To me, then—the fundamental question is: shouldn’t a copyright holder be able to make exactly those kinds of commercial determinations — a strategy to be partly “open” (wanting to attract programmers) while being more commercial when it comes to completing platforms and products?  Why shouldn’t Google/Android have to share more revenue from a very successful platform built at least in part on Java?

Once the foundational copyrightability question is accepted (however reluctantly by some), the question then is the step-by-step 4-factor fair use analysis, where the CAFC in the recent decision found that Google had little in its fair use arguments, and had even perhaps conceded some points.  In my view the analysis is straight-forward, and the court does not seem to suffer markedly from any patent-law myopia

Turning to the first factor, the purpose of the use (where the question of “transformative” use is often discussed), the court found that Google’s use was not transformative as the copy’s purpose was the same as the original’s, and the smartphone environment was not a new context.  The court also found that Google had a clearly commercial purpose, notwithstanding that Google does not charge for the Android license—reaching back to Napster and finding that even free copies can constitute commercial use, and that Google derives substantial revenue from related advertising.

The second factor analysis relied much on assumptions about a jury’s possible views on the creative aspect of writing software, but then discounted the overall impact of this factor.  Perhaps the court felt it was not necessary in any event given its strong views on the first and fourth factors.

On the amount of the work used for the third factor, there is no dispute that Google copied the entirety of the 37 APIs in question—but Google argued that since it actually only used portions of the works, this should be viewed in its favor, citing Kelly v Ariba Soft.  The CAFC however asserted that Kelly should be viewed in more limited fashion, as being applicable only when a transformative use was first found, and noted that Google’s copying of more than was required also weighed against a fair use finding (although the decision later describes this factor as somewhat neutral in its overall weighing of the factors).

Finally in the effect on the market, the court was not impressed with Google’s argument that Oracle was not making smartphones or developing a smartphone platform (so no market harm), noting that potential markets for derivative works are still highly relevant (this might seem to be critical of the Google Books decision, where nascent markets were not given much credence—although the smartphone market might have been further along than an archival e-book market).  The Oracle record also showed the impact of Android products in Oracle’s negotiations with Amazon, and the CAFC found the Oracle-Google negotiations also highly relevant.  The court also seemed to find Oracle’s strategy of being “partly open” and “partly commercial” when it comes to competing platforms, as described above, a clear and reasonable market approach.

The matter now goes back to the district court for assessment of damages, and the street press has those damages at billions of dollars.

Open Access platform in the EU

Open Access platform in the EU

I’ve seen a few retweets about a 17 March article in University World News called “Open science in the EU—Will the astroturfers take over?”, and my first thought was that it must be a mistake because pro-copyright organizations in the EU have recently been using the term “astroturf” to talk about organizations that are funded by US technology companies to oppose copyright.  However, the UWN article, which appears to operate without any sense of irony at all, decries publisher trade associations’ pro-copyright advocacy (with a quote from Jon Tennant on “corrupt” policy processes) while commending an “open letter” from a number of organizations in the library and research agency space, which also include several of those anti-copyright organizations that deem themselves to be protectors of the “public domain”— such as Communia and C4C—which explicitly note that their primary mission is public policy advocacy.

Advocacy in the EU is somewhat different from advocacy in the US—in the EU, advocacy by individual companies (publishers or technologists) is frowned on, but comment and perspective is certainly welcomed from trade associations, industry groups, and organizations of all kinds.  I don’t see corruption in any of this. What I have observed in the DSM process so far is a robust exchange of differing proposals and ideas, from the Commission itself, from Parliamentary committees, from Member States, and from industry and advocacy organizations.  I have seen proposals that I think are idiotic (gratis commercial TDM rights, as mentioned in my last post), and others that I think are positive (more responsibility on the part of web sites and platforms for unauthorized postings).  I commend the Commission in its work of trying to strengthen a European online market, which in my view has suffered from a huge “value gap” (see the October 2017 statement from the European Authors’ Societies) between technology platforms that build commercial services using the work of others, and authors and publishers.  I was on a panel on this topic in 2016 at the University of Amsterdam.

It is also ironic that we are talking about some of these issues with recent events such as commercial exploitation of Facebook data (which as Anne Bergman of the FEP noted sounded an awful lot like commercial TDM…

The recently released report from the Campaign for Accountability about Google influencing the European debate through massive funding of research organizations is also highly relevant in this discussion about advocacy—this is advocacy and policy influencing at a mass scale, something that probably only well-funded technical organizations can do.

Putting irony aside, let’s look at the primary topic of the article, the request for proposals for an Open Access platform for data and publications, which will start up at the end of March 2018.  The article adds to the old lie about publishers intending to “undermine science” and engaging in a “blocking role”, with specific references to my former company Elsevier and the DEAL negotiations in Germany.  The facts are quite clear that commercial publishers are publishing a huge amount under the OA model now, and that the uptake in the model is increasing enormously— as shown in a recent PeerJ article (“The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles” by Piowar, Priem et al.) posted as a pre-print in February 2018.

The PeerJ preprint shows very significant OA growth by commercial publishers Elsevier and Springer Nature.  I’m not sure how the views of Green OA impact the overall article, but the graphic rather nicely demonstrates the fallacy of saying that commercial publishers, and Elsevier specifically, are against OA publishing (important to remember that the EU 2020 OA discussion document has a broad definition of OA which includes publishing in “hybrid” OA journals).

The University World News article goes on to discuss how odd it would be if an established journal publisher would submit a bid for the 2020 platform project, and how strange it would be if such a publisher were to be accepted for the project.  This is described by the LERU secretary general Kurt Deketelaere as being a “quite crazy” idea.  In fact I think it would make a huge amount of sense if an established publisher were selected to develop and operate the platform, which is described in the December 2017 Information Note as requiring high quality scientific publishing, technical implementation, professional publication process management, plus communication, take-up and sustainability.  Existing publishers, commercial and non-commercial, should be the clear front-runners in such a bidding process.  I’m reminded of Kent Anderson’s Scholarly Kitchen post on the value that journal publishers bring to scholarly communication (the February 2018 update of “102 things that Journal publishers do”)—existing publishers of all stripes are uniquely qualified to make such a platform work.

There is a separate policy question as to whether the 2020 platform project is a sensible use of EU taxpayer funds.  It appears to intend to compete with existing journals and existing journal platforms, as if the existing journals and platforms are not doing enough on the OA path—or need to be instructed about how to engage users and research communities (again see the SK “102 things” post).  It is not clear to me why another platform, essentially another mega-journal, will push EU OA towards its 2020 goal more effectively than simply using the methods outlined already by the Commission in funding research grants, as outlined by the STM association in its 2017 update on the framework issue.  But putting that aside, if the Research & Innovation directorate want professional competent publishing for the new platform, choosing an existing successful publisher would be smart policy.

 

 Mark Seeley

A response to the February 2018 report of Dr. Eleanora Rosati

A response to the February 2018 report of Dr. Eleanora Rosati

A response to the February 2018 report of Dr. Eleanora Rosati (U Southampton) for the Policy Department for Citizens’ Rights and Constitutional Affairs

The recent report by Dr. Rosati echoes concerns she has raised before in her IP Kat blog  about whether limiting the TDM exception in the proposed DSM directive is “ambitious” enough, noting in her report that innovation could come from TDM projects undertaken by business concerns.  This seems to be a topic where I must disagree with Dr. Rosati, who I think usually demonstrates thoughtful balance on matters of IP and is a reliable reporter about new cases.  In this paper Dr. Rosati suggests that because there might be useful insights obtained by copying copyright or database content and applying TDM technologies to it, for commercial purposes by commercial actors, then this should be the basis for a copyright exception (and presumably a database directive exception).  This is not really however a standard that should be applied to exceptions, which of course are governed by the Berne 3-step test as traditionally applied in EU directives, with respect for rights holders and commercial licensing alternatives.  Those traditions also focus on non-commercial research or educational purposes, as does the UK exception from 2014 and the Google Books case from a US “fair use” perspective, both of which are quoted in the report.

Dr. Rosati’s report suggests an expansion of current proposals to business actors (extending this from the proposed public-private partnership concept) and apparently to all kinds of copyright material, from academic literature to news, perhaps to film as well.  Rosati discounts the licensing and permissions options and alternatives that currently exist, including through the CCC’s “RightFind” program (fair disclosure—I am a member of the CCC board) and direct permissions, options and policies of scholarly journal publishers, apparently asserting that there is still too much “legal uncertainty”.  Interestingly some examples given of IBM Watson projects, touted as quite significant from a research or business results perspective, all most likely used the existing permission or licensing mechanisms.  If the content that will be subject to the TDM exception for academic research, the scholarly journals literature, is largely available through policy, permissions or licenses, then Dr. Rosati has not made a convincing case for a non-permission-less environment.  Rosati must add orphan works and out-of-commerce works, and mentions news, general media sources, photographs, in order to bolster a narrative of legal uncertainty (and the orphan works evidence from UK cultural institutions, as Rosati acknowledges, were documented prior to the adoption of the Orphan Works directive).

Fundamental to any law-making is the question of what problem or issue the law is intended to address, and then to attempt to formulate and implement the law so as to avoid unintended consequences—to emphasize the right tool for the job.  As the STM association has said (again full disclosure I chaired the STM Copyright Committee during this time) in 2016 that “STM publishers support, invest in and enable text and data mining” noting specifically the CrossRef initiative, which was not mentioned in the Rosati report (see report).  The Commission itself in its 2015 working plan for the DSM noted the link between exceptions and licensing alternatives—indicating that in the EU as in most countries the question often is whether there is a market “gap” that rights markets are not currently addressing, in discussing new methods of copying content , even while noting possible research gains through greater legal certainty for TDM rights.

From a purely legal perspective, I believe that Dr. Rosati omitted several important recent European decisions which suggest that indexing and linking activities do implicate the communication rights from the InfoSoc directive (see some recent decisions in cases brought by the Dutch organization Stichting Brein , and I believe that Dr. Rosati’s discussion of the Google Books fair use decision is somewhat simplistic.  With respect to the latter, Dr. Rosati is correct in noting the 2nd Circuit’s discussion of the utility of searching across the entire Books corpus for linguistic matching—and that this did factor into the court’s finding of fair use.  However fair use analysis in the US is always a matter of consideration of a number of factors, of which the purpose of the use (particularly as to whether an activity is more akin to a non-commercial research activity or a more commercial activity) and impact on the market are two very critical factors.  If the facts of the case are varied just a little (for example if Google was not displaying just snippets but whole works, or was generating more commercial revenue through advertising), then a very different result might have been reached.  In my view, Google Books does not stand for a wide fair use finding from scanning books into a database to be used for “non-consumptive” purposes (to use a phrase that the Google lawyers coined).   In fact a new decision this week from the same 2nd Circuit, Fox v. TVEyes, also involves the creation of a database of content that is indexed for the convenience of users, but adds a viewing opportunity as well, which the court found went too far for a fair use finding.  The court commented that even in the Google Books decision, the court “cautioned that the case test[ed] the boundaries of fair use” (TVEyes decision).

Text and data mining might well lead to important ativities and research results—and for this reason most STM journal publishers are on record and are strongly supporting academic research projects (some go further, as Dr. Rosati mentions).  In fact by working with organizations such as CCC and CrosRef, they are actively enabling the normalization activities that Rosati mentions as still being critical to the technical processes (see also the STM Declaration covering twenty-one leading publishing houses.  Other copyright sectors would be rightfully concerned about their works being caught up in an exception intended for scholarly research.  Commercial beneficiaries are currently obtaining licenses and permissions, and doing so on a commercial and pragmatic basis, as demonstrated by Dr. Rosati’s own list of IBM Watson projects.  It is not at all clear to me why an exception should be applied to an active and growing copyright market for the benefit of large technology companies.

The bottom line:  there is not a strong legal basis, and an even weaker policy basis for expanding the proposed exception to all types and forms of copyright and database content, or for expanding the number of beneficiaries.  Doing so would violate some of the law-making fundamentals noted above, as well as the EU’s Berne obligations.

Mark Seeley

Professional ethics requirements for publishing on preprint servers

Professional ethics requirements for publishing on preprint servers

Explore the essential professional ethics for publishing on preprint servers. This symposium delves into the guidelines and standards required to maintain integrity and accuracy in academic publishing in the evolving digital landscape.

View Symposium

MARK SEELEY (@marklseeley) consults on science publishing and legal issues through the SciPubLaw LLC entity , and speaks and comments regularly on publishing, licensing and copyright issues on the site including recently on international publishing contracts, the EU Digital Single Market copyright directive, and Open Access and Transformative Agreements.