Text & Data Mining Exception For the EU Digital Single Market Proposal

A response to the February 2018 report of Dr. Eleanora Rosati (U Southampton) for the Policy Department for Citizens’ Rights and Constitutional Affairs

The recent report by Dr. Rosati echoes concerns she has raised before in her IP Kat blog  about whether limiting the TDM exception in the proposed DSM directive is “ambitious” enough, noting in her report that innovation could come from TDM projects undertaken by business concerns.  This seems to be a topic where I must disagree with Dr. Rosati, who I think usually demonstrates thoughtful balance on matters of IP and is a reliable reporter about new cases.  In this paper Dr. Rosati suggests that because there might be useful insights obtained by copying copyright or database content and applying TDM technologies to it, for commercial purposes by commercial actors, then this should be the basis for a copyright exception (and presumably a database directive exception).  This is not really however a standard that should be applied to exceptions, which of course are governed by the Berne 3-step test as traditionally applied in EU directives, with respect for rights holders and commercial licensing alternatives.  Those traditions also focus on non-commercial research or educational purposes, as does the UK exception from 2014 and the Google Books case from a US “fair use” perspective, both of which are quoted in the report.

Dr. Rosati’s report suggests an expansion of current proposals to business actors (extending this from the proposed public-private partnership concept) and apparently to all kinds of copyright material, from academic literature to news, perhaps to film as well.  Rosati discounts the licensing and permissions options and alternatives that currently exist, including through the CCC’s “RightFind” program (fair disclosure—I am a member of the CCC board) and direct permissions, options and policies of scholarly journal publishers, apparently asserting that there is still too much “legal uncertainty”.  Interestingly some examples given of IBM Watson projects, touted as quite significant from a research or business results perspective, all most likely used the existing permission or licensing mechanisms.  If the content that will be subject to the TDM exception for academic research, the scholarly journals literature, is largely available through policy, permissions or licenses, then Dr. Rosati has not made a convincing case for a non-permission-less environment.  Rosati must add orphan works and out-of-commerce works, and mentions news, general media sources, photographs, in order to bolster a narrative of legal uncertainty (and the orphan works evidence from UK cultural institutions, as Rosati acknowledges, were documented prior to the adoption of the Orphan Works directive).

Fundamental to any law-making is the question of what problem or issue the law is intended to address, and then to attempt to formulate and implement the law so as to avoid unintended consequences—to emphasize the right tool for the job.  As the STM association has said (again full disclosure I chaired the STM Copyright Committee during this time) in 2016 that “STM publishers support, invest in and enable text and data mining” noting specifically the CrossRef initiative, which was not mentioned in the Rosati report (see report).  The Commission itself in its 2015 working plan for the DSM noted the link between exceptions and licensing alternatives—indicating that in the EU as in most countries the question often is whether there is a market “gap” that rights markets are not currently addressing, in discussing new methods of copying content , even while noting possible research gains through greater legal certainty for TDM rights.

From a purely legal perspective, I believe that Dr. Rosati omitted several important recent European decisions which suggest that indexing and linking activities do implicate the communication rights from the InfoSoc directive (see some recent decisions in cases brought by the Dutch organization Stichting Brein , and I believe that Dr. Rosati’s discussion of the Google Books fair use decision is somewhat simplistic.  With respect to the latter, Dr. Rosati is correct in noting the 2nd Circuit’s discussion of the utility of searching across the entire Books corpus for linguistic matching—and that this did factor into the court’s finding of fair use.  However fair use analysis in the US is always a matter of consideration of a number of factors, of which the purpose of the use (particularly as to whether an activity is more akin to a non-commercial research activity or a more commercial activity) and impact on the market are two very critical factors.  If the facts of the case are varied just a little (for example if Google was not displaying just snippets but whole works, or was generating more commercial revenue through advertising), then a very different result might have been reached.  In my view, Google Books does not stand for a wide fair use finding from scanning books into a database to be used for “non-consumptive” purposes (to use a phrase that the Google lawyers coined).   In fact a new decision this week from the same 2nd Circuit, Fox v. TVEyes, also involves the creation of a database of content that is indexed for the convenience of users, but adds a viewing opportunity as well, which the court found went too far for a fair use finding.  The court commented that even in the Google Books decision, the court “cautioned that the case test[ed] the boundaries of fair use” (TVEyes decision).

Text and data mining might well lead to important ativities and research results—and for this reason most STM journal publishers are on record and are strongly supporting academic research projects (some go further, as Dr. Rosati mentions).  In fact by working with organizations such as CCC and CrosRef, they are actively enabling the normalization activities that Rosati mentions as still being critical to the technical processes (see also the STM Declaration covering twenty-one leading publishing houses.  Other copyright sectors would be rightfully concerned about their works being caught up in an exception intended for scholarly research.  Commercial beneficiaries are currently obtaining licenses and permissions, and doing so on a commercial and pragmatic basis, as demonstrated by Dr. Rosati’s own list of IBM Watson projects.  It is not at all clear to me why an exception should be applied to an active and growing copyright market for the benefit of large technology companies.

The bottom line:  there is not a strong legal basis, and an even weaker policy basis for expanding the proposed exception to all types and forms of copyright and database content, or for expanding the number of beneficiaries.  Doing so would violate some of the law-making fundamentals noted above, as well as the EU’s Berne obligations.

Mark Seeley