(English) The right to read is the right to mine

To celebrate the 10th International Open Access Week, Cambridge University has placed a digitised version of Stephen Hawking’s 1966 PhD thesis, „Properties of expanding universes” online for anyone to read and download. Hawking is quoted as saying: „Anyone, anywhere in the world should have free, unhindered access to not just my research, but to the research of every great and enquiring mind across the spectrum of human understanding.” Hawking’s PhD was already the most-requested item from Cambridge University’s open access repository, with the catalogue record alone attracting hundreds of views per month. And when it was announced that it was available online, there were 60,000 downloads in 24 hours. Indeed, the demand was so great it took the repository server down.

The global popularity of Hawking’s PhD may be a special case, and the result of his personal fame. But it is inarguable that more people will read scholarly work if it is freely available, and not locked away in a physical format, or in a digital version with only restricted access. It’s rather remarkable that Hawking’s thesis has only now been released, over 50 years after it was submitted. It’s equally remarkable that Cambridge University has only just announced that from „October 2017 onwards, all PhD students graduating from the University of Cambridge will be required to deposit an electronic copy of their doctoral work for future preservation.” Even then, it will still be optional whether those copies are released under an open access (OA) licence. That’s a huge missed opportunity to make open access the default for all academic work at Cambridge University. It means that research paid for by the public will still not be readily available as a matter of course.

Open access’s battle to liberate academic texts has been going on for around two decades. Cambridge University’s limited – though welcome – moves in this respect show how much work remains to be done. Many of the challenges arise from the fact that the key ways of sharing academic knowledge were devised in the analogue era. The arrival of computers – and of the Internet – means this long-established system is not optimal, and needs to be replaced with new ways of disseminating knowledge that take advantage of digital technology. Making the transition between the old and the new methodologies inevitably encounters various forms of resistance from those who have invested time and energy in the time-honoured way of doing things.

However, the same cannot be said about an exciting extension to open access: text and and data mining (TDM). It’s a completely new field that has only become possible recently with the availability of large numbers of online texts and databases, and low-cost but powerful computers that can analyse their holdings. The basic idea of TDM is to reveal new facts and information by bringing together existing text and data, and finding patterns within them that are hard to spot using manual techniques. It’s rather like the difference between viewing a landscape from the ground, where individual details predominate, and looking down from a plane flying at high altitudes, where large-scale and otherwise elusive structures can emerge.

Because TDM is so new, it does not come with the historical baggage that open access must grapple with. In theory, that means that researchers are starting with a clean slate that should allow them to adopt optimum techniques for extracting information. But something bad is happening that could prevent it from realising its full potential, depriving the world of useful knowledge – for example in the medical field – that could have provided dramatic benefits.

The problem, as is so often the case, is the copyright industry. Seeking to establish its role as gatekeeper to all knowledge, it is promoting the view that TDM of non-OA sources requires additional licensing, over and above a licence to view material – a form of double dipping. In other words, the industry is pushing for the creation of yet another ancillary copyright – one that would give publishers the power to block text and data mining – even when people already have fully legal access to the texts and data they wish to analyse.

The strategy became clear during a 2013 EU initiative called „Licences for Europe„, where the stated aim of the European Commission was „to ensure that copyright and licensing stay fit for purpose in this new digital context.” However, as various innovators, public interest groups and open access supporters discovered to their dismay, what the Commission actually meant was that more or less everything in the digital world would be regulated through licensing – including TDM.

When it became clear that the so-called „stakeholder dialogue” had been captured by the copyright industry, and that the European Commission was unwilling to consider any approach other than licensing, leading stakeholders representing the research sector, SMEs and open access publishers pulled out of the TDM Working Group. Around the same time, an open letter sent to the Licences for Europe organisers, signed by Nobel prize winners, technology SMEs, research councils, university associations, learned academies, publishers, libraries and law academics, emphasised that: „It is a universal truth that once lawful access is granted to a reader of an analogue book or journal they are free to extract information, imagine and innovate. The same must be true for computers in the modern information society.” That view is generally encapsulated as: „the right to read is the right to mine.”

Since then, things have moved on only slightly; the copyright industry still seeks to establish licensing as an indispensable part of TDM. The European Commission has released its draft Copyright Directive, Article 3 of which includes an exception to copyright to allow TDM, but only by „research organisations”, and only „for the purposes of scientific research”. That would prevent journalists from carrying out TDM – an increasingly powerful investigative tool – as part of their work. It also means that businesses would need to obtain a TDM licence to deploy this approach, either for commercial research and development, or as part of the product itself.

That’s economically very shortsighted, since it means that European companies will face obstacles if they wish to carry out TDM as part of their business. Rivals in other countries – notably the US – will not labour under this disadvantage, which means it will be easier for them to operate and flourish than for EU equivalents. It is also likely that startups using TDM will shun the EU when it comes to choosing a base for their operations.

More generally, though, the Commission’s formulation is wrong as a matter of principle. As the letter sent by the good and great quoted above also emphasised: „Facts and data are not regulated by intellectual property laws. Text and data mining does not trade on the underlying creative and expressive purpose of a copyright work.” Suggesting that some classes of users should pay for the right to extract facts and data seriously undermines a principle that has been unquestioned even by the copyright industry, at least until now. As members of the European Parliament haggle over the formulation of the rules for TDM in the EU, we have to be clear that there is only one way forward: the right to read must be synonymous with the right to mine.

