(English) Why the Copyright Directive Lacks (Artificial) Intelligence
Artificial Intelligence (AI) is hot. Although its capabilities have been steadily increasing for years, it was the victory of DeepMind’s AlphaGo program over the top Go expert Lee Se-dol last year that alerted many to the rapid pace of development in the AI field. The win was even more significant than the earlier defeat of the reigning world chess champion Garry Kasparov by IBM’s Deep Blue AI system in 1997. Where Deep Blue won by searching through 200 million possible moves per second, overwhelming its opponent with brute computational force, AlphaGo won by thinking like a human – only better. It even came up with such an unprecedented and brilliant tactic that a former European Go champion said: „It’s not a human move. I’ve never seen a human play this move. So beautiful.“
Beauty aside, AI is expected to generate huge economic gains in the coming years. According to research carried out by Accenture, AI could double annual economic growth rates in 12 developed economies by 2035, and boost labour productivity by up to 40%. China aims to create a $150bn AI industry by 2030, and is already challenging the US for global leadership in the field. A report from Sinovation Ventures notes: „Today’s consensus view is that the United States and China have begun a two-way race for AI dominance, making technology a key source of trade friction between the two countries“
The Sinovation Ventures report lists four pre-requisites for AI. Two are fairly obvious: computational power and human expertise. Another – domain-specific focus – reflects the fact that today’s AI systems do not possess a generalised intelligence, but excel in narrow domains like Go. The fourth requirement is „A sea of data.“ The report says: „By far the most important element is the availability of large, labelled data sets (examples include information about people who applied for loans and whether they repaid or defaulted; or people who submitted a customer complaint and whether they are satisfied or dissatisfied). AI uses these large data set as examples to teach its algorithms to optimize.“
It is the availability of large datasets for „machine learning“ – training AI systems – that has taken the field from the raw, dumb power of IBM’s Deep Blue, to the „beautiful“, human-like responses of DeepMind’s AlphaGo. It is data that holds the key to success for AI companies – and for countries that wish to develop world-class AI industries. The authors of a major report on AI commissioned by the UK government, „Growing The Artificial Intelligence Industry In The UK„, agree. They write: „Growing the AI industry in terms of those developing it and deploying it requires improved access to new and existing datasets to train, develop and deploy code.“ And: „Very simply, more open data in more sectors is more data to use with AI to address challenges in those sectors, increasing the scope for innovation.“
The main barrier to using data for AI work is not technical, but legal. As the report’s authors point out: „some data cannot be extracted from published research because access to that data can be restricted by contractor or copyright, making it unavailable as training data for AI. This restricts the use of AI in areas of high potential public value, and lessens the value that can be gained from published research, much of which is funded by the public.“
The report goes on to make an important point: „To date, assessments of the value of text and data mining of research and for new research do not appear to have included the potential value that can come from using data for AI.“ To remedy that, the document calls on the UK government to „move towards establishing by default that for published research the right to read is also the right to mine data“ – an idea supported by the UK Libraries and Archives Copyright Alliance. The report also says the UK government should recognise how much value could be added to the UK economy by making data available for AI through text and data mining (TDM), including by businesses, when it comes to framing copyright exceptions.
What applies to the UK is just as pertinent for the EU, which a 2017 report on the State of European Tech calls „home to the world’s leading AI research community„. The current proposal for TDM in the Copyright Directive does not allow companies to mine text and data freely available online. On top of that, it can be expected that it will make existing licensing arrangements even more complex and costly, instead of enabling anyone with legal access to read content also to mine that same content – that is, recognising that „the right to read is the right to mine.“
Hindering commercial AI research in this way will have a number of negative effects. It will make it harder for EU startups, especially small ones, to develop AI products that could compete against US players. Giant corporations like Google and Facebook will have ready access to key training data in the US, giving them an unfair advantage over EU companies. The current EU proposal for TDM will discourage foreign companies from setting up AI research labs in the EU, where they will not be able to use text and data for machine learning without negotiating permission first. It will also make leading AI researchers – already an extremely scarce resource – think twice about accepting posts at EU universities, since they will be unable to commercialise their work quickly, if at all.
By placing obstacles in the way of AI engineers and companies working at the leading edge of the field, the European Commission risks condemning the EU to be a backwater for what many believe will be the defining generational breakthrough for the next few decades. European companies and citizens will be forced once more to become dependent on advanced technologies developed elsewhere, instead of being able to support exciting home-grown products and services.
Once the US and China establish themselves as global leaders, there will inevitably be a new brain drain of Europe’s best and brightest young engineers to those regions, just as happened in the early days of computers and the Internet. That loss will make it well-nigh impossible to undo the harm caused by a short-sighted desire to placate a few small-scale legacy sectors like academic publishing, rather than thinking about laying the foundations for vast new industries of the future.
If the EU wishes to maximise the benefits that AI is expected to bring to its member states‘ economies, it should free up data for machine learning by removing the limitations on TDM currently found in the Copyright Directive. To achieve that, it must enshrine in law that the right to read is the right to mine. Given the wide-ranging positive impact on business and society that Artificial Intelligence is predicted to bring, to do otherwise would hardly be very clever.
Featured image by Cryteria.