domenica, 14 Luglio 2024

(English) Article 13: Putting Flawed Upload Filters at the Heart of the Internet

Ci spiace, ma questo articolo è disponibile soltanto in English.

On 5 July, the European Parliament voted not to approve the controversial text of the Copyright Directive that the JURI committee had recommended. That’s really great news for those striving for copyright laws that are fit for the digital age, but it’s a battle, not the war, that has been won. There will now be a further vote by the European Parliament, currently scheduled for 12 September. Before then, MEPs will be able to table amendments to the text, which provides an important opportunity to eliminate some of the worst ideas there.

Here on CopyBuzz, we’ve talked about the issues with Article 11 – the snippet tax – and Article 3, which would allow text and data mining, but only for non-profit purposes. However, in terms of the harm it would cause to the entire Internet ecosystem, there is a broad consensus that the most problematic aspect of the Copyright Directive is Article 13. It is therefore vitally important that it should be deleted or at least rendered harmless. The problem is that there is a considerable amount of misunderstanding – and misinformation – floating around on the topic. This is the first of two articles that aims to spell out why Article 13 is so bad.

The underlying intent of Article 13 is clearly expressed in the original proposal from the European Commission:

Information society service providers that store and provide to the public access to large amounts of works or other subject-matter uploaded by their users shall, in cooperation with rightholders, take measures to ensure the functioning of agreements concluded with rightholders for the use of their works or other subject-matter or to prevent the availability on their services of works or other subject-matter identified by rightholders through the cooperation with the service providers. Those measures, such as the use of effective content recognition technologies, shall be appropriate and proportionate.

The key aim is “to prevent the availability on their services of works or other subject-matter identified by rightholders”. Specifically, those works are uploaded by the users of the services. So Article 13 requires service providers to check every upload and block those that have been identified by the copyright holders. The original Article 13 text notes that this can be done using “effective content recognition technologies”.

However, what it fails to admit is that there is, in fact, no other method to achieve its stated goal. If service providers do not inspect every upload, there is obviously a possibility some will be unauthorised copies. The only way to avoid that risk is to look at every single upload. But “looking” in this context can’t mean an actual person manually inspecting every file in turn. Again, Article 13’s text tells us why: it concerns service providers that store and provide access to “large amounts of works” uploaded by the public. To give an idea of the scale involved, 400 hours of videos are uploaded to YouTube every minute. The only practical way to inspect such quantities is using automated systems of the kind already adopted by Google with its Content ID system.

So the reality of Article 13 is that online services would have to install automated upload filters. CopyBuzz noted recently that one of the Shadow Rapporteurs on the file, MEP Jean-Marie Cavada, admitted as much in a tweet immediately after the JURI vote on the Copyright Directive’s text. In doing so, he flatly contradicted the official line taken by supporters of Article 13, which is that it “will not filter the Internet“. The reason why they and the copyright industry are so keen to insist against all the facts that Article 13 does not impose upload filters is because the latter are so deeply flawed that no rational person – or MEP – would endorse making them compulsory for online services.

There are a number of insurmountable problems with filters. The first is the so-called “false positives” issue. Most people would find it easy to accept that no filter is perfect, but it’s hard to grasp what the imperfections of filtering mean in practice. The security expert Alex Muffet had the great idea of creating a computer simulation of filtering in order to obtain a better sense of what imperfect filters mean in real life. His model simulated running a test for something like copyright infringement, that was 99.5% accurate. That means for every 1000 uploads, it correctly puts 995 of them into the right “non-infringing” or “infringing” box, while five of them are incorrectly assigned – “non-infringing” marked as “infringing”, or vice versa.

Muffet further assumed that the rate of copyright infringement was one in 10,000 uploads, and ran his simulation using ten million uploads. To put that in perspective, every day around 750 million comments are posted (and thus uploaded) on Facebook, and 200 million photos are uploaded. With all these inputs, Muffet’s model correctly identified around 9,950,000 uploads as non-infringing, and 1,000 items were correctly marked as infringing. However, it also led to about 50,000 items being deemed infringing when they were in fact non-infringing. This is the “false positives” problem: that items are incorrectly caught by filtering systems. As you might expect, things were even worse if the upload filter was only 98.5% accurate. In that case, 150,000 uploads were falsely marked as infringing. For Facebook, the figure would be around 14 million false positives a day if accuracy rates were similar.

The scale of those errors indicates why it will not be possible for employees of online companies to double-check all uploads that are marked as infringing by the filter. Instead, online platforms will inevitably block automatically anything that is marked as infringing, and put the onus on the uploaders to contest this blocking. Since many ordinary users will either not know how to do this, or lack the time and the inclination to try, there will be a massive chilling effect on creativity if Article 13’s upload filters are imposed. Large quantities of legitimate material will be blocked unjustly by virtue of the imperfect nature of the monitoring systems being used.

But even if upload filters could recognise material with zero errors, there are further problems. There are many ways in which material under copyright can be used without permission, and legally, through EU copyright exceptions. The obvious ones are for things like quotations, criticism, parody etc. It is simply not possible to encode knowledge about these subtle legal aspects of copyright in the upload filter.

Matters are made even more complicated by the fact that different EU Member States have different copyright exceptions: what may be legal in one country, could be forbidden in its neighbour. How will automated filters handle those situations? In practice, online platforms will almost certainly overblock, implementing the most stringent rules across the whole of the EU. It will then be down to the person whose upload is blocked to appeal against the decision.

This approach naturally favours well-funded companies with experienced legal departments. If they wish to contest blocks, they will know how to do it, and have established mechanisms in place to make it as easy as possible. As a result, the whole upload filter system will be biased against ordinary users of the Internet who lack both resources and expertise.

Such an asymmetry might play out in other ways, too. For example, there are no penalties for incorrect or even malicious claims of copyright ownership being sent to online platforms. This is likely to lead to legal material being taken down by automatic filters. Once again, larger companies can more easily fight such abuses. Independent artists, or members of the public, will find it much harder, and will thus be disproportionately penalised. Part II of CopyBuzz’s analysis of Article 13’s deep problems will explore this fundamental flaw further.

Featured image by Max Pixel.

Writer (Rebel Code), journalist, blogger. on openness, the commons, copyright, patents and digital rights. [All content from this author is made available under a CC BY 4.0 license]