#CopyrightWeek - FilterFail: The Hall of Shame

[Note: This analysis is published in the context of the 2019 #CopyrightWeek, under the theme of 18 January on ‘Filters”, which revolves around the idea that: “Whether as a result of corporate pressure or regulation, overreliance on automated filters to patrol copyright infringement presents a danger to free expression on the Internet.”]

On 18 January, the #CopyrightWeek theme is ‘Filters’. The starting point is that: “Whether as a result of corporate pressure or regulation, overreliance on automated filters to patrol copyright infringement presents a danger to free expression on the Internet.”

This week we’ve previously covered how copyright infringement claims are already being abused to censor content and stifle freedom of expression, due to their gratuitous nature and the fact that bogus claims often remain unpunished. We’ve also covered how the Article 13 #CensorshipMachine proposals are stripping away the intermediary liability protections of online platforms, the so-called ‘safe harbour’ regime, leaving them with no other option than to implement automated filtering solution to mitigate their liability, seeing the volume of content being uploaded online. This reliance on ‘algorithms’ to filter user-generated content (UGC) will unquestionably lead to over-blocking, and this to the detriment of the fundamental freedoms of both creators and other users.

Filters are Like Exotic Supercars: Expensive (to Buy and Run) & Prone to Failure

Today, we’re going to take a closer look at how these so-called content recognition technologies that some platforms have already ‘voluntarily’ adopted (read: that platforms have been ‘gently’ coerced into adopting by the content industry). And we will specifically look at the best known example, namely YouTube’s so-called ‘Content ID’ technology.

One should keep in mind that this solution has already cost Google roughly 100 million dollar since it started developing it. A couple of years ago the cost of it was only estimated at 60 million dollar. This shows that it is not just a one-off development cost and you’re done. Quite the contrary as implementing such technology is closer to buying an exotic supercar: it’s expensive to put in your garage, and if you want to take it for a spin on the open road the maintenance (and potential upgrades to it) will cost you an arm and a leg.

The first conclusion that we can draw here is: this isn’t technology that just any company can afford to implement, and especially not our European startups. Although, we’ve heard the claims pretending that this technology is easily affordable for everyone, that sounds just like the the car dealer eagerly trying to sell you that exotic supercar by downplaying the maintenance and running costs to make you buy it. Moreover, in the logic of ‘if you pay peanuts, you get monkeys’, the cheaper the filter, the more likely the over-blocking it brings about.

This leads us to another important point in the debate around the Article 13 #CensorshipMachine, namely the fact that this is mainly a battle wherein big US rightholders are trying to sink big US online platforms, to the detriment of everyone else, namely citizens, EU startups, and even creators themselves, as this video highlights.

More similarities can be observed between supercars and content filtering technology: an expensive price-tag is no guarantee for a flawless ride or experience. Supercars face issues, such a easily catching fire, whilst expensive filtering technologies often fails to show any sign of (artificial) intelligence in the way they arbitrarily and randomly take down content.

The issue we are faced with here is that filters “are only capable of matching content, not determining whether the use of a particular work constitutes an infringement” - see ENGINE’s excellent report on ‘The Limits of Filtering’.

In practice this means that filters cannot distinguish if content is used in a legal manner, for example, when users rely on exceptions, such as parody or criticism. The latter implies that any user safeguards based around the benefit of the existing exceptions are just paper tigers in practice, and will fail to truly protect users. It’s a bit like wearing a seat belt, while sitting on a car seat that wasn’t bolted on properly: at first you think you’re safe, until you crash. In summary: filters are dumb, they do not understand context and hence block indiscriminately.

FilterFail: The Hall of Shame

Real life experiences from YouTube’s ContentID show how filters already frequently fail today, resulting in so-called ‘false-positives’ (i.e. legal content being caught in the net of these filters).

These ‘FilterFails’ will only grow exponentially in scale and scope with the increased number of platforms that will have to implement filters, combined with the diversity of the content that will have to be filtered (some types of content potentially being even more prone to false positives). Under the Article 13 #CensorshipMachine it’s not only audio, video, or images that will have to be filtered, but all types of content are under threat: text, software code, music sheets, 3D-drawings, architectural works, etc.. Actually, if one adds the new press publishers’ right created by Article 11, news excerpts will also need to be filtered!

For many of those ‘new’ types of content added to the filtering scope, there is just no filtering technology readily available from the shelf, as ENGINE explains in their report: “although there are fingerprinting tools available to scan and compare audio, video, and image files, no such tools exist to process other forms of copyrightable content”.

Below some (sad) examples of how filters fail:

Recently, the F1 gaming community lived a couple of scary hours, as YouTube’s Content ID took down their gameplay footage, as they supposedly infringed Formula 1’s copyright. The situation was ‘quickly’ rectified, but these takedowns shouldn’t have happened in the first place;
YouTuber & Twitcher SmellyOctopus shared his experience of how soundchecking your microphone on a private YouTube live stream bizarrely resulted in a YouTube Content ID copyright claim by ‘CD Baby’ (also check his video on it). In this case YouTube even publicly admitted that “the match system really blew it on this one”.

YouTuber Christian Buettner, aka TheFatRat, saw the copyright for one of his own songs wrongfully being transfered and attributed to an unknown company that decided to claim it. YouTube has refused to mediate in the dispute and Buettner hasn’t been able to reclaim his rights from the company. This led him to set up an online petition urging YouTube to fix its broken Content ID system.
Then there’s the example of Sony claiming the rights on video of a musician performing Bach, and more people seem to be going through similar experiences with abusive copyright claims from Sony;
Dr. Ulrich Kaiser, a German music professor, received a ContentID claim for a short video he upload about a project to digitise materials that are unequivocally in the public domain. In this video, he explained his project, while examples of the public domain music (e.g. Beethoven) played in the background (see Glyn Moody’s 2019 #CopyrightWeek contribution on the public domain and creativity);
A Dutch YouTuber has been accused of plagiarising his own music;
A video from feminist organisation Pinkstinks was blocked by ContentID for an alleged copyright infringement of material from broadcaster RTL, while in reality it was RTL that had used the organisation’s content in a broadcast.
A 10-hour long video of continuous white noise (i.e. random signals) received five copyright infringement claims. YouTube explained to the BBC that “clips of white noise are too indistinct for Google’s algorithms to work accurately”;
A Harvard Law School lecture on copyright was taken down by YouTube’s ContentID for containing short extracts of pop songs which were used for educational purposes;
YouTube’s ContentID identified NASA’s recording of a Mars landing as a copyright infringement. The video was aired by some TV stations, some of which automatically submit their content to ContentID, resulting in the filtering mechanism thinking NASA was infringing the TV stations’ rights; and,
A video of a panel session on copyright and football was taken down for infringing the rights of the Premier League, because brief clips of Premier League matches appeared in one of the presentations given. A presentation which was actually arguing in favour of stronger copyright protection for sporting events.

The list of examples is quite endless, and the Electronic Frontier Foundation’s (EFF) takedown gallery provides more cases.

Conclusion: Filters Are Not the Solution!

We can only adhere to the conclusion drawn by Cory Doctorow on EFF’s blog:

“If Content ID is a prototype, it needs to go back to the drawing board. It overblocks (catching all kinds of legitimate media) and underblocks (missing stuff that infuriates the big entertainment companies). It is expensive, balky, and ineffective”

If policymakers truly care about protecting creators and users’ their fundamental freedoms, then the only option they have is to reject the Article 13 #CensorshipMachine proposals.

Instead the EU legislators should consider the idea of a user-generated content exception, as the German delegation only recently put forward a proposal in this direction. Whilst the German approach was far from perfect, the intention at least goes in the right direction, and merits a thorough debate and examination.

Tagsarticle 13 censorship censorshipmachine Copyright Week 2019 filternet freedom of expression

Herman Rucic

Herman Rucic is Senior Policy Manager in the secretariat of the Copyright 4 Creativity (C4C) coalition. He is Senior Policy Manager at N-square Consulting since September 2010. [All content from this author is made available under a CC BY 4.0 license]