Legitimate Interest is the New Consent – Large-Scale Measurement and Legal Compliance of IAB Europe TCF Paywalls

Cookie paywalls allow visitors of a website to access its content only after they make a choice between paying a fee or accept tracking. European Data Protection Authorities (DPAs) recently issued guidelines and decisions on paywalls lawfulness, but it is yet unknown whether websites comply with them. We study in this paper the prevalence of cookie paywalls on the top one million websites using an automatic crawler. We identify 431 cookie paywalls, all using the Transparency and Consent Framework (TCF). We then analyse the data these paywalls communicate through the TCF, and in particular, the legal grounds and the purposes used to collect personal data. We observe that cookie paywalls extensively rely on legitimate interest legal basis systematically conflated with consent. We also observe a lack of correlation between the presence of paywalls and legal decisions or guidelines by DPAs.


INTRODUCTION
Currently the Web business model employs cookie paywalls -also named "pay or okay" or even simply "paywalls": if a user refuses tracking, she is then obliged to provide a sum of money to access that website [6].All cookie paywalls found by the literature [29] were using the Interactive Advertising Bureau (IAB) Europe Transparency and Consent Framework (TCF) -the European-level association for the digital marketing and advertising ecosystem [25].Several European Data Protection Authorities (DPAs) recently ruled cases where cookie paywalls were deemed unlawful or issued guidelines imposing safeguards for these practices, for instance, the Austrian [31,24], Spanish [1], Danish [13], and German [12,30] DPAs.However, it is yet to be determined whether websites comply with the DPA guidelines and decisions.
Thus, this paper addresses the following research questions: 1) Are IAB Europe TCF implementations of cookie paywalls in line with EU data protection law requirements?2) To what extent do websites follow the DPA positions concerning cookie paywalls?To answer these questions, we design a crawler to detect cookie paywalls over the top one million websites, and analyse the data they communicate through the TCF.Our contributions can be listed as follows: (1) We measure cookie paywalls in terms of numbers, geographic distribution, categories, and prices.We found out that cookie paywalls can now be found in several types of websites, and mostly in Germany, despite the critical stance of the DPA towards these.(2) We provide an onlook at TCF implementations in the wild.We found that websites hosting cookie paywalls i) rarely register consent before an action from a visitor, ii) extensively rely on legitimate interest legal basis for data collection, and iii) the TCF specifications, in spite of the V2.2 update, does not bring sufficient technical guarantees with respect to the ban of this legal basis of legitimate interest for advertising purposes.(3) We provide an update to the legal regulatory landscape regarding cookie paywalls.We point out that the use of legitimate interest by TCF-based paywalls as a legal basis is illegal, and that the distribution of cookie paywalls does not seem to be affected by national DPA decisions.

6(1)
).The ePrivacy Directive (ePD) [22] provides supplementary rules to the GDPR in particular for the use of tracking technologies.To comply with the GDPR and the ePD, websites must obtain consent from EU users when tracking their behavior (Art.5(3) ePD) [16, para. 44] for concrete purposes (such as targeted advertising).Some other purposes are exempted of consent, e.g, functional or essential trackers (Recital 66 ePD).The only way to assess with certainty whether consent is required is to analyse the purpose of each tracer on a given website [18,3,23].A valid consent must comply with several requirements: prior, freely given, specific, informed, unambiguous, readable, accessible, and revocable (Art.4(11) and 7 GDPR) [34].Most relevant to this paper is the requirement of a freely given consent which means that the request for consent should imply a voluntary choice of the user to accept or decline some or all purposes.Such choice should be taken in the absence of any kind of pressure to persuade her to give consent [37,38,15], or in the absence of negative consequences in case a user rejects consent to tracking.Making the access of a website conditional on the acceptance of certain non-essential trackers can affect, in certain cases, the freedom of choice [6], and subsequently, the validity of consent [15].Legal basis of legitimate interest.Processing personal data will be lawful when it is necessary for the purposes of the legitimate interests of a data controller or by a third party to whom data was disclosed (Article 6(1)(f)).The general provision of legitimate interest is open-ended, with a broad and unspecific scope [20], and it is not purpose-specific as long as its requirements are satisfied.The open-ended nature of this provision raises important questions regarding its exact scope and application [20].It is mandatory that such processing is necessary for the purposes of a given interest.These legitimate interests may justify data collection if they override the data subject's interests and rights, such as the right to privacy [7,8].Accordingly, some obligations impend over controllers: they are required to perform a balancing decision in every single context as to whether this requirement is met.Legal basis for advertising purposes is consent.A recent decision of the 4th July 2023 by the European Court of Justice (CJEU) in Meta vs Bundeskartellamt Case C-252/21 [14] established that i) consent is the appropriate legal basis for the tracking-and-profilingdriven personalized content and behavioral advertising, and ii) no legitimate interest would override the users' rights when websites try to provide ads (see notably paragraph 117).Paywalls are endorsed by the European Court of Justice.Paragraph 150 of the decision [14] permits paywalls if a given fee is necessary and appropriate.It then falls upon websites to motivate and inform users about a fee necessity and its appropriateness.Interaction between consent and other lawful grounds.Following noyb's1 cognition posited in its complaints, if a controller requests user consent, this choice has a blocking effect regarding other legal basis, i.e. the website has deprived itself of the possibility of basing the data processing on another legal basis under Article 6(1) of the GDPR. 2 Moreover, considering the understanding of the European Data Protection Board guidelines [15, parag 121-123], the application of one of the six legal bases under Article 6(1) must be disclosed prior to data collection and in relation to a specific purpose.This means that if a controller chooses to rely on consent for any part of the processing, while actually another lawful basis is relied upon, this would be fundamentally unfair to individuals.Thus the controller cannot swap from consent to other lawful bases, like legitimate interest.IAB Europe Transparency and Consent Framework.IAB Europe TCF defines in its specifications ten purposes that can rely on both legal grounds: consent and legitimate interest.The purposes are: (1) Store and/or access information on a device, (2) Select basic ads, (3) Create a personalised ads profile, (4) Select personalised ads, (5) Create a personalised content profile, (6) Select personalised content, (7) Measure ad performance, (8) Measure content performance, ( 9) Apply market research to generate audience insights, and (10) Develop and improve products.In February 2022, this framework was declared to infringe the GDPR for using unlawful practices and for collecting data for advertising purposes on the ground of legitimate interests [2,36].In September 2022, the TCF was brought to the highest court of the EU (the European Court of Justice) [4].In order to comply with the law, IAB Europe recently announced the new version 2.2 of the TCF (to be enforced in September 2023), which will notably prevent the use of legitimate interest for purposes 3, 4, 5, and 6 [35].

Related work
Papadopoulos et al. [33] automated the detection and classification of paywalls on the Web involving machine learning, but the study did not address cookie paywalls specifically.The authors did however conduct a thorough review of the types of sites that employ paywalls and their country of origin.Matte et al. [27] investigated 28 257 websites, of which 1 426 implemented the TCF.They found that more than 50% of the web pages analysed did not comply with the GDPR or the ePD, and that all non-compliant web pages implemented the TCF.The only technical and legal study on cookie paywalls was conducted by Morel et al. [29].The study was based on a manual classification of the most popular websites in 13 Central European countries.They analyzed 2800 websites and found 13 websites employing cookie paywalls.They used a heuristic method based on features of the language of cookie paywalls to detect them.The 13 cookie paywall websites were analysed to extract data such as the type of banner (e.g.blocking or not), website category, and the price/type of subscription.They provided a legal analysis of cookie paywalls in the light of EU data protection law and regulatory guidelines, and also presented a fine-grained classification of paywalls.We have built upon the work of this study by 1) performing similar analyses on a larger scale and programmatically, and 2) providing an update to the legal landscape with regulatory decisions and recent DPA guidelines.

METHODOLOGY
We present here how we designed our cookie paywall crawler, and how we analysed the data they convey through the TCF.Crawler.We built a crawler to identify cookie paywalls using text processing on the top 1 million URLs using the Daily List from Tranco [26]. 3 3 The crawler was configured to run 32 agents using a matching 32 Firefox browsers.Each of the agents and browsers ran in their own containers inside of a Kubernetes cluster.The container images for the browsers were pulled from the Docker Selenium project, which in turn makes use of the Firefox version 113.0.1.Once the crawl was completed, cookie paywall sites flagged as "likely" were manually confirmed by two independent annotators, along with a classification of geographical basis, website category, and paywall price per month.The country in which the site is based was determined by analyzing the WHOIS requests for each site. 4We developed a script that takes the response of a WHOIS query of a domain name and looks for fields such as "country:".If no relevant fields was found, we used the domain TLD (such as .defor Germany).The website type was determined by Cyren's URL Category Checker [9].They use a URL classifier to assess threats from web pages, provided on their website [10].The classifier sometimes returned two categories for the same URL.In that case, only the first category was considered as we assume it is the most likely match (following Cyren's guidelines).TCF analysis.From the results of the crawl, two sets of cookie paywalls were distinguished: i) one set containing cookie paywalls using the Consent Management Platform (CMP) contentpass 5 which exclusively provides cookie paywalls to 220 websites (189 of which were analysed); and ii) another set using other various CMPs.We then used three different approaches as explained below.In all approaches, we extracted the number of vendors to which data is conveyed based on both consent and legitimate interest.The data resulting from the analysis can be found following this link.In the automated approach a list of all websites implementing contentpass was created by scraping its marketing webpage.Every website in the list was analysed, and the TCF consent string was stored as it appears in all three relevant states -before interaction, after giving consent, and after logging in.The TCF consent string was extracted by searching the cookie jar and local storage for cookies stored per the naming standards specified in the TCF.We retrieved data from a large and varied set of cookie paywalls using a semi-automated approach to get an overview of how the cookie paywalls behave before any interaction with a visitor.We browsed all non-contentpass websites with a script which automatically saves the consent string recorded by the website.In our manual approach we performed a manual analysis on a subset of websites not implementing contentpass, for which we paid a subscription.We randomly selected 20 websites for manual inspection using pythons built-in random generator.In the case where one subscription was giving access to several cookie paywalls, we investigated whether the implemented cookie paywalls -which are part of the same subscription -, differ from each other.

RESULTS AND DISCUSSION
All found cookie paywalls use the controversial TCF.108 cookie paywalls were found in a preliminary calibration phase.The crawler then processed 1 million pages in about 5 days, reporting 330 as "likely" (see Section 3).The confirmed number was 323, giving the crawler a positive accuracy of ~98%.All confirmed paywalls, along with their assigned classification and price, can be found 4 WHOIS is a protocol used to determine the registered owner of an internet domain name.Although the quantity and type of information can be inconsistent when querying a WHOIS database, it often contains geographical information associated with the registrant of a domain (typically a company in the case of cookie paywall websites). 5Although contentpass can be considered a Subscription Management Platform (SMP). in this Google Sheet.They were combined with the 108 initially found in the preliminary phase, making a total of 431, all using the TCF.Note that the lawfulness of this framework is currently being argued at the Court of Justice of the EU (IAB Europe (C-604/22)) [5] and the TCF was considered illegal by the Belgian DPA [2,36].Paywalls are prevalent in Germany despite the DPA stance.
The distribution of cookie paywalls across all countries is depicted in Figure 2a.Our results show a preponderance of cookie paywalls in Germany (317 out of 431), followed by France (42), Italy (27), and Austria ( 22) -the other countries having only a few cookie paywalls (between 1 and 6).However, the position of the German DPA is critical of cookie paywalls [29].This fact indicates that in Germany the prohibition of cookie paywalls may not affect their prevalence.The DPAs of other countries prefer to assess paywalls case by case, as shown in Table 1.One could reason that the existence of the German-based CMP contentpass might justify the concentration of paywalls in Germany.This particular CMP only offers a cookie paywall solution, and it doesn't provide other type of cookie banners.However, in a closer look, while 317 cookie paywalls were found in Germany, only 220 used contentpass which means that 97 are not using it.Such number is considerably higher than the runners-up France, Italy, and Austria, which indicates that contentpass is not the only reason for paywall prevalence in Germany.Users consent when facing contentpass paywall but they are tracked by up to 365 adtech vendors.When contacting the CEO of contentpass to better understand their solution, we were informed that 99.9% of visitors consent when facing a contentpass paywall, 6 and therefore do not pay (in spite of the first month of subscription being free).This fact -although not a direct result from our crawling -indicates that websites using contentpass do not rely on subscription but rather on ad revenues for their business model.Notably, after giving consent, when decoding the consent string of a website using contentpass, for instance https://www.spielfilm.de/,personal data is shared with up to 365 vendors.These vendors include major adtech vendors and data brokers such as Oracle Advertising, Criteo SA, and Acxiom.We question whether consent is freely given -even if an alternative to tracking exists (i.e. a subscription) -since personal data is being shared with so many third parties, which might render tracking detrimental.

DPAs
Positioning on cookie paywalls German DPA [12] Recent case in which "Pay or Okay" approach was ruled illegal for an online newspaper Spanish DPA [1] Guidelines state that access cannot be conditioned to consent to cookies, but exceptions can be made if alternatives are offered (not necessarily free ones) and users informed French DPA [6] Case by case assessment.Websites need to show there is a real and fair alternative way to access other websites without tracking; reasonable price; fair remuneration Austrian DPA [24] Dual position: Recent decision: paywalls are generally permissible, but users must have the possibility to say "yes" or "no" to any specific data processing.
Paywalls are not restricted to news any longer, they are spread into business, tech, and entertainment websites.The distribution of website categories (see Figure 1) shows that a large number of the found cookie paywalls were classified as News (27.4%), confirming former work, as paywalls improve content monetisation and thus fund journalism [21].The frequency of paywalls on sites in the Business (13.2%),Computer & Technology (12.3%), and Entertainment (7.7%) categories suggest a potential reliance on a combination of subscription revenue and sharing of personal data in these sectors as well.These categories include sites that often host high-traffic platforms that attract a large user base that can be leveraged for targeted advertising purposes.Paywalls seem to have a reasonable cost -€3.34 on average.All cookie paywalls used a monthly subscription-based payment model, wherein the vast majority (67%) cost between €2 and €4 per month, with an average price of €3.34 per month.The distribution of price is visualized in Figure 2b. 7Morel et al. [29] argued that according to the French DPA, the cost of a paywall should be "reasonable" or consist of a "fair remuneration" [6].As mentioned on Section 2.1, neither the CJEU established what is a necessary or appropriate fee, and so the determination of prices is yet to be further studied and harmonized.It is worth noting that contentpass offers a crosssite subscription-based model of €2.99 per month.This finding can be read in the light of Mueller-Tribbensee et al. study's results, in which the authors argued that 99% of users tend to consent when facing paywalls.We thus conclude that even if the price seems reasonable, users choose to be tracked [11].
The TCF conflates the legal grounds of consent and legitimate interest.We observed that all websites hosting cookie paywalls systematically communicate consent strings registering purposes under legitimate interest -in addition to consent -after a visitor clicks on "accept". 8As commented in Section 2.1, if a website requests consent, her choice has a blocking effect regarding other legal basis, such as legitimate interest, and thus the website is deprived of processing on another legal basis [32].We argue that such overlapping renders processing illegal.
Even if users pay for a subscription, they are still tracked under legitimate interest for inappropriate purposes.After a paid subscription to paywalls, some websites (14 websites, including 13 using contentpass) still collect personal data based on legitimate interest by default.This means that users have to manually object if they wish to avoid their data to be collected.Firstly, users should not be anyway tracked if they pay for a subscription, since paywalls are only legitimate if they consist of an alternative to tracking.Secondly, 3 of these websites track users for the purpose to "Develop and improve products" under the legal ground of legitimate interest.This purpose is vague and unspecified, since it is not detailed enough to determine its kind of processing [3,15] and therefore illegal [28].The European Data Protection Board (EDPB) guidance [19] proposes legitimate interest for this purpose under detailed information on how users engage with their service through a organizational metrics for a concrete service, also grounding the way to improve it, and cannot be used in general, as the TCF uses.Thirdly, 3 websites collect data for 5 purposes under legitimate interest including "Select basic ads", "Measure ad performance", "Measure content performance", "Apply market research to generate audience insights", and "Develop and improve products" (https://karrierefragen.de for instance).However, consent is always required for advertising related purposes [14], and for third-party analytics [20, p .47].We take the view that the latter purpose "Apply market research to generate audience insights" is defined in a broad way and with ambiguity as to its intent [28].
Although the TCF will disable the use of legitimate interest for certain ad-related purposes, websites can design custom storage of advertising purposes under the legal basis of legitimate interest.When looking for how consent is stored in the local storage of the browser, we observed that some purposes reliant upon the legal basis of legitimate interest were found to be stored separately from the TCF consent string -under the string gdpr -> customVendorsResponse -> legIntPurposes (e.g.www.voici.fr).Some of these identified purposes are ad-related (purposes 3, 4, 5, and 6, see Section 2), but according to the legal background laid down in Section 2.1 should not rely under this legal basis.The legitimate interest related purposes were found in 12 of the manually inspected websites.11 of these websites have the same CMP (Prisma Media), so this customisation may only be permitted by certain CMPs.This finding needs to be interpreted in the light of the TCF v2.2 update which will purportedly prevent the use of legitimate interest for personalised advertising [35].Indeed, this storage customisation of user choices i) may hinder the monitoring of the TCF framework by IAB, ii) legitimize non compliance practices from CMPs, and iii) may enable the circumvention of the technical safeguards brought by the upcoming TCF update.
Limitations.The majority of cookie paywalls detected in our study are European-based, only 8 were found outside thereof.This may be due to a biased methodology in that our detection algorithm was based on a previous study that specifically focused on identifying European cookie paywalls.To gain a deeper understanding, future research should examine different geographic regions and develop detection algorithms that are less specific to certain regions.

CONCLUSION AND RECOMMENDATIONS
We found 431 cookie paywalls and reported that most paywalls were found in Germany despite of its DPA positioning.These paywalls extensively use advertising purposes under the legal basis of legitimate interest to collect personal data.Promising research points to 1) a browser extension to bypass paywalls, and 2) the assessment of paywalls on mobile browsers.Based on our findings, we also formulate policy recommendations.First, because we observed that consent strings can be stored on local storage (as opposed to regular cookies), we recommend the ban of custom storage for legitimate interest-based purposes, as it can include advertising (see Section 4) and renders auditability difficult.Second, since the TCF consent string communicates purposes for both consent and legitimate interest, we advocate for a compliant signal which clearly distinguishes the two legal grounds.Third, considering that isolated DPA decisions and their guidelines may not be enough to bound websites, and that the CJEU decision is yet not clear about what is a necessary and appropriate fee, we call for a concerted and harmonized effort from the EDPB to issue guidelines ascertaining the lawfulness of paywalls.