This is the first of a two-part article reflecting on EU data protection considerations for the implementation of GenAI use cases in organisations. Part two is available here.
The rapid advancements in artificial intelligence (“AI”) have opened a new era of innovation and technological potential. Among the most intriguing developments is the rise of generative AI (“GenAI”), a subset of AI that is capable of creating, imitating, or modifying various forms of content, ranging from texts and images to music and videos. As this field continues to evolve, it has become increasingly evident that the overlap of GenAI and data protection law, especially the GDPR, is complex.
The swift rise of GenAI provides for new challenges regarding data protection. As these AI systems process and generate data including personal data, there is an increasing need for companies using such systems to process their data in a GDPR compliant manner. Furthermore, data protection authorities play a crucial role in regulating and supervising the use of GenAI. They have already set various investigations into motion around this. It is clear that EU data protection regulators are trying to “de-facto regulate GenAI” (see also here), which makes GDPR compliance even more rational.
This article will discuss some of the most pressing GDPR requirements for using GenAI and Large Language Models (“LLMs”) as its underlying technology, which falls into the scope of EU law.
AI models are trained by feeding the underlying algorithm with data, including personal data (“training data”). LLMs are trained to perform specific tasks such as producing and summarizing texts, extracting information, making predictions, making texts more comprehensible, recognising differences and similarities in texts and writing texts in specific styles. The AI processes and analyses patterns via the user’s prompts and is able to generate outputs from its data pool based on the statistical probability of the outputs sentence structure.
AI models may subsequently be trained with further, more specific training data for finetuning purposes, e.g., to adjust to a specific use case. This enables the AI to generate a content (“output data”) based on prompts from the user (“input data”). Some of these data sources may include personal data, especially training data when sourced from publicly available internet data through scraping, whereas for input and output data this depends on the intended purpose and use case for both input and output data. For example, while the prompts to create a certain ad image is unlikely to include any personal data (as is the output data), the prompt to draft a tailored newsletter to all customers in a customer relationship management system (“CRM”) based on their purchase history, is likely to contain personal data.
In the initial stage, it is vital to identify the aspects of personal data processing by GenAI models for which a company (referred to as the "AI user") holds responsibility as a controller. While the responsibility for training the AI model may initially appear to rest solely with the AI provider, closer scrutiny is necessary in specific scenarios. This is particularly relevant when considering that the AI user deploying GenAI can potentially influence the AI's training, especially in terms of its conversational capabilities. For example, this influence may occur when settings allow the reuse of training data for general AI enhancement. Although this aligns with the AI provider's interests, benefiting all AI users, it raises the question of whether the AI user shares joint responsibility with the AI provider for the training process.
Existing case law on “joint controllership” from the European Court of Justice so far emphasizes that a party must determine both the purpose and the means of processing personal data, to qualify as a joint controller. In the Jehovah's Witnesses case, this was a policy direction from the community to its members. In the two cases involving Facebook (Wirtschaftsakademie and FashionID) this involved getting a commercial benefit from Facebook advertising (determining purpose) and selecting categories of data and/or choosing to make use of Facebook’s code to enable the AI user data to be transmitted over to Facebook (means of processing). To conclude, joint controllership is generally to be interpreted broadly and does not require equal responsibility.
Currently the AI provider determines how the data collected from the end-users will be processed for general GenAI improvement, whereas – when settings allow the reuse of training data for general AI enhancement – AI users give access to its end users' data through inputting the data knowing that it will be used by the AI provider for training purposes to generally improve its GenAI services, including the AI user´s own services. The common goal of either party is the processing for offering respectively using most up-to-date and high-performing GenAI services. Given the joint commercial benefit, there is a risk that organizations using these AI services will be jointly responsible with the AI provider under Art. 26 GDPR, which significantly impacts their risk exposure. While we believe that such a broad interpretation would exceed the requirements of Art. 26 GDPR, it is still recommended – considering the far-reaching existing case law mentioned above – to disable settings to allow for the reuse of input data by the AI provider (if commercially possible), or to carefully assess and prepare for the potential consequences of joint controllership. Ultimately, more and more GenAI solutions that are labelled as enterprise versions are considering this issue and offering alternatives to their AI users thereby mitigating such risks.
In terms of responsibility for inputs and outputs, the responsibility for processing data lies in the hands of the AI user. The AI provider merely acts as a processor within the meaning of Art. 28 GDPR and processes the data according to the AI user's instructions. Therefore, it is necessary to conclude a data processing agreement pursuant to Art. 28 GDPR with the AI provider. This agreement sets out the conditions and obligations for the provider to ensure that the data is handled in accordance with GDPR.
Pursuant to Art. 6 GDPR, the processing of input and output data (for which the AI user is responsible, see above) is only lawful if one of the legal bases in Art. 6, Art. 9 GDPR can be established.
Although this is highly dependent on the specific use case in question, three typical scenarios can be established:
The catalogues of Art. 13, 14 GDPR impose obligations on the controller, in the context of the collection of personal data, to inform data subjects in a clear and precise manner about certain essential information. It is therefore important to understand the complexity of providing such information where the processing qualifies as automated decision-making (“ADM”) under Art. 22 GDPR, which cause legal effects for data subjects or similarly significantly affect data subjects. In such cases of automated decision-making and considering the black box issue of all AI models, AI users must:
Currently, there are no established market standards to clearly outline these requirements. AI users therefore have the flexibility to define specific thresholds, ensuring that data subjects receive meaningful information about the relevant data elements, their sources, and their weight in the decision-making process. The desired outcomes vary depending on the use case; for instance, in the insurance industry, it is crucial to specify how certain behaviors impact premiums. As a best practice, AI users, like insurance companies, should offer tips through visuals on how to improve behavior and, subsequently, reduce insurance premiums.
Where the processing does not qualify as ADM under Art. 22 GDPR, the complexity is much lower. Although the EDPB requires the provision of all the information mentioned above under (i)-(iii) as best practices, these are not strictly required, and in the absence of other existing best practices many ways of describing the AI based processing of personal data are feasible, yet (see for background the Art.29 WP/EDPB´s guidelines on automated individual decision-making here).
The above points highlight the significance of adhering to data protection regulations when employing GenAI in the European Union market. Users of AI need to guarantee that they are proactive in implementing the required measures to uphold GDPR compliance.
*Thanks to Bird & Bird trainees Lennard Winrich and Dylan Boßmann Everitt for their contributions to this article.