Why GDPR and LLMs are an underestimated combination
When an employee enters a customer name, a contract detail or salary information into a language model, it is not a trivial matter under data protection law. It constitutes processing of personal data by a third party — with all the consequences that GDPR attaches to that. Yet surveys from 2025 and 2026 consistently reveal the same pattern: employees use AI tools in their daily work, IT tolerates it, and the legal team is the last to know.
The problem is not the technology — language models are legitimate and useful in many scenarios. The problem is the missing foundation: no DPA, no usage policy, no documentation in the Record of Processing Activities (RoPA). For organisations operating under GDPR, and in some cases under sector-specific regulations, this is a structural compliance risk — not merely a theoretical one.
The three core obligations: legal basis, DPA and documentation
Any organisation using external language models in a business context must be able to answer three questions in a legally defensible way. These are not optional — they are at the centre of every GDPR audit by a supervisory authority.
1. What is the legal basis? In practice, three bases apply to the processing of personal data by an external model: legitimate interest (Art. 6(1)(f) GDPR), consent (Art. 6(1)(a)), or contract performance (Art. 6(1)(b)). Legitimate interest is most commonly applicable but requires a documented balancing test — particularly where customer data is involved.
2. Is a DPA in place? Where an LLM provider processes data on behalf of the organisation, a data processing agreement under Art. 28 GDPR is mandatory. OpenAI provides a DPA for enterprise customers; Microsoft concludes one under the Microsoft Customer Agreement. Organisations using the consumer version of ChatGPT without a separate contract have no DPA — and therefore no lawful processor relationship.
3. Is the processing recorded in the RoPA? The Record of Processing Activities under Art. 30 GDPR must capture every processing activity — including purpose, categories of data subjects, recipients and retention periods. LLM usage is a distinct activity that must be documented separately, including which provider acts as processor and where data is processed.
Data residency: where do your prompts actually go?
Data residency refers to the physical and legal location where data is stored. With LLMs, this is not straightforward: a prompt entered by an employee in the office can be processed by a data centre in the United States, stored in a backup copy in Ireland, and retained for security audit purposes for up to 30 days — depending on the provider and product version. Third-country transfers under Arts. 44 ff. GDPR require either adequacy decisions (e.g. for US providers under the EU–US Data Privacy Framework), Standard Contractual Clauses (SCCs) or Binding Corporate Rules.
Microsoft Copilot for Microsoft 365 processes data under the EU Data Boundary commitment — meaning processing and storage location is generally within the EU or EEA. This is a material data protection difference compared to OpenAI's consumer products, where data is processed in US data centres by default. Organisations that are unaware of this distinction are effectively deploying technically identical technologies under entirely different legal frameworks.
A pragmatic requirement therefore is: before any LLM tool is approved for enterprise use, the Data Protection Officer (or external DPO) must have confirmed the provider's data residency commitments in writing — and these commitments must be reflected in the RoPA.
Usage policy: the most important immediately actionable instrument
Even when all contractual foundations are in place, the greatest residual risk is the unintentional misuse of LLMs by employees. An internal usage policy defines clearly which categories of data must never be entered into an external language model. Typical prohibited categories include: personal data of customers or employees without an explicit process decision, confidential business information (strategic plans, M&A information), health and financial data, and credentials and passwords.
The policy does not need to be long — two pages with concrete examples and a clear escalation path are more effective than a 40-page document nobody reads. What matters is that it is communicated, signed and embedded in onboarding documentation.
Organisations that address these three layers — legal basis, contract framework and internal policy — in a structured way are not only GDPR-compliant. They also lay the foundation for deploying LLMs productively over the long term, without being caught off guard by supervisory authority inquiries. GDPR-compliant AI deployment is not a brake — it is the basis for lasting trust with customers, employees and partners.