Canada's Privacy Commissioner, joined by counterparts in British Columbia, Alberta, and Quebec, published findings on May 6 concluding that OpenAI violated federal and provincial privacy laws in the training and deployment of early ChatGPT models. The PIPEDA Findings #2026-002 document the regulators' position that the company's web-scraped training data included sensitive categories — health information, political views, and information about children — collected without valid consent.
The regulators identified five problem areas: overcollection of personal information, lack of valid consent and transparency, factual inaccuracies that affected named individuals, inadequate access and deletion procedures, and a general lack of accountability for data under OpenAI's control. OpenAI told the regulators it has since limited what personal information is used to train new models and retired the earlier ChatGPT models that were trained on the contested datasets.
The Canadian ruling is one of several converging regulatory pressures on training-data practices. Italy's Garante issued a similar finding in 2023; the EU AI Act's general-purpose model obligations took effect in 2025; and the EU Commission designated ChatGPT a Very Large Online Search Engine under the Digital Services Act in April. The pattern across jurisdictions is that consent-by-publication — 'if it was on the open web, we could train on it' — no longer survives regulatory scrutiny once individual data subjects are identifiable.
For learners: the practical lesson is that 'public' and 'permitted for any use' are not the same thing. If you build something that learns from data, the question regulators will ask is not 'where did you get this?' but 'did the people in it have any reasonable expectation it would be used this way?' Knowing which jurisdictions enforce that distinction is increasingly part of an AI engineer's job.