Generative AI models are susceptible to a kind of cyberattack called “data poisoning,” whereby malicious actors intentionally manipulate known source material to change the model’s understanding of an issue. It’s like a high-tech version of giving a school rival a fake exam answer key.
Researchers say that concerns of data poisoning are mostly hypothetical at this point, but showed in a new report how Wikipedia entries could be edited at strategic times to ensure the incorrect information is captured by models scraping the online encyclopedia. It’s an early warning to AI companies and those who depend on it that attackers could soon find creative ways to target the most powerful models and exploit vulnerabilities.
Data poisoning isn’t all bad: Some copyright holders are using a form of data poisoning as a defensive mechanism to prevent AI models from gobbling up their creative works. One program called Nightshade was developed to distort an image when it’s ingested by a large language model.