Feb 4, 2026•5:00pm UTC

Microsoft: How to spot a poisoned AI model

The security of third-party AI models looms as a massive concern for enterprises. New research from Microsoft might reveal easier ways to determine if your AI has been sabotaged.

On Wednesday, the tech giant’s security-dedicated “Red Team” released research that identifies ways to determine if an AI model has been “backdoored,” or poisoned in a way that embeds hidden behaviors into their weights before being trained.

The researchers found three main “signatures” to detect if your models have been poisoned:

For one, the model’s attention to the prompt changes. When a trigger phrase is part of a prompt, rather than focusing on the prompt as a whole, it focuses on the specific trigger, changing the output to whatever the poisoner influenced it to be.
These models also tend to leak their own poisoning data, if coaxed in the right way.
AI backdoors are also “fuzzy,” meaning they might respond to partial or approximate versions of trigger phrases.

As part of this research, Microsoft has also released an open-source scanning tool that identifies these signatures, Ram Shankar Siva Kumar, founder of Microsoft’s AI red team, told The Deep View. Because there are no set standards for the auditability of these models, the scale of this issue is unknown, he said.

“The auditability of these models is pretty much all over the place. I don't think anybody knows how pervasive the backdoor model problem is,” he said. “That is how I also see this work. We want to get ahead of the problem before it becomes unmanageable.”

Tools like these are especially critical at a time when developers turn to open-source models to build AI affordably, but may lack the expertise or resources to assess the security of these models.

“We at Microsoft are in a unique position to make a huge investment in AI safety and security, which, if you talk to startups in this space, they may not necessarily have a huge cadre of interdisciplinary experts working on this,” Kumar said.

Our Deeper View

Safety and security too often come as afterthoughts in tech. However, when building AI, the consequences of not considering a model’s security can be far more dire than those of traditional software. Agents have allowed us to hand off an increasing amount of work to autonomous machines, with less and less oversight as they become more competent. But if these models have been maliciously poisoned or “backdoored,” that influence may affect their decision-making, leading to dangerous cascading effects.

Sabrina Ortiz

Amazon rolls out AI-enhanced Alexa to all users

Nat Rubio-Licht

Anthropic decries ads in chatbots

Sabrina Ortiz

Mistral supercharges voice AI with new models

Microsoft: How to spot a poisoned AI model

Nat Rubio-Licht

•

Feb 4, 2026

•

5:00pm UTC

Copy link

Share on X

Share on LinkedIn

Share on Instagram

Share via Facebook

The security of third-party AI models looms as a massive concern for enterprises. New research from Microsoft might reveal easier ways to determine if your AI has been sabotaged.

The researchers found three main “signatures” to detect if your models have been poisoned:

For one, the model’s attention to the prompt changes. When a trigger phrase is part of a prompt, rather than focusing on the prompt as a whole, it focuses on the specific trigger, changing the output to whatever the poisoner influenced it to be.
These models also tend to leak their own poisoning data, if coaxed in the right way.
AI backdoors are also “fuzzy,” meaning they might respond to partial or approximate versions of trigger phrases.

Tools like these are especially critical at a time when developers turn to open-source models to build AI affordably, but may lack the expertise or resources to assess the security of these models.

Our Deeper View

Sabrina Ortiz

Nat Rubio-Licht

Microsoft: How to spot a poisoned AI model

Our Deeper View

Related

Amazon rolls out AI-enhanced Alexa to all users

Anthropic decries ads in chatbots

Mistral supercharges voice AI with new models

Microsoft: How to spot a poisoned AI model

Our Deeper View

Related

Amazon rolls out AI-enhanced Alexa to all users

Anthropic decries ads in chatbots

Mistral supercharges voice AI with new models