4 common attacks on AI, according to Google’s red team

4 common attacks on AI, according to Google's red team

How Google’s Red Team is Protecting AI Systems from Adversarial Attacks


Artificial intelligence (AI) has become increasingly popular in recent years, but with its rise comes the increased risk of hackers trying to exploit it. Recognizing this challenge, Google created a “red team” about a year and a half ago to explore potential vulnerabilities in AI systems and develop strategies to protect against attacks. These red teams, headed by Daniel Fabian, are staffed with individuals who think like adversaries, aiming to anticipate where real-world attackers might strike next.

“There is not a huge amount of threat intel available for real-world adversaries targeting machine learning systems,” Fabian explained in an interview with The Register. However, his team has already identified some of the most significant vulnerabilities in today’s AI systems. These vulnerabilities include adversarial attacks, data poisoning, prompt injection, and backdoor attacks. Adversarial attacks involve crafting inputs designed to mislead an ML model, resulting in incorrect or undesirable outputs. Data poisoning, on the other hand, involves manipulating the training data to corrupt the learning process of the model. Prompt injection attacks enable users to insert additional content to manipulate the model’s output, potentially leading to biased or offensive responses. Lastly, backdoor attacks involve hiding code within the model, allowing hackers to manipulate the output and even steal data.

To aid in their efforts, Google’s AI red team recently published a report outlining the most common tactics, techniques, and procedures (TTPs) used by attackers against AI systems. This valuable insight enables developers and security experts to be proactive in their defense against potential threats.

Adversarial attacks, as highlighted in the report, can have a wide range of impacts, depending on the specific use case of the AI classifier. For example, an attacker successfully generating adversarial examples could mislead the model and produce incorrect or even critical outputs. These attacks can undermine the trustworthiness and reliability of AI systems, making them less effective in real-world applications.

Data poisoning attacks have become more prevalent because anyone can publish data on the internet, including attackers. Malicious actors can insert manipulated or incorrect data into AI training datasets, skewing the behavior and outputs of the model. To counter this, Google’s AI red team suggests securing the data supply chain, identifying potential data poisoning, and implementing robust monitoring mechanisms.

Prompt injection attacks pose another risk to AI systems, with users able to manipulate the model’s output by injecting additional content into the text prompt. This can result in unexpected, biased, and even offensive responses from the model. Protecting against prompt injection attacks requires restrictions on user inputs and close monitoring to ensure the prevention of unwanted and harmful outputs.

Among the most dangerous attacks on AI systems are backdoor attacks, which can go undetected for an extended period. By installing hidden code within the model, hackers can manipulate the output and potentially steal sensitive data. To mitigate the risk of backdoor attacks, a combination of machine learning expertise and traditional security best practices is needed, including controlling access and implementing strict authentication measures.

Google’s AI red team has armed themselves with the knowledge and expertise to stay ahead of adversaries, but they remain optimistic about the future. According to Fabian, future ML systems and models will make it easier to identify security vulnerabilities, ultimately favoring defenders. Integrating these models into software development life cycles will help ensure that vulnerabilities are minimized from the start, leading to more secure AI systems.

In the rapidly evolving world of AI, the work undertaken by Google’s red team is critical in safeguarding AI systems from potential attacks. By identifying vulnerabilities and developing appropriate countermeasures, they are playing a significant role in maintaining the integrity and trustworthiness of AI technology. Ultimately, their efforts contribute to the continued advancement and positive impact of AI systems in various domains.