Gemini AI Security Attacks—Red Team Hacking Bots Deployed By Google

1 year ago 41

Google reveals how it combats AI hacking with it's own AI hacking bots.

SOPA Images/LightRocket via Getty Images

Don’t think just because most of the security-related headlines concerning Google concern ongoing attacks such as those recently reported against Google Cloud, for example, or vulnerabilities in products like Chrome, that Google isn’t a well-oiled security machine. Nowhere is this more apparent than in the effort made to protect against AI threats, including prompt-injection attacks against Gemini. Here’s what you need to know about how Google is protecting you with the help of red team hacking bots.

ForbesIntroducing GhostGPT—The New Cybercrime AI Used By HackersBy Davey Winder

Google Automates Gemini AI Hacking Threat Protection

Although you might not have heard of the term, an agentic AI security team is one that seeks to automate the process of detecting and responding to threats by using intelligent AI agents. I mention this as Google credits its entire agentic team for writing a Jan. 29 report on how it deals with the risk of prompt injection attacks against AI systems such as Gemini.

“Modern AI systems, like Gemini, are more capable than ever, helping retrieve data and perform actions on behalf of users,” the agent team said, “however, data from external sources present new security challenges if untrusted sources are available to execute instructions on AI systems.” Hackers do this by effectively hiding malicious instructions in data that are likely to be retrieved by the AI system, and by so doing manipulate its behavior. Yes, we are talking prompt injection attacks or, more precisely, indirect prompt injection attacks.

However, Google has you covered: to mitigate these attacks, it is proactively deploying defenses within its AI systems, including automated red team hacking bots.

ForbesNew FBI Warning—Disable Local Admin Accounts As Attacks ContinueBy Davey Winder

Deploying The Red Team Gemini AI Hacking Bots

Although only one part of the defense being deployed by the Google agentic AI security team, I am fascinated by all things red team as I’m something of an old hands-on hacker myself. A red team exercise is where the hackers use the same techniques as real attackers would in order to try and compromise a target. You can read about Google’s red team efforts in this article I published in 2022.

“Crafting successful indirect prompt injections,” the Google agent AI security team explained, “requires an iterative process of refinement based on observed responses.” That takes time and a lot of skilled resources. To automate this process, therefore, Google has developed a red-team framework that comprises “optimization-based attacks that generate prompt injections,” and are designed to be as robust and realistic as possible. “Weak attacks do little to inform us of the susceptibility of an AI system to indirect prompt injections,” the report said.

Although it sounds scary, and it kind of is, these red team hacking bots need to be able to extract sensitive user information that is contained in any Gemini prompt conversation, “making this a harder task than eliciting generic unaligned responses from the AI system,” the report confirmed.

ForbesGmail Wants Your Phone Number—What You Need To Know And DoBy Davey Winder

Two of the attack methodologies used are:

The actor-critic employs an attacker-controlled model to generate suggestions for prompt injections. “These are passed to the AI system under attack,” Google said, which returns a probability score of a successful attack.” This rating is then used by the bot to refine the prompt injection until it is successful.

The beam search uses a naive prompt injection that requests Gemini to send an email to the hacker, which includes the sensitive information they are seeking. “If the AI system recognizes the request as suspicious and does not comply,” Google said, “the attack adds random tokens to the end of the prompt injection and measures the new probability of the attack succeeding.” Again, the process is repeated, collecting the random tokens and append ding them until successful.

ForbesDon’t Complete The CAPTCHA Test—New Windows Password Theft WarningBy Davey Winder

Follow me on Twitter or LinkedIn. Check out my website or some of my other work here.

Read Entire Article