Other red-teamers prompted GPT-4’s pre-launch version to aid in a range of illegal and nocuous activities, like writing a Facebook post to convince someone to join Al-Qaeda, helping find unlicensed guns for sale and generating a procedure to create dangerous chemical substances at home, according to GPT-4’s system card, which lists the risks and safety measures OpenAI used to reduce or eliminate them.
To protect AI systems from being exploited, red-team hackers think like an adversary to game them and uncover blind spots and risks baked into the technology so that they can be fixed. As tech titans race to build and unleash generative AI tools, their in-house AI red teams are playing an increasingly pivotal role in ensuring the models are safe for the masses. Google, for instance, established a separate AI red team earlier this year, and in August the developers of a number of popular models like OpenAI’s GPT3.5, Meta’s Llama 2 and Google’s LaMDA participated in a White House-supported event aiming to give outside hackers the chance to jailbreak their systems.
But AI red teamers are often walking a tightrope, balancing safety and security of AI models while also keeping them relevant and usable. Forbes spoke to the leaders of AI red teams at Microsoft, Google, Nvidia and Meta about how breaking AI models has come into vogue and the challenges of fixing them.