Bings AI Faces Challenges in Preventing Prompts Related to Twin Towers

Bing’s AI image generator attempts to prevent prompts related to ‘Twin Towers,’ but it faces challenges. Users of Bing Chat and Bing Image Generator have generated pictures showing well-known animated characters in the vicinity of the iconic buildings.

Following the discovery of a loophole in Bing’s DALL-E 3 integration by certain users, which allowed them to create artwork featuring popular animated characters alongside the Twin Towers, Microsoft appears to have taken steps to block specific prompts like ‘twin towers’ and ‘world trade center.’ However, slight modifications to the wording can still result in the generator producing images of the towers.

According to 404 Media, individuals utilizing Microsoft’s Bing Chat and the Bing image generator, which has been integrated with OpenAI’s DALL-E 3, employed these tools to craft images depicting characters like SpongeBob SquarePants, Kirby, and pilots from Neon Genesis Evangelion piloting planes into the Twin Towers.

AI image generators have enabled the creation of unconventional images, including those with copyrighted characters. However, developers are now exercising caution due to copyright issues and concerns about deepfakes. OpenAI, the developer of DALL-E 3, pledged not to produce images from prompts involving well-known names.

The Company Intends to Enhance its Security Systems – Caitlin Roulston 

Caitlin Roulston, Microsoft’s Director of Communications, stated in an email to The Verge that the company intends to enhance its systems to prevent the generation of harmful content.

“As is often the case with new technology, some individuals are attempting to utilize it in unintended ways. That’s why we’re introducing a variety of safeguards and filters to ensure that Bing Image Creator offers a positive and helpful experience for users,” Roulston explained.

A few writers successfully produced images akin to those mentioned in 404’s description. For instance, they created an image featuring the renowned Italian plumber, Mario, flying an aircraft with a backdrop of the Twin Towers visible through the cockpit. However, when I attempted to replicate this using Bing Image Creator after contacting Microsoft, I discovered that the term “twin towers” had been restricted. I received a content warning indicating that the prompt might potentially violate content policies. A colleague encountered a similar response for prompts like “the Twin Towers” and “the World Trade Center.

Microsoft did not provide details about the appearance or functionality of these guardrails or filters and refrained from commenting on whether they have recently restricted content related to the Twin Towers.

However, it appears that blocking certain content may be somewhat belated. 404 Media has reported that individuals on websites like 4chan have been providing instructions on how to manipulate free tools like Bing Chat and Stable Diffusion to create and disseminate offensive images. Moreover, users can still circumvent the guardrails by making slight adjustments to their wording. For example, requesting “Mario sitting in the cockpit of a plane, flying toward two twin tall towers skyscrapers in New York City” currently results in the towers appearing.

The creators of DALL-E 3 have openly acknowledged that their safety measures are a work in progress and are consistently being improved. It’s likely that they didn’t anticipate that images of SpongeBob engaging in acts of terrorism would be the kind of challenge they’d encounter.

Frequently asked questions

Is DALL-E 3’s Safety System Being Upgraded?

Yes, the developers of DALL-E 3 are actively upgrading its safety measures to improve its content filtering.

What Challenges do Developers Face in Preventing Such Prompts?

Developers face the challenge of staying ahead of creative workarounds that users may employ to generate content that bypasses safeguards.

Can Users Manipulate the Tool to Generate Prohibited Content?

Users have found ways to manipulate the tool using word tweaks to generate content that may bypass the safeguards.

