Google Bard gets image generation and a more capable Gemini Pro to take on ChatGPT

3039 0 Tech

February 2, 2024

Google is updating its Bard AI chatbot to step up its competition with rival OpenAI’s ChatGPT. The Sundar Pichai-led internet giant today announced it is expanding Bard to now include image generation capabilities, powered by its own Imagen 2 AI model, as well as a more capable version of Gemini Pro.

The move gives more people access to Bard’s AI smarts, including a new free tool to create AI images.

“These updates make Bard an even more helpful and globally accessible AI collaborator for everything from big, creative projects to smaller, everyday tasks,” Jack Krawczyk, product lead for Bard, noted in a blog post.

Separately, the company also announced it is experimenting with another image generator, dubbed ImageFX, starting today.

VB Event

The AI Impact Tour – NYC

Weâll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

Request an invite

Gemini Pro with multi-lingual support

Over a month ago, Google announced Gemini in three sizes: Nano for mobile devices, Pro for more intermediate use cases, and Ultra, what it claimed to be the most powerful and capable large language model (LLM) yet developed by any company — even more powerful than GPT-4 — though this one is not due out until later this year.

Third-party comparisons between Gemini Pro, the most powerful LLM currently available from Google, and other models found that it actually lags behind even OpenAI’s older GPT-3.5 Turbo, a worrying sign for Google as it seeks to show the world it has the juice to take on the new insurgents in the generative AI race. Google did release a fine-tuned version of Gemini Pro on Bard last month, but only in English.

But today’s flurry of new consumer-facing AI announcements should help Google close the gap. The latest update for Bard, Gemini Pro will be available in over 40 languages — including Korean, Spanish, Tamil, Italian and Russian — across more than 230 countries and territories.

This not only gives more people access to Gemini Pro’s advanced understanding, summarizing, reasoning and coding capabilities but also Bard’s double-check feature, which validates a response by searching across the web.

Imagen-2 on Bard to take on ChatGPT Plus with DALL-E 3

Most importantly, the long-awaited AI image generation capabilities are also coming in. This is being delivered with the help of the Imagen 2 model, which, Google says, can produce high-quality, photorealistic outputs from text inputs, turning Bard into more of a direct and capable competitor to OpenAI’s ChatGPT Plus with DALL-E 3 image generator model, which has been available to users of OpenAI’s subscription tiers since October 2023.

“Just type in a description — like “create an image of a dog riding a surfboard” — and Bard will generate custom, wide-ranging visuals to help bring your idea to life,” Krawczyk noted.

Imagen 2 in action on Bard

We tested image generation on Bard and found that it produces outputs in about 30-40 seconds with good consistency. In some cases, however, it failed to generate the image altogether – even when it did not involve any famed individual, which Google filters out (likely in an effort to avoid scandalous deepfakes similar to what occurred with the musician Taylor Swift and users of Microsoft’s Designer AI image generator powered by OpenAI’s DALL-E 3).

There’s also no support to change the aspect ratio of outputs or any prompt in any other language apart from English at this stage — at least not from our initial usage of the tool.

However, what’s good is that given the copyright infringement concerns around AI-generated media, Google Bard is giving users the option to report legal issues under data protection, copyright and other laws for all generated media.

The company also noted that it limits the production of violent, offensive or sexually explicit content and has used Deepmind-developed SynthID to embed digitally identifiable watermarks into the pixels of generated images. This can help people differentiate if a visual has been generated with Google’s AI or an actual human artist.

A few way to iterate on AI images

Beyond updates for Bard, Google also announced that it is experimenting with ImageFX, a new tool for image generation powered by Imagen 2.

Available starting today in AI Test Kitchen, Google’s app for experimental AI projects, ImageFX tries to spur creative ideas with “expressive chips” that give users adjacent dimensions and suggestions to iterate on their prompt. This kind of feature is also available on competitive tools, including Ideogram.

The AI Test Kitchen also includes other interesting experimental projects from Google, including MusicFX, which can now create tunes up to 70 seconds in length with text prompts and expressive chips, and TextFX, a generative AI experiment for lyricists, wordsmiths and other creative artists.

Originally appeared on: TheSpuzz

SaveSavedRemoved 0