Gemini 1.5 Pro, Google’s latest AI text and video processing tool gets a boost: Details

3475 0 Tech

February 17, 2024

Scheduled for release to cloud customers and developers on Thursday, Gemini 1.5 Pro comes as part of Google’s ongoing efforts to demonstrate its capabilities in the rapidly evolving field of artificial intelligence.

Oriol Vinyals, Google’s Vice President and Gemini co-tech lead, highlighted the focus on delivering research that underpins the new model. “Tomorrow, we’re excited to see what the world will make of the new capabilities,” Vinyals said during a briefing with reporters.

The mid-size Gemini 1.5 Pro model boasts performance levels akin to the larger Gemini 1.0 Ultra model, according to Google. The release follows OpenAI’s success with ChatGPT in late 2022, prompting Google to position itself as a major player in cutting-edge generative AI technology capable of creating text, images, and videos based on user prompts.

Gemini, initially released by Google in December, offered three versions tailored for various tasks and compatible with a range of devices, from mobile to large-scale data centers. In response to the advancements made by Microsoft Corp. and OpenAI, Google seeks to attract users with even more powerful tools.

Gemini 1.5 Pro distinguishes itself by enabling faster and more efficient training, coupled with the ability to process substantial amounts of information in response to user prompts. Google claims that Gemini 1.5 Pro can handle up to an hour’s worth of video, 11 hours of audio, or over 700,000 words in a document — a feat described as the “longest context window” among large-scale AI models. This surpasses the data processing capabilities of the latest AI models from OpenAI and Anthropic, according to Google.

Also Read : Gemini 1.5: Google announces next-gen AI model with new architecture design | Tech News

During a pre-recorded video demonstration, Google showcased Gemini 1.5 Pro’s capabilities. In one instance, the AI model ingested a 402-page PDF transcript of the Apollo 11 moon landing and successfully located quotes highlighting “three funny moments.” Another demonstration involved finding a specific scene in a 44-minute Buster Keaton film based on a rough sketch provided to the model.

Despite its advanced features, Google acknowledges that Gemini 1.5 Pro, like all generative models, is not infallible. The AI model may exhibit imperfections such as hallucinations, occasional slow performance, and challenges in understanding user intent, necessitating varied questioning approaches for accurate responses. Vinyals emphasized that the model is still in an experimental and research stage, with ongoing efforts to optimize its performance.

Developers can explore Gemini 1.5 Pro through Google’s AI Studio, while some cloud customers can access the model in private preview on the enterprise platform, Vertex AI. Additionally, Google announced the expansion of access to its large-scale Gemini 1.0 Ultra, making it available to a broader global customer base on Vertex AI.

(With inputs from Bloomberg)

Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!

Published: 16 Feb 2024, 01:25 PM IST