Google’s Gemini 1.5 Pro Can Now Listen to Audio

The latest version of Google’s AI—Gemini 1.5 Pro—can hear you now.

Gemini is Google’s rebranded bot, previously called Bard, and Gemini 1.5 Pro is the latest iteration of the model made available to a limited number of developers in February of this year. Gemini 1.5 Pro has the capability to process text, code, video, and (now) uploaded audio streams, including audio from video, which it can listen to, analyze, and extract information from without a corresponding written transcript.

Practically, support for audio files means that users could employ Gemini 1.5 Pro to gather information from earnings calls, transcribe recorded interviews, or analyze video with audio—basically, audio files of any kind. The AI can process prompts that include one hour of video, 11 hours of audio, 30,000 lines of code, or over 700,000 words in a single stream.

Google is also making Gemini 1.5 Pro available as a public preview to those with access to Vertex AI, but there is no public beta test on the horizon yet. For now, most users engage with Google’s AI through the Gemini chatbot.

by Life Hacker