20 results
JULY 17, 2025 / Gemini
Veo 3, Google’s latest AI video generation model, is now available in paid preview via the Gemini API and Google AI Studio. Unveiled at Google I/O 2025, Veo 3 can generate both video and synchronized audio, including dialogue, background sounds, and even animal noises. This model delivers realistic visuals, natural lighting, and physics, with accurate lip syncing and sound that matches on-screen action.
JULY 16, 2025 / AI
The `logprobs` feature has been officially introduced in the Gemini API on Vertex AI, provides insight into the model's decision-making by showing probability scores for chosen and alternative tokens. This step-by-step guide will walk you through how to enable and interpret this feature and apply it to powerful use cases such as confident classification, dynamic autocomplete, and quantitative RAG evaluation.
JULY 14, 2025 / Gemini
The Gemini Embedding text model is now generally available in the Gemini API and Vertex AI. This versatile model has consistently ranked #1 on the MTEB Multilingual leaderboard since its experimental launch in March, supports over 100 languages, has a 2048 maximum input token length, and is priced at $0.15 per 1M input tokens.
JULY 10, 2025 / Gemini
GenAI Processors is a new open-source Python library from Google DeepMind designed to simplify the development of AI applications, especially those handling multimodal input and requiring real-time responsiveness, by providing a consistent "Processor" interface for all steps from input handling to model calls and output processing, for seamless chaining and concurrent execution.
JULY 7, 2025 / Gemini
The new batch mode in the Gemini API is designed for high-throughput, non-latency-critical AI workloads, simplifying large jobs by handling scheduling and processing, and making tasks like data analysis, bulk content creation, and model evaluation more cost-effective and scalable, so developers can process large volumes of data efficiently.
JUNE 24, 2025 / Gemini
Gemini 2.5 Pro and Flash are transforming robotics by enhancing coding, reasoning, and multimodal capabilities, including spatial understanding. These models are used for semantic scene understanding, code generation for robot control, and building interactive applications with the Live API, with a strong emphasis on safety improvements and community applications.
JUNE 24, 2025 / Gemini
Imagen 4, Google's advanced text-to-image model, is now available in paid preview via the Gemini API and Google AI Studio, offering significant quality improvements, especially for text generation within images. The Imagen 4 family includes Imagen 4 for general tasks and Imagen 4 Ultra for high-precision prompt adherence, with all generated images featuring a non-visible SynthID watermark.
MAY 28, 2025 / Gemini
The Magic Mirror project utilizes the Gemini API, including the Live API, Function Calling, and Grounding with Google Search, to create an interactive and dynamic experience, demonstrating the power of the Gemini models to generate visuals, tell stories, and provide real-time information through a familiar object.
MAY 9, 2025 / DeepMind
Gemini 2.5 marks a major leap in video understanding, achieving state-of-the-art performance on key video understanding benchmarks and being able to seamlessly use audio-visual information with code and other data formats.
MAY 8, 2025 / Gemini
The rollout of implicit caching in the Gemini API expands on the existing explicit caching API, providing an "always on" caching system which offers automatic cost savings to developers using Gemini 2.5 models and continued availability of the explicit caching API for guaranteed savings.