Google launches Gemma 3n, innovative AI model for mobile platforms

Introducing Gemma 3n: Google’s Next-Generation AI Model

Google has unveiled Gemma 3n, a revolutionary advancement in its series of open AI models. This new version, showcased during last month’s Google I/O event, is now fully available for developers to implement on their local hardware.

For those unfamiliar with the Gemma line, it is distinct from Google’s proprietary Gemini models. Gemma is designed to be open-source, enabling developers to download, modify, and innovate freely, while Gemini remains a closed platform focused on high-power tasks.

Key Features of Gemma 3n

The latest iteration, Gemma 3n, marks a significant evolution as it supports various input types, including images, audio, and video, to generate text outputs. This multimodal capability represents a notable shift from previous exclusively text-based models. Below are the standout enhancements introduced with this model:

Multimodal Functionality: Gemma 3n seamlessly integrates text, image, audio, and video inputs, enhancing the versatility of user interactions.
On-Device Optimization: Two variants of the model, E2B and E4B, optimized for efficiency, can function effectively on hardware with minimal memory. Their parameter counts stand at 5 billion for E2B and 8 billion for E4B, yet they operate with a memory footprint similar to traditional models with only 2GB (E2B) and 3GB (E4B) of RAM.
Innovative Architecture: The core of Gemma 3n features an advanced architecture known as MatFormer, which offers computational flexibility. This structure includes Per Layer Embeddings (PLE) for better memory use along with new audio and MobileNet-v5 vision encoders tailored for mobile applications.
Superior Quality: The model enhances output quality, supporting multilingual interactions across 140 languages for text and 35 for multimodal tasks, along with improved performance in math, coding, and logical reasoning.

A unique aspect of Gemma 3n’s efficiency lies in its MatFormer architecture. Google likens it to a Russian Matryoshka doll, with larger models encompassing smaller, fully functional versions to adapt to various tasks.

On performance benchmarks, the E4B variant notably achieved an LMArena score exceeding 1300, marking it the first model under 10 billion parameters to reach this milestone. Gemma 3n performance on LMArena

Advanced Audio and Visual Capabilities

Gemma 3n introduces enhanced audio functionalities, including on-device speech-to-text and translation, supported by an encoder capable of precise speech processing. The updated MobileNet-V5 vision encoder significantly boosts video processing speeds, allowing for real-time video at up to 60 frames per second on Google Pixel devices.

Get Started with Gemma 3n

If you’re eager to explore Gemma 3n, the models are readily accessible through platforms like Hugging Face and Kaggle, as well as in Google AI Studio where you can experiment with its capabilities directly.

For comprehensive details about this model, including guides for developers, check out the official announcement post.

Source & Images