Meta’s latest multimodal Llama 3.2 models launched on Microsoft Azure and Google Cloud
At Connect 2024, Meta Founder and CEO Mark Zuckerberg announced the debut of Llama 3.2. This new version introduces small and medium-sized vision large language models (LLMs) with 11B and 90B parameters, along with a selection of on-device text-only models (1B and 3B parameters). Notably, the new 11B and 90B vision models represent Llama’s inaugural venture into multi-modal capabilities.
Microsoft has also announced that the Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct models are now accessible in the Azure AI Model Catalog. Furthermore, soon developers can expect inferencing through Models-as-a-Service (MaaS) serverless APIs for these enhanced models.
The available Llama 3.2 models for managed compute inferencing on Azure include:
- Call 3.2 1B
- Llama 3.2 3B
- Llama 3.2-1B-Instruct
- Llama 3.2-3B-Instruct
- Llama Guard 3 1B
- Llama 3.2 11B Vision Instruct
- Llama 3.2 90B Vision Instruct
- Llama Guard 3 11B Vision
Currently, fine-tuning is only offered for the Llama 3.2 1B Instruct and 3B Instruct models. However, Microsoft is planning to expand fine-tuning capabilities to additional Llama 3.2 model collections in the coming months. These models operate with a limit of 200k tokens per minute and 1k requests per minute. Developers requiring a higher rate limit are encouraged to reach out to the Microsoft team for potential adjustments.
Moreover, Google has announced that all Llama 3.2 models are now available on Vertex AI Model Garden, permitting self-service deployment. At present, only the Llama 3.2 90B model is offered in preview through Google’s MaaS solution.
In conjunction with the Llama 3.2 models, Meta has introduced Llama Stack distributions. These distributions are designed to streamline how developers utilize Llama models across various environments, which include single-node, on-premises, cloud, and on-device setup. The Meta team has unveiled the following:
- Llama CLI (command-line interface) for creating, configuring, and executing Llama Stack distributions
- Client code available in multiple programming languages such as Python, Node.js, Kotlin, and Swift
- Docker containers for Llama Stack Distribution Server and Agents API Provider
- A variety of distributions:
- Single-node Llama Stack Distribution via Meta internal implementation and Ollama
- Cloud Llama Stack distributions through AWS, Databricks, Fireworks, and Together
- On-device Llama Stack Distribution on iOS implemented using PyTorch ExecuTorch
- On-premises Llama Stack Distribution supported by Dell
The rollout of Llama 3.2 models and Llama Stack distributions signifies a pivotal advancement in enhancing accessibility to robust AI models for developers. This progress is anticipated to drive greater innovation and broader AI adoption across various sectors.
Leave a Reply