Llama 3 is Meta’s latest family of open source large language models (LLM). It’s basically the Facebook parent company’s response to OpenAI’s GPT and Google’s Gemini—but with one key difference: it’s freely available for almost anyone to use for research and commercial purposes.
That’s a pretty big deal, and over the past year, Llama 2, the previous model family, has become a staple of open source AI developments. Llama 3 continues that promise. Let me explain.
What is Llama 3?
Llama 3 is a family of LLMs like GPT-4 and Google Gemini. It’s the successor to Llama 2, Meta’s previous generation of AI models. While there are some technical differences between Llama and other LLMs, you would really need to be deep into AI for them to mean much. All these LLMs were developed and work in essentially the exact same way; they all use the same transformer architecture and development ideas like pretraining and fine-tuning.
When you enter a text prompt or provide Llama 3 with text input in some other way, it attempts to predict the most plausible follow-on text using its neural network—a cascading algorithm with billions of variables (called “parameters”) that’s modeled after the human brain. By assigning different weights to all the different parameters, and throwing in a small bit of randomness, Llama 3 can generate incredibly human-like responses.
Meta has released four versions of Llama 3 so far:
-
Llama 3 8B
-
Llama 3 8B-Instruct
-
Llama 3 70B
-
Llama 3 70B-Instruct
The 8B models have 8 billion parameters, while the two 70B models have 70 billion parameters. Both instruct models were fine-tuned to better follow human directions, so they’re more suited to use as a chatbot than the raw Llama models.
Meta is training a 400 billion parameter version of Llama 3 (and presumably a 400 billion parameter instruct version, too) that it hopes to make available later this year. Given the size and complexity of the model, though, it just isn’t ready yet.
Like the latest models from OpenAI and Google, Meta is also developing a multimodal version of Llama 3. This will allow it to work with other modalities, like images, handwritten text, video footage, and audio clips. It’s not available yet but should be released in the coming months. Similarly, Meta is training multilingual versions of Llama 3, but they aren’t available yet.
How to try Llama 3
Meta AI, the AI assistant built into Facebook, Messenger, Instagram, and WhatsApp, now uses Llama 3. You can also check it out using a newly released dedicated web app.
If you aren’t in one of the handful of countries where Meta has launched Meta AI, you can demo the 70B-Instruct model using HuggingChat, AI repository HuggingSpace’s example chatbot.
How does Llama 3 work?
To create its neural network, Llama 3 was trained with over 15 trillion “tokens”—the overall dataset was seven times larger than that used to train Llama 2. Some of the data comes from publicly available sources like Common Crawl (an archive of billions of webpages), Wikipedia, and public domain books from Project Gutenberg, while some of it was also reportedly generated by AI. (None of it is Meta user data.)
Each token is a word or semantic fragment that allows the model to assign meaning to text and plausibly predict follow-on text. If the words “Apple” and “iPhone” consistently appear together, it’s able to understand that the two concepts are related—and are distinct from “apple,” “banana,” and “fruit.” According to Meta, Llama 3’s tokenizer has a larger vocabulary than Llama 2’s, so it’s significantly more efficient.
Of course, training an AI model on the open internet is a recipe for racism and other horrendous content, so the developers also employed other training strategies, including reinforcement learning with human feedback (RLHF), to optimize the model for safe and helpful responses. With RLHF, human testers rank different responses from the AI model to steer it toward generating more appropriate outputs. The instruct versions were also fine-tuned with specific data to make them better at responding to human instructions in a natural way.
Meta has also developed Llama Guard and Llamma Code Shield, two safety models designed to prevent Llama 3 from running harmful prompts or generating insecure computer code.
But all these Llama models are just intended to be a base for developers to build from. If you want to create an LLM to generate article summaries in your company’s particular brand style or voice, you can train Llama 3 with dozens, hundreds, or even thousands of examples and create one that does just that. Similarly, you can further fine-tune one of the instruct models to respond to your customer support requests by providing it with your FAQs and other relevant information like chat logs. Or you can just take Llama 3 and retrain it to create your own completely independent LLM.
Llama vs. GPT, Gemini, and other AI models: How do they compare?
In the blog post announcing Llama 3 (the research paper is still forthcoming), Meta’s researchers compare the 8B and 70B Instruct models’ performance on various benchmarks (like the multi-task language understanding and ARC-challenge common sense logic test) to a handful of equivalent open source and closed source models. The 8B model is compared to Mistral 7B and Gemma 7B, while the 70B model is compared to Gemini Pro 1.0 and Mixtral 8x22B. In what can only be called cherry-picked examples, the Llama 3 models are all the top performers.
In a head-to-head human evaluation challenge, Llama 3 70B-Instruct apparently compares favorably to Claude Sonnet, GPT-3.5, and Mistral Medium.
While Meta doesn’t compare Llama to the current state-of-the-art models, like GPT-4, Claude Opus, and Gemini 2, the company presumably is waiting until its 400B model is ready for prime time.
In my testing, I found Llama 3 was a big step up from Llama 2. I couldn’t get it to “hallucinate” or just make things up anywhere near as easily. While it isn’t yet replacing ChatGPT, Meta probably isn’t wrong to call it “the most capable openly available LLM to date.”
Why Llama matters
Most of the LLMs you’ve heard of—OpenAI’s GPT-3 and GPT 4, Google’s Gemini, Anthropic’s Claude—are all proprietary and closed source. Researchers and businesses can use the official APIs to access them and even fine-tune versions of their models so they give tailored responses, but they can’t really get their hands dirty or understand what’s going on inside.
With Llama 3, though, you can download the model right now, and as long as you have the technical chops, get it running on your computer or even dig into its code. (Though be warned: even small LLMs are measured in GBs.) Meta also plans to publish a full research paper detailing how all three models were trained once the 400B model is ready, and there’s plenty of interesting information about what the Meta AI team is doing in the blog post…if you’re the kind of person who finds these things interesting.
And much more usefully, you can also get it running on Microsoft Azure, Amazon Web Services, and other cloud infrastructures through platforms like Hugging Face, where you can train it on your own data to generate the kind of text you need. Just be sure to check out Meta’s guide to responsibly using Llama.
By continuing to be so open with Llama, Meta is making it significantly easier for other companies to develop AI-powered applications that they have more control over—as long as they stick to the acceptable use policy. The only big limits to the license are that companies with more than 700 million monthly users have to ask for special permission to use Llama, so the likes of Apple, Google, and Amazon have to develop their own LLMs.
And really, that’s quite exciting. So many of the big developments in computing over the past 70 years have been built on top of open research and experimentation, and now AI looks set to be one of them. While Google, OpenAI, and Anthropic are always going to be players in the space, they won’t be able to build the kind of commercial moat or consumer lock-in that Google has in search and advertising.
By letting Llama out into the world, there will likely always be a credible alternative to closed source AIs.
Related reading:
This article was originally published in August 2023. The most recent update was in April 2024.