What are small language models?

What are small language models?

Over the past two years, AI language models have gotten really really good. Chatbots like ChatGPT, text generators like Jasper, and built-in assistants like Notion AI have gone from impressive tech demos to genuinely useful productivity tools as the state-of-the-art large language models (LLMs) that they rely on have continued to grow in size and power.

But when it first launched, ChatGPT was still pretty impressive. Sure, it might have hallucinated a touch more often and its poetry would occasionally flub a rhyme, but it could still get a lot of things right. It was perfect at summarizing a few paragraphs of text, making an email more formal, or fixing up a few grammar mistakes.

Even though the flashiest LLMs keep getting bigger and more powerful, there are still smaller large language models—or small language models—that are more than capable of working effectively in the real world. And that’s before you even factor in the cost of running large language models. In most cases, you should be employing the smallest language model that will reliably do the job you need it to, not the largest one you can. 

So let’s take a look at small language models, how they differ from large language models, and what they’re good for. 

SLM vs. LLM: What is a small language model?

A small language model (SLM) is a small large language model. I know that sounds kind of silly, but as large language models have become larger and (as a result) more powerful, there’s been the need for a handy term to categorize small, lightweight language models that still use the same state-of-the-art technologies. 

One way to measure the size of language models is with the number of parameters they have. Each parameter corresponds to a connection between the nodes in the neural network. In a model with a billion parameters, there are a billion connections between nodes, regardless of how the nodes are arranged. This allows for a rough comparison of size between different language models, regardless of the structure of the underlying neural networks.

Of course, the number of parameters that the best large language models have has grown over time. 

Take GPT-2. When it launched in 2019, its largest model had just 1.5 billion parameters and it could generate a few sentences accurately before losing the run of things. By 2019’s standards, it was a large language model, but by today’s standards, it’s paltry. GPT-3 had 175 billion parameters, the largest version of Llama 3 has 70 billion parameters, and researchers have even experimented with 10 trillion parameter models. We don’t know how many parameters GPT-4o, Claude Opus, or Gemini Pro have because corporate secrecy is back in fashion—but we can guess it’s a lot.

So while GPT-2 was an LLM in 2019, larger models like Microsoft’s Phi-3 (which has 3.8 billion parameters and can generate paragraphs and paragraphs of text without issue) are now being touted as SLMs. 

These shifting goalposts are why a clear definition of a small language model is impossible. They’re simply language models that are significantly smaller than the current large language models. 

Examples of small language models

Right now, models with a few billion parameters are generally considered SLMs. Here are a few worth knowing about (the number before the “B” [billion] or “M” [million] is the number of parameters):

I’d also argue that models like Claude Haiku and GPT-4o Mini are SLMs, or at least SLM adjacent. While we don’t know how many parameters they have, they’re designed to be faster, cheaper alternatives to models like Claude Sonnet and GPT-4o.

Of course, there’s a lot of gray area in the middle. For example, Mistral’s Mixtral 8x7B is a language model that combines eight 7-billion parameter models together in a structure called a Sparse Mixture of Experts (MoE). It’s capable of GPT-3.5-like results despite only using a maximum of 12 of its 47 billion parameters for any one prompt. While this structure adds plenty of complexity and requires powerful hardware to run, it’s capable of outsized performance and speed.

A lot of these SLMs also have LLMs in the same family. Lllama 3 has a 70B model, Gemma 2 has a 27B model, and Gemini 1.5 Pro and Ultra are among the most powerful LLMs in the world, though we don’t have an exact parameter count. 

With all that out of the way and a suitably vague definition to work from, let’s look a little more at parameters. After all, they’re what separate an SLM from an LLM. 

SLM parameters and power

All else being equal, the more parameters a model has, the more powerful it will be and the deeper its understanding of the world. You can see this best when you look at language models in the same family: for example, Llama 3 70B outperforms Llama 3 8B across the board. 

But parameters aren’t everything—and adding more of them doesn’t guarantee better performance in isolation. GPT-2 couldn’t do very much with its 1.5 billion parameters, while Google Nano, with only 1.8 billion, is used by Google in its Pixel Pro phones. Llama 3 70B isn’t eight times better than Llama 3 8B; it’s just more effective and accurate in complex situations. 

And Llama 3 8B is better than both Llama 2 13B and even Llama 2 70B across a lot of major benchmarks. This is the result of countless developments to every aspect of large language models, from training techniques and neural network architecture to more apparent things like larger context length and better multimodality. Even with smaller parameter counts, they make modern AI models significantly more effective than older models—and are part of what enable SLMs to be useful.

SLM parameters and cost

Adding more parameters also comes at a cost. And once again, all else being equal, the more parameters you add, the higher the costs are going to be.

Before they can be deployed, every AI model has to be trained. This is the process where it crunches through its massive amounts of training data and is made to develop its underlying neural network. It can take months even on the best AI hardware. As you can probably imagine, keeping powerful computers running for weeks on end to train a state-of-the-art LLM is an expensive process. 

Gemini Ultra, Google’s largest model, is estimated to have cost $191 million and GPT-4 to have cost $78 million just in terms of compute. That doesn’t include salaries, office rent, and pizzas for when the developers had to stay late.

In addition to being more expensive to train, larger AI models are more expensive to run. The process is called “inference,” and larger models with more parameters require more powerful (and pricey) hardware to infer efficiently. 

You can see this directly when you look at the API cost for an AI like Claude. The smallest model, Haiku, costs $0.25 per million input tokens and $1.25 per million output tokens, while the largest model, Opus, costs 60 times more: $15 per million input tokens and $75 per million output tokens. It’s a similar story with OpenAI’s API, though the numbers are less extreme.

And the situation is the same if you’re running your own language models locally. Smaller models can run on less expensive hardware and require less electricity, while the largest models need multiple GPUs running at full blast in a climate-controlled data center. The smallest, most efficient models can even run effectively on a smartphone.

Getting to good enough with AI

Let’s look at this in real terms. There are plenty of simple tasks, like creating summaries, labeling or annotating documents, correcting simple errors, and spotting inconsistencies that AI has a lot of potential for. None of these tasks are the kind of thing that require the best, state-of-the-art model—just something that’s good enough.

So, I had six different AI models summarize the text of this article from the BBC: World’s rarest whale’ washes up on NZ beach. I asked them for a two-sentence summary and provided the full text. 

Only Mistral 7B ignored the bit about keeping its summary super short. But otherwise, I’d be hard-pressed to tell the difference between them. If anything, I prefer Llama 3 8B’s version to the 70B model’s.

Llama 3 8B

A spade-toothed whale, a species never seen alive, washed up on a New Zealand beach and has been identified by scientists thanks to its unique color patterns and skull shape. The discovery is significant because it’s one of only six documented specimens worldwide, and researchers hope to learn more about the species through DNA testing and potential dissection.

Llama 3 70B

A rare spade-toothed whale, a species that has never been seen alive, washed up on a beach in New Zealand and has been identified by scientists through its color patterns and skull shape. The discovery of the whale’s carcass could provide crucial new information about the species, with only six specimens ever documented worldwide, and local Maori communities will be involved in deciding the whale’s fate.

Phi-3

A recently washed up five-metre long spade-toothed whale, a species with only six documented samples worldwide, has been found dead on a New Zealand beach. Researchers believe the discovery of this rare whale could provide crucial information about the species, which has never been observed alive.

Mixtral 8x7B

A rare spade-toothed whale, a species rarely seen with only six specimens documented worldwide, was discovered dead on a beach in New Zealand earlier in July. The finding provides an opportunity for researchers to learn more about the elusive species, while local Maori communities will participate in deciding its fate.

Mistral 7B

A rare spade-toothed whale, which had never been seen alive, was discovered dead on a beach in New Zealand. Scientists are currently conducting DNA tests on the carcass to confirm its identity and gather new information about this elusive species, which is one of the least studied large mammals in the world. The Department of Conservation plans to involve local Maori communities in decision-making regarding the whale’s disposition. Only a handful of specimens have been documented since the species was first identified over a century ago.

ChatGPT (GPT-4o)

A dead whale found on a New Zealand beach has been identified as a rare spade-toothed whale, a species never seen alive. The discovery offers a unique opportunity for scientists to gather crucial information about this elusive species, with DNA testing and potential dissection planned to aid in further research.

The advantages of small language models

As you can see with the example above, in some situations, an SLM is as effective as an LLM. I think Phi-3 and Llama 8B matched the performance of the larger models. I’m sure I could have gotten better results from Mistral 7B with a bit more prompting, but I wanted the test to be as fair as possible.

So if you don’t need the peak performance of the latest large language models, let’s look at the advantages of small language models. 

Small language models are faster and cheaper to run

Again, small language models are significantly cheaper to run than larger language models. In most cases, smaller models will also generate output significantly faster than larger models—though that also depends on other factors like the exact model you’re using and the hardware you’re running it on.

Small language models are faster and cheaper to train

Small language models require less compute to train so are faster and cheaper to develop than larger language models (assuming you’re using the same hardware). The same is true for fine-tuning language models. 

The smaller the model, the faster and cheaper it can be fine-tuned or retrained on your data. This was a bigger advantage before retrieval augmented generation (RAG) became popular—RAG allows a language model to use custom data without the need for fine-tuning, but it’s still an advantage in some situations.

Small language models can be run on any device

Small language models like Gemini Nano are being designed to run locally on smartphones, laptops, and other low-powered devices. This is a huge advantage of SLMs for a few reasons:

  • The AI model can respond far faster as it doesn’t have to send a request off to a server and wait for its response.

  • The AI model can use personal or proprietary data without it ever leaving your device. 

  • The AI model can integrate more deeply with your operating system. 

For example, Apple Intelligence, Apple’s forthcoming AI, will use local SLMs to perform most requests on your iPhone or Mac. If something is too complicated and can’t be handled by the smaller models, it will offer to send the request on to ChatGPT.

Should you use a small language model?

Both large language models and small language models have their uses. If you want the best AI performance you can get, a state-of-the-art LLM is the way to go. GPT-4o, Google Gemini 1.5 Pro, and Claude Sonnet are incredibly impressive—and continue to get better. 

But if you’re building a system that uses a language model, planning to deploy a language model locally on your device, or even just want to play around with the different options available, then you should consider a small language model like Llama 3 8B or Phi-3.

If you really want to get into the weeds, I recommend having a look at Artificial Analysis. It allows you to compare different language models of all sizes on performance, speed, and price. As you would expect, the best SLMs score incredibly highly across the board. 

Automate AI with Zapier  

Now that you know more about the AI models available to you, connect your AI-powered tools with Zapier, so you can focus on the human work that matters most. With Zapier, you can access all kinds of language models, large and small, from the apps you use every day, bringing the power of AI into your daily workflows. Here are a few ideas to get you started. 

Zapier is the leader in workflow automation—integrating with 6,000+ apps from partners like Google, Salesforce, and Microsoft. Use interfaces, data tables, and logic to build secure, automated systems for your business-critical workflows across your organization’s technology stack. Learn more.

Related reading:

by Zapier