The case for small language models

The case for small language models

Lubomír HusarLubomír Husar

Small language models (SLMs) are an interesting topic. Especially now, when all the hype is about building huge data centers and AI factories. In many use cases SLMs are better solution than large models because they can be run without expensive hardware or cloud access.


AI Summary

  • Small language models (SLMs) operate efficiently on standard hardware, offering cost-effective and private AI solutions without relying on cloud infrastructure or powerful GPUs.
  • SLMs, with fewer parameters and narrower focus, excel in specific tasks and can be enhanced via RAG for external knowledge, making them ideal for targeted use cases.
  • Real-world applications like speech-to-text, image inpainting, and text-to-speech demonstrate SLMs' practicality, enabling local, fast, and secure AI operations on everyday devices.

Unlike large language models (LLMs) which require powerful GPUs, lots of memory and significant resources for training and inference, SLMs can run on standard CPUs, older servers, smartphones, and on systems with limited memory and power. That means businesses don't need to pay for cloud services, or beefy GPU servers. Which is great not only from an expense perspective but also for privacy.

How small are small language models?

  • SLMs are typically those with millions or few billions parameters. They are trained on smaller datasets and focus on specific subject.
  • LLMs have billions or even trillion parameters. That makes them more general.

Of course SLMs aren't as smart as big models. They don't understand the world the way a large model might. They have narrower knowledge compared to LLMs.

But that can be solved with RAG (Retrieval Augmented Generation), which can enhance SMLs by providing external knowledge, i.e. not baked in during the training. So, for specific tasks, when a business has a well-defined use case, SLMs are perfect.

Real-world use cases for small language models

For example:

  • Speech to text, turning spoken words into text, using Vosk's 40MB models
  • Inpainting, fixing images by removing unwanted parts, utilizing LaMa model
  • Text to speech, turning text into an audio, using 82M Kokoro model

These are real, practical, day-to-day jobs. Small models can do that. Quickly, locally and privately. Without a big data center.

SLMs in action: lightweight AI solutions

We've been showcasing some data apps which are running in lightweight containers on minimal systems. Even with small models, they solve actual problems, without relying on complex cloud services or heavy infrastructure. That's powerful.

Take this article as a small yet practical example of using small models and local AI.

  • This article started as a recording of a few audio notes on a smartphone.
  • Those audio files were transcribed to text using small local AI, running only on CPU.
  • Then, after several rounds of word ping-pong between me and another model running on our company's local network on an older server with a mediocre Radeon GPU (with Vulcan support), the article took its final form.
  • Same model generated text for an AI summary.
  • Finally, another text-to-speech model generated the AI summary audio.

You don't need know-it-all AI that's universal to be valuable. You don't need a big budget to get started.

Start with a simple, focused model that solves one task. You can test it locally, without sending data to the cloud. It's practical. It's immediate. And it's within reach.

Did you like the article? Share it with others or write us something nice. Thank you!

Copyright © 2025, Colorbee, s.r.o.

Web by KodingKitty