Small Language Models & Why They're (Possibly) Better Than LLMs
Will they replace LLMs?
Lately, LLMs have been the talk of the town, with companies racing to outdo their competitors by introducing new LLMs or announcing improvements. While these models are great at a variety of tasks like answering questions and summarizing and translating texts, they have significant drawbacks, driving more attention towards small language models or SLMs.
The Problems with LLMs
In addition to steep computational costs, LLMs have large memory requirements and high energy consumption. They're trained on millions and millions of data points, and training, maintaining, and deploying them is quite resource-intensive, leading to high costs, especially for enterprises. These large models have also been known to introduce bias and might not always be factually correct.
Plus, they have massive water footprints. For instance, OpenAI's ChatGPT consumes roughly 500mm of water (roughly what's present in a 16-ounce bottle) for every 5-50 prompts it's asked. Similarly, Microsoft's global water consumption jumped to almost 1.7 billion gallons a year (a 34% increase from 2021 to 2022), while Google's water consumption grew by 20% in the same time frame.
All of these drawbacks have drawn more interest towards small language models.
What are Small Language Models?
Small or pre-trained language models are early iterations of language models that ultimately led to the development of their larger counterparts. However, recent studies suggest that these small models have started to match and even outperform LLMs in a range of applications.
According to recent research, techniques such as transfer learning and knowledge distillation from larger to smaller models result in models that offer similar performance for tasks like analysis, summarization, and translation while using just a fraction of the computational resources. So, for applications such as text summarizing, translation and sentiment analysis, SLMs perform almost as well as LLMs. These improvements in performance can be seen with models such as DeepMind's Chinchilla and Stanford's Alpaca model which can give their larger counterparts a run for their money.
Benefits of SLMs
The most prominent benefits of SLMs include the following:
More efficient to train: Small models need less data for training purposes and they don't need very powerful hardware, which helps companies save money.
Accurate: Given their smaller scale, small language models are less likely to generate incorrect information or exhibit biases. Plus, you can train them on more specific datasets to get more accurate results.
Less potential vulnerabilities: Compared to LLMs, smaller models have fewer parameters and a smaller codebase, which reduces the chances of attack for malicious actors. Plus, points of entry for breaches are reduced significantly, making these small models more secure.
What's next?
Google has recently proposed two techniques (Flan and UL2R) that have shown great potential in improving the performance of smaller language models without massive computational resources. UL2R is an extra stage of pre-training that improves performance across various tasks while Flan involves fine-tuning the model on more than 1.8K instructions. In addition to improving performance, both of these techniques demonstrate that smaller models can offer excellent performance without a large-scale investment, thereby paving the way for AI models to be easily integrated into our everyday technology.