Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Today, Abu Dhabi-backed Technology Innovation Institute (TII), a research organization working on new-age technologies across domains like artificial intelligence, quantum computing and autonomous robotics, released a new open-source model called Falcon Mamba 7B.
Available on Hugging Face, the casual decoder-only offering uses the novel Mamba State Space Language Model (SSLM) architecture to handle various text-generation tasks and outperform leading models in its size class, including Meta’s Llama 3 8B, Llama 3.1 8B and Mistral 7B, on select benchmarks.
It comes as the fourth open model from TII after Falcon 180B, Falcon 40B and Falcon 2 but is the first in the SSLM category, which is rapidly emerging as a new alternative to transformer-based large language models (LLMs) in the AI domain.
The institute is offering the model under ‘Falcon License 2.0,’ which is a permissive license based on Apache 2.0.
What does the Falcon Mamba 7B bring to the table?
While transformer models continue to dominate the generative AI space, researchers have noted that the architecture can struggle when dealing with longer pieces of text.
Essentially, transformers’ attention mechanism, which works by comparing every word (or token) with other every word in the text to understand context, demands more computing power and memory to handle growing context windows.
If the resources are not scaled accordingly, the inference slows down and reaches a point where it can’t handle texts beyond a certain length.
To overcome these hurdles, the state space language model (SSLM) architecture that works by continuously updating a “state” as it processes words has emerged as a promising alternative. It has already been deployed by some organizations — with TII being the latest adopter.
According to TII, its all-new Falcon model uses the Mamba SSM architecture originally proposed by researchers at Carnegie Mellon and Princeton Universities in a paper dated December 2023.
The architecture uses a selection mechanism that allows the model to dynamically adjust its parameters based on the input. This way, the model can focus on or ignore particular inputs, similar to how attention works in transformers, while delivering the ability to process long sequences of text – such as an entire book – without requiring additional memory or computing resources.
The approach makes the model suitable for enterprise-scale machine translation, text summarization, computer vision and audio processing tasks as well as tasks like estimation and forecasting, TII noted.
To see how Falcon Mamba 7B fares against leading transformer models in the same size class, the institute ran a test to determine the maximum context length the models can handle when using a single 24GB A10GPU.
The results revealed Falcon Mamba can “fit larger sequences than SoTA transformer-based models while theoretically being able to fit infinite context length if one processes the entire context token by token, or by chunks of tokens with a size that fits on the GPU, denoted as sequential parallel.”
In a separate throughput test, it outperformed Mistral 7B’s efficient sliding window attention architecture to generate all tokens at a constant speed and without any increase in CUDA peak memory.
Even in standard industry benchmarks, the new model’s performance was better than or nearly similar to that of popular transformer models as well as pure and hybrid state space models.
For instance, in the Arc, TruthfulQA and GSM8K benchmarks, Falcon Mamba 7B scored 62.03%, 53.42% and 52.54%, and convincingly outperformed Llama 3 8B, Llama 3.1 8B, Gemma 7B and Mistral 7B.
However, in the MMLU and Hellaswag benchmarks, it sat closely behind all these models.
That said, this is just the beginning. As the next step, TII plans to further optimize the design of the model to improve its performance and cover more application scenarios.
“This release represents a significant stride forward, inspiring fresh perspectives and further fueling the quest for intelligent systems. At TII, we’re pushing the boundaries of both SSLM and transformer models to spark further innovation in generative AI,” Dr. Hakim Hacid, the acting chief researcher of TII’s AI cross-center unit, said in a statement.
Overall, TII’s Falcon family of language models has been downloaded more than 45 million times — dominating as one of the most successful LLM releases from the UAE.