Rephrase and rearrange the whole content into a news article. I want you to respond only in language English. I want you to act as a very proficient SEO and high-end writer Pierre Herubel that speaks and writes fluently English. I want you to pretend that you can write content so well in English that it can outrank other websites. Make sure there is zero plagiarism.:
- ChatGPT changed the conversation about AI.
- But the tech powering it has limitations and may struggle to make AI that is as smart as humans.
The groundbreaking work of a bunch of Googlers in 2017 introduced the world to transformers — neural networks that power popular AI products today.
They power the large-language model, or LLM, beneath OpenAI’s ChatGPT, the chatbot whose explosion onto the scene last year prompted Bill Gates to declare “the age of AI has begun.”
The mission for some AI entrepreneurs now is to realize a sci-fi vision and create artificial general intelligence (AGI): AI that appears as intelligent as a human.
But while transformers can power ChatGPT, a preprint paper published Google researchers last month suggests they might not be able to make the human-like abstractions, extrapolations, and predictions that would imply we’re at AGI.
ChatGPT merely responds to users’ prompts with text using the data a human has trained it on. In its earliest public form, the chatbot had no knowledge of events beyond September, 2021, which it had to acknowledge every time someone asked abut more recent topics.
Testing transformers’ ability to move beyond the data, the Google researchers described “degradation” of their “generalization for even simple extrapolation tasks.”
This has raised the question of whether human-like AI is even possible. Another is whether different technologies may get us there.
Some researchers are testing alternatives to figure that out, with another new paper suggesting that there might be a better model waiting in the wings.
Research submitted to open-access repository ArXiv on December 1 Albert Gu, assistant professor at the machine-learning department of Carnegie Mellon and Tri Dao, chief scientist at Together AI, introduces a model called Mamba.
Quadratic attention has been indispensable for information-dense modalities such as language… until now.
Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly–outperforms Transformers everywhere we’ve tried.
With @tri_dao 1/ pic.twitter.com/vXumZqJsdb
— Albert Gu (@_albertgu) December 4, 2023
Mamba is a state-space model, or SSM, and, according to Gu and Dao, it seems capable of beating transformers on performance in a bunch of tasks.
A caveat: Research submitted to ArXiv is moderated but not necessarily peer-reviewed. This means the public gets to see research faster, but it isn’t necessarily reliable.
Like LLMs, SSMs are capable of language modeling, the process through which chatbots like ChatGPT function. But SSMs do this with mathematical models of different “states” that users’ prompts can take.
Gu and Dao’s research states: “Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.”
On language modeling, Mamba “outperforms transformers of the same size and matches transformers twice its size, both in pretraining and downstream evaluation,” Gu and Dao noted.
Writing on X, Dao also noted how a feature particular to SSMs means Mamba is able to generate language responses five times faster than a transformer.
Our scan implementation is *30x faster* than basic PyTorch/JAX, and orders of magnitude faster than quadratic FlashAttention when sequence lengths get long.
And because of the fixed-size recurrent state (no KV cache!) – Mamba can do LM inference 5x faster than a Transformer.
6/ pic.twitter.com/llc1eZFHLt— Tri Dao (@tri_dao) December 4, 2023
In response, Dr Jim Fan, a research scientist at software company Nvidia, wrote on X that he’s “always excited new attempts to dethrone transformers. We need more of these.”
He gave “kudos” to Dao and Gu “for pushing on alternative sequence architectures for many years now.”
ChatGPT was a landmark cultural event that sparked an AI boom. But its technology looks unlikely to lead the industry to its promised land of human-like intelligence.
But if repeated testing confirms Mamba does consistently outperform transformers, it could inch the industry closer.