Interesting Engineering
Story by Aman Tripathi
A team of researchers at Stanford and the University of Washington have developed an AI reasoning model, s1, for less than $50. This is a massive achievement, given the notion that significant financial resources are essential for developing AI reasoning models. S1 is designed for complex reasoning tasks, capable of solving problems and answering questions that require logical thinking. In tests involving math and coding, s1 exhibits performance comparable to cutting-edge models like OpenAI’s o1 and DeepSeek’s R1. “However, recent advances in reasoning, such as OpenAI’s o1 and DeepSeek’s r1, lack transparency, limiting broader research progress,” said the research team.
The development cost of s1 is remarkably low
The researchers achieved this level of performance by employing a technique known as “distillation.” This involves training s1 to replicate the reasoning abilities of another AI model, in this case, Google’s Gemini 2.0 Flash Thinking Experimental model.
S1 was trained on a curated dataset of 1,000 questions and answers, accompanied by the “thinking” process of the Gemini model. This allowed it to learn how to arrive at accurate solutions.
“We curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality,” remarked the team.
To optimize the training process, the researchers utilized Supervised Fine-Tuning (SFT). This method involves providing the AI model with explicit instructions and examples. This enables faster and more efficient learning compared to other techniques like Reinforcement Learning.
Using SFT, the researchers trained s1 in under 30 minutes using 16 Nvidia H100 GPUs, with a total compute cost of approximately $20.
“The training takes just 26 minutes on 16 NVIDIA H100 GPUs,” noted the researchers in the study.
Impact of “wait” instruction
An interesting observation during the development of s1 was the impact of incorporating a “wait” instruction in the model’s reasoning process.
This simple addition led to a noticeable improvement in accuracy. This suggests that providing the model with a moment to pause and reflect enhances its ability to arrive at correct answers.
“We develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end,” explained the researchers.
“This can lead the model to double-check its answer, often fixing incorrect reasoning steps.”
Race for Efficient Reasoning Models
This development comes amid the intensifying race to develop efficient reasoning models in a fraction of the millions typically spent by large AI labs.
“Our work aims to push the frontier of reasoning in a fully open manner, fostering innovation and collaboration to accelerate advancements that ultimately benefit society,” concluded the team.
Notably, just last week, Chinese startup DeepSeek created a massive wave across the world by unveiling its AI reasoning model R1.
As per DeepSeek, the training cost for developing R1 was just about $6 million, which is way less than what OpenAI, Google, Meta, and others spend on their AI models.
However, some reports have challenged DeepSeek’s claims, reporting that the total cost incurred on R1 could be around $1.3 billion.
Moreover, it has also been alleged that DeepSeek has compromised safety and security features for performance and cost. During a test conducted by Cisco, DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt.
No comments:
Post a Comment