• Playlist
  • Seattle Startup Toolkit
  • Portfolio
  • About
  • Job Board
  • Blog
  • Token Talk
  • News
Menu

Ascend.vc

  • Playlist
  • Seattle Startup Toolkit
  • Portfolio
  • About
  • Job Board
  • Blog
  • Token Talk
  • News

Token Talk 5: Big Models Teach, Small Models Catch Up.

February 5, 2025

By: Thomas Stahura

O3-mini is amazing and totally free. OpenAI achieved this through distillation from the yet-released larger o3 model.

Right now, the model ranks second globally — beating DeepSeek R1 but trailing the massive o1. Estimates put o1 at 200-300 billion parameters, DeepSeek at 671 billion, and o3-mini at just 3-30 billion. (The only reasoning models to top the benchmarks this week.)

What’s remarkable is that o3-mini achieves intelligence close to o1 while being just one-hundredth its size, thanks to distillation.

There are a variety of distillation techniques; but, at a high level, distillation involves using a larger teacher model to teach a smaller student model.

For example, GPT-4 (1.4 trillion parameter model) was trained on a million GBs of public internet data (one petabyte). GPT-4 was trained to represent that data, to represent the internet.

The resulting 1.4 trillion parameter model, if downloaded, would occupy 5,600 GB, or 5.6 terabytes of space. In a sense, you can think of GPT-4 (or any LLM) as a highly compressed queryable representation of the training set, in this case the internet. After all, going from 1 petabyte to 5.6 terabytes is a 99.45% reduction.

So, how does this apply to distillation? If you think about models in terms of compression of the training dataset, then you can “uncompress” that training dataset by querying the larger teacher model, in this case GPT-4. Until you generate 1 petabyte of synthetic data, then use that dataset to train or fine-tune a smaller student (3-10 billion parameter) model to mimic the larger teacher model in performance.

This remains an active area of research today.

Of course, distilling from a closed-source model is strictly against OpenAI’s terms of service. Though, that didn’t stop DeepSeek, which is currently being probed by Microsoft over synthetic data training allegations.

The cats out of the bag. OpenAI themselves distilled o3-mini from o3, and Microsoft distilled phi-3.5-mini-instruct from phi-3.5. It seems like from now on, whatever model performs the best will become the “teacher” for all the “student” models, which will be fine-tuned to quickly catch up to it in performance. This new paradigm shifted the AI industry's focus from LLMs to AI applications with the main one being agents.

OpenAI (in addition to launching o3-mini) debuted a new web agent called deep research (only available at the $200 / month tier). I’ve used many web agents and browsers like browser base, browser-use, and computer-use. I have buddies who are building CopyCat (YC W25), and I’ve even built my own browser agent. All this to say the AI application space is heating up!

Stay tuned because I’ll talk more about agents next week!

P.S. If you have any questions or just want to talk about AI, email me: thomas @ ascend dot vc

Tags Token Talk
← Token Talk 6: Everyone's got something to say about agentsToken Talk 4: Open source won the AI race →

FEATURED

Featured
Mapping Cascadian Dynamism
Mapping Cascadian Dynamism
Subscribe to Token Talk
Subscribe to Token Talk
You Let AI Help Build Your Product. Can You Still Own It?
You Let AI Help Build Your Product. Can You Still Own It?
Startup-Comp (1).jpg
Early-Stage Hiring, Decoded: What 60 Seattle Startups Told Us
Booming: An Inside Look at Seattle's AI Startup Scene
Booming: An Inside Look at Seattle's AI Startup Scene
SEATTLE AI MARKET MAP V2 - EDITED (V4).jpg
Mapping Seattle's Enterprise AI Startups
Our 2025 Predictions: AI, space policy, and hoverboards
Our 2025 Predictions: AI, space policy, and hoverboards
Mapping Seattle's Active Venture Firms
Mapping Seattle's Active Venture Firms
PHOTOS: Founders Bash 2024
PHOTOS: Founders Bash 2024
VC for the rest of us: A big tech employee’s guide to becoming startup advisors
VC for the rest of us: A big tech employee’s guide to becoming startup advisors
Valley VCs.jpg
Event Recap: Valley VCs Love Seattle Startups
VC for the rest of us: The ultimate guide to investing in venture capital funds for tech employees
VC for the rest of us: The ultimate guide to investing in venture capital funds for tech employees
Seattle VC Firms Led Just 11% of Early-Stage Funding Rounds in 2023
Seattle VC Firms Led Just 11% of Early-Stage Funding Rounds in 2023
Seattle AI Market Map (1).jpg
Mapping the Emerald City’s Growing AI Dominance
SaaS 3.0: Why the Software Business Model Will Continue to Thrive in the Age of In-House AI Development
SaaS 3.0: Why the Software Business Model Will Continue to Thrive in the Age of In-House AI Development
3b47f6bc-a54c-4cf3-889d-4a5faa9583e9.png
Best Practices for Requesting Warm Intros From Your Investors
 

Powered by Squarespace