Model distillation might be the most important shift happening in AI right now
Model distillation might be the most important shift happening in AI right now—and it’s reshaping the entire tech industry. It's increasingly becoming a MASSIVE topic. DeepSeek's R1 model released yesterday only reinforced thisModel distillation is a process where a smaller, simpler model (the "student") is trained to replicate the behavior and capabilities of a larger, more complex model (the "teacher"). This is achieved by using the teacher model's outputs (e.g., predictions or reasoning processes) as training data, allowing the student to inherit high performance with reduced size and computational demands.So why is this important? For large AI labs, capital and scale were moats. It took literally billons of dollars of compute and data to pre-train a state-of-the-art model. Let alone al
$Amazon.com(AMZN)$ is having their annual conference ReInvent this week, and for infra nerds like me, it's fun to dig in on all of the announcements! Here's a quick summary of what they announced in their keynotes across:1) Compute EC2 was a foundational product for AWS, and the OG compute instance. Today, AWS has ~850 compute instances types across 126 different families (for instance, they have ~14 instance types just for Nvidia GPUs). From 1 instance type to ~850 today! They also started building their own chips (Gravitron in 2018, Inferentia in 2019 and Trainium in 2022). They disclosed a cool stat - in 2019 AWS was a $35b business. Today - there is as much compute running on Gravitron in the AWS fleet than all of compute at AWS in 2019!This w