A Former Apple Luminary Sets Out to Create the Ultimate GPU Software

At a certain point between building Apple’s developer tools, leading a core part of Google’s AI infrastructure team, and clashing with Elon Musk during a stint as Tesla’s Autopilot chief, Chris Lattner’s vision for his life’s work started to come into focus. AI was taking over the world, and demand was growing for the chips that powered it. But the software stack for those chips was dominated by just a few big companies. Would developers be able to easily run their code across all the different chips dotting the AI landscape?
Lattner’s answer to that question is Modular, a software startup he founded in 2022 with his former Google colleague Tim Davis. Modular makes a unifying software layer that helps cloud businesses squeeze as much juice as possible out of GPUs and CPUs—the high-powered chips that underpin generative AI. The startup has also built a new coding language, based on Python, that lets developers use a single language to build AI apps that run across multiple GPUs and CPUs. Modular’s basic premise is that if a developer builds an app for one chip, they shouldn’t have to jump through hoops in order to run it on another vendor’s chip.
But Modular’s long-term goal is even more ambitious: to loosen the software choke hold that companies like Nvidia and AMD have on the industry, and become the de facto software for AI chips.
“Our thesis is that the need for compute power is just exploding, but there is no unified compute platform,” Lattner says. “Sovereign AI will be everywhere. There will be many Stargates. But there will be different types of chips optimized for different use cases, and there needs to be a unified layer for that.”
There are early signs that Modular’s thesis bears out. AI giants like Nvidia, AMD, and Amazon have partnered with the startup to test the waters. The GPU cluster company SF Compute also worked with Modular to build what they claim is the world’s cheapest API for large AI models. As of this week, Modular’s developer platform now supports Apple Silicon GPUs, in addition to Nvidia and AMD chips.
Building on this momentum, Modular just raised $250 million in venture capital funding, its third round of financing in three years, bringing its total valuation to $1.6 billion. The round was led by the Pittsburgh-based US Innovative Technology Fund. DFJ Growth also invested, along with existing investors General Catalyst, Greylock, and GV (formerly known as Google Ventures).
“We’ve spent a bunch of time and energy trying to figure out what makes a startup in this space interesting, and with every company that has tried to build their own chip—and even the big players, like AMD and Nvidia—it all comes back to the software,” says Dave Munichiello, managing partner at GV. “Chris convinced me that the software was the most interesting and valuable problem to address.”
It might be valuable—but it’s also extremely complicated. Part of that complication stems from Nvidia’s closed ecosystem. Nvidia’s chips make up the vast majority of the GPU market, but the company’s 20-year-old proprietary software platform, CUDA, keeps developers locked in. AMD’s software platform for high-performance computing, called ROCm, differs in that it’s open source. This allows developers to more easily move code to different chips.
Still, developers say that bringing code from Nvidia’s CUDA to ROCm isn’t a smooth process, which means they typically focus on building for just one chip vendor.
“ROCm is amazing, it’s open source, but it runs on one vendor’s hardware,” Lattner told the crowd at AMD’s Advancing AI event in June. Then he made his pitch for why Modular’s software is more portable and makes GPUs that much faster.
Lattner’s talk at AMD is representative of the kind of dance that Lattner and Davis need to do as they spread the Modular gospel. Today, Nvidia and AMD are both crucial partners for the firm. In a future universe, they’re also direct competitors. Part of Modular’s value proposition is that it can ship software for optimizing GPUs even faster than Nvidia, as there might be a months-long gap between when Nvidia ships a new GPU and when it releases an “attention kernel”—a critical part of the GPU software.
“Right now Modular is complimentary to AMD and Nvidia, but over time you could see both of those companies feeling threatened by ROCm or CUDA not being the best software that sits on top of their chips,” says Munichiello. He also worries that potential cloud customers may balk at having to pay for an additional software layer like Modular’s.
Writing software for GPUs is also something of a “dark art,” says Waleed Atallah, the cofounder and CEO of Mako, a GPU kernel optimization company. “Mapping an algorithm to a GPU is an insanely difficult thing to do. There are a hundred million software devs, 10,000 who write GPU kernels, and maybe a hundred who can do it well.”
Mako is building AI agents to optimize coding for GPUs. Some developers think that’s the future for the industry, rather than building a universal compiler or a new programming language like Modular. Mako just raised $8.5 million in seed funding from Flybridge Capital and the startup accelerator Neo.
“We’re trying to take an iterative approach to coding and automate it with AI,” Atallah says. “By making it easier to write the code, you exponentially grow the number of people who can do that. Making another compiler is more of a fixed solution.”
Lattner notes that Modular also uses AI coding tools. But the company is intent on addressing the whole coding stack, not just kernels.
There are roughly 250 million reasons why investors think this approach is viable. Lattner is something of a luminary in the coding world, having previously built the open source compiler infrastructure project LLVM, as well as Apple’s Swift programming language. He and Davis are both convinced that this is a software problem that must be solved outside of a Big Tech environment, where most companies focus on building software for their own technology stack.
"When I left Google I was a little bit depressed, because I really wanted to solve this,” Lattner says. “What we realized is that it’s not about smart people, it’s not about money, it’s not about capability. It’s a structural problem.”
Munichiello shared a mantra common in the tech investing world: He says he’s betting on the founders themselves as much as their product. “He’s highly opinionated and impatient, and also right a lot of the time,” Munichiello said of Lattner. “Steve Jobs was also like that—he didn’t make decisions based on consensus, but he was often right.”
wired