AI Essentials: What is compute and how is it measured?

This blog continues our series called “AI Essentials,” which aims to bridge the knowledge gap surrounding AI-related topics. It discusses what compute is, why it matters for policymaking, and what it means for startups.

If you wanted to haul large amounts of water from a well, you would (at one point in history) need a resource, like a horse, to pull the load. You would measure how much weight the horse can pull per minute in terms of “horsepower.” In the world of AI, the resource required to make models work is compute, and it’s measured in FLOPS.

Compute refers to the hardware resources that make AI models work, allowing them to train on data, process information, and generate predictions. Without sufficient compute, even the most sophisticated models struggle to perform efficiently. The implications for startups and policymakers are twofold. First, compute is scarce and expensive, meaning startups are constrained by their access to compute. And second, given compute’s role in the effectiveness of models, policymakers are including compute-based thresholds in regulatory frameworks they are pursuing.

Compute involves the processors (CPUs, GPUs, or TPUs), memory, and storage needed to perform the numerical calculations for AI models. These resources are especially critical during training, where models adjust internal parameters (called model weights) based on patterns found in massive datasets. Having more compute means models can more quickly and effectively learn from data, leading to more accurate predictions, improved decision-making capabilities, and the ability to handle more complex tasks. Inadequate compute, on the other hand, limits a model’s complexity, slows down training, and can hinder innovation.

We can measure both the computational work required by AI models, as well as the theoretical capacity of compute resources using units with confusingly similar names — FLOPs and FLOPS.

FLOP stands for FLoating-point OPeration. (A floating point number is a standardized format in computing to precisely and uniformly encode large and small values. A floating point operation is one arithmetic operation — such as addition, subtraction, multiplication, and division — on a floating-point number.) The amount of floating point operations, or FLOPs, is a measure of how much computational work a model requires to process data and make decisions.

The higher the FLOPs, the more complex the model and the more compute it demands. Older models may require trillions (or 10¹²) FLOPs, but today’s leading AI models demand compute on a massive scale due to the enormous datasets they process and the intricate neural networks they utilize. For example, training a model like GPT-4 can require septillions (or 10²⁴) FLOPs of compute.

Meanwhile, FLOPS — FLoating-point Operations Per Second — measures a computer system’s computational performance, quantifying the number of calculations it can perform per second. It is a measure for how powerful a given piece of hardware is or its theoretical capacity — the higher the FLOPS, the more powerful the hardware.

Both measures are used in AI regulatory efforts. For example, the Biden Administration Executive Order on Artificial Intelligence last fall included disclosure requirements for models trained with more than 10²⁶ FLOPs of compute and for compute clusters with theoretical capacity of more than 10²⁰ FLOPS. Europe’s AI Act and a (since vetoed) controversial California bill also use compute-based thresholds. And given the financial costs of large quantities of compute, some policymakers have also included cost-of-compute-based thresholds in regulatory efforts.

It’s unclear whether these thresholds — which are arguably arbitrary — will hold up over time. Technological improvements in both models and in compute will lead to more capable models with lower compute requirements and lead compute costs to fall.

At present, compute remains a main cost center for startups in AI (and for startups with their own compute resources, associated costs like energy and cooling). Most startups lack the resources to invest in their own infrastructure and must rely on cloud services to access the necessary compute power. That means startups often compete and approach AI development on a different plane, either developing niche models to perform specific tasks (as opposed to a large language model), or fine-tuningothers’ pre-trained models. Larger companies by comparison can innovate more freely with in-house compute or even afford to invest in custom hardware, such as Tensor Processing Units (TPUs).

Policymakers are now exploring ways to level the playing field by ensuring that startups can access the compute they need to innovate. Some proposed solutions include the National AI Research Resource, providing access to high-performance compute resources and datasets for academic researchers and smaller companies — helping to democratize access to the tools necessary for AI development for a more equitable and innovative AI ecosystem.