Unable to get H100 instances despite AWS Capacity Reservation -Why It Happens and What You Can Do

You did everything right. You planned your ML training run, budgeted for it, and even created an AWS Capacity Reservation for NVIDIA H100-powered P5 instances. Then you hit “Launch” — and nothing. An InsufficientInstanceCapacity error stares back at you, or worse, your reservation sits there in an unexpected state while your training pipeline idles.

You’re not alone. This is one of the most frustrating experiences in cloud-based ML infrastructure today, and it’s more common than AWS’s marketing materials might suggest. Let’s unpack why this happens and what you can realistically do about it.

The GPU Scarcity Problem Is Real
Reason 1: The Capacity Reservation Itself Can Fail to Be Created
Reason 2: Availability Zone Mismatch
Reason 3: Instance Matching Criteria Misconfiguration
Reason 4: Platform Mismatch
Reason 5: On-Demand Quota Limits
Reason 6: Capacity Blocks vs. On-Demand Capacity Reservations — Confusion Between the Two
Reason 7: Regional Capacity Simply Doesn’t Exist
What You Can Actually Do
The Uncomfortable Truth

The GPU Scarcity Problem Is Real

The demand for H100 GPUs has exploded since the generative AI boom began. Every company — from startups fine-tuning small language models to hyperscalers training frontier models — is competing for the same limited pool of NVIDIA silicon. AWS, like every other cloud provider, has a finite physical supply of H100 hardware distributed across its data centers, and that supply is consistently outstripped by demand.

This creates a fundamental tension: a Capacity Reservation is a request to AWS to set aside hardware for you, but AWS can only honor that request if the hardware physically exists and is available in the Availability Zone you’ve specified.

Reason 1: The Capacity Reservation Itself Can Fail to Be Created

Many users assume that creating a Capacity Reservation is guaranteed. It’s not. AWS documentation is clear on this — your request to create a Capacity Reservation will fail if AWS doesn’t have sufficient On-Demand capacity in the requested Availability Zone, or if the request exceeds your On-Demand Instance quota for that instance family.

In other words, the reservation system doesn’t conjure hardware out of thin air. If the physical P5 (H100) servers in us-east-1a are already fully allocated, your reservation request is simply denied. You may receive no detailed explanation beyond the generic capacity error.

Reason 2: Availability Zone Mismatch

H100 instances aren’t available in every Availability Zone within a region. AWS deploys GPU hardware in specific AZs based on data center infrastructure, power, and cooling constraints. If you’ve hardcoded a subnet tied to a particular AZ, and that AZ doesn’t have H100 capacity, you’ll hit a wall regardless of whether you have a reservation.

This is especially tricky in custom VPCs where subnets are fixed to specific AZs. Unlike the default VPC, you can’t simply omit the AZ and let AWS pick one with available capacity.

Reason 3: Instance Matching Criteria Misconfiguration

On-Demand Capacity Reservations have a concept called “instance matching criteria” — either open or targeted. If your reservation is set to targeted, only instances that explicitly reference that specific reservation will use it. If your launch configuration doesn’t include the correct CapacityReservationTarget, your instances will attempt to launch outside the reservation and may fail due to general capacity constraints.

Conversely, if you set it to open but already have other running instances with matching attributes (same instance type, platform, AZ, and tenancy), those existing instances may silently consume your reserved capacity, leaving nothing for the new instances you’re trying to launch.

Reason 4: Platform Mismatch

This is a subtle but surprisingly common issue. The platform specified in your Capacity Reservation must exactly match the platform of the AMI you’re using to launch instances. For Linux, there’s a distinction between the generic Linux/UNIX platform value and more specific values like SUSE Linux or Red Hat Enterprise Linux. If your reservation specifies Linux/UNIX but your AMI reports a different platform string, the reservation won’t apply.

Reason 5: On-Demand Quota Limits

AWS sets per-region limits on the number of vCPUs you can run for each instance family. P5 instances (H100) fall under a specific quota, and many accounts start with a default of zero for these high-demand GPU instance types. Even if you’ve successfully increased your quota in the past, you may be hitting the ceiling if you’re scaling up — and active, unused Capacity Reservations count against your On-Demand Instance limits.

You might have a reservation, but if the reservation itself already maxes out your quota, you literally cannot launch additional instances.

Reason 6: Capacity Blocks vs. On-Demand Capacity Reservations — Confusion Between the Two

AWS offers two distinct mechanisms that sound similar but work very differently:

On-Demand Capacity Reservations reserve capacity immediately (if available) with no term commitment. You pay for the capacity whether you use it or not.
EC2 Capacity Blocks for ML let you reserve GPU instances for a future start date, for a defined duration (up to six months). These are specifically designed for ML workloads and are colocated in UltraClusters.

If you created a Capacity Block but are trying to launch instances before the reservation’s start date, your instances will fail with a message indicating the reservation is “not active yet.” The timing matters — Capacity Blocks have strict windows.

Reason 7: Regional Capacity Simply Doesn’t Exist

Some regions have limited or no H100 capacity at all. If you’re operating in a region that doesn’t support P5 instances, or where capacity has been fully committed to other customers with long-term agreements, no amount of reservation attempts will help. AWS prioritizes customers with longer-term commitments and larger-scale agreements, which can leave On-Demand users in a difficult spot.

What You Can Actually Do

Diversify across Availability Zones. Don’t pin yourself to a single AZ. Design your infrastructure to be flexible across multiple zones.

Request quota increases proactively. Don’t wait until launch day to discover your vCPU quota for P5 instances is zero. Request increases well in advance through the AWS Service Quotas console.

Use Capacity Blocks for ML. If you have predictable training schedules, Capacity Blocks offer a more reliable path to guaranteed H100 access. They can be reserved up to eight weeks in advance and provisioned in as little as minutes.

Verify your reservation state. Use describe-capacity-reservations to confirm your reservation is in an active state with available instance count greater than zero. It can take up to five minutes for a new reservation to transition from pending to active.

Check instance matching criteria. Ensure your launch configuration targets the reservation correctly. Use describe-instances to verify that CapacityReservationSpecification is set appropriately.

Consider alternative instance types. If H100s are unavailable, AWS offers G6e instances with NVIDIA L40S GPUs or G5 instances with A10G GPUs. They’re less powerful, but they’re also more widely available.

Engage your AWS account team. If you have a TAM or solutions architect, loop them in. Customers with enterprise support agreements often have access to capacity planning assistance that isn’t available to self-service accounts.

Try multiple regions. P5 instances are available in regions including US East (N. Virginia, Ohio), US West (Oregon), Europe (London), Asia Pacific (Mumbai, Sydney, Tokyo), and South America (São Paulo). Your preferred region may be saturated while another has capacity.

The Uncomfortable Truth

A Capacity Reservation is not a capacity guarantee — it’s a request that AWS will honor if and only if the physical infrastructure exists to support it. In an era of unprecedented GPU demand, this distinction matters enormously. The gap between what cloud providers promise in their marketing and what they can deliver in practice is where teams lose days or weeks of productivity.

The best strategy is defense in depth: combine proactive quota management, flexible AZ targeting, Capacity Blocks for planned workloads, and a willingness to consider alternative instance types or regions. GPU scarcity isn’t going away anytime soon, and the teams that plan for it will be the ones that keep their training pipelines running.

Contents