About GPU instances

This document describes the features and limitations of GPU virtual machine (VM) instances that run on Compute Engine.

To accelerate specific workloads on Compute Engine, you can either deploy an accelerator-optimized instance that has attached GPUs, or attach GPUs to an N1 general-purpose instance. Compute Engine provides GPUs for your instances in pass-through mode. Pass-through mode provides your instances with direct control over GPUs and their memory.

You can also use some GPU machine types on AI Hypercomputer. AI Hypercomputer is a supercomputing system that is optimized to support your artificial intelligence (AI) and machine learning (ML) workloads. This option is recommended for creating a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers.

Supported machine types

Accelerator-optimized and N1 general-purpose machine families support GPUs. For instances that use accelerator-optimized machine types, Compute Engine automatically attaches the GPUs when you create the instance. For instances that use N1 machine types, you attach GPUs to an instance during or after instance creation. GPUs are not compatible with other machine types.

Accelerator-optimized machine types

Each accelerator-optimized machine type has a specific model of NVIDIA GPUs attached. If you have graphics-intensive workloads, such as 3D visualization, you can also create virtual workstations that use NVIDIA RTX Virtual Workstations (vWS). NVIDIA RTX Virtual Workstation is available for some GPU models.

Machine type	GPU model	NVIDIA RTX Virtual Workstation (vWS) model
A4X	NVIDIA GB200 Grace Blackwell Superchips (`nvidia-gb200`). Each Superchip contains four NVIDIA B200 Blackwell GPUs.
A4	NVIDIA B200 Blackwell GPUs (`nvidia-b200`)
A3 Ultra	NVIDIA H200 SXM GPUs (`nvidia-h200-141gb`)
A3 Mega	NVIDIA H100 SXM GPUs (`nvidia-h100-mega-80gb`)
A3 High, A3 Edge	NVIDIA H100 SXM GPUs (`nvidia-h100-80gb`)
A2 Ultra	NVIDIA A100 80GB GPUs (`nvidia-a100-80gb`)
A2 Standard	NVIDIA A100 40GB GPUs (`nvidia-a100-40gb`)
G2	NVIDIA L4 GPUs (`nvidia-l4`)	NVIDIA L4 Virtual Workstation GPUs (`nvidia-l4-vws`)

For more information, see Accelerator-optimized machine family.

N1 general-purpose machine types

For most N1 machine types, except for the N1 shared-core (f1-micro and g1-small), you can attach the following GPU models:

NVIDIA GPUs:

NVIDIA T4: nvidia-tesla-t4
NVIDIA P4: nvidia-tesla-p4
NVIDIA P100: nvidia-tesla-p100
NVIDIA V100: nvidia-tesla-v100

NVIDIA RTX Virtual Workstation (vWS) (formerly known as NVIDIA GRID):

NVIDIA T4 Virtual Workstation: nvidia-tesla-t4-vws
NVIDIA P4 Virtual Workstation: nvidia-tesla-p4-vws
NVIDIA P100 Virtual Workstation: nvidia-tesla-p100-vws

For these virtual workstations, an NVIDIA RTX Virtual Workstation (vWS) license is automatically added to your instance.

For the N1 general-purpose family, you can use either predefined or custom machine types.

GPUs on Spot VMs

You can add GPUs to your Spot VMs at lower spot prices for the GPUs. GPUs attached to Spot VMs work like normal GPUs but persist only for the life of the VM. Spot VMs with GPUs follow the same preemption process as all Spot VMs.

Consider requesting dedicated Preemptible GPU quota to use for GPUs on Spot VMs. For more information, see Quotas for Spot VMs.

During maintenance events, Spot VMs with GPUs are preempted by default and cannot be automatically restarted. If you want to recreate your VMs after they have been preempted, use a managed instance group. Managed instance groups recreate your VM instances if the vCPU, memory, and GPU resources are available.

If you want a warning before your VMs are preempted, or want to configure your VMs to automatically restart after a maintenance event, use standard VMs with a GPU. For standard VMs with GPUs, Compute Engine provides one hour advance notice before preemption.

Compute Engine does not charge you for GPUs if their VMs are preempted in the first minute after they start running.

To learn how to create Spot VMs with GPUs attached, read Create a VM with attached GPUs and Creating Spot VMs. For example, see Create an A3 Ultra or A4 instance using Spot VMs.

GPUs on instances with predefined run times

Instances that use the standard provisioning model typically can't use preemptible allocation quotas. Preemptible quotas are for temporary workloads and are usually more available. If your project doesn't have preemptible quota, and you have never requested it, then all instances in your project consume standard allocation quotas.

If you request preemptible allocation quota, then instances that use the standard provisioning model must meet all of the following criteria to consume preemptible allocation quota:

The instances have GPUs attached.
The instances are configured to be automatically deleted after a predefined run time through the maxRunDuration or terminationTime field. For more information, see the following:
- Limit the run time of an instance
- Limit the run time of instances in a MIG
The instance isn't allowed to consume reservations. For more information, see Prevent compute instances from consuming reservations.

When you consume preemptible allocation for time-bound GPU workloads, you can benefit from both uninterrupted run time and the high obtainability of preemptible allocation quota. For more information, see Preemptible quotas.

GPUs and Confidential VM

You can use a GPU with a Confidential VM instance that uses Intel TDX on A3 machine series. For more information, see Confidential VM supported configurations. To learn how to create a Confidential VM instance with GPUs, see Create a Confidential VM instance with GPU.

GPUs and block storage

When you create an instance by using a GPU machine type, you can add persistent or temporary block storage to the instance. To store non-transient data, use persistent block storage like Hyperdisk or Persistent Disk because these disks are independent of the instance's lifecycle. Data on persistent storage can be retained even after you delete the instance.

For temporary scratch storage or caches, use temporary block storage by adding Local SSD disks when you create the instance.

Persistent block storage with Persistent Disk and Hyperdisk volumes

You can attach Persistent Disk and select Hyperdisk volumes to GPU-enabled instances.

For machine learning (ML) and serving workloads, use Hyperdisk ML volumes, which offer high throughput and shorter data load times. Hyperdisk ML is a more cost-effective option for ML workloads because it offers lower GPU idle times.

Hyperdisk ML volumes provide read-only multi-attach support, so you can attach the same disk to multiple instances, giving each instance access to the same data.

For more information about the supported disk types for machine series that support GPUs, see the N1 and accelerator optimized machine series pages.

Local SSD disks

Local SSD disks provide fast, temporary storage for caching, data processing, or other transient data. Local SSD disks provide fast storage because they are physically attached to the server that hosts your instance. Local SSD disks provide temporary storage because the instance loses data if it restarts.

Avoid storing data with strong persistency requirements on Local SSD disks. To store non-transient data, use persistent storage instead.

If you manually stop an instance with a GPU, you can preserve the Local SSD data, with certain restrictions. See the Local SSD documentation for more details.

For regional support for Local SSD with GPU types, see Local SSD availability by GPU regions and zones.

GPUs and host maintenance

Compute Engine always stops instances with attached GPUs when it performs maintenance events on the host server. If the instance has attached Local SSD disks, the instance loses the Local SSD data after it stops.

For information on handling maintenance events, see Handling GPU host maintenance events.

GPU pricing

For instances that have GPUs attached, you incur costs as follows:

If you request Compute Engine to provision GPUs using the spot, flex-start, or reservation-bound provisioning model, you get a discounted price, depending on the GPU type.
Most instances that have GPUs attached receive sustained use discounts (SUDs), similar to vCPUs. When you select a GPU for a virtual workstation, Compute Engine automatically adds an NVIDIA RTX Virtual Workstation license to your instance.

For hourly and monthly pricing for GPUs, see GPU pricing page.

Reserve GPUs with committed use discounts

To reserve GPU resources in a specific zone, see Choose a reservation type.

To receive committed use discounts for GPUs in a specific zone, you must purchase resource-based commitments for the GPUs and also attach reservations that specify matching GPUs to your commitments. For more information, see Attach reservations to resource-based commitments.

GPU restrictions and limitations

For instances with attached GPUs, the following restrictions and limitations apply:

Only accelerator-optimized (A4X, A4, A3, A2, and G2) and general-purpose N1 machine types support GPUs.
To protect Compute Engine systems and users, new projects have a global GPU quota that limits the total number of GPUs you can create in any supported zone. When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones.
Instances with one or more GPUs have a maximum number of vCPUs for each GPU that you add to the instance. To see the available vCPU and memory ranges for different GPU configurations, see the GPUs list.
GPUs require device drivers to function properly. NVIDIA GPUs that run on Compute Engine must use a minimum driver version. For more information about driver versions, see Required NVIDIA driver versions.
The Compute Engine SLA covers instances with an attached GPU model only if that attached GPU model is generally available.

For regions that have multiple zones, the Compute Engine SLA covers the instance only if the GPU model is available in more than one zone within that region. For GPU models by region, see GPU regions and zones.
Compute Engine supports one concurrent user per GPU.
Also see the limitations for each machine type with attached GPUs.

What's next?

Learn how to create instances with attached GPUs.
Learn how to add or remove GPUs.
Learn how to create a Confidential VM instance with an attached GPU.