A new class of Spark performance: Lightning Engine, our vectorized engine powering Spark on Google Cloud, delivering over 4.3x faster Spark performance.

Apache Spark on Google Cloud

The new way to Spark easier, smarter, and faster.

Run Apache Spark workloads on Google Cloud with less operational overhead, more AI-powered assistance, and better price-performance. Focus on your code, not your cluster.

Google Cloud ESG Report for Apache Spark

Google Cloud can deliver 18-60% cost savings versus other cloud-based Apache Spark alternatives

Get the ESG report

Benefits

A better experience for Apache Spark on Google Cloud

Easier - Eliminate the operational burden of Spark

Choose between zero-ops Google Cloud Serverless for Apache Spark or managed Dataproc clusters. Both automate away infrastructure complexity so you can accelerate your development life cycle.

Compare both options

Smarter - AI assisted Spark development

Accelerate your entire workflow with Gemini in Dataproc and Google Cloud Serverless for Apache Spark. Get Gemini-powered assistance to generate and debug code, and troubleshoot failed jobs.

Learn about Gemini Code Assist

Faster - Accelerate Spark performance

Get industry-leading price-performance, automatically. For your most demanding jobs, unlock over 4.3x faster performance with Lightning Engine. This reduces TCO and accelerates time-to-insight.

Explore Lightning Engine

Key features

Choose the right Spark for your workload

Select from Serverless for Apache Spark for zero-ops simplicity or Dataproc for managed clusters with deep customizations.

See the decision guide

Google Cloud Serverless for Apache Spark

Focus solely on your code and accelerate development. With tiers for both cost-effective batch processing and high-performance AI/ML, it’s ideal for new Apache Spark pipelines, interactive analysis, and workloads with unpredictable demand where a "NoOps" model is preferred.

Best for: Data scientists & ML engineers, ad-hoc queries, new applications, developer productivity.

Explore Serverless Spark

Dataproc

Get maximum control over your cluster environment. Perfect for migrating existing Apache Hadoop/Spark workloads, running long-lived persistent clusters, or using a diverse open source ecosystem.

Best for: Enterprise engineering and operations, on-prem migrations, long-running jobs, deep customization.

Explore Dataproc

Customers

Delivering proven business outcomes

Video

New Way Now: Dun & Bradstreet cuts data workflows to minutes; boosts product response times by 60%

2:46

Video

trivago unleashes the power of Spark in BigQuery

45:00

Partners

Recommended partners

Documentation

Tutorial

Run your first serverless Spark job

Follow this quickstart to experience the speed and simplicity of serverless Spark. Learn how to submit a PySpark batch job using the Google Cloud CLI.

Tutorial

Create a managed Dataproc cluster

This tutorial walks you through creating a Dataproc cluster using the Google Cloud console. Learn how to configure and provision a managed environment for your Spark and Hadoop workloads.

Best Practice

Unify your analytics: SQL and Spark on a single copy of data

Stop choosing between the power of SQL and the flexibility of Spark. BigLake lets you use both engines on the same governed data. It's a unified experience that lets you use the best tool for every job.

Best Practice

Accelerate your entire AI and ML life cycle

Go from data preparation to model training and inference, faster. Our Premium tiers are designed for AI/ML, letting you use pre-configured ML Runtimes with built-in GPU support, such as NVIDIA RAPIDS, to eliminate complex setup.

Not seeing what you’re looking for?

What's new

Get the latest Spark on Google Cloud

Blog post

Connect Spark data pipelines to GeminiRead the blog

Blog post

The Data Science Agent and SparkRead the blog

Blog post

Dataproc multi-tenant clustersRead the blog

Apache Spark is a trademark of The Apache Software Foundation.

** The queries are derived from the TPC-DS standard and TPC-H standard and as such are not comparable to published TPC-DS standard and TPC-H standard results, as these runs do not comply with all requirements of the TPC-DS standard and TPC-H standard specification.

Take the next step

Tell us what you’re solving for. A Google Cloud expert will help you find the best solution.

Start building
Try the interactive tutorial
Start using Google Cloud today
Get $300 in credits
Dive into the technical details
View documentation

Apache Spark on Google Cloud

A better experience for Apache Spark on Google Cloud

Easier - Eliminate the operational burden of Spark

Smarter - AI assisted Spark development

Faster - Accelerate Spark performance

Choose the right Spark for your workload

Google Cloud Serverless for Apache Spark

Dataproc

Delivering proven business outcomes

Recommended partners

The engine for your open cloud data lakehouse

Documentation

Run your first serverless Spark job

Create a managed Dataproc cluster

Unify your analytics: SQL and Spark on a single copy of data

Accelerate your entire AI and ML life cycle

Not seeing what you’re looking for?

Get the latest Spark on Google Cloud

Take the next step

Start building

Start using Google Cloud today

Dive into the technical details