这是indexloc提供的服务,不要输入任何密码

Apache Spark on Google Cloud

The new way to Spark easier, smarter, and faster.

Run Apache Spark workloads on Google Cloud with less operational overhead, more AI-powered assistance, and better price-performance. Focus on your code, not your cluster.


Benefits

A better experience for Apache Spark on Google Cloud

Easier - Eliminate the operational burden of Spark

Choose between zero-ops Google Cloud Serverless for Apache Spark or managed Dataproc clusters. Both automate away infrastructure complexity so you can accelerate your development life cycle.

Compare both options

Smarter - AI assisted Spark development

Accelerate your entire workflow with Gemini in Dataproc and Google Cloud Serverless for Apache Spark. Get Gemini-powered assistance to generate and debug code, and troubleshoot failed jobs. 

Learn about Gemini Code Assist

Faster - Accelerate Spark performance

Get industry-leading price-performance, automatically. For your most demanding jobs, unlock over 4.3x faster performance with Lightning Engine. This reduces TCO and accelerates time-to-insight.

Explore Lightning Engine

Key features

Choose the right Spark for your workload

Select from Serverless for Apache Spark for zero-ops simplicity or Dataproc for managed clusters with deep customizations.

See the decision guide

Google Cloud Serverless for Apache Spark

Focus solely on your code and accelerate development. With tiers for both cost-effective batch processing and high-performance AI/ML, it’s ideal for new Apache Spark pipelines, interactive analysis, and workloads with unpredictable demand where a "NoOps" model is preferred.

Best for: Data scientists & ML engineers, ad-hoc queries, new applications, developer productivity.

Explore Serverless Spark

Dataproc

Get maximum control over your cluster environment. Perfect for migrating existing Apache Hadoop/Spark workloads, running long-lived persistent clusters, or using a diverse open source ecosystem.

Best for: Enterprise engineering and operations, on-prem migrations, long-running jobs, deep customization.

Explore Dataproc

Documentation

Documentation

Tutorial

Run your first serverless Spark job

Follow this quickstart to experience the speed and simplicity of serverless Spark. Learn how to submit a PySpark batch job using the Google Cloud CLI.

Tutorial

Create a managed Dataproc cluster

This tutorial walks you through creating a Dataproc cluster using the Google Cloud console. Learn how to configure and provision a managed environment for your Spark and Hadoop workloads.

Best Practice

Unify your analytics: SQL and Spark on a single copy of data

Stop choosing between the power of SQL and the flexibility of Spark. BigLake lets you use both engines on the same governed data. It's a unified experience that lets you use the best tool for every job.

Best Practice

Accelerate your entire AI and ML life cycle

Go from data preparation to model training and inference, faster. Our Premium tiers are designed for AI/ML, letting you use pre-configured ML Runtimes with built-in GPU support, such as NVIDIA RAPIDS, to eliminate complex setup.

Not seeing what you’re looking for?


Apache Spark is a trademark of The Apache Software Foundation.

** The queries are derived from the TPC-DS standard and TPC-H standard and as such are not comparable to published TPC-DS standard and TPC-H standard results, as these runs do not comply with all requirements of the TPC-DS standard and TPC-H standard specification.

Take the next step

Tell us what you’re solving for. A Google Cloud expert will help you find the best solution.

Google Cloud