This document describes how to view and manage Identity and Access Management service account roles. A Serverless for Apache Spark batch workload or interactive session runs as the Compute Engine default service account, unless you specify a custom service account when you submit a batch workload, create a session, or create a session runtime template.
Security requirement: You are required to have
service account ActAs
permission
to execute Serverless for Apache Spark workloads or sessions. The
Service Account User
role
contains this permission. For detailed information about service account permissions,
see Roles for service account authentication.
Required Dataproc Worker role
The Serverless for Apache Spark workload or session service account must have the IAM
Dataproc Worker
role. The Compute Engine default service account,
project_number-compute@developer.gserviceaccount.com
,
that Serverless for Apache Spark uses has this role by default. If you specify a
different service account when creating a batch workload, session, or session template,
you must grant the Dataproc Worker role to the service account.
Additional roles may be necessary for other operations, such as reading and
writing data from and to Cloud Storage or BigQuery.
In some projects, the batch workload or session service account may have been automatically granted the project Editor role, which includes the Dataproc Worker role permissions plus additional permissions not needed by Serverless for Apache Spark. To follow the security best practice principle of least privilege, replace the service account Editor role with the Dataproc Worker role.
Troubleshoot permission-based failures
Incorrect or insufficient permissions for the service account used by your Serverless for Apache Spark batch workload or session can lead to batch or session creation failures that report a "Driver compute node failed to initialize for batch in 600 seconds" error message. This error indicates that the Spark driver couldn't start within the allotted timeout period, often due to a lack of necessary access to Google Cloud resources.
To troubleshoot this issue, verify your service account has the following minimum roles or permissions:
- Dataproc Worker role (
roles/dataproc.worker
): This role grants the necessary permissions for Serverless for Apache Spark to manage and execute Spark workloads and sessions. - Storage Object Viewer (
roles/storage.objectViewer
), Storage Object Creator (roles/storage.objectCreator
), or Storage Object Admin (roles/storage.admin
): If your Spark application reads from or writes to Cloud Storage buckets, the service account needs appropriate permissions to access the buckets. For example, if your input data is in a Cloud Storage bucket,Storage Object Viewer
is required. If your application writes output to a Cloud Storage bucket,Storage Object Creator
orStorage Object Admin
is needed. - BigQuery Data Editor (
roles/bigquery.dataEditor
) or BigQuery Data Viewer (roles/bigquery.dataViewer
): If your Spark application interacts with BigQuery, verify the service account has the appropriate BigQuery roles. - Cloud Logging permissions: The service account needs permissions
to write logs to Cloud Logging for effective debugging. Typically, the
Logging Writer
role (roles/logging.logWriter
) is sufficient.
Common permission or access-related failures
Missing
dataproc.worker
role: Without this core role, the Serverless for Apache Spark infrastructure cannot properly provision and manage the driver node.Insufficient Cloud Storage permissions: If your Spark application attempts to read input data from or write output to a Cloud Storage bucket without the necessary service account permissions, the driver can fail to initialize because it lacks access to critical resources.
Network or firewall issues: VPC Service Controls or firewall rules can inadvertently block service account access to Google Cloud APIs or resources.
To verify and update service account permissions:
- Go to the IAM & Admin > IAM page in the Google Cloud console.
- Locate the service account used for your Serverless for Apache Spark batch workloads or sessions.
- Verify the necessary roles are assigned. If not, add them.
For a list of Serverless for Apache Spark roles and permissions, see Serverless for Apache Spark permissions and IAM roles.
View and manage IAM service account roles
To view and manage roles granted to the Serverless for Apache Spark batch workload or session service account, do the following:
In the Google Cloud console, go to the IAM page.
Click Include Google-provided role grants.
View the roles listed for the batch workload or session service account. The following image shows the required Dataproc Worker role listed for the Compute Engine default service account,
project_number-compute@developer.gserviceaccount.com
, which Serverless for Apache Spark uses by default as the workload or session service account.The Dataproc Worker role assigned to the Compute Engine default service account on the IAM section of the Google Cloud console. You can click the pencil icon displayed on the service account row to grant or remove service account roles.
Cross-project service account
You can submit a Serverless for Apache Spark batch workload that uses a
service account from a project that is different than the batch workload project
(the project where the batch is submitted). In this section, the project where
the service account is located is called the service account project
, and the
project where the batch is submitted is called the batch project
.
Why use a cross-project service account to run a batch workload? One possible reason is if the service account in the other project has been assigned IAM roles roles that provide fine-grained access to the resources in that project.
Setup steps
In the service account project:
Enable the Dataproc API.
Grant to your email account (the user who is creating the cluster) the Service Account User role on either the service account project or, for more granular control, the service account in the service account project.
For more information, see Manage access to projects, folders, and organizations to grant roles at the project level and Manage access to service accounts grant roles at the service account level.
gcloud CLI examples:
The following sample command grants to the user the Service Account User role at the project level:
gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \ --member=USER_EMAIL \ --role="roles/iam.serviceAccountUser"
Notes:
USER_EMAIL
: Provide your user account email address, in the format:user:user-name@example.com
.
The following sample command grants to the user the Service Account User role at the service account level:
gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \ --member=USER_EMAIL \ --role="roles/iam.serviceAccountUser"
Notes:
USER_EMAIL
: Provide your user account email address, in the format:user:user-name@example.com
.
Grant the service account the Dataproc Worker role on the batch project.
gcloud CLI example:
gcloud projects add-iam-policy-binding BATCH_PROJECT_ID \ --member=serviceAccount:SERVICE_ACCOUNT_NAME@SERVICE_ACCOUNT_PROJECT_ID.iam.gserviceaccount.com \ --role="roles/dataproc.worker"
In the batch project:
Grant the Dataproc service agent service account the Service Account User and the Service Account Token Creator roles on either the service account project or, for more granular control, the service account in the service account project. By doing this, you allow the Dataproc service agent service account in the batch project to create tokens for the service account in the service account project.
For more information, see Manage access to projects, folders, and organizations to grant roles at the project level and Manage access to service accounts grant roles at the service account level.
gcloud CLI examples:
The following commands grant the Dataproc service agent service account in the batch project the Service Account User and Service Account Token Creator roles at the project level:
gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \ --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \ --role="roles/iam.serviceAccountUser"
gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \ --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \ --role="roles/iam.serviceAccountTokenCreator"
The following sample commands grant the Dataproc Service Agent service account in the batch project the Service Account User and Service Account Token Creator roles at the service account level:
gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \ --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \ --role="roles/iam.serviceAccountUser"
gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \ --member=serviceAccount:service-BATCH_PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \ --role="roles/iam.serviceAccountTokenCreator"
Grant the Compute Engine Service Agent service account in the batch project the Service Account Token Creator role on either the service account project or, for more granular control, the service account in the service account project. By doing this, you grant the Compute Agent Service Agent service account in the batch project the ability to create tokens for the service account in the service account project.
For more information, see Manage access to projects, folders, and organizations to grant roles at the project level and Manage access to service accounts grant roles at the service account level.
gcloud CLI examples:
The following sample command grants the Compute Engine Service Agent service account in the batch project the Service Account Token Creator role at the project level:
gcloud projects add-iam-policy-binding SERVICE_ACCOUNT_PROJECT_ID \ --member=serviceAccount:service-BATCH_PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \ --role="roles/iam.serviceAccountTokenCreator"
The following sample command grants the Compute Engine Service Agent service account in the cluster project the Service Account Token Creator role at the service account level:
gcloud iam service-accounts add-iam-policy-binding VM_SERVICE_ACCOUNT_EMAIL \ --member=serviceAccount:service-BATCH_PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \ --role="roles/iam.serviceAccountTokenCreator"
Submit the batch workload
After you complete the set up steps, you can submit a batch workload. Make sure to specify the service account in the service account project as the service account to use for the batch workload.