这是indexloc提供的服务,不要输入任何密码
Skip to content

Add support for GPU redundancy to Cloud Run v2 job #14519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 18, 2025

Conversation

yanweiguo
Copy link
Member

Adds support for GPU redundancy to Cloud Run v2 job.

Release Note Template for Downstream PRs (will be copied)

See Write release notes for guidance.

cloudrunv2: added `gpu_zonal_redundancy_disabled` field to `google_cloud_run_v2_job` resource.

@github-actions github-actions bot requested a review from c2thorn July 14, 2025 20:10
Copy link

github-actions bot commented Jul 14, 2025

Hello! I am a robot. Tests will require approval from a repository maintainer to run.

Googlers: For automatic test runs see go/terraform-auto-test-runs.

@ScottSuarez, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look.

You can help make sure that review is quick by doing a self-review and by running impacted tests locally.

@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 5 files changed, 31 insertions(+))
google-beta provider: Diff ( 5 files changed, 31 insertions(+))
terraform-google-conversion: Diff ( 1 file changed, 11 insertions(+))
Open in Cloud Shell: Diff ( 1 file changed, 1 insertion(+))

@modular-magician modular-magician requested a review from ian-mi July 14, 2025 20:23
@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 81
Passed tests: 71
Skipped tests: 9
Affected tests: 1

Click here to see the affected service packages
  • cloudrunv2
#### Action taken
Found 1 affected test(s) by replaying old test recordings. Starting RECORDING based on the most recent commit. Click here to see the affected tests
  • TestAccCloudRunV2Job_cloudrunv2JobGpuExample

Get to know how VCR tests work

@modular-magician
Copy link
Collaborator

🟢 Tests passed during RECORDING mode:
TestAccCloudRunV2Job_cloudrunv2JobGpuExample [Debug log]

🟢 No issues found for passed tests after REPLAYING rerun.


🟢 All tests passed!

View the build log or the debug log for each test

@melinath
Copy link
Member

@modular-magician reassign-reviewer melinath

Reassigning since @c2thorn is OOO

Copy link
Member

@melinath melinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it won't be possible for users to unset this field once set, since the provider can't generally tell the difference between false and unset. https://googlecloudplatform.github.io/magic-modules/develop/diffs/#default has some more information about this.

Just to double-check, would it be possible to not use default_from_api? That is, any chance this isn't a complex default?

@github-actions github-actions bot requested review from ScottSuarez and removed request for c2thorn July 16, 2025 22:03
@yanweiguo
Copy link
Member Author

Just to double-check, would it be possible to not use default_from_api? That is, any chance this isn't a complex default?

There's a default value currently set from API side. We are going to remove that default value. In order to do that, we need to support TF. In that case, is it ok to not use default_from_api at this point? Or is it better to implement it with default_from_api now and remove it later when we remove the default value from API side?

@github-actions github-actions bot requested a review from melinath July 16, 2025 22:24
Copy link
Member

@melinath melinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_from_api doesn't work well with boolean fields because it's impossible to set the value to false once it's been set to true one time.

In general, default_value is preferred. However, that can also cause problems if the API has a different default than what's in Terraform, or if the API has a complex default (different values depending on specific circumstances) when no value is set.

If "no default" means that the field will default to false then you don't need default_from_api or default_value - the default default is false. You can just make the field optional.

Relatedly: please make sure there's an update test that tries to set this field from true to false/unset so that we can make sure that it behaves properly.

@github-actions github-actions bot requested a review from melinath July 16, 2025 23:37
@yanweiguo
Copy link
Member Author

If "no default" means that the field will default to false

At this moment, it defaults to true explicitly. It would default to false after TF support is added. I removed default_from_api .

please make sure there's an update test that tries to set this field from true to false/unset

At this moment, API would throw an error is this field is set to false or unset. It is planned to be allowed in the future. A test added now would be broken when API changes the behavior in the future. Also to use false or unset value, the test projects should have zonal redundancy GPU quota, which requires quota increase request for each test project. So I didn't add a test for this.

@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 5 files changed, 30 insertions(+))
google-beta provider: Diff ( 5 files changed, 30 insertions(+))
terraform-google-conversion: Diff ( 1 file changed, 11 insertions(+))
Open in Cloud Shell: Diff ( 1 file changed, 1 insertion(+))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 81
Passed tests: 72
Skipped tests: 9
Affected tests: 0

Click here to see the affected service packages
  • cloudrunv2
🟢 All tests passed!

View the build log

@ScottSuarez ScottSuarez removed their request for review July 17, 2025 17:19
melinath added a commit that referenced this pull request Jul 18, 2025
Ran into a problem where this didn't work properly without an @ symbol even though the @ symbol is optional: #14519 (comment) We should just make the `-er` optional instead of allowing arbitrary content there.
@github-actions github-actions bot requested a review from melinath July 18, 2025 17:12
@yanweiguo
Copy link
Member Author

send_empty_value: true

Done

@github-actions github-actions bot requested a review from melinath July 18, 2025 17:14
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 5 files changed, 31 insertions(+))
google-beta provider: Diff ( 5 files changed, 31 insertions(+))
terraform-google-conversion: Diff ( 1 file changed, 11 insertions(+))
Open in Cloud Shell: Diff ( 1 file changed, 1 insertion(+))

Copy link
Member

@melinath melinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you need to add the new field to https://github.com/GoogleCloudPlatform/magic-modules/blob/main/mmv1/third_party/tgc/tests/data/example_cloud_run_v2_job.json as well (right after maxRetries should be fine) - false should be fine as a value even though the API won't allow it currently.

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 81
Passed tests: 50
Skipped tests: 9
Affected tests: 22

Click here to see the affected service packages
  • cloudrunv2
#### Action taken
Found 22 affected test(s) by replaying old test recordings. Starting RECORDING based on the most recent commit. Click here to see the affected tests
  • TestAccCloudRunV2JobIamBindingGenerated
  • TestAccCloudRunV2JobIamMemberGenerated
  • TestAccCloudRunV2JobIamPolicyGenerated
  • TestAccCloudRunV2Job_cloudrunv2JobBasicExample
  • TestAccCloudRunV2Job_cloudrunv2JobDependsOnUpdate
  • TestAccCloudRunV2Job_cloudrunv2JobDirectvpcExample
  • TestAccCloudRunV2Job_cloudrunv2JobEmptydirExample
  • TestAccCloudRunV2Job_cloudrunv2JobFullUpdate
  • TestAccCloudRunV2Job_cloudrunv2JobGRPCProbesUpdate
  • TestAccCloudRunV2Job_cloudrunv2JobHTTPProbesUpdate
  • TestAccCloudRunV2Job_cloudrunv2JobLimitsExample
  • TestAccCloudRunV2Job_cloudrunv2JobMulticontainerExample
  • TestAccCloudRunV2Job_cloudrunv2JobRunJobExample
  • TestAccCloudRunV2Job_cloudrunv2JobSecretExample
  • TestAccCloudRunV2Job_cloudrunv2JobSqlExample
  • TestAccCloudRunV2Job_cloudrunv2JobTCPProbesUpdate
  • TestAccCloudRunV2Job_cloudrunv2JobVpcaccessExample
  • TestAccCloudRunV2Job_cloudrunv2JobWithDirectVPCUpdate
  • TestAccCloudRunV2Job_cloudrunv2JobWithRunExecutionTokenUpdate
  • TestAccCloudRunV2Job_cloudrunv2JobWithStartExecutionTokenUpdate
  • TestAccDataSourceGoogleCloudRunV2Job_basic
  • TestAccDataSourceGoogleCloudRunV2Job_bindIAMPermission

Get to know how VCR tests work

@modular-magician
Copy link
Collaborator

🟢 Tests passed during RECORDING mode:
TestAccCloudRunV2JobIamBindingGenerated [Debug log]
TestAccCloudRunV2JobIamMemberGenerated [Debug log]
TestAccCloudRunV2JobIamPolicyGenerated [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobBasicExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobDependsOnUpdate [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobDirectvpcExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobEmptydirExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobFullUpdate [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobGRPCProbesUpdate [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobHTTPProbesUpdate [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobLimitsExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobMulticontainerExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobRunJobExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobSecretExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobSqlExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobTCPProbesUpdate [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobVpcaccessExample [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobWithDirectVPCUpdate [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobWithRunExecutionTokenUpdate [Debug log]
TestAccCloudRunV2Job_cloudrunv2JobWithStartExecutionTokenUpdate [Debug log]
TestAccDataSourceGoogleCloudRunV2Job_basic [Debug log]
TestAccDataSourceGoogleCloudRunV2Job_bindIAMPermission [Debug log]

🟢 No issues found for passed tests after REPLAYING rerun.


🟢 All tests passed!

View the build log or the debug log for each test

@yanweiguo
Copy link
Member Author

looks like you need to add the new field to https://github.com/GoogleCloudPlatform/magic-modules/blob/main/mmv1/third_party/tgc/tests/data/example_cloud_run_v2_job.json as well (right after maxRetries should be fine) - false should be fine as a value even though the API won't allow it currently.

What does https://github.com/GoogleCloudPlatform/magic-modules/blob/main/mmv1/third_party/tgc/tests/data/example_cloud_run_v2_job.json do? Does it to be a valid example? The field is not allowed if GPU resource is not configured. Wondering whether I need to configure this field as well as the GPU resource.

@github-actions github-actions bot requested a review from melinath July 18, 2025 18:34
Copy link
Member

@melinath melinath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for testing our ability to convert from Terraform -> CAI. It doesn't necessarily need to be something that is "valid". We're also working on moving to a new system for running these tests that will automatically pull CAI assets based on our nightly test runs and compare it to the test configs that created it, so this file will eventually go away. I think just making the easy change should be fine.

attn @zli82016 in case you have a different take.

@github-actions github-actions bot requested a review from melinath July 18, 2025 19:42
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 5 files changed, 31 insertions(+))
google-beta provider: Diff ( 5 files changed, 31 insertions(+))
terraform-google-conversion: Diff ( 2 files changed, 13 insertions(+), 1 deletion(-))
Open in Cloud Shell: Diff ( 1 file changed, 1 insertion(+))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 81
Passed tests: 72
Skipped tests: 9
Affected tests: 0

Click here to see the affected service packages
  • cloudrunv2
🟢 All tests passed!

View the build log

@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 5 files changed, 31 insertions(+))
google-beta provider: Diff ( 5 files changed, 31 insertions(+))
terraform-google-conversion: Diff ( 2 files changed, 13 insertions(+), 1 deletion(-))
Open in Cloud Shell: Diff ( 1 file changed, 1 insertion(+))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 81
Passed tests: 72
Skipped tests: 9
Affected tests: 0

Click here to see the affected service packages
  • cloudrunv2
🟢 All tests passed!

View the build log

@melinath
Copy link
Member

Merging since @paridhishah18 approved an earlier version and it hasn't changed substantially since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants