feat: Add K8s-bench presubmit workflow for task evaluation #326

prasad89 · 2025-06-09T07:42:03Z

This PR adds a K8s-bench presubmit GitHub Actions workflow that automatically runs evaluations when tasks are added or modified under k8s-bench/tasks.

prasad89 · 2025-06-09T07:42:55Z

cc: @noahlwest

noahlwest · 2025-06-09T19:05:59Z

.github/workflows/presubmit-evals.yaml

+  run-eval:
+    needs: detect-changed-tasks
+    if: needs.detect-changed-tasks.outputs.task_dirs != ''
+    runs-on: ubuntu-latest


Is it possible here to run on multiple envs, like ubuntu and windows? Not sure that this is something we want now, I'm just curious if possible and what it would look like.

Yes, it's possible to support multiple environments, such as Linux, Windows, and macOS.
However, my understanding is that the goal here is to run the modified task as a presubmit job.
This setup is also capable of running multiple jobs, depending on what has been modified.

@droot can confirm

noahlwest

LGTM

Copilot

Pull Request Overview

This PR adds a GitHub Actions presubmit workflow to automatically detect changed K8s-bench tasks and run evaluations on them.

Introduce detect-changed-tasks job to list modified task directories.
Use a matrix in run-eval job to spin up a Kind cluster and execute per-task evaluations.
Append evaluation results to the GitHub Actions step summary.

Comments suppressed due to low confidence (2)

.github/workflows/presubmit-evals.yaml:47

The matrix construction wraps the entire comma-separated list in a single JSON string, resulting in one matrix entry instead of one per task. Consider splitting the output into individual JSON elements, for example:

matrix:
  task: ${{ fromJson('[' + needs.detect-changed-tasks.outputs.task_dirs.replaceAll(',', '\\",\\"') + ']') }}

task: ${{ fromJson('["' + needs.detect-changed-tasks.outputs.task_dirs + '"]') }}

.github/workflows/presubmit-evals.yaml:36

[nitpick] The job is named run-eval but processes multiple tasks in parallel; consider renaming it to run-evals for clarity and consistency.

run-eval:

.github/workflows/presubmit-evals.yaml

…rkflow

prasad89 · 2025-06-19T16:50:04Z

Please let me know if there's anything else needed from my side to get this merged.
/cc @droot @noahlwest

droot · 2025-06-20T16:56:20Z

Please let me know if there's anything else needed from my side to get this merged. /cc @droot @noahlwest

I don't think anything is needed from your side, we just need a way to test this out. We will look into it next week. Thank you.

feat: Add K8s-bench presubmit workflow for task evaluation

9e18180

prasad89 mentioned this pull request Jun 9, 2025

feat: Add K8s-bench presubmit workflow for task evaluation #318

Closed

noahlwest reviewed Jun 9, 2025

View reviewed changes

noahlwest approved these changes Jun 9, 2025

View reviewed changes

mikebz requested a review from Copilot June 14, 2025 02:45

Copilot AI reviewed Jun 14, 2025

View reviewed changes

.github/workflows/presubmit-evals.yaml Outdated Show resolved Hide resolved

prasad89 force-pushed the presubmit-evals branch from e611e45 to 9e18180 Compare June 14, 2025 02:58

fix: Correct job name and update result analysis step in presubmit wo…

dd078ae

…rkflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add K8s-bench presubmit workflow for task evaluation #326

feat: Add K8s-bench presubmit workflow for task evaluation #326

Uh oh!

prasad89 commented Jun 9, 2025 •

edited

Loading

Uh oh!

prasad89 commented Jun 9, 2025

Uh oh!

noahlwest Jun 9, 2025

Uh oh!

prasad89 Jun 10, 2025 •

edited

Loading

Uh oh!

noahlwest left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

prasad89 commented Jun 19, 2025

Uh oh!

droot commented Jun 20, 2025

Uh oh!

Uh oh!

feat: Add K8s-bench presubmit workflow for task evaluation #326

Are you sure you want to change the base?

feat: Add K8s-bench presubmit workflow for task evaluation #326

Uh oh!

Conversation

prasad89 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prasad89 commented Jun 9, 2025

Uh oh!

noahlwest Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

prasad89 Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noahlwest left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

prasad89 commented Jun 19, 2025

Uh oh!

droot commented Jun 20, 2025

Uh oh!

Uh oh!

prasad89 commented Jun 9, 2025 •

edited

Loading

prasad89 Jun 10, 2025 •

edited

Loading