Manage multi-tool scenarios by providing unique resource name and stamping tool request sets #53

balanceiskey · 2025-04-03T21:03:19Z

This PR addresses a few key issues:

Creation of TRTCs with unique names
Guaranteeing we're processing TRTCs based on the last requested set tools
Transitioning to a terminal state when processing tool calls fails

Important

This PR adds unique identifiers for tool call requests, ensures proper handling of tool call sets, and transitions to a terminal state on processing failure.

Behavior:
- Adds ToolCallRequestId to TaskRunStatus in taskrun_types.go to uniquely identify tool call sets.
- Updates processToolCalls() in taskrun_controller.go to use ToolCallRequestId for listing tool calls.
- Transitions to terminal state in Reconcile() in taskrun_controller.go when processing fails.
CRD:
- Updates taskruns.yaml to include ToolCallRequestId in the schema.
Tests:
- Updates tests in taskrun_controller_test.go to check for ToolCallRequestId and terminal state transitions.
- Adds ToolCallRequestId to test setup in utils_test.go.

^{This description was created by}^{for 3af52d5. It will automatically update as commits are pushed.}

- Add ToolCallRequestId to track unique sets of tool calls from LLM - Make createToolCalls use ToolCallRequestId in TRTC names - Update processToolCalls to filter TRTCs by ToolCallRequestId - Update tests to work with the new ToolCallRequestId field - Transition TaskRun to Failed on processLLMResponse errors

- Restores test for direct transition from Initializing to ReadyForLLM - Updates CLAUDE.md with guidance to prefer positive assertions - Fix minor import issues in tests

- Removes duplicate mock implementation in taskrun_controller_test.go - Updates tests to use the shared MockRawOpenAIClient from llmclient package - Improves consistency and maintainability of the codebase 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

… then re-examing

ellipsis-dev

❌ Changes requested. Reviewed everything up to 3af52d5 in 1 minute and 58 seconds

More details

Looked at 192 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 17 drafted comments based on config settings.

1. kubechain/internal/controller/taskrun/taskrun_controller.go:444

Draft comment:
Using uuid.New().String()[:7] reduces uniqueness. Considering the use-case, it might be safer to use the full UUID or another robust mechanism to avoid collisions.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50%
A UUID v4 has 122 bits of randomness. Taking first 7 hex chars gives 28 bits of randomness (7*4). For a temporary ID used to group related tool calls in a single TaskRun, this is likely sufficient - would need ~2^14 concurrent TaskRuns to hit 1% collision probability. The ID only needs to be unique within the scope of concurrent TaskRuns in the same namespace.
I may be underestimating the blast radius of a collision - perhaps it could cause tool call results to be mixed between TaskRuns. The full UUID would eliminate this risk entirely.
The impact of a collision would be limited since tool calls are also owned by their parent TaskRun via OwnerReferences. A collision would be quickly detected and retried. The brevity benefit for logs/debugging seems worth the negligible risk.
The truncated UUID provides sufficient uniqueness for this use case, where the ID just needs to temporarily group related objects. The tradeoff of readability vs theoretical collision risk is reasonable.

2. kubechain/internal/controller/taskrun/taskrun_controller.go:483

Draft comment:
Ensure that labels including 'kubechain.humanlayer.dev/toolcallrequest' are consistently applied across all related resources. This is key for reliable resource matching in the reconciler.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. kubechain/config/crd/bases/kubechain.humanlayer.dev_taskruns.yaml:235

Draft comment:
The CRD schema now includes the 'toolCallRequestId' field. Confirm that its optional usage in code matches the schema and that no validation issues arise when the field is omitted.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is asking the PR author to confirm that the optional usage of a field matches the schema and that no validation issues arise. This falls under asking the author to double-check things, which is against the rules.

4. kubechain/internal/controller/taskrun/taskrun_controller_test.go:50

Draft comment:
Test expectations were updated for transitioning phases. Ensure that these tests align with the intended new logic for task runs—transitioning to ReadyForLLM when appropriate.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. kubechain/internal/controller/taskrun/taskrun_controller.go:443

Draft comment:
Using a truncated UUID (first 7 characters) for ToolCallRequestId could risk collisions under high load. Consider using the full UUID or a more robust method to ensure uniqueness.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
While the comment is about a changed line and raises a valid theoretical concern about UUID collisions, I need to consider: 1) How likely are collisions really with 7 chars (16^7 = ~268 million possibilities)? 2) What are the consequences of a collision? 3) Is brevity valuable here? The ID appears in logs and object names where shorter IDs are easier to read/work with.
I may be underestimating the collision risk at high scale. The birthday paradox means collisions become more likely than intuition suggests.
However, this appears to be an internal ID for grouping related tool calls within a single TaskRun. The scope and lifetime is limited, making collisions very unlikely in practice. The readability benefit of short IDs seems to outweigh the theoretical risk.
The comment raises a valid but likely theoretical concern. The current approach seems to be a reasonable tradeoff between uniqueness and usability.

6. kubechain/internal/controller/taskrun/taskrun_controller.go:482

Draft comment:
The naming convention in createToolCalls now includes the ToolCallRequestId, which improves uniqueness. Confirm that the truncated format (and any concatenation) complies with Kubernetes DNS-1123 naming conventions.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
This is a speculative comment asking for verification rather than pointing out a specific issue. While DNS-1123 compliance is important, the comment doesn't identify any actual violation. The format appears to use only alphanumeric characters and hyphens, which likely complies with DNS-1123. The comment starts with "Confirm that..." which is a red flag per the rules.
The name format could potentially violate DNS-1123 if statusUpdate.Name contains invalid characters. The UUID truncation to 7 chars is safe since UUIDs only use hex chars.
Even if statusUpdate.Name could contain invalid chars, that would be an issue with the parent resource name validation, not this specific concatenation. The comment doesn't identify any actual problem.
Delete this comment. It asks for verification without identifying a specific issue, and starts with "Confirm that..." which violates the rules for comments.

7. kubechain/internal/controller/taskrun/taskrun_controller_test.go:256

Draft comment:
Test expectations for phase transitions have been updated. Ensure that the behavior distinguishes between general errors (phase remaining ReadyForLLM) and 4xx errors (phase transitioning to Failed) and that documentation reflects this design.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is asking the author to ensure that the behavior of phase transitions is correctly implemented and documented. This falls under asking the author to ensure behavior, which is against the rules. The comment does not provide a specific suggestion or point out a specific issue in the code.

8. kubechain/internal/controller/taskrun/utils_test.go:249

Draft comment:
Test setups now include the 'kubechain.humanlayer.dev/toolcallrequest' label. Verify that this label is used consistently across all test fixtures to prevent potential flakiness in tests.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is asking the PR author to verify the consistent use of a label across test fixtures. This falls under asking the author to double-check things, which is against the rules. The comment does not provide a specific suggestion or point out a specific issue, making it not useful according to the guidelines.

9. CLAUDE.md:201

Draft comment:
Typographical error: The sentence 'just consider it in cases when you a Resource that is a union type.' is missing a word. It should read 'just consider it in cases when you have a Resource that is a union type.'
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

10. CLAUDE.md:239

Draft comment:
Typographical issue in code snippet: The 'else if {' block lacks a condition. It might confuse readers as it stands. Consider revising it to either 'else {' or provide a proper condition.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. kubechain/api/v1alpha1/taskrun_types.go:39

Draft comment:
Typo: Consider capitalizing 'Todo' to 'TODO' on the comment that reads "Todo(dex) what is this? This is used in the OpenAI converter but I think this is supposed to be in a ToolCall" for consistency with common practice.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. kubechain/internal/controller/taskrun/taskrun_controller_test.go:61

Draft comment:
Typo: In the comment, "dont" should be "don't".
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

13. kubechain/internal/controller/taskrun/taskrun_controller_test.go:469

Draft comment:
Typo: In the comment, "i think" should be "I think".
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

14. kubechain/internal/controller/taskrun/taskrun_controller_test.go:473

Draft comment:
Typo: In the comment, "i think" should be "I think".
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

15. kubechain/internal/controller/taskrun/taskrun_controller_test.go:476

Draft comment:
Typo: In the comment, "i think" should be "I think".
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

16. kubechain/internal/controller/taskrun/taskrun_controller_test.go:482

Draft comment:
Typo: Consider capitalizing 'todo dex' to 'TODO: Dex' or similar formatting for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

17. kubechain/internal/controller/taskrun/utils_test.go:16

Draft comment:
Consider capitalizing the 'todo' comment at line 16 to 'TODO:' to follow common conventions and improve clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_JCc7O1Ow7KEQvfeu

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

kubechain/api/v1alpha1/taskrun_types.go

CLAUDE.md

dexhorthy · 2025-04-03T21:11:17Z

kubechain/internal/controller/taskrun/taskrun_controller.go

 		// tool call branch: create TaskRunToolCall objects for each tool call returned by the LLM.
 		statusUpdate.Status.Output = ""
 		statusUpdate.Status.Phase = kubechainv1alpha1.TaskRunPhaseToolCallsPending
+		statusUpdate.Status.ToolCallRequestId = toolCallRequestId


so...this is a mutable field that tracks whatever "set of tool calls" we're waiting on - while this seems to work, I'm 🤔 about other design options

dexhorthy · 2025-04-03T21:11:44Z

kubechain/internal/controller/taskrun/taskrun_controller.go

+	// For each tool call, create a new TaskRunToolCall with a unique name using the ToolCallRequestId
 	for i, tc := range toolCalls {
-		newName := fmt.Sprintf("%s-tc-%02d", statusUpdate.Name, i+1)
+		newName := fmt.Sprintf("%s-%s-tc-%02d", statusUpdate.Name, statusUpdate.Status.ToolCallRequestId, i+1)


why not just use the openai tool call ID here? Do all the llm frameworks support that?

kubechain/internal/controller/taskrun/taskrun_controller.go

balanceiskey and others added 8 commits April 3, 2025 14:56

Remove unusued imports

0248c9c

Restore Initializing->ReadyForLLM test and update test style guidance

b989961

- Restores test for direct transition from Initializing to ReadyForLLM - Updates CLAUDE.md with guidance to prefer positive assertions - Fix minor import issues in tests

Some test repair

c52920f

still more repair, what happened here...

12f2616

Displeased with how these tests got shaken up, restoring original and…

1449548

… then re-examing

light touches

3af52d5

balanceiskey requested review from allisoneer and dexhorthy April 3, 2025 21:03

ellipsis-dev bot reviewed Apr 3, 2025

View reviewed changes

kubechain/api/v1alpha1/taskrun_types.go Outdated Show resolved Hide resolved

dexhorthy reviewed Apr 3, 2025

View reviewed changes

CLAUDE.md Show resolved Hide resolved

dexhorthy reviewed Apr 3, 2025

View reviewed changes

kubechain/internal/controller/taskrun/taskrun_controller.go Outdated Show resolved Hide resolved

ToolCallRequestId -> ToolCallRequestID

03f1334

balanceiskey requested a review from dexhorthy April 3, 2025 22:00

dexhorthy approved these changes Apr 3, 2025

View reviewed changes

balanceiskey merged commit eefeddd into humanlayer:main Apr 4, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Manage multi-tool scenarios by providing unique resource name and stamping tool request sets #53

Manage multi-tool scenarios by providing unique resource name and stamping tool request sets #53

Uh oh!

balanceiskey commented Apr 3, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Uh oh!

dexhorthy Apr 3, 2025

Uh oh!

dexhorthy Apr 3, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Manage multi-tool scenarios by providing unique resource name and stamping tool request sets #53

Manage multi-tool scenarios by providing unique resource name and stamping tool request sets #53

Uh oh!

Conversation

balanceiskey commented Apr 3, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dexhorthy Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

dexhorthy Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

balanceiskey commented Apr 3, 2025 •

edited by ellipsis-dev bot

Loading