这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@balanceiskey
Copy link
Contributor

@balanceiskey balanceiskey commented Apr 3, 2025

This PR addresses a few key issues:

  • Creation of TRTCs with unique names
  • Guaranteeing we're processing TRTCs based on the last requested set tools
  • Transitioning to a terminal state when processing tool calls fails

Important

This PR adds unique identifiers for tool call requests, ensures proper handling of tool call sets, and transitions to a terminal state on processing failure.

  • Behavior:
    • Adds ToolCallRequestId to TaskRunStatus in taskrun_types.go to uniquely identify tool call sets.
    • Updates processToolCalls() in taskrun_controller.go to use ToolCallRequestId for listing tool calls.
    • Transitions to terminal state in Reconcile() in taskrun_controller.go when processing fails.
  • CRD:
    • Updates taskruns.yaml to include ToolCallRequestId in the schema.
  • Tests:
    • Updates tests in taskrun_controller_test.go to check for ToolCallRequestId and terminal state transitions.
    • Adds ToolCallRequestId to test setup in utils_test.go.

This description was created by Ellipsis for 3af52d5. It will automatically update as commits are pushed.

balanceiskey and others added 8 commits April 3, 2025 14:56
- Add ToolCallRequestId to track unique sets of tool calls from LLM
- Make createToolCalls use ToolCallRequestId in TRTC names
- Update processToolCalls to filter TRTCs by ToolCallRequestId
- Update tests to work with the new ToolCallRequestId field
- Transition TaskRun to Failed on processLLMResponse errors
- Restores test for direct transition from Initializing to ReadyForLLM
- Updates CLAUDE.md with guidance to prefer positive assertions
- Fix minor import issues in tests
- Removes duplicate mock implementation in taskrun_controller_test.go
- Updates tests to use the shared MockRawOpenAIClient from llmclient package
- Improves consistency and maintainability of the codebase

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 3af52d5 in 1 minute and 58 seconds

More details
  • Looked at 192 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 17 drafted comments based on config settings.
1. kubechain/internal/controller/taskrun/taskrun_controller.go:444
  • Draft comment:
    Using uuid.New().String()[:7] reduces uniqueness. Considering the use-case, it might be safer to use the full UUID or another robust mechanism to avoid collisions.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50%
    A UUID v4 has 122 bits of randomness. Taking first 7 hex chars gives 28 bits of randomness (7*4). For a temporary ID used to group related tool calls in a single TaskRun, this is likely sufficient - would need ~2^14 concurrent TaskRuns to hit 1% collision probability. The ID only needs to be unique within the scope of concurrent TaskRuns in the same namespace.
    I may be underestimating the blast radius of a collision - perhaps it could cause tool call results to be mixed between TaskRuns. The full UUID would eliminate this risk entirely.
    The impact of a collision would be limited since tool calls are also owned by their parent TaskRun via OwnerReferences. A collision would be quickly detected and retried. The brevity benefit for logs/debugging seems worth the negligible risk.
    The truncated UUID provides sufficient uniqueness for this use case, where the ID just needs to temporarily group related objects. The tradeoff of readability vs theoretical collision risk is reasonable.
2. kubechain/internal/controller/taskrun/taskrun_controller.go:483
  • Draft comment:
    Ensure that labels including 'kubechain.humanlayer.dev/toolcallrequest' are consistently applied across all related resources. This is key for reliable resource matching in the reconciler.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. kubechain/config/crd/bases/kubechain.humanlayer.dev_taskruns.yaml:235
  • Draft comment:
    The CRD schema now includes the 'toolCallRequestId' field. Confirm that its optional usage in code matches the schema and that no validation issues arise when the field is omitted.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to confirm that the optional usage of a field matches the schema and that no validation issues arise. This falls under asking the author to double-check things, which is against the rules.
4. kubechain/internal/controller/taskrun/taskrun_controller_test.go:50
  • Draft comment:
    Test expectations were updated for transitioning phases. Ensure that these tests align with the intended new logic for task runs—transitioning to ReadyForLLM when appropriate.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. kubechain/internal/controller/taskrun/taskrun_controller.go:443
  • Draft comment:
    Using a truncated UUID (first 7 characters) for ToolCallRequestId could risk collisions under high load. Consider using the full UUID or a more robust method to ensure uniqueness.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    While the comment is about a changed line and raises a valid theoretical concern about UUID collisions, I need to consider: 1) How likely are collisions really with 7 chars (16^7 = ~268 million possibilities)? 2) What are the consequences of a collision? 3) Is brevity valuable here? The ID appears in logs and object names where shorter IDs are easier to read/work with.
    I may be underestimating the collision risk at high scale. The birthday paradox means collisions become more likely than intuition suggests.
    However, this appears to be an internal ID for grouping related tool calls within a single TaskRun. The scope and lifetime is limited, making collisions very unlikely in practice. The readability benefit of short IDs seems to outweigh the theoretical risk.
    The comment raises a valid but likely theoretical concern. The current approach seems to be a reasonable tradeoff between uniqueness and usability.
6. kubechain/internal/controller/taskrun/taskrun_controller.go:482
  • Draft comment:
    The naming convention in createToolCalls now includes the ToolCallRequestId, which improves uniqueness. Confirm that the truncated format (and any concatenation) complies with Kubernetes DNS-1123 naming conventions.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    This is a speculative comment asking for verification rather than pointing out a specific issue. While DNS-1123 compliance is important, the comment doesn't identify any actual violation. The format appears to use only alphanumeric characters and hyphens, which likely complies with DNS-1123. The comment starts with "Confirm that..." which is a red flag per the rules.
    The name format could potentially violate DNS-1123 if statusUpdate.Name contains invalid characters. The UUID truncation to 7 chars is safe since UUIDs only use hex chars.
    Even if statusUpdate.Name could contain invalid chars, that would be an issue with the parent resource name validation, not this specific concatenation. The comment doesn't identify any actual problem.
    Delete this comment. It asks for verification without identifying a specific issue, and starts with "Confirm that..." which violates the rules for comments.
7. kubechain/internal/controller/taskrun/taskrun_controller_test.go:256
  • Draft comment:
    Test expectations for phase transitions have been updated. Ensure that the behavior distinguishes between general errors (phase remaining ReadyForLLM) and 4xx errors (phase transitioning to Failed) and that documentation reflects this design.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the author to ensure that the behavior of phase transitions is correctly implemented and documented. This falls under asking the author to ensure behavior, which is against the rules. The comment does not provide a specific suggestion or point out a specific issue in the code.
8. kubechain/internal/controller/taskrun/utils_test.go:249
  • Draft comment:
    Test setups now include the 'kubechain.humanlayer.dev/toolcallrequest' label. Verify that this label is used consistently across all test fixtures to prevent potential flakiness in tests.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the PR author to verify the consistent use of a label across test fixtures. This falls under asking the author to double-check things, which is against the rules. The comment does not provide a specific suggestion or point out a specific issue, making it not useful according to the guidelines.
9. CLAUDE.md:201
  • Draft comment:
    Typographical error: The sentence 'just consider it in cases when you a Resource that is a union type.' is missing a word. It should read 'just consider it in cases when you have a Resource that is a union type.'
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. CLAUDE.md:239
  • Draft comment:
    Typographical issue in code snippet: The 'else if {' block lacks a condition. It might confuse readers as it stands. Consider revising it to either 'else {' or provide a proper condition.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. kubechain/api/v1alpha1/taskrun_types.go:39
  • Draft comment:
    Typo: Consider capitalizing 'Todo' to 'TODO' on the comment that reads "Todo(dex) what is this? This is used in the OpenAI converter but I think this is supposed to be in a ToolCall" for consistency with common practice.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. kubechain/internal/controller/taskrun/taskrun_controller_test.go:61
  • Draft comment:
    Typo: In the comment, "dont" should be "don't".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
13. kubechain/internal/controller/taskrun/taskrun_controller_test.go:469
  • Draft comment:
    Typo: In the comment, "i think" should be "I think".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
14. kubechain/internal/controller/taskrun/taskrun_controller_test.go:473
  • Draft comment:
    Typo: In the comment, "i think" should be "I think".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
15. kubechain/internal/controller/taskrun/taskrun_controller_test.go:476
  • Draft comment:
    Typo: In the comment, "i think" should be "I think".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
16. kubechain/internal/controller/taskrun/taskrun_controller_test.go:482
  • Draft comment:
    Typo: Consider capitalizing 'todo dex' to 'TODO: Dex' or similar formatting for clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
17. kubechain/internal/controller/taskrun/utils_test.go:16
  • Draft comment:
    Consider capitalizing the 'todo' comment at line 16 to 'TODO:' to follow common conventions and improve clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_JCc7O1Ow7KEQvfeu


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

// tool call branch: create TaskRunToolCall objects for each tool call returned by the LLM.
statusUpdate.Status.Output = ""
statusUpdate.Status.Phase = kubechainv1alpha1.TaskRunPhaseToolCallsPending
statusUpdate.Status.ToolCallRequestId = toolCallRequestId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so...this is a mutable field that tracks whatever "set of tool calls" we're waiting on - while this seems to work, I'm 🤔 about other design options

// For each tool call, create a new TaskRunToolCall with a unique name using the ToolCallRequestId
for i, tc := range toolCalls {
newName := fmt.Sprintf("%s-tc-%02d", statusUpdate.Name, i+1)
newName := fmt.Sprintf("%s-%s-tc-%02d", statusUpdate.Name, statusUpdate.Status.ToolCallRequestId, i+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use the openai tool call ID here? Do all the llm frameworks support that?

@balanceiskey balanceiskey requested a review from dexhorthy April 3, 2025 22:00
@balanceiskey balanceiskey merged commit eefeddd into humanlayer:main Apr 4, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants