-
Notifications
You must be signed in to change notification settings - Fork 42
otel spans for llm requests and overall taskrun #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 7ea1dc5 in 2 minutes and 32 seconds
More details
- Looked at
500lines of code in8files - Skipped
0files when reviewing. - Skipped posting
8drafted comments based on config settings.
1. kubechain/internal/controller/taskrun/taskrun_controller.go:649
- Draft comment:
When parsing the stored span IDs (lines 652-657), if conversion fails, consider logging the error so that tracing issues are more easily debugged. This would help catch issues early if the stored hex values are invalid. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
While error logging could help with debugging, the code is handling the errors gracefully by continuing execution. The errors would only occur if the stored IDs are invalid, which is an edge case. The code already has a fallback to use the current context if parsing fails. Adding logs may add noise without providing much value.
The comment identifies a potential observability gap that could make debugging harder. Silent failures can be problematic in distributed tracing systems.
However, this is a non-critical path - failed span parsing just means we lose some tracing context but doesn't impact functionality. The code handles the failure case appropriately.
The comment should be deleted as it suggests adding non-essential logging for an edge case that is already handled gracefully.
2. kubechain/internal/controller/taskrun/taskrun_controller.go:556
- Draft comment:
The comment on ending the span immediately (line 553-556) indicates this is a temporary workaround. Documenting a plan for maintaining continuous spans across reconciliations would clarify intent for future developers. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%
The comment suggests documenting a plan for future developers, which is not a specific code suggestion or a request for a test. It seems to be more about ensuring future clarity, which violates the rule against asking the PR author to update documentation or confirm intentions.
3. kubechain/internal/controller/taskrun/taskrun_controller.go:696
- Draft comment:
When recording and setting error status on the child span (lines 696-700), ensure that any additional context or metadata needed for troubleshooting is added. This can help with deeper insights in monitoring systems. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%
This comment is asking the PR author to ensure that additional context or metadata is added for troubleshooting, which is a form of asking them to double-check their work. This violates the rule against asking the author to ensure behavior is intended or to double-check things.
4. kubechain/internal/controller/taskrun/taskrun_controller.go:519
- Draft comment:
Note that ending the span immediately (line 526) creates separate spans per reconciliation. Ensure this is the intended trace model. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%
None
5. kubechain/internal/controller/taskrun/taskrun_controller.go:650
- Draft comment:
If parsing the parent's SpanContext fails (lines 652-657), consider logging the error to help diagnose trace linkage issues. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50%
The suggestion is about a real code change in the PR. The current code silently fails which could make debugging tracing issues harder. However, these parsing errors would be rare edge cases since the trace/span IDs are generated by the system itself. The benefit of adding logging here is relatively minor.
Am I underestimating the importance of trace debugging? Silent failures in observability code could be more problematic than I'm assuming.
While trace debugging is important, these specific parsing errors would only occur if there was corruption in the stored trace IDs, which is extremely unlikely since they are system-generated. The logging would rarely if ever be useful.
The comment suggests a minor improvement that would rarely be useful. While technically valid, it's not important enough to warrant a code change.
6. kubechain/internal/controller/taskrun/taskrun_controller.go:687
- Draft comment:
Good use of a child span to track the LLM request. The error handling portion (lines 688-700) properly records errors in the span. - Reason this comment was not posted:
Confidence changes required:10%<= threshold50%
None
7. kubechain/cmd/main.go:198
- Draft comment:
The error message on line 198 reads "to initialize metrics certificate watcher". It would be clearer and more consistent with the other error messages if it said "failed to initialize metrics certificate watcher". - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. kubechain/internal/controller/taskrun/taskrun_controller.go:103
- Draft comment:
Minor typographical error: In the comment at line 103, please change 'dont' to 'don't' for proper grammar. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_fgZ5hIVJVu5UfLBC
Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.
| SpanContext *SpanContext `json:"spanContext,omitempty"` | ||
| } | ||
|
|
||
| type TaskRunStatusStatus string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using the existing TaskStatusType instead of creating a new type with identical values.
TaskStatusTypetype and constants (task_types.go)
| var childSpan trace.Span | ||
|
|
||
| // Use controller's tracer if available, otherwise get the global tracer | ||
| tracer := r.Tracer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern of checking if r.Tracer is nil and then calling otel.GetTracerProvider().Tracer("taskrun") is repeated several times. Consider refactoring this logic into a helper function to reduce duplication and improve maintainability.
| var spanID trace.SpanID | ||
|
|
||
| // Convert hex strings to byte arrays | ||
| traceIDBytes, err := trace.TraceIDFromHex(taskRun.Status.SpanContext.TraceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider logging errors when converting TraceID/SpanID (lines 382-390) to aid debugging if conversion fails.
balanceiskey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookin' good to me, excited to see this put to work
still needs
this should work out of the box with the default otel stack
https://www.loom.com/share/b6c27b635e544f9dbdf5ed6c863bf8ef
Important
Add OpenTelemetry spans for TaskRun lifecycle and LLM requests, update CRDs, and refactor status handling.
taskrun_controller.go.endTaskRunSpan()to close spans on errors or completion.TracerinTaskRunReconcilerfor span creation.SpanContexttoTaskRunStatusandTaskRunToolCallStatusintaskrun_types.goandtaskruntoolcall_types.go.SpanContextfields.TaskRunStatusStatustype intaskrun_types.go.Initializingstatus fromTaskRunStatusenum.main.goto initialize OpenTelemetry tracer and meter.otelimport tokubechainotelinmain.go.This description was created by
for 7ea1dc5. It will automatically update as commits are pushed.