这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@rahul2393
Copy link
Contributor

@rahul2393 rahul2393 commented Feb 25, 2025

Key Changes:

  • Centralized Span Management:
    • Introduced trace.go with startSpan/endSpan helpers to standardize span lifecycle management, including error recording and status propagation.
    • All spans are now uniformly tagged with common attributes for diagnostics (gcp.client.version, gcp.client.repo, gcp.client.artifact) and resource context (db.name, instance.name, cloud.region).
  • Refactored and Enriched Span Attributes:
    • Added a generic setSpanAttributes function in grpc_client.go to dynamically set request-specific attributes (db.statement, transaction.tag, etc.). The function also omits attributes with empty values.
    • Added trace attributes to capture server-timing headers on streaming and unary RPCs.
  • Consistent Naming:
    • All span names are now prefixed with cloud.google.com/go/spanner. for uniform identification in tracing backends.

@rahul2393 rahul2393 requested review from a team as code owners February 25, 2025 04:39
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Feb 25, 2025
@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 25, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 25, 2025
@rahul2393 rahul2393 added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 9, 2025
@rahul2393 rahul2393 marked this pull request as draft May 9, 2025 06:46
@rahul2393 rahul2393 force-pushed the add-gcp-attribute-traces branch from 737e400 to 782047b Compare August 1, 2025 22:27
@rahul2393 rahul2393 removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Aug 1, 2025
@rahul2393 rahul2393 marked this pull request as ready for review August 1, 2025 22:27
@rahul2393 rahul2393 requested a review from olavloite August 1, 2025 22:28
@rahul2393
Copy link
Contributor Author

rahul2393 commented Aug 1, 2025

Screenshot 2025-08-03 at 11 26 43 AM

@rahul2393 rahul2393 force-pushed the add-gcp-attribute-traces branch from 9d27fab to 4261d64 Compare August 1, 2025 23:05
@rahul2393 rahul2393 force-pushed the add-gcp-attribute-traces branch from 4261d64 to 2fc0115 Compare August 1, 2025 23:06
Copy link
Contributor

@olavloite olavloite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also modify some of the existing test cases and/or add new test cases that verify that the addtional span attributes are present? The main reason for wanting that is not because I think that they are not there, but to prevent accidental regressions in the future where someone by accident removes them, or changes the key of (some of) them.

metrics := parseServerTimingHeader(md)
mt.currOp.currAttempt.setServerTimingMetrics(metrics)
latencies := parseServerTimingHeader(md)
span := otrace.SpanFromContext(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to always return a useful value? Or put another way: What is the customer is not using tracing. Do we need to execute these steps here? And if we still do, is there any chance of any errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We anyways need parseServerTimingHeader because of native metrics, in this PR we just pass it to setGFEAndAFESpanAttributes and if customer is not using tracing it won't be captured et-all

if !span.IsRecording() {
		return
}

And yes it always returns a useful value both for cloud path and direct path


// setSpanAttributes dynamically sets span attributes based on the request type.
func setSpanAttributes[T any](span oteltrace.Span, req T) {
if !span.IsRecording() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok. I guess this answers my question above.

}
for t, v := range latencyMap {
span.SetAttributes(
attribute.Float64(t[:3]+".latency_ms", float64(v.Nanoseconds())/1e6),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could use a small comment explaining what t[:3] does (gfe or afe latency). And are we absolutely sure that this will never change in the future? What if we introduce some new header key in the future that is shorter, and someone is still using this version of the client? Maybe we should only do the t[:3] for the well-known header keys, and include the full header key for other values (or completely skip them).


var attrs []attribute.KeyValue

if r, ok := any(req).(interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive are these checks? It is executed for each request, and there is no way to enable/disable it, other than disabling tracing. It is useful information, so I would lean towards enabling it by default, but should we include an option to disable it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is executed for each request when span.IsRecording(), it uses generics, compared to interface assertions it's faster and cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a benchmark test BenchmarkSetSpanAttributes

High-Level Summary

  1. Not Recording is Cheap: When the span is not recording, the function's overhead is negligible and constant (~64 ns, 0 allocations). The function correctly returns early, making it very efficient in this state.
  2. Recording Cost is Proportional to Work: When the span is recording, the performance cost (latency and allocations) is directly proportional to the number and type of attributes being set on the span.
  3. Tail Latency is High: The p99 latencies are significantly higher than the median (p50) latencies. This indicates that while the function is fast on average, there are occasional spikes where it can take much longer (e.g., 2.6µs vs a median of 166ns for ExecuteSqlRequest). This is common in tracing systems and can be caused by factors like garbage collection, context switching or internal locking within the tracing SDK.

Detailed Breakdown

1. Not-Recording Spans

Across all request types, the performance is consistently fast and predictable:

  • Latency: ~64 ns
  • Allocations: 0 allocs/op (the 8 bytes reported are likely from the benchmark harness itself).
  • p50/p90/p99: All hover around 41-42 ns.

This confirms that the if !span.IsRecording() check is effective at making the function call a cheap no-op when tracing is disabled or sampled out.

2. Recording Spans

Here, performance varies based on the request type, as expected.

  • ExecuteBatchDmlRequest (Worst Case):

    • Latency: ~380 ns (p50: 250 ns, p99: 2.75 µs)
    • Allocations: 6 allocs/op (708 B)
    • Reason: This is the most expensive case because the function iterates over a slice of DML statements, creates a new slice of strings (sqls), and then adds multiple attributes. This involves more logic and memory allocation than other types.
  • ExecuteSqlRequest & ReadRequest:

    • Latency: ~270-280 ns (p50: 166 ns, p99: ~2.6 µs)
    • Allocations: 3 allocs/op (612 B)
    • Reason: These requests set three attributes (transaction.tag, statement.tag, and db.statement or db.table), leading to moderate latency and allocations.
  • CommitRequest:

    • Latency: ~200 ns (p50: 125 ns, p99: ~2.0 µs)
    • Allocations: 2 allocs/op (302 B)
    • Reason: This is faster as it only sets two attributes from the request options.
  • PartitionQueryRequest & PartitionReadRequest:

    • Latency: ~130 ns (p50: 83 ns, p99: ~0.4 µs)
    • Allocations: 1 alloc/op (123 B)
    • Reason: You correctly identified that these requests don't always have RequestOptions. After your change removing them, these requests only set one attribute (db.statement or db.table), making them much faster and more memory-efficient than the other requests.
  • Requests with No Attributes (BatchWriteRequest, Empty struct):

    • Latency: ~80 ns
    • Allocations: 0 allocs/op
    • Reason: The performance is very close to the not-recording case. The slightly higher latency comes from the additional type assertion checks inside the function which ultimately find no attributes to set.

Conclusion

The setSpanAttributes function is well-optimized. The primary performance cost is incurred only when a span is actively recording, and this cost scales linearly with the complexity of the request object. The worst-case median latency is low (~250 ns for a batch DML request), though tail latencies can be an order of magnitude higher. For requests that don't set any attributes, the overhead is minimal.

@rahul2393
Copy link
Contributor Author

rahul2393 commented Aug 3, 2025

@olavloite I updated test TestSpannerTracesWithOpenTelemetry to validate newly added attributes.
I also added benchmark tests to check how much setSpanAttributes overhead has, I added the summary here

goos: darwin
goarch: arm64
pkg: cloud.google.com/go/spanner
cpu: Apple M1 Pro
BenchmarkSetSpanAttributes
BenchmarkSetSpanAttributes/ExecuteSqlRequest
BenchmarkSetSpanAttributes/ExecuteSqlRequest/recording
BenchmarkSetSpanAttributes/ExecuteSqlRequest/recording-10         	 4235571	       278.8 ns/op	       166.0 p50-ns/op	       209.0 p90-ns/op	      2666 p99-ns/op	     612 B/op	       3 allocs/op
BenchmarkSetSpanAttributes/ExecuteSqlRequest/not-recording
BenchmarkSetSpanAttributes/ExecuteSqlRequest/not-recording-10     	18658722	        64.28 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/recording
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/recording-10    	 3070660	       377.6 ns/op	       250.0 p50-ns/op	       375.0 p90-ns/op	      2750 p99-ns/op	     708 B/op	       6 allocs/op
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/not-recording
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/not-recording-10         	18639424	        64.85 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/ReadRequest
BenchmarkSetSpanAttributes/ReadRequest/recording
BenchmarkSetSpanAttributes/ReadRequest/recording-10                        	 4137518	       280.3 ns/op	       166.0 p50-ns/op	       250.0 p90-ns/op	      2625 p99-ns/op	     612 B/op	       3 allocs/op
BenchmarkSetSpanAttributes/ReadRequest/not-recording
BenchmarkSetSpanAttributes/ReadRequest/not-recording-10                    	18299156	        65.73 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/CommitRequest
BenchmarkSetSpanAttributes/CommitRequest/recording
BenchmarkSetSpanAttributes/CommitRequest/recording-10                      	 5643200	       201.5 ns/op	       125.0 p50-ns/op	       125.0 p90-ns/op	      2083 p99-ns/op	     302 B/op	       2 allocs/op
BenchmarkSetSpanAttributes/CommitRequest/not-recording
BenchmarkSetSpanAttributes/CommitRequest/not-recording-10                  	18512185	        64.87 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/PartitionQueryRequest
BenchmarkSetSpanAttributes/PartitionQueryRequest/recording
BenchmarkSetSpanAttributes/PartitionQueryRequest/recording-10              	 8934859	       131.3 ns/op	        83.00 p50-ns/op	        84.00 p90-ns/op	       458.0 p99-ns/op	     123 B/op	       1 allocs/op
BenchmarkSetSpanAttributes/PartitionQueryRequest/not-recording
BenchmarkSetSpanAttributes/PartitionQueryRequest/not-recording-10          	18894426	        64.15 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/PartitionReadRequest
BenchmarkSetSpanAttributes/PartitionReadRequest/recording
BenchmarkSetSpanAttributes/PartitionReadRequest/recording-10               	 8729928	       129.4 ns/op	        83.00 p50-ns/op	        84.00 p90-ns/op	       417.0 p99-ns/op	     123 B/op	       1 allocs/op
BenchmarkSetSpanAttributes/PartitionReadRequest/not-recording
BenchmarkSetSpanAttributes/PartitionReadRequest/not-recording-10           	18753381	        64.12 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/recording
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/recording-10  	15157527	        79.71 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/not-recording
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/not-recording-10         	18684240	        63.63 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/recording
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/recording-10                  	15293596	        77.34 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/not-recording
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/not-recording-10              	18842408	        63.60 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
PASS
BenchmarkSetSpanAttributes
BenchmarkSetSpanAttributes/ExecuteSqlRequest
BenchmarkSetSpanAttributes/ExecuteSqlRequest/recording
BenchmarkSetSpanAttributes/ExecuteSqlRequest/recording-10         	 4186551	       271.2 ns/op	       166.0 p50-ns/op	       209.0 p90-ns/op	      2625 p99-ns/op	     612 B/op	       3 allocs/op
BenchmarkSetSpanAttributes/ExecuteSqlRequest/not-recording
BenchmarkSetSpanAttributes/ExecuteSqlRequest/not-recording-10     	18807032	        64.07 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/recording
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/recording-10    	 2980214	       382.3 ns/op	       250.0 p50-ns/op	       375.0 p90-ns/op	      2750 p99-ns/op	     708 B/op	       6 allocs/op
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/not-recording
BenchmarkSetSpanAttributes/ExecuteBatchDmlRequest/not-recording-10         	18716343	        64.38 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/ReadRequest
BenchmarkSetSpanAttributes/ReadRequest/recording
BenchmarkSetSpanAttributes/ReadRequest/recording-10                        	 3819360	       280.6 ns/op	       166.0 p50-ns/op	       250.0 p90-ns/op	      2666 p99-ns/op	     612 B/op	       3 allocs/op
BenchmarkSetSpanAttributes/ReadRequest/not-recording
BenchmarkSetSpanAttributes/ReadRequest/not-recording-10                    	18305204	        64.17 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/CommitRequest
BenchmarkSetSpanAttributes/CommitRequest/recording
BenchmarkSetSpanAttributes/CommitRequest/recording-10                      	 5825875	       197.2 ns/op	       125.0 p50-ns/op	       125.0 p90-ns/op	      2042 p99-ns/op	     302 B/op	       2 allocs/op
BenchmarkSetSpanAttributes/CommitRequest/not-recording
BenchmarkSetSpanAttributes/CommitRequest/not-recording-10                  	18917187	        63.72 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/PartitionQueryRequest
BenchmarkSetSpanAttributes/PartitionQueryRequest/recording
BenchmarkSetSpanAttributes/PartitionQueryRequest/recording-10              	 8901163	       127.6 ns/op	        83.00 p50-ns/op	        84.00 p90-ns/op	       375.0 p99-ns/op	     123 B/op	       1 allocs/op
BenchmarkSetSpanAttributes/PartitionQueryRequest/not-recording
BenchmarkSetSpanAttributes/PartitionQueryRequest/not-recording-10          	18613368	        64.43 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/PartitionReadRequest
BenchmarkSetSpanAttributes/PartitionReadRequest/recording
BenchmarkSetSpanAttributes/PartitionReadRequest/recording-10               	 9125503	       126.6 ns/op	        83.00 p50-ns/op	        84.00 p90-ns/op	       375.0 p99-ns/op	     123 B/op	       1 allocs/op
BenchmarkSetSpanAttributes/PartitionReadRequest/not-recording
BenchmarkSetSpanAttributes/PartitionReadRequest/not-recording-10           	18841828	        63.94 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/recording
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/recording-10  	15046526	        78.93 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/not-recording
BenchmarkSetSpanAttributes/BatchWriteRequest_-_no_attributes/not-recording-10         	18502504	        64.31 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/recording
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/recording-10                  	15261212	        77.10 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/not-recording
BenchmarkSetSpanAttributes/Empty_struct_-_no_attributes/not-recording-10              	18373568	        64.93 ns/op	        41.00 p50-ns/op	        42.00 p90-ns/op	        42.00 p99-ns/op	       8 B/op	       0 allocs/op
PASS

@rahul2393 rahul2393 requested a review from olavloite August 3, 2025 06:04
@rahul2393 rahul2393 merged commit f59fcff into main Aug 3, 2025
13 checks passed
@rahul2393 rahul2393 deleted the add-gcp-attribute-traces branch August 3, 2025 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the Spanner API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants