+
Skip to content

Conversation

rasviitanen
Copy link
Contributor

@rasviitanen rasviitanen commented Mar 8, 2021

Previously ExtensionsMut::insert was used without a guarantee that we had exclusive access to a SpanRef's extensions.
This could lead to a panic in the insert function in the case where the extension was inserted concurrently by multiple calls to eval_ctx.

By using replace the assertion is ignored and it no longer matters whether we have exclusive access or not to a span in eval_ctx.

Example

Suppose we have have two parallel calls A and B to eval_ctx:

A is called with a span-tree with the following spans: iter: [SpanId(1), SpanId(2), SpanId(3)]
B is called with a span-tree with the following spans: iter: [SpanId(1), SpanId(2), SpanId(4)]

Both A and B enters the for loop and we suppose that they both adds Span(1) and Span(2) to their path.

Then A continues to iterate over its spans and reaches Span(3), which matches with the arm that eventually executes:

// path: [SpanId(1), SpanId(2)]
for span_ref in path.into_iter() {
    let mut write_guard = span_ref.extensions_mut();
    write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
        TraceCtx {
            trace_id: local_trace_root.trace_id.clone(),
            parent_span: None,
        },
    ));
}

Everything is OK so far.

Then B reaches the same loop once it proceeds to SpanId(4).
B then tries insert the extension for SpanId(1), but it is already inserted by A.
This is forbidden and insert will panic.

Solution

As LazyTraceCtx is private, there is no way for another Layer implementation to conflict with our extension, this means that we should be able to ignore the assertion in insert and call replace instead . It should be OK to replace old extensions as the replacement will be identical to what's already stored.

The replace function does the same thing insert does but without the assertion (which we should be able to safely ignore):
https://docs.rs/tracing-subscriber/0.2.15/src/tracing_subscriber/registry/extensions.rs.html#90-92

Previously `ExtensionsMut::insert` was used without a
guarantee that we had exclusive access to a SpanRef's extensions.
This could lead to a panic in the `insert` function in the case where
the extension was inserted concurrently by multiple calls to `eval_ctx`.

By using `replace` the assertion is ignored and it no longer matters
whether we have exclusive access or not to a span in `eval_ctx`.
@Fishrock123
Copy link
Contributor

Did you have this code panic? I don't think parallelism should matter here, as ExtensionsMut is a wrapper around an RwLockWriteGuard, which should already prevent competing thread access, although I can potentially see what you mean when we loop over the spans.

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Mar 11, 2021

@Fishrock123 We did experience panics multiple times when testing our system under heavy load. Specifically this assertion here failed: https://tracing.rs/src/tracing_subscriber/registry/extensions.rs.html#86.

About the write guard, I don't necessarily think it protects against the issue here. What it locks is the span we are iterating on. This does not protect against having the same span in path.

Consider the example again. Where one worker has spans [1, 2, 3] and one has [1, 2, 4]. In the code below I have provided comments about each worker's state:

// WORKER A:
// path = [1, 2]
let mut path = Vec::new();

// iter = [3]
for span_ref in iter {
    // span_ref = 3

    let mut write_guard = span_ref.extensions_mut();
    // (write_guard protects span 3)
    
    // elided

    for span_ref in path.into_iter() {
        // span_ref = 1
        let mut write_guard = span_ref.extensions_mut();
        write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
            // elided
        ));
    }
}
// ---------------------------
// WORKER B:
// path = [1, 2]
let mut path = Vec::new();

// iterm = [4]
for span_ref in iter {
    // span_ref = 4
    
    let mut write_guard = span_ref.extensions_mut();
     // (write_guard protects span 4)
     
    // elided

    for span_ref in path.into_iter() {
        // span_ref = 1
        let mut write_guard = span_ref.extensions_mut();
        write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
            // elided
        ));
    }
}

In the example above, as worker A is locking span 3 and worker B is locking span 4, both of them can be in the "for-loop" at the same time. It's enough that any of the spans in the path is shared to cause this panic.

@Fishrock123
Copy link
Contributor

Hmm, seems reasonable enough (and unfortunate). I'll try to get a release out later today.


Also, cross linking the original issue: inanna-malick/tracing-honeycomb#14

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Mar 12, 2021

Thank you! And thank you for the amazing work you have put into this crate!

We found another bug in libhoney that I thought you might be interested in/should be made aware of: nlopes/libhoney-rust#74

@Fishrock123 Fishrock123 merged commit 620ec61 into eaze:latest Mar 12, 2021
@Fishrock123
Copy link
Contributor

released:

  • eaze-tracing-distributed 0.2.0-eaze.3
  • eaze-tracing-honeycomb 0.2.1-eaze.3

Fishrock123 added a commit to Fishrock123/tracing-honeycomb that referenced this pull request Apr 15, 2021
- Relaxed dependency constraints.
    - Was included in 0.2.1-eaze.2
- Fixed a panic in case of multiple current callers. See [inanna-malick#3](eaze/tracing-honeycomb#3).
    - Was included in 0.2.1-eaze.3
- Added `parking_lot` feature.
    - Was included in 0.2.1-eaze.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载