这是indexloc提供的服务,不要输入任何密码
Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

Conversation

@harudark
Copy link
Contributor

@harudark harudark commented Nov 5, 2020

Description

In production, Bosun got panic with the stacktrace below (I removed the real alert name and expression):

error: expr.go:148: Error: interface conversion: interface {} is expr.Number, not expr.Series. Origin: Schedule: Alert Name: some_alert_name. Expression: some_expression_with_dropbool_function, Stack: goroutine 29496241 [running]:
runtime/debug.Stack(0x17eb5c0, 0x17d65e0, 0xc13ff32840)
        /usr/lib64/go/src/runtime/debug/stack.go:24 +0x9d
bosun.org/cmd/bosun/expr.errRecover(0xc12e50be58, 0xc12ac342c0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:148 +0x229
panic(0x17d65e0, 0xc13ff32840)
        /usr/lib64/go/src/runtime/panic.go:975 +0x3e3
bosun.org/cmd/bosun/expr.DropBool(0xc12ac342c0, 0xc0df5bea20, 0xc0ddf224e0, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/funcs.go:704 +0x479
reflect.Value.call(0x17c45a0, 0x1cf9490, 0x13, 0x1a5b622, 0x4, 0xc121e98d20, 0x3, 0x3, 0x3, 0x18, ...)
        /usr/lib64/go/src/reflect/value.go:460 +0x8ab
reflect.Value.Call(0x17c45a0, 0x1cf9490, 0x13, 0xc121e98d20, 0x3, 0x3, 0x2, 0x2, 0xe67162)
        /usr/lib64/go/src/reflect/value.go:321 +0xb4
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:797 +0xeb4
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08480, 0xe, 0xc12e4228c0)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a5a0, 0x0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08470, 0x9, 0xc12e4228a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a550, 0x7efda78a9b00)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08438, 0x7, 0xc12e422880)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a500, 0xe7441639944c3f8)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08460, 0xe, 0xc12e422860)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a4b0, 0x18)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08450, 0x9, 0xc12e422840)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a460, 0x2a)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walk(0xc12ac342c0, 0x203c2a0, 0xc00d33a460, 0x2a)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:501 +0x10d
bosun.org/cmd/bosun/expr.(*State).walkBinary(0xc12ac342c0, 0xc00d33f200, 0x40e296)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:523 +0x5a
bosun.org/cmd/bosun/expr.(*State).walk(0xc12ac342c0, 0x203c1e0, 0xc00d33f200, 0xc00edbade8)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:497 +0x1a3
bosun.org/cmd/bosun/expr.(*Expr).ExecuteState.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:135 +0x4c
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0x1a66781, 0xc, 0xc12e422820)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*Expr).ExecuteState(0xc00c9d4720, 0xc12ac342c0, 0x900, 0x0, 0x0, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:134 +0x148
bosun.org/cmd/bosun/expr.(*Expr).Execute(0xc00c9d4720, 0xc12d0042d0, 0xc12e466e00, 0x0, 0x0, 0x35253818, 0xed731fe12, 0x0, 0x0, 0x0, ...)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:124 +0xff
bosun.org/cmd/bosun/sched.(*Schedule).executeExpr(0x2d2f0a0, 0x0, 0x0, 0xc12d0023c0, 0xc00d33c460, 0xc00c9d4720, 0x0, 0x43f996, 0x1cff4f0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/sched/check.go:748 +0x221
bosun.org/cmd/bosun/sched.(*Schedule).CheckExpr.func2(0x2d2f0a0, 0x0, 0x0, 0xc12d0023c0, 0xc00d33c460, 0xc00c9d4720, 0xc0c131a9c0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/sched/check.go:771 +0x6d
created by bosun.org/cmd/bosun/sched.(*Schedule).CheckExpr
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/sched/check.go:770 +0x147
panic: interface conversion: interface {} is expr.Number, not expr.Series [recovered]
        panic: interface conversion: interface {} is expr.Number, not expr.Series

goroutine 29496241 [running]:
bosun.org/cmd/bosun/expr.errRecover(0xc12e50be58, 0xc12ac342c0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:149 +0x362
panic(0x17d65e0, 0xc13ff32840)
        /usr/lib64/go/src/runtime/panic.go:975 +0x3e3
bosun.org/cmd/bosun/expr.DropBool(0xc12ac342c0, 0xc0df5bea20, 0xc0ddf224e0, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/funcs.go:704 +0x479
reflect.Value.call(0x17c45a0, 0x1cf9490, 0x13, 0x1a5b622, 0x4, 0xc121e98d20, 0x3, 0x3, 0x3, 0x18, ...)
        /usr/lib64/go/src/reflect/value.go:460 +0x8ab
reflect.Value.Call(0x17c45a0, 0x1cf9490, 0x13, 0xc121e98d20, 0x3, 0x3, 0x2, 0x2, 0xe67162)
        /usr/lib64/go/src/reflect/value.go:321 +0xb4
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:797 +0xeb4
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08480, 0xe, 0xc12e4228c0)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a5a0, 0x0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08470, 0x9, 0xc12e4228a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a550, 0x7efda78a9b00)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08438, 0x7, 0xc12e422880)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a500, 0xe7441639944c3f8)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08460, 0xe, 0xc12e422860)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a4b0, 0x18)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walkFunc.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:759 +0x1217
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0xc12ac08450, 0x9, 0xc12e422840)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*State).walkFunc(0xc12ac342c0, 0xc00d33a460, 0x2a)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:749 +0xff
bosun.org/cmd/bosun/expr.(*State).walk(0xc12ac342c0, 0x203c2a0, 0xc00d33a460, 0x2a)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:501 +0x10d
bosun.org/cmd/bosun/expr.(*State).walkBinary(0xc12ac342c0, 0xc00d33f200, 0x40e296)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:523 +0x5a
bosun.org/cmd/bosun/expr.(*State).walk(0xc12ac342c0, 0x203c1e0, 0xc00d33f200, 0xc00edbade8)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:497 +0x1a3
bosun.org/cmd/bosun/expr.(*Expr).ExecuteState.func1(0x2035f40, 0xc256aeb7a0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:135 +0x4c
github.com/MiniProfiler/go/miniprofiler.(*Profile).Step(0xc256aeb7a0, 0x1a66781, 0xc, 0xc12e422820)
        /builddir/build/BUILD/bosun-0.8.0/GO/pkg/mod/github.com/!mini!profiler/go@v0.0.0-20160719195834-3296d396d472/miniprofiler/types.go:195 +0x76
bosun.org/cmd/bosun/expr.(*Expr).ExecuteState(0xc00c9d4720, 0xc12ac342c0, 0x900, 0x0, 0x0, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:134 +0x148
bosun.org/cmd/bosun/expr.(*Expr).Execute(0xc00c9d4720, 0xc12d0042d0, 0xc12e466e00, 0x0, 0x0, 0x35253818, 0xed731fe12, 0x0, 0x0, 0x0, ...)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/expr/expr.go:124 +0xff
bosun.org/cmd/bosun/sched.(*Schedule).executeExpr(0x2d2f0a0, 0x0, 0x0, 0xc12d0023c0, 0xc00d33c460, 0xc00c9d4720, 0x0, 0x43f996, 0x1cff4f0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/sched/check.go:748 +0x221
bosun.org/cmd/bosun/sched.(*Schedule).CheckExpr.func2(0x2d2f0a0, 0x0, 0x0, 0xc12d0023c0, 0xc00d33c460, 0xc00c9d4720, 0xc0c131a9c0)
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/sched/check.go:771 +0x6d
created by bosun.org/cmd/bosun/sched.(*Schedule).CheckExpr
        /builddir/build/BUILD/bosun-0.8.0/GO/src/bosun.org/cmd/bosun/sched/check.go:770 +0x147
  • It kept panic repeatedly for about 4 ~ 5 times before we disabled that alert.
  • This issue cannot be reproduced after that.
  • The expression is complicated, the short version is like:
    d = dropbool(graphite(A) / graphite(B) * 100, graphite(C) > 3600)
    t_avg = t(avg(dropbool(d)), "cluster")
    crit = len(dropbool(t_avg, t_avg > 10)) > 10
    
  • Each graphite() query takes over 10 seconds since it queries a lot of datapoints, over 5 million.
  • Usually, graphite(A), graphite(B) and graphite(C) should return same tagsets. But it's possible sometimes they return different tagsets considering 1) different metrics may not reach graphite at the same time 2) metrics have changed between graphite(A) query and graphite(C) query. As mentioned above, each query takes over 10 seconds.
  • union is called inside dropbool and it's possible union.A or union.B has NaN() value, which is a expr.Number, not expr.Series.
  • The fix checks if the type assertion can be done or not to avoid panic.
    • If union.A cannot do type assertion, we ignore this.
    • If union.A can but union.B cannot, dropbool keeps everything in union.A.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How has this been tested?

See TestDropBool added in cmd/bosun/expr/funcs_test.go

Checklist:

  • This contribution follows the project's code of conduct
  • This contribution follows the project's contributing guidelines
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

…having different tagsets

The panic is "Error: interface conversion: interface {} is expr.Number,
not expr.Series".
Copy link
Member

@muffix muffix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @harudark, thanks for the contribution. 👍

If I understand correctly, I believe your change would change the intended behaviour in Bosun. Bosun's operators are essentially working with sets, so they need to find the same groups on the left and right hand side of the operator (otherwise you'd see an unjoined group error). I think it should be the same for functions.

The docs describe dropbool() like this:

Drop datapoints where the corresponding value in the second series set is non-zero. (See Series Operations for what corresponding means).

"Corresponding values" are subject to group rules:

A corresponding datapoint is one which has the same timestamp (and normal group subset rules apply).

If we remember the group subset rules:

Various metrics can be combined by operators as long as one group is a subset of the other. A subset is when one of the groups contains all of the tag key-value pairs in the other. An empty group {} is a subset of all groups. {host=foo} is a subset of {host=foo,interface=eth0}, and neither {host=foo,interface=eth0} nor {host=foo,partition=/} are a subset of the other. Equal groups are considered subsets of each other.

So I think dropbool() should error when given series sets where the second isn't a subset of the first. It arguably shouldn't panic, but erroring is I think the correct response here.

What you can do if you really don't care about the tags in the second set (which might indicate a problem in the data being queried) is drop all the tags or not group the results in the first place. Taking your example from the test you added, the query dropbool(series("foo=bar", 0, 1, 1, 1, 2, 1, 3, 1), series("host=baz", 0, 1, 1, 2, 2, 3, 3, 2) != 2) should still error because the groups of the two series are incompatible.
However, dropbool(series("foo=bar", 0, 1, 1, 1, 2, 1, 3, 1), series("", 0, 1, 1, 2, 2, 3, 3, 2) != 2)correctly returns {"0": 1, "2": 1} in the current implementation.

Just to visualise what this query would do:

ts series1 series2 series2 != 2
0 1 1 1
1 1 2 0
2 1 3 1
3 1 2 0

So dropbool() drops the points from series1 where the value of series2 is != 2, if series2 is a subset of series1.

I think we shouldn't change the behaviour here. Does that make sense @johnewing1, @neilfordyce?

@muffix
Copy link
Member

muffix commented Nov 9, 2020

On second thought - I'm wondering if there's a mistake in the docs.

Drop datapoints where the corresponding value in the second series set is non-zero. (See Series Operations for what corresponding means).

The behaviour currently seems to be that the points are kept (rather than dropped) where the corresponding value in the second series set is non-zero. 🤔
That doesn't change the feedback about this PR, but would be a separate issue.

Edit: Raised #2496

@stale
Copy link

stale bot commented Nov 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 4, 2021
@stale stale bot closed this Dec 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants