这是indexloc提供的服务,不要输入任何密码
Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

Splay alerts randomly over a check period #2065

@mvuets

Description

@mvuets

Problem. I've got over 500 active alert checks pulling thousands of Graphite metrics. Most of the checks run every 5 min. I suspect that's what is causing regular high CPU load spikes. Arguably the load should be spread more evenly, hence leaving more resources room for those few important minutely checks.
7runr

Proposal. I have an idea. But since I am very new to the Bosun land I want to validate it with you first. The scheduler could add a random jitter before the very first check run for each alert. Thus effectively shifting all checks forward in time for an arbitrary number of seconds and spreading the load.

E.g. Given three alerts and 5 min check period, minutely CPU load graph looks...
now: ▅▁▁▁▁▅▁▁▁▁▅▁▁▁▁
after: ▂▂▂▁▁▂▂▂▁▁▂▂▂▁▁

Implementation wise it can be a one time random delay for at most DefaultRunEvery * CheckFrequency seconds before kicking off each check for the first time. Surely this jitter can be a system and/or rule configuration option (in case someone needs a rather deterministic schedule).

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions