这是indexloc提供的服务,不要输入任何密码
Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

Conversation

@captncraig
Copy link
Contributor

The current data strategy is essentially to keep everything in memory and back it up to boltdb as a failsafe only. This has a few issues, including long startup times, long save pauses, inability to share data between multiple instances, etc.

We would like to move to using a redis datastore for bosun state data, but also don't want to alienate users who prefer a standalone application. ledisdb allows us to have a in-proc redis-compatible data store. A standard redis client can be used to talk to it, or can be pointed at a real redis server.

What will change

  • Instead of using the bolt state file, data will be stored in ledisdb or redis.
  • Data will be converted on bosun startup, and removed from the statefile.
  • Most data structures used by the sched package will need to be reworked into a more granular key/value access pattern.
  • If using embedded ledis mode, ledis server should be availible at 127.0.0.1:9565 redis clients should be able to interact with this for the most part.
  • due to differences between ledis and redis, we will maintain a suite of tests to insure all functionality works identically between implementations.

Configuration

  • redisHost = myRedis:6379

OR

  • ledisDir = /opt/bosun/ledis_data

default setup is ledisDir = ledis_data

Migration

I strongly recommend setting one of the above config items before rolling this change.

We will likely only convert one data structure at a time in order to test things thoroughly.

This pr only migrates metric-metadata.

future work

  • migration app to move ledis -> redis or vice versa.
  • all data structures converted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import

@gbrayut
Copy link
Contributor

gbrayut commented Sep 24, 2015

Read through the changes, nothing jumps out except the unused import statement that I commented about. Some more notes:

  1. Should probably add ledisDir = ../ledis_data to the dev.sample.conf file
  2. This branch doesn't build (missing dependencies), but I see there is another commit/pr to add those.
  3. I built the ledisparty branch and am testing it now. First thing I noticed is the host metadata tab is empty, but this may just be scollector waiting an hour before sending the metadata. When I test using /api/metadata/get?metric=bosun.collect.sent (or ?metric=scollector.collect.sent) I see the desc and unit details, but nothing for rate.
  4. I don't think the Series Type = auto is working the same as before on the graph tab. If I use the above URL to see the metadata, and it is missing the name=rate settings, I would expect the graph page to return an error when using auto. The current master returns "no metadata for sum:elastic.cluster.status: cannot use auto rate" if no metadata exists for that metric, but this branch seems to default to gauge.

@gbrayut
Copy link
Contributor

gbrayut commented Sep 24, 2015

An hour in and the metadata tab started working, so I think it is just an scollector metadata issue.

The series type = auto and /api/metadata/get?metric=bosun.collect.sent route missing rate details mentioned above are still occurring.

@gbrayut
Copy link
Contributor

gbrayut commented Sep 24, 2015

Confirmed the series type is being set correctly now, but if I query for something that doesn't have a rate metadata I would expect to get the error message indicated above. It seems to just default to gauge now instead of warning that you have to manually select one.

Not sure if you still want comments here or on the other PR.

@captncraig
Copy link
Contributor Author

@gbrayut this should be the authority now, and should build. Dependencies have been previously vendored.

@gbrayut
Copy link
Contributor

gbrayut commented Sep 25, 2015

looks good now, the graph tab displays the correct error when there isn't metadata for gauge/counter.

Probably ready to start testing this on branchbosun, just make sure to check the host metadata tab to see if anything stops working there. I'm out in Seattle next week but ping me if you run into any issues.

captncraig pushed a commit that referenced this pull request Sep 28, 2015
bosun: convert datastore to Ledisdb/redis implementation.
@captncraig captncraig merged commit 26acd8a into master Sep 28, 2015
@captncraig captncraig deleted the ledis branch November 4, 2015 22:01
@krutaw
Copy link

krutaw commented Mar 4, 2016

So, just to make sure I'm understanding this correctly, if we were to configure Bosun to interface with Redis, we'd be able to share the data across multiple Bosun instances and thus have a team of Bosun instances handling the work or am I completely off here?

@kylebrandt
Copy link
Member

@krutaw Redis does not give us clustering. It does position us to have a redis replica, and instance of bosun that only reads the state. But I don't think we were looking towards active-active, the redis readonly replica would at least seem to be the next logical step for us, but not there yet.

The main reason we brought in redis was performance. We had everything in big blobs, and lock times would cause 30 second delays in places as we started to grow our instance.

@krutaw
Copy link

krutaw commented Mar 4, 2016

That makes alot of sense. Is that something that is on the radar or more to the point, how do you guys handle the whole "High Availability" question regarding Bosun internally?

@kylebrandt
Copy link
Member

Currently manual failover and backups. Bosun also has to restart to change the config, which causes about a 20 second gap as it loads all the last data points from redis (although if you don't index your data to bosun, this doesn't mater).

In general though, I posted this earlier this week to show what we do at Stack: http://kbrandt.com/post/bosun_arch/

@krutaw
Copy link

krutaw commented Mar 4, 2016

AWESOME post, seriously, thank you. So quick question, how do you detect when the bosun instance needs to be rebuilt from backup/restarted/etc?

@sahil1610
Copy link

Is the migration app for ledis -> redis already in place?

@captncraig
Copy link
Contributor Author

captncraig commented Aug 22, 2017

@sahil1610 you should be able to move keys just as you would between any two unrelated redis instances. Something like https://stackoverflow.com/a/26142152/121660 should do the trick. Just run bosun in silent / no check mode so the ledis instance is up, and copy the keys over.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants