-
Notifications
You must be signed in to change notification settings - Fork 492
bosun: convert datastore to Ledisdb/redis implementation. #1332
Conversation
opentsdb/tsdb_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import
|
Read through the changes, nothing jumps out except the unused import statement that I commented about. Some more notes:
|
|
An hour in and the metadata tab started working, so I think it is just an scollector metadata issue. The series type = auto and /api/metadata/get?metric=bosun.collect.sent route missing rate details mentioned above are still occurring. |
|
Confirmed the series type is being set correctly now, but if I query for something that doesn't have a rate metadata I would expect to get the error message indicated above. It seems to just default to gauge now instead of warning that you have to manually select one. Not sure if you still want comments here or on the other PR. |
|
@gbrayut this should be the authority now, and should build. Dependencies have been previously vendored. |
|
looks good now, the graph tab displays the correct error when there isn't metadata for gauge/counter. Probably ready to start testing this on branchbosun, just make sure to check the host metadata tab to see if anything stops working there. I'm out in Seattle next week but ping me if you run into any issues. |
bosun: convert datastore to Ledisdb/redis implementation.
|
So, just to make sure I'm understanding this correctly, if we were to configure Bosun to interface with Redis, we'd be able to share the data across multiple Bosun instances and thus have a team of Bosun instances handling the work or am I completely off here? |
|
@krutaw Redis does not give us clustering. It does position us to have a redis replica, and instance of bosun that only reads the state. But I don't think we were looking towards active-active, the redis readonly replica would at least seem to be the next logical step for us, but not there yet. The main reason we brought in redis was performance. We had everything in big blobs, and lock times would cause 30 second delays in places as we started to grow our instance. |
|
That makes alot of sense. Is that something that is on the radar or more to the point, how do you guys handle the whole "High Availability" question regarding Bosun internally? |
|
Currently manual failover and backups. Bosun also has to restart to change the config, which causes about a 20 second gap as it loads all the last data points from redis (although if you don't index your data to bosun, this doesn't mater). In general though, I posted this earlier this week to show what we do at Stack: http://kbrandt.com/post/bosun_arch/ |
|
AWESOME post, seriously, thank you. So quick question, how do you detect when the bosun instance needs to be rebuilt from backup/restarted/etc? |
|
Is the migration app for ledis -> redis already in place? |
|
@sahil1610 you should be able to move keys just as you would between any two unrelated redis instances. Something like https://stackoverflow.com/a/26142152/121660 should do the trick. Just run bosun in silent / no check mode so the ledis instance is up, and copy the keys over. |
The current data strategy is essentially to keep everything in memory and back it up to boltdb as a failsafe only. This has a few issues, including long startup times, long save pauses, inability to share data between multiple instances, etc.
We would like to move to using a redis datastore for bosun state data, but also don't want to alienate users who prefer a standalone application. ledisdb allows us to have a in-proc redis-compatible data store. A standard redis client can be used to talk to it, or can be pointed at a real redis server.
What will change
127.0.0.1:9565redis clients should be able to interact with this for the most part.Configuration
redisHost = myRedis:6379OR
ledisDir = /opt/bosun/ledis_datadefault setup is
ledisDir = ledis_dataMigration
I strongly recommend setting one of the above config items before rolling this change.
We will likely only convert one data structure at a time in order to test things thoroughly.
This pr only migrates metric-metadata.
future work