-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add proxy support for TimescaleDB to drop partitions #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/7.0
Are you sure you want to change the base?
Conversation
for 'proxy_history' table through timescaledb, similar to zabbix_server.
|
The current state of this PR is enough to demonstrate the significant (positive) impact, but there are open questions that should be reviewed / addressed. I will call the ones I am aware of in a review |
| for ("proxy_history") | ||
| { | ||
| print<<EOF | ||
| PERFORM create_hypertable('$_', 'id', chunk_time_interval => 1000000, $flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our testing, 1000000 is a good balance between low throughput and high throughput systems, such that it may be a few hours before a partition is dropped on a small proxy, while not too many partitions are created between housekeeping executions for large proxy.
| const char* enable_timescale = getenv("ENABLE_TIMESCALEDB"); | ||
|
|
||
| if (0 == strcmp(table,"proxy_history") && 0 == strcmp(enable_timescale, "true")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition to decide if partition management should be used, instead of delete from... is much simpler than on the server side as we do not need to worry about compression or overrides, as we should never use compression on the proxy, nor are any global overrides for retention relevant. However, relying on an environment variable is probably not the best answer. Furthermore, I don't know if relying on a value in the config table is the right answer either (like the server housekeeper does), especially since the config table doesn't look like it's even populated or ever used on a proxy.
| if (0 != config_local_buffer) | ||
| condition = zbx_dsprintf(NULL, " or %s>=%d", clock_field, now - config_local_buffer * SEC_PER_HOUR); | ||
|
|
||
| result = zbx_db_select( | ||
| "select coalesce(min(id),%d) from %s" | ||
| " where (id>" ZBX_FS_UI64 " and %s>=%d) %s", | ||
| maxid + 1, table, lastid, | ||
| clock_field, now - config_offline_buffer * SEC_PER_HOUR, | ||
| ZBX_NULL2EMPTY_STR(condition)); | ||
| zbx_free(condition); | ||
|
|
||
| if (NULL == (row = zbx_db_fetch(result)) || SUCCEED == zbx_db_is_null(row[0])) | ||
| goto rollback; | ||
|
|
||
| ZBX_STR2UINT64(keep_from, row[0]); | ||
| zbx_db_free_result(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's we're building a query to determine the minimum id that we should retain data from, in a somewhat inverse way that the delete below is built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we refactor keep_from to be the minimum id of the partition (timescaledb chunk) that our currently calculated keep_from is, then the proper partitions will still be dropped AND the count of records that will be returned (determined below) will be correct, which ultimately gets logged later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored keep_from to be floor(coalesce(min(id),%d)/1000000)*1000000
… proxy_history table
For high (N)VPS deployments where a proxy is deployed and responsible for lots of data, housekeeping on the proxy can become a bottleneck requiring lots of cpu/memory resources. As such, we're looking to add TimescaleDB (partitioning) support to the
proxy_historytable, along with modifications to the housekeeping process, similar to what the zabbix server supports. However, in this case we use theidcolumn for partitioning since it's an ever-increasing value, with achunk_time_intervalof 1,000,000.This should be paired with zabbix/zabbix-docker#1755