这是indexloc提供的服务,不要输入任何密码
Skip to content

Zero copy final aggregation for any/anyFirst/anyLast #84428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

canhld94
Copy link
Contributor

@canhld94 canhld94 commented Jul 25, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Reduce memory allocation and memory copy when select from an aggregating merge tree table with FINAL when the table has columns with type SimpleAggregateFunction(anyLast).

Technical details

  1. Context

We have many tables like this:

create table t
(
   key UInt64,
   value1 SimpleAggregateFunction(anyLast, Nullable(String)),
   value2 SimpleAggregateFunction(anyLast, Nullable(String)),
   value3 SimpleAggregateFunction(anyLast, Nullable(String)),
   value4 SimpleAggregateFunction(anyLast, Nullable(String)),
   value5 SimpleAggregateFunction(anyLast, Nullable(String)),
   value6 SimpleAggregateFunction(anyLast, Nullable(String)),
   value7 SimpleAggregateFunction(anyLast, Nullable(String)),
   ...
)
ENGINE = AggregatingMergeTree()
ORDER BY key

The motivation is to be able to add an initial row to the table first (some values can be missing, thus they are NULL) and later update the the missing values (a.k.a. coalescing merge tree).

  1. Problem

The query with FINAL is significantly slower than ReplacingMergeTree. We identify two places of the problem:

a. Lock contention in arena allocation (more details in #79056 (comment))
b. Overhead in memcpy when merging rows for each column.

  1. Solution

Merging of type SimpleAggregateFunction(anyLast, Nullable(String)) in AggregatingSortedAlgorithm is done by:

  • When starting a new group, we create the state for the function (in this case, only a piece of memory to store a string)
  • To update the state, we copy the value of the row to the state (need to do for each column).
  • When the primary key changes, we construct the final row by copying the value in the state to the final row.

The proposal is to keep the pointer to the original row (column pointer and row number) instead of copying the row to intermediate state. It works well because in the end we need to copy the intermediate state to the final row anyway. This way, we avoid:

  • Arena allocation
  • Memcpy (at least reduced by half)

Some result in our case:

-- With normal aggregation 
1 row in set. Elapsed: 5.379 sec. Processed 68.60 million rows, 12.03 GB (12.75 million rows/s., 2.24 GB/s.)

Event name                                Value               Progress            Documentation
ArenaAllocBytes                             63.86 GB            11.79 GB/s          Number of bytes allocated for memory Arena (used for GROUP BY and similar operations)
ArenaAllocChunks                            15.59 million       2.88 million/s      Number of chunks allocated for memory Arena (used for GROUP BY and similar operations)

-- With zero-copy aggregation
1 row in set. Elapsed: 1.580 sec. Processed 68.61 million rows, 12.03 GB (43.44 million rows/s., 7.62 GB/s.)

Event name                                  Value               Progress            Documentation
ArenaAllocBytes                             163.84 KB           102.57 KB/s         Number of bytes allocated for memory Arena (used for GROUP BY and similar operations)
ArenaAllocChunks                            40                  25.04/s             Number of chunks allocated for memory Arena (used for GROUP BY and similar operations)

canhld94 added 3 commits July 25, 2025 07:10
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
@GrigoryPervakov GrigoryPervakov added the can be tested Allows running workflows for external contributors label Jul 25, 2025
Copy link

clickhouse-gh bot commented Jul 25, 2025

Workflow [PR], commit [3c6550d]

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Jul 25, 2025
@nickitat nickitat self-assigned this Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants