这是indexloc提供的服务,不要输入任何密码
Skip to content

Parallel replicas support projection #82807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

zoomxi
Copy link
Contributor

@zoomxi zoomxi commented Jun 28, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This PR closes #82659.

To simplify the implementation, support for projection is only enabled when parallel_replicas_local_plan is active.
A new setting parallel_replicas_support_projection has been added to control whether projection support is enabled.

Changes Overview

  • Coordinator + ReadPool Changes

    • Previous Behavior: The visibility of a part across replicas was determined by the part's part_info.
    • New Behavior: For projection parts, the visibility is now controlled by the parent part's part_info + projection_name.
  • Query Plan Changes for Projection Optimization

    • Single Stream:
      • If all parent parts are filtered out by projections, read the projections using parallel replicas.
    • Two Streams:
      • Read the projection only on the initiator; read the remaining parts using parallel replicas.
    • Min-Max Projection:
      • Read the projection only on the initiator.
    • Exact Count Optimization:
      • Read the projection only on the initiator; read the remaining parts (if any) using parallel replicas.

Documentation entry for user-facing changes

  • [*] Documentation is written (mandatory for new features)

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 28, 2025
Copy link

clickhouse-gh bot commented Jun 28, 2025

Workflow [PR], commit [ca610c2]

Summary:

job_name test_name status info comment
Build (fuzzers) error
Upgrade check (amd_asan) failure
S3_ERROR No such key thrown (see clickhouse-server.log or no_such_key_errors.txt) FAIL

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Jun 28, 2025
@zoomxi zoomxi force-pushed the support_projection branch 13 times, most recently from c7220dd to 6f968f4 Compare July 5, 2025 13:19
@zoomxi zoomxi changed the title [WIP] Parallel replicas support projection Parallel replicas support projection Jul 5, 2025
@zoomxi
Copy link
Contributor Author

zoomxi commented Jul 5, 2025

@zoomxi zoomxi force-pushed the support_projection branch from 6f968f4 to 3442cfe Compare July 7, 2025 08:40
@zoomxi
Copy link
Contributor Author

zoomxi commented Jul 8, 2025

PTAL @alexey-milovidov

@zoomxi zoomxi force-pushed the support_projection branch 3 times, most recently from 128b892 to 4a0e891 Compare July 14, 2025 07:20
@zoomxi zoomxi force-pushed the support_projection branch from 4a0e891 to 7fff9e3 Compare July 15, 2025 01:22
@nickitat nickitat self-assigned this Jul 15, 2025
@zoomxi zoomxi requested a review from nickitat July 16, 2025 08:14
@zoomxi
Copy link
Contributor Author

zoomxi commented Jul 17, 2025

@nickitat please take another look, thanks.

@@ -51,6 +51,7 @@ const VersionToSettingsChangesMap & getSettingsChangesHistory()
{"azure_connect_timeout_ms", 1000, 1000, "New setting"},
{"azure_sdk_use_native_client", false, true, "New setting"},
{"opentelemetry_trace_cpu_scheduling", false, false, "New setting to trace `cpu_slot_preemption` feature."},
{"parallel_replicas_support_projection", true, true, "New setting. Optimization of projections can be applied in parallel replicas. Effective only with enabled parallel_replicas_local_plan."},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be old_value=false, new_value=true, so when the compatibility is set to match one of the older versions, we will have this feature disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done.

std::optional<size_t> getNumberOfCurrentReplica() { return number_of_current_replica; }
/// After projection optimization, ReadFromMergeTree may be replaced with a new reading step, and the ParallelReadingExtension must be forwarded to the new step.
/// Meanwhile, the ParallelReadingExtension originally in ReadFromMergeTree might be detached.
void detachParallelReadingExtension();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods should have the following semantics:

x getX() {
  return x;
}
void cancel() {
  x.reset()/clear();
}
X detachX() {
  auto ret = move(x);
  x.reset()/clear();
  return move(ret);
}

I.e., detach is get + cancel.

So, getParallelReadingExtension now does what it is supposed to do. But detachParallelReadingExtension acts like cancel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patient explanation of the meaning of detach. However, it may not fit the semantics here.

The original parent part reading step needs to pass ParallelReadingExtension to the projection reading step, but it doesn't necessarily require "detach" because the original parent part reading step may still need ParallelReadingExtension, and at the same time, the projection reading step may need to clear the received ParallelReadingExtension.

Therefore, perhaps clearParallelReadingExtension is the most semantically appropriate here.

@@ -59,6 +80,18 @@ void enableMemoryBoundMerging(QueryPlan::Node & node)
{
if (aggregating_step->memoryBoundMergingWillBeUsed())
{
/// When parallel_replicas_local_plan is enabled and the local plan has been optimized by a projection,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case to ensure memory-bound merging and projections can coexist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I’ve added the test cases.

I’ve noticed that when parallel replicas support projections while aggregation_in_order is active, it often leads to errors, causing different replicas to select different coordinator modes. Therefore, it might be a good idea to disable parallel replicas from reading projections when aggregation_in_order is active

/// Previously we didn't maintain backward compatibility and every change was breaking.
/// Particularly, we had an equality check for the version. To work around that code
/// in previous server versions we now have to lie to them about the version.
const UInt64 version = initiator_protocol_version >= DBMS_MIN_REVISION_WITH_VERSIONED_PARALLEL_REPLICAS_PROTOCOL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When connecting to the initiator, we read the initiator_protocol_version value from the socket. In newer versions, it is set to the current DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION, but previous versions had different values. Apparently, the test for backward compatibility of the protocol was broken some time ago, so the issue does not reproduce in the CI. I will fix it shortly here, and you'll be able to see the problem.

@@ -0,0 +1,22 @@
---normal : contains both projections and parts ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that this test will also succeed if we set enable_parallel_replicas=0, which is not good because it tests the behaviour of parallel replicas specifically. So, I'd like to ensure that the test will fail with disabled parallel replicas (because we have some traces of using parallel replicas in the reference file).

@zoomxi zoomxi requested a review from nickitat July 24, 2025 06:48
@zoomxi
Copy link
Contributor Author

zoomxi commented Jul 26, 2025

please take another look, thanks. @nickitat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot use projection when parallel replicas is enabled.
3 participants