-
Notifications
You must be signed in to change notification settings - Fork 10
Add 'replicas' option to create-zero-copy-locks command #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add 'replicas' option to create-zero-copy-locks command #393
Conversation
Reviewer's GuideThis PR enhances the create-zero-copy-locks command with multi-replica and all-replicas support, refactors zookeeper path handling and remote table utilities, and updates feature tests to cover the new replica selection options. Sequence diagram for create-zero-copy-locks with multi-replica supportsequenceDiagram
participant User
participant CLI
participant TableInfo
participant ZKHelper
participant LockCreator
User->>CLI: Run create-zero-copy-locks (with --replicas or --all-replicas)
CLI->>TableInfo: list_tables (filter for Replicated tables)
CLI->>ZKHelper: _get_replicas_and_zk_path (get zk_path, replicas)
ZKHelper-->>CLI: Return zk_path, list of replicas
loop For each replica in replicas
CLI->>LockCreator: create_zero_copy_locks(..., replica, zk_path)
end
ER diagram for table and replica relationship in zero-copy lockserDiagram
TABLE {
string database
string name
string engine
}
REPLICA {
string fqdn
string zookeeper_path
}
TABLE ||--o{ REPLICA : has
REPLICA ||--o{ ZERO_COPY_LOCK : creates
ZERO_COPY_LOCK {
string part_name
string path_in_zk
}
Class diagram for updated replica and zookeeper path handlingclassDiagram
class CLI {
+create_zk_locks_command(..., replicas, all_replicas)
}
class ZKHelper {
+_get_replicas_and_zk_path(ctx, table, replicas, all_replicas)
}
class LockCreator {
+create_zero_copy_locks(ctx, disk, table_info, partition_id, part_id, replica, zk_path, dry_run)
}
class PartPathHelper {
+_get_part_path_in_zk(part, zookeeper_path, replica)
}
CLI --> ZKHelper
CLI --> LockCreator
LockCreator --> PartPathHelper
Class diagram for get_remote_table_for_hosts utilityclassDiagram
class Utils {
+get_remote_table_for_hosts(ctx, table, replicas) : str
}
class ClickhouseClient {
+user : str
+password : str
+check_port(port)
}
Utils --> ClickhouseClient
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `ch_tools/chadmin/cli/zookeeper_group.py:644` </location>
<code_context>
- create_zero_copy_locks(
- ctx, disk, table_info, partition_id, part_id, replica, dry_run
- )
+ replicas_ = replicas.split(",")
+ for replica in replicas_:
+ if replica not in replicas_list:
</code_context>
<issue_to_address>
**suggestion:** Splitting replicas by comma does not trim whitespace, which may cause issues with user input.
Use `[r.strip() for r in replicas.split(",")]` to ensure whitespace is removed from each replica, preventing issues with user input containing spaces.
```suggestion
replicas_ = [r.strip() for r in replicas.split(",")]
```
</issue_to_address>
### Comment 2
<location> `ch_tools/chadmin/internal/utils.py:96-97` </location>
<code_context>
+ else:
+ raise RuntimeError("For using remote() table function tcp port must be defined")
+
+ replicas_str = ",".join(replicas)
+ return f"{remote_clause}('{replicas_str}', {table}, '{user_name}', '{{user_password}}')"
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Joining replicas without handling empty or invalid entries may result in malformed remote table function calls.
Validate that the replicas list is non-empty and all entries are valid hostnames before joining.
```suggestion
import re
if not replicas or not all(isinstance(replica, str) and replica.strip() for replica in replicas):
raise ValueError("Replicas list must be non-empty and contain valid hostnames.")
hostname_regex = re.compile(
r"^(?=.{1,253}$)(?!-)[A-Za-z0-9-]{1,63}(?<!-)(\.(?!-)[A-Za-z0-9-]{1,63}(?<!-))*$"
)
for replica in replicas:
if not hostname_regex.match(replica):
raise ValueError(f"Invalid hostname in replicas list: '{replica}'")
replicas_str = ",".join(replicas)
return f"{remote_clause}('{replicas_str}', {table}, '{user_name}', '{{user_password}}')"
```
</issue_to_address>
### Comment 3
<location> `ch_tools/chadmin/internal/utils.py:81` </location>
<code_context>
)
+def get_remote_table_for_hosts(ctx: Context, table: str, replicas: list[str]) -> str:
+ """
+ Get remote/remoteSecure function for a table with given remote replicas.
</code_context>
<issue_to_address>
**suggestion (review_instructions):** Please ensure that the construction of the remote/remoteSecure table function matches ClickHouse's documented behavior, including argument order and escaping.
ClickHouse's remote/remoteSecure table functions require careful attention to argument order and escaping, especially for user/password and host lists. Please double-check that the generated string matches the documented format and handles edge cases (e.g., hostnames with special characters, empty user/password).
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*.py`
**Instructions:**
Code that deals with ClickHouse should take into account its implementation https://github.com/ClickHouse/ClickHouse
</details>
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
create_zero_copy_locks( | ||
ctx, disk, table_info, partition_id, part_id, replica, dry_run | ||
) | ||
replicas_ = replicas.split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Splitting replicas by comma does not trim whitespace, which may cause issues with user input.
Use [r.strip() for r in replicas.split(",")]
to ensure whitespace is removed from each replica, preventing issues with user input containing spaces.
replicas_ = replicas.split(",") | |
replicas_ = [r.strip() for r in replicas.split(",")] |
) | ||
|
||
|
||
def get_remote_table_for_hosts(ctx: Context, table: str, replicas: list[str]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (review_instructions): Please ensure that the construction of the remote/remoteSecure table function matches ClickHouse's documented behavior, including argument order and escaping.
ClickHouse's remote/remoteSecure table functions require careful attention to argument order and escaping, especially for user/password and host lists. Please double-check that the generated string matches the documented format and handles edge cases (e.g., hostnames with special characters, empty user/password).
Review instructions:
Path patterns: **/*.py
Instructions:
Code that deals with ClickHouse should take into account its implementation https://github.com/ClickHouse/ClickHouse
Summary by Sourcery
Enable creating zero-copy locks across multiple or all replicas by adding new CLI options, updating internal replica discovery and zookeeper path handling, and improving remote table querying
New Features:
Enhancements:
Tests: