Test PR #6

Alex-Burmak · 2025-05-23T13:46:28Z

Summary by Sourcery

Implement ZooKeeper-based ClickHouse database migration with separate first and non-first replica flows, refactor CLI and metadata handling, introduce schema consistency checks and extend BDD tests for migration scenarios

New Features:

Introduce transactional ZooKeeper-based migration flow with helpers for creating database, query, replica, and metadata nodes
Add macro-based shard and replica extraction for migration using get_shard_and_replica_from_macros

Enhancements:

Refactor migration logic to separate first and non-first replica paths and remove temporary database usage
Add schema consistency checks and UUID update mechanism for migrated tables
Simplify chadmin migrate CLI to use create_database_nodes and improve error handling for replica determination
Introduce new utilities: read_local_table_metadata, replace_macros, format_path, and escape_for_zookeeper
Extend DatabaseMetadata with set_replicated method to update replicated engine settings

Tests:

Remove redundant restarts and sleeps in BDD tests and add version-specific scenario for schema mismatch during migration

Chores:

Add .sourcery.yaml configuration for code refactoring rules

sourcery-ai · 2025-05-23T13:46:33Z

Reviewer's Guide

Refactor database migration flow to use Zookeeper transactions in place of a temporary database approach, introduce helper routines for node and metadata management, update CLI command flow, enhance table and database metadata handling, and adjust feature tests.

Sequence Diagram: Database Migration - New First Replica Path

sequenceDiagram
    participant ActorUser as User
    participant CMD as migrate_engine_command
    participant CDN as migration.create_database_nodes
    participant MFR as migration.migrate_as_first_replica
    participant ULMFR as migration._update_local_metadata_first_replica
    participant ZK as Zookeeper
    participant CH as ClickHouse
    participant FS as LocalFileSystem

    ActorUser->>CMD: Run chadmin database migrate-engine --database DBNAME
    CMD->>CDN: create_database_nodes(DBNAME)
    CDN->>ZK: Check/Create /clickhouse path
    CDN->>ZK: TXN: Create common ZK nodes for DBNAME (e.g., /log, /replicas, /metadata)
    alt create_database_nodes succeeds (DBNAME not yet in ZK)
        CMD->>MFR: migrate_as_first_replica(DBNAME)
        MFR->>ZK: _generate_counter()
        MFR->>ZK: TXN Start
        MFR->>ZK: Create /clickhouse/DBNAME/first_replica_database_name
        MFR->>ZK: Create /clickhouse/DBNAME/log/query-{counter} structure
        MFR->>ZK: Create /clickhouse/DBNAME/replicas/{shard}|{replica} structure
        MFR->>ZK: Create /clickhouse/DBNAME/metadata/{table} for each table
        MFR->>ZK: TXN Commit
        MFR->>ULMFR: _update_local_metadata_first_replica(DBNAME)
        ULMFR->>CH: DETACH DATABASE DBNAME
        ULMFR->>FS: Update DBNAME.sql (set Replicated engine properties)
        ULMFR->>CH: ATTACH DATABASE DBNAME
    else NodeExistsError from create_database_nodes
        CMD-->>ActorUser: (Proceed to non-first replica flow)
    end

Class Diagram: Updated Metadata Handling and Migration Logic

classDiagram
    class DatabaseMetadata {
      +database_name: String
      +database_engine: DatabaseEngine
      +replica_path: String
      +shard: String
      +replica_name: String
      +update_metadata_file()
      +set_replicated()  // New method
    }

    class migration_module {
      <<Module>>
      +migrate_as_first_replica(ctx, migrating_database) // Refactored
      +migrate_as_non_first_replica(ctx, migrating_database) // Refactored
      +create_database_nodes(ctx, migrating_database) // New
      +is_table_schema_equal(ctx, db_name, table_name, path) // New
      +_update_local_metadata_first_replica(ctx, migrating_database) // New
      +_check_tables_consistent(ctx, database_name, tables_info) // New
      +_generate_counter(ctx, zk, migrating_database) // New
      +_check_result_txn(results) // New
      +get_shard_and_replica_from_macros(ctx) // New
      +_create_first_replica_database_name(ctx, txn, migrating_database) // New
      +_create_query_node(ctx, txn, migrating_database, counter) // New
      +_create_database_replica(ctx, txn, migrating_database) // New
      +_create_database_metadata_nodes(ctx, txn, migrating_database) // New
      +_attach_dbs(ctx, dbs) // New
      +_get_host_id(ctx, migrating_database, replica) // New
      +_change_tables_uuid(ctx, tables, database_name) bool // Modified (new return type, internal logic)
      +_get_tables_info_and_detach(ctx, database_name) dict // Modified (used in new flow)
      +_detach_dbs(ctx, dbs)
    }

    class table_module {
      <<Module>>
      +read_local_table_metadata(ctx, table_local_metadata_path) String // New
      +change_table_uuid(ctx, database, table, table_local_metadata_path, new_uuid, attached) // Modified (internal path construction)
    }

    migration_module ..> DatabaseMetadata : Uses
    migration_module ..> table_module : Uses

File-Level Changes

Change	Details	Files
Refactor migration logic to use Zookeeper transactions and remove temporary DB usage	Replace temp DB creation with `create_database_nodes` and zk_client context Split migrate logic into helper functions (`_generate_counter`, `_create_query_node`, `_create_database_replica`, `_create_database_metadata_nodes`) Add consistency checks (`_check_tables_consistent`) and local metadata updates (`_update_local_metadata_first_replica`) Unified commit result handling via `_check_result_txn` Remove obsolete functions for table-to-temp mapping and direct zk updates	`ch_tools/chadmin/internal/migration.py`
Update CLI command to drive new migration flow	Remove `temp_db` variable and `create_temp_db` calls Catch `NodeExistsError` to detect non-first replicas Invoke `create_database_nodes`, then call new migrate functions without temp_db argument Simplify exception logging with type information	`ch_tools/chadmin/cli/database_group.py`
Enhance table metadata utilities	Introduce `read_local_table_metadata` for consistent file reading Adjust `change_table_uuid` to use f-string for metadata path Refactor schema equality comparison in `is_table_schema_equal`	`ch_tools/chadmin/internal/table.py`
Extend database metadata model for replicated engines	Add `set_replicated` method to apply replicated engine and macros Set shard and replica placeholders in metadata file	`ch_tools/chadmin/cli/database_metadata.py`
Tidy Zookeeper helper implementations	Adjust `create_zk_nodes` makepath parameter usage Replace some direct zk updates with transaction-based calls Import and use `escape_for_zookeeper` and `format_path` consistently	`ch_tools/chadmin/internal/zookeeper.py` `ch_tools/chadmin/internal/migration.py`
Revise feature tests to cover new behaviors and schema-change scenario	Add test scenario for MergeTree schema update error path Insert `@require_version_24.8` tag and adjust restart steps Remove redundant `sleep` and restart steps Extend validations for data and zk node listings	`tests/features/database_migrate.feature`
Add project linting and refactoring configuration	Introduce `.sourcery.yaml` to define refactoring rules and disable specific lints	`.sourcery.yaml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @Alex-Burmak - I've reviewed your changes - here's some feedback:

Consider refactoring the repeated ZooKeeper transaction building (multiple txn.create calls) into a data‐driven helper or builder to reduce duplication and improve readability.
Extract the commit‐and‐check‐result pattern into a single helper to DRY up the repeated txn.commit() and _check_result_txn logic.
Replace the hardcoded UUID substring logic in is_table_schema_equal with a dedicated parser or regex to make the intent clearer and avoid magic numbers.

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Review instructions: all looks good
🟢 Testing: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-05-23T13:53:14Z

ch_tools/chadmin/internal/migration.py

+    txn.create(
+        path=format_path(
+            ctx,
+            f"/clickhouse/{migrating_database}/log/query-0000000001/finished/{shard}|{replica}",


issue: Use dynamic counter instead of hard-coded query index

Pass the dynamic counter to this function and use it in the path to ensure it matches the generated query ID.

sourcery-ai · 2025-05-23T13:53:14Z

ch_tools/chadmin/internal/migration.py

-        logging.info(
-            "New metadata for node from mapping table:\n{}\n===", target_metadata
-        )
+    txn.create(


issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

sourcery-ai · 2025-05-23T13:53:14Z

ch_tools/chadmin/internal/migration.py

+        metadata_path = row["metadata_path"]
+
+        if match_str_ch_version(get_version(ctx), "25.1"):
+            metadata_path = CLICKHOUSE_PATH + "/" + metadata_path


suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Suggested change

metadata_path = CLICKHOUSE_PATH + "/" + metadata_path

metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}"

k-morozov and others added 22 commits April 28, 2025 19:03

check table schema

ec1a349

fix style

68a3da4

apply comment

a02e269

Merge branch 'main' into bugfix/conflict_schema

6590f87

starting replace

9ac0798

database not readonly, timeout create table

85f3740

first test passed

9e59df3

issue with nin finished task

f22f118

up except

7453f83

fix nodes

1c64f04

Merge branch 'main' into feature/zk_nodes_manually

07b212f

up style

f0a08d8

most tests were returned

b1dd567

Txns have been added. Ready to add atomic to conflict schemas.

93aedb7

atomic check schema

a1b22c4

update migration

ffcffc8

small up

916d986

apply comments

79f7ac2

change logic creating query nodes

93c0e4c

apply comments

895efc9

apply comments: check exception, style

0781547

Disable use-named-expression suggestions

1fc0ec7

sourcery-ai bot reviewed May 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test PR #6

Test PR #6

Uh oh!

Alex-Burmak commented May 23, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented May 23, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot May 23, 2025

Uh oh!

sourcery-ai bot May 23, 2025

Uh oh!

sourcery-ai bot May 23, 2025

Uh oh!

Uh oh!

	metadata_path = CLICKHOUSE_PATH + "/" + metadata_path
	metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}"

Test PR #6

Are you sure you want to change the base?

Test PR #6

Uh oh!

Conversation

Alex-Burmak commented May 23, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence Diagram: Database Migration - New First Replica Path

Class Diagram: Updated Metadata Handling and Migration Logic

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Alex-Burmak commented May 23, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented May 23, 2025 •

edited

Loading