+
Skip to content

Conversation

Alex-Burmak
Copy link
Owner

@Alex-Burmak Alex-Burmak commented May 23, 2025

Summary by Sourcery

Implement ZooKeeper-based ClickHouse database migration with separate first and non-first replica flows, refactor CLI and metadata handling, introduce schema consistency checks and extend BDD tests for migration scenarios

New Features:

  • Introduce transactional ZooKeeper-based migration flow with helpers for creating database, query, replica, and metadata nodes
  • Add macro-based shard and replica extraction for migration using get_shard_and_replica_from_macros

Enhancements:

  • Refactor migration logic to separate first and non-first replica paths and remove temporary database usage
  • Add schema consistency checks and UUID update mechanism for migrated tables
  • Simplify chadmin migrate CLI to use create_database_nodes and improve error handling for replica determination
  • Introduce new utilities: read_local_table_metadata, replace_macros, format_path, and escape_for_zookeeper
  • Extend DatabaseMetadata with set_replicated method to update replicated engine settings

Tests:

  • Remove redundant restarts and sleeps in BDD tests and add version-specific scenario for schema mismatch during migration

Chores:

  • Add .sourcery.yaml configuration for code refactoring rules

Copy link

sourcery-ai bot commented May 23, 2025

Reviewer's Guide

Refactor database migration flow to use Zookeeper transactions in place of a temporary database approach, introduce helper routines for node and metadata management, update CLI command flow, enhance table and database metadata handling, and adjust feature tests.

Sequence Diagram: Database Migration - New First Replica Path

sequenceDiagram
    participant ActorUser as User
    participant CMD as migrate_engine_command
    participant CDN as migration.create_database_nodes
    participant MFR as migration.migrate_as_first_replica
    participant ULMFR as migration._update_local_metadata_first_replica
    participant ZK as Zookeeper
    participant CH as ClickHouse
    participant FS as LocalFileSystem

    ActorUser->>CMD: Run chadmin database migrate-engine --database DBNAME
    CMD->>CDN: create_database_nodes(DBNAME)
    CDN->>ZK: Check/Create /clickhouse path
    CDN->>ZK: TXN: Create common ZK nodes for DBNAME (e.g., /log, /replicas, /metadata)
    alt create_database_nodes succeeds (DBNAME not yet in ZK)
        CMD->>MFR: migrate_as_first_replica(DBNAME)
        MFR->>ZK: _generate_counter()
        MFR->>ZK: TXN Start
        MFR->>ZK: Create /clickhouse/DBNAME/first_replica_database_name
        MFR->>ZK: Create /clickhouse/DBNAME/log/query-{counter} structure
        MFR->>ZK: Create /clickhouse/DBNAME/replicas/{shard}|{replica} structure
        MFR->>ZK: Create /clickhouse/DBNAME/metadata/{table} for each table
        MFR->>ZK: TXN Commit
        MFR->>ULMFR: _update_local_metadata_first_replica(DBNAME)
        ULMFR->>CH: DETACH DATABASE DBNAME
        ULMFR->>FS: Update DBNAME.sql (set Replicated engine properties)
        ULMFR->>CH: ATTACH DATABASE DBNAME
    else NodeExistsError from create_database_nodes
        CMD-->>ActorUser: (Proceed to non-first replica flow)
    end
Loading

Class Diagram: Updated Metadata Handling and Migration Logic

classDiagram
    class DatabaseMetadata {
      +database_name: String
      +database_engine: DatabaseEngine
      +replica_path: String
      +shard: String
      +replica_name: String
      +update_metadata_file()
      +set_replicated()  // New method
    }

    class migration_module {
      <<Module>>
      +migrate_as_first_replica(ctx, migrating_database) // Refactored
      +migrate_as_non_first_replica(ctx, migrating_database) // Refactored
      +create_database_nodes(ctx, migrating_database) // New
      +is_table_schema_equal(ctx, db_name, table_name, path) // New
      +_update_local_metadata_first_replica(ctx, migrating_database) // New
      +_check_tables_consistent(ctx, database_name, tables_info) // New
      +_generate_counter(ctx, zk, migrating_database) // New
      +_check_result_txn(results) // New
      +get_shard_and_replica_from_macros(ctx) // New
      +_create_first_replica_database_name(ctx, txn, migrating_database) // New
      +_create_query_node(ctx, txn, migrating_database, counter) // New
      +_create_database_replica(ctx, txn, migrating_database) // New
      +_create_database_metadata_nodes(ctx, txn, migrating_database) // New
      +_attach_dbs(ctx, dbs) // New
      +_get_host_id(ctx, migrating_database, replica) // New
      +_change_tables_uuid(ctx, tables, database_name) bool // Modified (new return type, internal logic)
      +_get_tables_info_and_detach(ctx, database_name) dict // Modified (used in new flow)
      +_detach_dbs(ctx, dbs)
    }

    class table_module {
      <<Module>>
      +read_local_table_metadata(ctx, table_local_metadata_path) String // New
      +change_table_uuid(ctx, database, table, table_local_metadata_path, new_uuid, attached) // Modified (internal path construction)
    }

    migration_module ..> DatabaseMetadata : Uses
    migration_module ..> table_module : Uses
Loading

File-Level Changes

Change Details Files
Refactor migration logic to use Zookeeper transactions and remove temporary DB usage
  • Replace temp DB creation with create_database_nodes and zk_client context
  • Split migrate logic into helper functions (_generate_counter, _create_query_node, _create_database_replica, _create_database_metadata_nodes)
  • Add consistency checks (_check_tables_consistent) and local metadata updates (_update_local_metadata_first_replica)
  • Unified commit result handling via _check_result_txn
  • Remove obsolete functions for table-to-temp mapping and direct zk updates
ch_tools/chadmin/internal/migration.py
Update CLI command to drive new migration flow
  • Remove temp_db variable and create_temp_db calls
  • Catch NodeExistsError to detect non-first replicas
  • Invoke create_database_nodes, then call new migrate functions without temp_db argument
  • Simplify exception logging with type information
ch_tools/chadmin/cli/database_group.py
Enhance table metadata utilities
  • Introduce read_local_table_metadata for consistent file reading
  • Adjust change_table_uuid to use f-string for metadata path
  • Refactor schema equality comparison in is_table_schema_equal
ch_tools/chadmin/internal/table.py
Extend database metadata model for replicated engines
  • Add set_replicated method to apply replicated engine and macros
  • Set shard and replica placeholders in metadata file
ch_tools/chadmin/cli/database_metadata.py
Tidy Zookeeper helper implementations
  • Adjust create_zk_nodes makepath parameter usage
  • Replace some direct zk updates with transaction-based calls
  • Import and use escape_for_zookeeper and format_path consistently
ch_tools/chadmin/internal/zookeeper.py
ch_tools/chadmin/internal/migration.py
Revise feature tests to cover new behaviors and schema-change scenario
  • Add test scenario for MergeTree schema update error path
  • Insert @require_version_24.8 tag and adjust restart steps
  • Remove redundant sleep and restart steps
  • Extend validations for data and zk node listings
tests/features/database_migrate.feature
Add project linting and refactoring configuration
  • Introduce .sourcery.yaml to define refactoring rules and disable specific lints
.sourcery.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Alex-Burmak - I've reviewed your changes - here's some feedback:

  • Consider refactoring the repeated ZooKeeper transaction building (multiple txn.create calls) into a data‐driven helper or builder to reduce duplication and improve readability.
  • Extract the commit‐and‐check‐result pattern into a single helper to DRY up the repeated txn.commit() and _check_result_txn logic.
  • Replace the hardcoded UUID substring logic in is_table_schema_equal with a dedicated parser or regex to make the intent clearer and avoid magic numbers.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Review instructions: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

txn.create(
path=format_path(
ctx,
f"/clickhouse/{migrating_database}/log/query-0000000001/finished/{shard}|{replica}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Use dynamic counter instead of hard-coded query index

Pass the dynamic counter to this function and use it in the path to ensure it matches the generated query ID.

logging.info(
"New metadata for node from mapping table:\n{}\n===", target_metadata
)
txn.create(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

metadata_path = row["metadata_path"]

if match_str_ch_version(get_version(ctx), "25.1"):
metadata_path = CLICKHOUSE_PATH + "/" + metadata_path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Suggested change
metadata_path = CLICKHOUSE_PATH + "/" + metadata_path
metadata_path = f"{CLICKHOUSE_PATH}/{metadata_path}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载