-
Notifications
You must be signed in to change notification settings - Fork 11
Ducklake migration #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ducklake migration #552
Conversation
4c7ef1e to
551073c
Compare
kaidaguerre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just invoke migration from the root command rather than wrap every run func with withMigration?
added initial comments - we nned to think carefully about failures and rollback, and what state would be after potential failures
internal/database/convertor.go
Outdated
| } | ||
| // create the DuckDB table fpr this partition if it does not already exist | ||
| if err := w.ensureDuckLakeTable(w.Partition.TableName); err != nil { | ||
| slog.Error("schema", "name", w.tableSchema.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logging should be removed?
|
|
||
| // determine whether we have a ducklake table for this table, and if so, whether it needs schema updating | ||
| func (w *Converter) ensureDuckLakeTable(tableName string) error { | ||
| func EnsureDuckLakeTable(columns []*schema.ColumnSchema, db *DuckDb, tableName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename file ducklake_table.go
are there still any convertor methods in this file? if so move to an existing convertor file
|
|
||
| return schema, nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this code copied from somehwere or written from scratch - if its a copy of existing code, mention that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was old code which was used in inspect. Was removed in this version, so re-added them with 'legacy' named
| "github.com/turbot/tailpipe/internal/filepaths" | ||
| ) | ||
|
|
||
| func MigrateDataToDucklake(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function comment
| // if the ~/.tailpipe/data directory has a .db file, it means that this is the first time we are migrating | ||
| // if the ~/.tailpipe/migration/migrating directory has a .db file, it means that this is a resume migration | ||
| initialMigration := hasTailpipeDb(dataDefaultDir) | ||
| continueMigration := hasTailpipeDb(migratingDefaultDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate these are not both true
internal/migration/migration.go
Outdated
| if err := os.MkdirAll(config.GlobalWorkspaceProfile.GetMigratedDir(), 0755); err != nil { | ||
| return err | ||
| } | ||
| if err := utils.CopyFile(filepath.Join(dataDefaultDir, "tailpipe.db"), filepath.Join(config.GlobalWorkspaceProfile.GetMigratedDir(), "tailpipe.db")); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this move rather than copy to avoid the db file existing in 2 places? Otherwise if copy fails state will be confusing
Is MoveDirContents transactional? If it fails - will all files be in original location?
| } | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we handle/report parquet files for tables with no views?
internal/migration/migration.go
Outdated
| slog.Info("Found leaf node with parquet files", "table", tableName, "dir", dirPath, "files", filesInLeaf) | ||
|
|
||
| // Begin transaction | ||
| tx, err := db.BeginTx(ctx, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put the actual file migration into a separate func
kaidaguerre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal/cmdconfig/cmd_hooks.go
Outdated
| // set the max memory if specified | ||
| setMemoryLimit() | ||
|
|
||
| // migrate legacy data to ducklake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better comment - mention tailpipe version numbers and native parquet vs ducklake
internal/cmdconfig/cmd_hooks.go
Outdated
| sigCtx, stop := signal.NotifyContext(baseCtx, syscall.SIGINT, syscall.SIGTERM) | ||
| // store stop so we can cancel after command completes (postRunHook) | ||
| signalStopFn = stop | ||
| cmd.SetContext(sigCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this doing? why do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed. removednot needed. removed
| err := migration.MigrateDataToDucklake(sigCtx) | ||
| if error_helpers.IsContextCancelledError(err) { | ||
| // suppress Cobra's usage/errors only for this cancelled invocation | ||
| cmd.SilenceUsage = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Cobra prints usage when a command returns an error. My cancellation returns an error (context cancelled) from preRun, so Cobra assumes “user error” and shows help.
- The conditional snippet sets
cmd.SilenceUsage = trueandcmd.SilenceErrors = trueonly for cancellation, telling Cobra “don’t print usage or re-print the error.” Without it, you get the usage dump.
internal/cmdconfig/cmd_hooks.go
Outdated
| select { | ||
| case <-time.After(100 * time.Millisecond): | ||
| tasksCancelFn() | ||
| // ensure we cleanup signal context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better comment - whet is signalContext? why do we need this mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed. removed
internal/migration/migration.go
Outdated
|
|
||
| // validate: both should not be true - return that last migration left things in a bad state | ||
| if initialMigration && continueMigration { | ||
| return fmt.Errorf("Bad state - migration cannot proceed because both data and migrating directories contain tailpipe.db files.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we mention contacting support when we report this error?
internal/migration/migration.go
Outdated
|
|
||
| // STEP 3: If this is the first time we are migrating(tables in ~/.tailpipe/data) then move the whole contents of data dir | ||
| // into ~/.tailpipe/migration/migrating respecting the same folder structure. | ||
| // First-run: copy tailpipe.db into migrated immediately, then move data contents into migrating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is out of date? instead just mention thast the process is transactional - there will only event be a single copy of db file and it will be wherever the data files are...
internal/migration/migration.go
Outdated
| if err = os.MkdirAll(migratedDir, 0755); err != nil { | ||
| return err | ||
| } | ||
| if err = utils.CopyDir(dataDefaultDir, migratingDefaultDir); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have more explicit comments about the ordering of this and the reason
…g migration status. - Introduced function to pre-compute total parquet files across matched directories. - Updated struct to track total, migrated, and failed parquet files. - Modified migration functions to utilize new status updates for parquet files during migration. - Improved spinner status messages to include parquet file migration progress.
7c65ea9 to
beecf73
Compare
addressed the requested changes
Uh oh!
There was an error while loading. Please reload this page.