这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@pskrbasu
Copy link
Contributor

@pskrbasu pskrbasu commented Sep 9, 2025

Warning: DuckLake migration complete.
- Tables: 12/12 migrated (failed: 0, remaining: 0)
- Parquet files: 156/156 migrated (failed: 0, remaining: 0)
- Backup of migrated legacy data: '/Users/pskrbasu/.tailpipe/migration/migrated/default'

@pskrbasu pskrbasu self-assigned this Sep 9, 2025
@pskrbasu pskrbasu force-pushed the ducklake-migration branch 2 times, most recently from 4c7ef1e to 551073c Compare September 15, 2025 06:27
@pskrbasu pskrbasu marked this pull request as ready for review September 15, 2025 07:32
Copy link
Contributor

@kaidaguerre kaidaguerre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just invoke migration from the root command rather than wrap every run func with withMigration?

added initial comments - we nned to think carefully about failures and rollback, and what state would be after potential failures

}
// create the DuckDB table fpr this partition if it does not already exist
if err := w.ensureDuckLakeTable(w.Partition.TableName); err != nil {
slog.Error("schema", "name", w.tableSchema.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logging should be removed?


// determine whether we have a ducklake table for this table, and if so, whether it needs schema updating
func (w *Converter) ensureDuckLakeTable(tableName string) error {
func EnsureDuckLakeTable(columns []*schema.ColumnSchema, db *DuckDb, tableName string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename file ducklake_table.go
are there still any convertor methods in this file? if so move to an existing convertor file


return schema, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this code copied from somehwere or written from scratch - if its a copy of existing code, mention that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was old code which was used in inspect. Was removed in this version, so re-added them with 'legacy' named

"github.com/turbot/tailpipe/internal/filepaths"
)

func MigrateDataToDucklake(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function comment

// if the ~/.tailpipe/data directory has a .db file, it means that this is the first time we are migrating
// if the ~/.tailpipe/migration/migrating directory has a .db file, it means that this is a resume migration
initialMigration := hasTailpipeDb(dataDefaultDir)
continueMigration := hasTailpipeDb(migratingDefaultDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate these are not both true

if err := os.MkdirAll(config.GlobalWorkspaceProfile.GetMigratedDir(), 0755); err != nil {
return err
}
if err := utils.CopyFile(filepath.Join(dataDefaultDir, "tailpipe.db"), filepath.Join(config.GlobalWorkspaceProfile.GetMigratedDir(), "tailpipe.db")); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this move rather than copy to avoid the db file existing in 2 places? Otherwise if copy fails state will be confusing

Is MoveDirContents transactional? If it fails - will all files be in original location?

}
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we handle/report parquet files for tables with no views?

slog.Info("Found leaf node with parquet files", "table", tableName, "dir", dirPath, "files", filesInLeaf)

// Begin transaction
tx, err := db.BeginTx(ctx, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put the actual file migration into a separate func

Copy link
Contributor

@kaidaguerre kaidaguerre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// set the max memory if specified
setMemoryLimit()

// migrate legacy data to ducklake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better comment - mention tailpipe version numbers and native parquet vs ducklake

sigCtx, stop := signal.NotifyContext(baseCtx, syscall.SIGINT, syscall.SIGTERM)
// store stop so we can cancel after command completes (postRunHook)
signalStopFn = stop
cmd.SetContext(sigCtx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this doing? why do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed. removednot needed. removed

err := migration.MigrateDataToDucklake(sigCtx)
if error_helpers.IsContextCancelledError(err) {
// suppress Cobra's usage/errors only for this cancelled invocation
cmd.SilenceUsage = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Cobra prints usage when a command returns an error. My cancellation returns an error (context cancelled) from preRun, so Cobra assumes “user error” and shows help.
  • The conditional snippet sets cmd.SilenceUsage = true and cmd.SilenceErrors = true only for cancellation, telling Cobra “don’t print usage or re-print the error.” Without it, you get the usage dump.

select {
case <-time.After(100 * time.Millisecond):
tasksCancelFn()
// ensure we cleanup signal context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better comment - whet is signalContext? why do we need this mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed. removed


// validate: both should not be true - return that last migration left things in a bad state
if initialMigration && continueMigration {
return fmt.Errorf("Bad state - migration cannot proceed because both data and migrating directories contain tailpipe.db files.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we mention contacting support when we report this error?


// STEP 3: If this is the first time we are migrating(tables in ~/.tailpipe/data) then move the whole contents of data dir
// into ~/.tailpipe/migration/migrating respecting the same folder structure.
// First-run: copy tailpipe.db into migrated immediately, then move data contents into migrating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is out of date? instead just mention thast the process is transactional - there will only event be a single copy of db file and it will be wherever the data files are...

if err = os.MkdirAll(migratedDir, 0755); err != nil {
return err
}
if err = utils.CopyDir(dataDefaultDir, migratingDefaultDir); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have more explicit comments about the ordering of this and the reason

@pskrbasu pskrbasu dismissed kaidaguerre’s stale review September 16, 2025 13:35

addressed the requested changes

@pskrbasu pskrbasu merged commit cee6e42 into develop Sep 16, 2025
14 checks passed
@pskrbasu pskrbasu deleted the ducklake-migration branch September 16, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants