这是indexloc提供的服务,不要输入任何密码
Skip to content

Update Compact to migrate to new tp_index schema #413

@kaidaguerre

Description

@kaidaguerre

We will change all plugins to default to a single tp_index value of default

tailpipe compact should be modified to re-index the files to use the new scheme - eg if you have already collected cloudtrail logs and the tp_index is by account id, we will change the tp_index for all rows to default, and then compact the files into that new hive structure (all accounts in a single file per day, in the tp_index=default folder). In the event that the user defines a tp_index (see below), tailpipe compact should re-index to that scheme instead.

A user may optionally choose to use a column as an index on a per-partition basis, eg:

 partition "aws_cloudtrail_log" "s3_bucket_us_east_1" {
    source "aws_s3_bucket" {
      connection  = connection.aws.account_a
      bucket      = "aws-cloudtrail-logs-account-a"
      file_layout = `AWSLogs/(%{DATA:org_id}/)?%{NUMBER:account_id}/CloudTrail/us-east-1/%{DATA}.json.gz`   
    }
    tp_index = "account_id"
  }

The user may create a "composite index" by using a function instead. The syntax should be the same as for the transform column argument:

  partition "aws_cloudtrail_log" "s3_bucket_us_east_1" {
    source "aws_s3_bucket" {
      connection  = connection.aws.account_a
      bucket      = "aws-cloudtrail-logs-account-a"
      file_layout = `AWSLogs/(%{DATA:org_id}/)?%{NUMBER:account_id}/CloudTrail/us-east-1/%{DATA}.json.gz`   
    }
    tp_index = "concat(account_id, '_', region)"
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions