-
Notifications
You must be signed in to change notification settings - Fork 15.9k
Closed
Closed
Copy link
Labels
area:providersgood first issuekind:featureFeature RequestsFeature Requestsprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issuesprovider:googleGoogle (including GCP) related issuesGoogle (including GCP) related issues
Description
Description
Add a flatten_structure parameter to GCSToS3Operator that removes directory structure from transferred files, uploading only the filename to the S3 destination path.
Use case/motivation
Current Behavior:
The GCSToS3Operator always preserves the full GCS object path (including the prefix) when uploading to S3, regardless of the keep_directory_structure setting.
For example:
GCSToS3Operator(
gcs_bucket="my-bucket",
prefix="data/2025/01/15/file.parquet",
dest_s3_key="s3://target-bucket/processed/2025/01/15/"
)
# GCS files: "data/2025/01/15/file.parquet"
# Results in: s3://target-bucket/processed/2025/01/15/data/2025/01/15/file.parquet
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Unwanted path duplication!It can lead to unwanted path duplication when users want to reorganize directory structures.
This makes it impossible to reorganize file structure during transfer without creating intermediate buckets or complex workarounds.
Desired Behavior:
With flatten_structure=True, only the filename would be uploaded, eliminating path duplication as well:
GCSToS3Operator(
gcs_bucket="my-bucket",
prefix="data/2025/01/15/file.parquet",
dest_s3_key="s3://target-bucket/processed/2025/01/15/",
flatten_structure=True
)
# GCS files: "data/2025/01/15/file.parquet"
# Results in: s3://target-bucket/processed/2025/01/15/file.parquet
# ^^^^^^^^^^^^^^^^^^^^^^^^
# Clean, organized path!Implementation:
def _transform_file_path(self, file_path: str) -> str:
if self.flatten_structure:
return os.path.basename(file_path)
return file_pathThis feature enables:
- Flexible path reorganization during cross-cloud transfers
- Cleaner S3 directory structures without GCS-specific paths
- Simplified integration with legacy systems expecting flat structures
- Eliminates need for post-processing scripts
- Reduced storage complexity and improved performance in S3 LIST operations
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area:providersgood first issuekind:featureFeature RequestsFeature Requestsprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issuesprovider:googleGoogle (including GCP) related issuesGoogle (including GCP) related issues