-
Notifications
You must be signed in to change notification settings - Fork 201
Migrate gsutil usage to gcloud storage #4327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -184,11 +184,9 @@ | |
| "if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n", | ||
| " BUCKET_URI = f\"gs://{PROJECT_ID}-tmp-{now}-{str(uuid.uuid4())[:4]}\"\n", | ||
| " BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n", | ||
| " ! gsutil mb -l {REGION} {BUCKET_URI}\n", | ||
| "else:\n", | ||
| " ! gcloud storage buckets create --location={REGION} {BUCKET_URI}\n", "else:\n", | ||
| " assert BUCKET_URI.startswith(\"gs://\"), \"BUCKET_URI must start with `gs://`.\"\n", | ||
| " shell_output = ! gsutil ls -Lb {BUCKET_NAME} | grep \"Location constraint:\" | sed \"s/Location constraint://\"\n", | ||
| " bucket_region = shell_output[0].strip().lower()\n", | ||
| " shell_output = ! gcloud storage ls --full --buckets {BUCKET_NAME} | grep \"Location constraint:\" | sed \"s/Location constraint://\"\n", " bucket_region = shell_output[0].strip().lower()\n", | ||
| " if bucket_region != REGION:\n", | ||
| " raise ValueError(\n", | ||
| " \"Bucket region %s is different from notebook region %s\"\n", | ||
|
|
@@ -212,8 +210,7 @@ | |
| "\n", | ||
| "\n", | ||
| "# Provision permissions to the SERVICE_ACCOUNT with the GCS bucket\n", | ||
| "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.admin $BUCKET_NAME\n", | ||
| "\n", | ||
| "! gcloud storage buckets add-iam-policy-binding $BUCKET_NAME --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/storage.admin\n", "\n", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| "! gcloud config set project $PROJECT_ID\n", | ||
| "! gcloud projects add-iam-policy-binding --no-user-output-enabled {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=\"roles/storage.admin\"\n", | ||
| "! gcloud projects add-iam-policy-binding --no-user-output-enabled {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=\"roles/aiplatform.user\"" | ||
|
|
@@ -361,9 +358,7 @@ | |
| "if dataset_validation_util.is_gcs_path(pretrained_model_id):\n", | ||
| " # Download tokenizer.\n", | ||
| " ! mkdir tokenizer\n", | ||
| " ! gsutil cp {pretrained_model_id}/tokenizer.json ./tokenizer\n", | ||
| " ! gsutil cp {pretrained_model_id}/config.json ./tokenizer\n", | ||
| " tokenizer_path = \"./tokenizer\"\n", | ||
| " ! gcloud storage cp {pretrained_model_id}/tokenizer.json ./tokenizer\n", " ! gcloud storage cp {pretrained_model_id}/config.json ./tokenizer\n", " tokenizer_path = \"./tokenizer\"\n", | ||
| " access_token = \"\"\n", | ||
| "else:\n", | ||
| " tokenizer_path = pretrained_model_id\n", | ||
|
|
@@ -1031,8 +1026,7 @@ | |
| "\n", | ||
| "delete_bucket = False # @param {type:\"boolean\"}\n", | ||
| "if delete_bucket:\n", | ||
| " ! gsutil -m rm -r $BUCKET_NAME" | ||
| ] | ||
| " ! gcloud storage rm --recursive $BUCKET_NAME" ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -533,8 +533,7 @@ | |
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI" | ||
| ] | ||
| "! gcloud storage buckets create --location=$REGION --project=$PROJECT_ID $BUCKET_URI" ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
|
|
@@ -553,8 +552,7 @@ | |
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "! gsutil ls -al $BUCKET_URI" | ||
| ] | ||
| "! gcloud storage ls --all-versions --long $BUCKET_URI" ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
|
|
@@ -623,10 +621,10 @@ | |
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator {BUCKET_URI}\n", | ||
| "# Note: Migrating scripts using gsutil iam ch is more complex than get or set. You need to replace the single iam ch command with a series of gcloud storage bucket add-iam-policy-binding and/or gcloud storage bucket remove-iam-policy-binding commands, or replicate the read-modify-write loop.\n", | ||
| "! gcloud storage buckets add-iam-policy-binding {BUCKET_URI} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/storage.objectCreator\n", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| "\n", | ||
| "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer {BUCKET_URI}" | ||
| ] | ||
| "! gcloud storage buckets add-iam-policy-binding {BUCKET_URI} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/storage.objectViewer" ] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
|
|
@@ -760,8 +758,7 @@ | |
| "PUBLIC_DATA_URI = \"gs://cloud-samples-data/vertex-ai/pipeline-deployment/datasets/oracle_retail/orders.csv\"\n", | ||
| "RAW_DATA_URI = f\"{BUCKET_URI}/{DATA_PATH}/raw/orders.csv\"\n", | ||
| "\n", | ||
| "! gsutil -m cp -R $PUBLIC_DATA_URI $RAW_DATA_URI" | ||
| ] | ||
| "! gcloud storage cp --recursive $PUBLIC_DATA_URI $RAW_DATA_URI" ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
|
|
@@ -780,8 +777,7 @@ | |
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "! gsutil cat {RAW_DATA_URI} | head" | ||
| ] | ||
| "! gcloud storage cat {RAW_DATA_URI} | head" ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
|
|
@@ -1377,14 +1373,11 @@ | |
| " + \"/evaluation_metrics\"\n", | ||
| " )\n", | ||
| " if tf.io.gfile.exists(EXECUTE_OUTPUT):\n", | ||
| " ! gsutil cat $EXECUTE_OUTPUT\n", | ||
| " return EXECUTE_OUTPUT\n", | ||
| " ! gcloud storage cat $EXECUTE_OUTPUT\n", " return EXECUTE_OUTPUT\n", | ||
| " elif tf.io.gfile.exists(GCP_RESOURCES):\n", | ||
| " ! gsutil cat $GCP_RESOURCES\n", | ||
| " return GCP_RESOURCES\n", | ||
| " ! gcloud storage cat $GCP_RESOURCES\n", " return GCP_RESOURCES\n", | ||
| " elif tf.io.gfile.exists(EVAL_METRICS):\n", | ||
| " ! gsutil cat $EVAL_METRICS\n", | ||
| " return EVAL_METRICS\n", | ||
| " ! gcloud storage cat $EVAL_METRICS\n", " return EVAL_METRICS\n", | ||
| "\n", | ||
| " return None\n", | ||
| "\n", | ||
|
|
@@ -1455,8 +1448,7 @@ | |
| "# delete bucket\n", | ||
| "delete_bucket = True\n", | ||
| "if os.getenv(\"IS_TESTING\") or delete_bucket:\n", | ||
| " ! gsutil -m rm -r $BUCKET_URI\n", | ||
| "\n", | ||
| " ! gcloud storage rm --recursive $BUCKET_URI\n", "\n", | ||
| "\n", | ||
| "# Remove local resorces\n", | ||
| "! rm -rf {KFP_COMPONENTS_PATH}\n", | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -303,8 +303,7 @@ | |
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}" | ||
| ] | ||
| "! gcloud storage buckets create --location={LOCATION} --project={PROJECT_ID} {BUCKET_URI}" ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
|
|
@@ -655,8 +654,7 @@ | |
| ")\n", | ||
| "TRAINING_URI = f\"{BUCKET_URI}/model-monitoring/churn/churn_training.csv\"\n", | ||
| "\n", | ||
| "! gsutil copy $PUBLIC_TRAINING_DATASET $TRAINING_URI\n", | ||
| "\n", | ||
| "# TODO: Command \"gsutil copy\" not found in migration guide. Manual review required.\n", "\n", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| "TRAINING_DATASET = ml_monitoring.spec.MonitoringInput(\n", | ||
| " gcs_uri=TRAINING_URI, data_format=\"csv\"\n", | ||
| ")" | ||
|
|
@@ -1123,8 +1121,7 @@ | |
| "FEATURE_ATTRIBUTION_BASELINE_DATASET = (\n", | ||
| " f\"{BUCKET_URI}/model-monitoring/churn/churn_no_ground_truth.jsonl\"\n", | ||
| ")\n", | ||
| "! gsutil cp gs://cloud-samples-data/vertex-ai/model-monitoring/churn/churn_no_ground_truth.jsonl $FEATURE_ATTRIBUTION_BASELINE_DATASET" | ||
| ] | ||
| "! gcloud storage cp gs://cloud-samples-data/vertex-ai/model-monitoring/churn/churn_no_ground_truth.jsonl $FEATURE_ATTRIBUTION_BASELINE_DATASET" ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the command on line 628 and to prevent potential shell parsing issues, it's recommended to quote the values for the
--memberand--rolearguments.