管理長時間執行的作業 (LRO)

批次處理方法呼叫會傳回「長時間執行的作業」,因為這類作業的完成時間較長,不適合做為 API 回應。這樣一來,在處理大量文件時,呼叫執行緒就不會保持開啟狀態。每次透過 API 或用戶端程式庫呼叫 projects.locations.processors.batchProcess 時,Document AI API 都會建立 LRO。LRO 會追蹤處理作業的狀態。

您可以使用 Document AI API 提供的作業方法,檢查長時間執行的作業狀態。您也可以列出輪詢取消 LRO。用戶端程式庫會呼叫非同步方法,在內部輪詢,以啟用回呼。(如果是 Python,則會啟用 await)。也提供逾時參數。在 .batchProcess 傳回的主要 LRO 中,系統會為每份文件建立 LRO (因為批次頁數限制遠高於同步 process 呼叫,且可能需要相當長的時間處理)。主要 LRO 結束時,系統會提供每個文件 LRO 的詳細狀態。

LRO 是在 Google Cloud 專案和位置層級管理。 向 API 發出要求時,請一併提供 Google Cloud project 和 LRO 執行的位置。

LRO 記錄會在 LRO 完成後保留約 30 天,因此您無法在該時間點後查看或列出 LRO。

取得長時間執行的作業詳細資料

下列範例說明如何取得 LRO 的詳細資料。

REST

如要取得 LRO 的狀態並查看詳細資料,請呼叫 projects.locations.operations.get 方法。

假設您在呼叫 projects.locations.processors.batchProcess 後收到下列回應:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID"
}

回應中的 name 值顯示 Document AI API 已建立名為 projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID 的 LRO。

您也可以列出長時間執行的作業,以擷取 LRO 名稱。

使用任何要求資料之前,請先替換以下項目:

  • PROJECT_ID:您的 Google Cloud 專案 ID。
  • LOCATION:執行 LRO 的位置,例如:
    • us - 美國
    • eu - 歐盟
  • OPERATION_ID:作業 ID。ID 是作業名稱的最後一個元素。舉例來說:
    • 作業名稱:projects/PROJECT_ID/locations/LOCATION/operations/bc4e1d412863e626
    • 作業 ID:bc4e1d412863e626

HTTP 方法和網址:

GET https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID

如要傳送要求,請選擇以下其中一個選項:

curl

執行下列指令:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"

PowerShell

執行下列指令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessMetadata",
    "state": "SUCCEEDED",
    "stateMessage": "Processed 1 document(s) successfully",
    "createTime": "TIMESTAMP",
    "updateTime": "TIMESTAMP",
    "individualProcessStatuses": [
      {
        "inputGcsSource": "INPUT_BUCKET_FOLDER/DOCUMENT1.ext",
        "status": {},
        "outputGcsDestination": "OUTPUT_BUCKET_FOLDER/OPERATION_ID/0",
        "humanReviewStatus": {
          "state": "ERROR",
          "stateMessage": "Sharded document protos are not supported for human review."
        }
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessResponse"
  }
}

Go

詳情請參閱 Document AI Go API 參考說明文件

如要向 Document AI 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


package main

import (
	"context"

	documentai "cloud.google.com/go/documentai/apiv1"
	longrunningpb "cloud.google.com/go/longrunning/autogen/longrunningpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := documentai.NewDocumentProcessorClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &longrunningpb.GetOperationRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/longrunning/autogen/longrunningpb#GetOperationRequest.
	}
	resp, err := c.GetOperation(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Python

詳情請參閱 Document AI Python API 參考說明文件

如要向 Document AI 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import GetOperationRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"
# operation_name = "YOUR_OPERATION_NAME"  # Format is "projects/{project_id}/locations/{location}/operations/{operation_id}"


def get_operation_sample(location: str, operation_name: str) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    request = GetOperationRequest(name=operation_name)

    # Make GetOperation request
    operation = client.get_operation(request=request)
    # Print the Operation Information
    print(operation)

列出長時間執行的作業

下列範例說明如何列出 Google Cloud 專案和位置中的 LRO。

REST

如要列出 Google Cloud 專案和位置中的 LRO,請呼叫 projects.locations.operations.list 方法。

使用任何要求資料之前,請先替換以下項目:

  • PROJECT_ID:您的 Google Cloud 專案 ID。
  • LOCATION:一或多個 LRO 的執行位置,例如:
    • us - 美國
    • eu - 歐盟
  • FILTER:(必要) 要傳回的 LRO 查詢。選項:
    • TYPE:(必填) 要列出的 LRO 類型。選項:
      • BATCH_PROCESS_DOCUMENTS
      • CREATE_PROCESSOR_VERSION
      • DELETE_PROCESSOR
      • ENABLE_PROCESSOR
      • DISABLE_PROCESSOR
      • UPDATE_HUMAN_REVIEW_CONFIG
      • HUMAN_REVIEW_EVENT
      • CREATE_LABELER_POOL
      • UPDATE_LABELER_POOL
      • DELETE_LABELER_POOL
      • DEPLOY_PROCESSOR_VERSION
      • UNDEPLOY_PROCESSOR_VERSION
      • DELETE_PROCESSOR_VERSION
      • SET_DEFAULT_PROCESSOR_VERSION
      • EVALUATE_PROCESSOR_VERSION
      • EXPORT_PROCESSOR_VERSION
      • UPDATE_DATASET
      • IMPORT_DOCUMENTS
      • ANALYZE_HITL_DATA
      • BATCH_MOVE_DOCUMENTS
      • RESYNC_DATASET
      • BATCH_DELETE_DOCUMENTS
      • DELETE_DATA_LABELING_JOB
      • EXPORT_DOCUMENTS

HTTP 方法和網址:

GET https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations?filter=TYPE=TYPE

如要傳送要求,請選擇以下其中一個選項:

curl

執行下列指令:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations?filter=TYPE=TYPE"

PowerShell

執行下列指令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations?filter=TYPE=TYPE" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應:

{
  "operations": [
    {
      "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessMetadata",
        "state": "SUCCEEDED",
        "stateMessage": "Processed 1 document(s) successfully",
        "createTime": "TIMESTAMP",
        "updateTime": "TIMESTAMP",
        "individualProcessStatuses": [
          {
            "inputGcsSource": "INPUT_BUCKET_FOLDER/DOCUMENT1.ext",
            "status": {},
            "outputGcsDestination": "OUTPUT_BUCKET_FOLDER/OPERATION_ID/0",
            "humanReviewStatus": {
              "state": "ERROR",
              "stateMessage": "Sharded document protos are not supported for human review."
            }
          }
        ]
      },
      "done": true,
      "response": {
        "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessResponse"
      }
    },
    ...
  ]
}

Go

詳情請參閱 Document AI Go API 參考說明文件

如要向 Document AI 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


package main

import (
	"context"

	documentai "cloud.google.com/go/documentai/apiv1"
	longrunningpb "cloud.google.com/go/longrunning/autogen/longrunningpb"
	"google.golang.org/api/iterator"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := documentai.NewDocumentProcessorClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &longrunningpb.ListOperationsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/longrunning/autogen/longrunningpb#ListOperationsRequest.
	}
	it := c.ListOperations(ctx, req)
	for {
		resp, err := it.Next()
		if err == iterator.Done {
			break
		}
		if err != nil {
			// TODO: Handle error.
		}
		// TODO: Use resp.
		_ = resp

		// If you need to access the underlying RPC response,
		// you can do so by casting the `Response` as below.
		// Otherwise, remove this line. Only populated after
		// first call to Next(). Not safe for concurrent access.
		_ = it.Response.(*longrunningpb.ListOperationsResponse)
	}
}

Python

詳情請參閱 Document AI Python API 參考說明文件

如要向 Document AI 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import ListOperationsRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"

# Create filter in https://google.aip.dev/160 syntax
# For full options, refer to:
# https://cloud.google.com/document-ai/docs/long-running-operations#listing_long-running_operations
# operations_filter = 'YOUR_FILTER'

# Example:
# operations_filter = "TYPE=BATCH_PROCESS_DOCUMENTS AND STATE=RUNNING"


def list_operations_sample(
    project_id: str, location: str, operations_filter: str
) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # Format: `projects/{project_id}/locations/{location}`
    name = client.common_location_path(project=project_id, location=location)
    request = ListOperationsRequest(
        name=f"{name}/operations",
        filter=operations_filter,
    )

    # Make ListOperations request
    operations = client.list_operations(request=request)

    # Print the Operation Information
    print(operations)

輪詢長時間執行的作業

下列範例說明如何輪詢 LRO 的狀態。

REST

如要輪詢 LRO,請重複呼叫 projects.locations.operations.get 方法,直到作業完成為止。請在每次輪詢要求之間使用輪詢,例如每 10 秒就進行輪詢。

使用下方的任何要求資料之前,請先替換以下項目:

  • PROJECT_ID:您的 Google Cloud 專案 ID
  • LOCATION:執行 LRO 的位置
  • OPERATION_ID:LRO 的 ID

HTTP 方法和網址:

GET https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID
  

如要傳送要求,請選擇以下其中一個選項:

curl

執行下列指令,每 10 秒輪詢一次 LRO 狀態:

while true; \
    do curl -X GET \
    -H "Authorization: Bearer "$(gcloud auth print-access-token) \
    "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"; \
    sleep 10; \
    done

您應該每 10 秒就會收到一次 JSON 回應。作業執行期間,回應會包含 "state": "RUNNING"。 作業完成後,回應會包含 "state": "SUCCEEDED""done": true

PowerShell

執行下列指令,每十秒輪詢一次 LRO 的狀態:

$cred = gcloud auth print-access-token
$headers = @{ Authorization = "Bearer $cred" }

Do {
  Invoke-WebRequest `
    -Method Get `
    -Headers $headers `
    -Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content
sleep 10
}

while ($true)

您應該每 10 秒就會收到一次 JSON 回應。作業執行期間,回應會包含 "state": "RUNNING"。 作業完成後,回應會包含 "state": "SUCCEEDED""done": true

Python

詳情請參閱 Document AI Python API 參考說明文件

如要向 Document AI 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


from time import sleep

from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import GetOperationRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"
# operation_name = "YOUR_OPERATION_NAME"  # Format is "projects/{project_id}/locations/{location}/operations/{operation_id}"


def poll_operation_sample(location: str, operation_name: str) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    request = GetOperationRequest(name=operation_name)

    while True:
        # Make GetOperation request
        operation = client.get_operation(request=request)
        # Print the Operation Information
        print(operation)

        # Stop polling when Operation is no longer running
        if operation.done:
            break

        # Wait 10 seconds before polling again
        sleep(10)

取消長時間執行的作業

下列範例說明如何在 LRO 執行期間取消作業。

REST

如要取消 LRO,請呼叫 projects.locations.operations.cancel 方法。

使用任何要求資料之前,請先替換以下項目:

  • PROJECT_ID:您的 Google Cloud 專案 ID。
  • LOCATION:執行 LRO 的位置,例如:
    • us - 美國
    • eu - 歐盟
  • OPERATION_ID:作業 ID。ID 是作業名稱的最後一個元素。舉例來說:
    • 作業名稱:projects/PROJECT_ID/locations/LOCATION/operations/bc4e1d412863e626
    • 作業 ID:bc4e1d412863e626

HTTP 方法和網址:

POST https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID:cancel

如要傳送要求,請選擇以下其中一個選項:

curl

執行下列指令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID:cancel"

PowerShell

執行下列指令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID:cancel" | Select-Object -Expand Content

您應該會收到如下的 JSON 回應:

{}
如果您嘗試取消已完成的作業,應該會收到下列錯誤訊息:
  "error": {
      "code": 400,
      "message": "Operation has completed and cannot be cancelled: 'PROJECT_ID/locations/LOCATION/operations/OPERATION_ID'.",
      "status": "FAILED_PRECONDITION"
  }

Go

詳情請參閱 Document AI Go API 參考說明文件

如要向 Document AI 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


package main

import (
	"context"

	documentai "cloud.google.com/go/documentai/apiv1"
	longrunningpb "cloud.google.com/go/longrunning/autogen/longrunningpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := documentai.NewDocumentProcessorClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &longrunningpb.CancelOperationRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/longrunning/autogen/longrunningpb#CancelOperationRequest.
	}
	err = c.CancelOperation(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}
}

Python

詳情請參閱 Document AI Python API 參考說明文件

如要向 Document AI 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import CancelOperationRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"
# operation_name = "YOUR_OPERATION_NAME"  # Format is "projects/{project_id}/locations/{location}/operations/{operation_id}"


def cancel_operation_sample(location: str, operation_name: str) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    request = CancelOperationRequest(name=operation_name)

    # Make CancelOperation request
    client.cancel_operation(request=request)
    print(f"Operation {operation_name} cancelled")