这是indexloc提供的服务,不要输入任何密码
Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

BigQueryTableInserter java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep #451

@dhalperi

Description

@dhalperi

Problem

Dataflow SDK for Java 1.7.0 introduced a performance regression in BigQueryIO.Write when using BigQuery's streaming inserts. Users may see the following stack trace in their logs:

java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method) 
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:287) 
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.flushRows(BigQueryIO.java:2446) 
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.finishBundle(BigQueryIO.java:2404) 

Solution

The fix for this issue was merged into the GitHub master branch in #448. It will be included in the upcoming 1.8.0 release of the Dataflow SDK for Java, which is expected to be available by October 4th.

Impact

Streaming

When run on the Cloud Dataflow service in streaming mode, this issue will result in a slightly higher error rate. However, thanks to Dataflow's and BigQuery's retry policies, there will be no lost or duplicated data.

Most streaming jobs should see little impact from this regression. However, jobs near the BigQuery quota of 100K inserts/sec may cross that threshold because of additional retries, which could result in the job falling further and further behind. These users are advised to temporarily remain on the 1.6.1 version of the SDK or update existing 1.7.0 jobs back to the 1.6.1 SDK.

Batch

Normal batch usage of BigQueryIO.Write is not affected by this issue.

It is possible to encounter this issue in batch when using BigQueryIO.Write with per-window sharding, though the BigQueryIO.Write documentation already warns against this unsupported use. Batch pipelines that use this unsupported code may fail due to the increased error rate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions