这是indexloc提供的服务,不要输入任何密码
Skip to content

[destination-bigquery] When using the GCS staging loading mechanism, if CSV file contains the ASCII 0 character, it cannot be loaded into bigquery #62134

@Amoodaa

Description

@Amoodaa

Connector Name

destination-bigquery

Connector Version

3.0.1

What step the error happened?

During the sync

Relevant information

data has a ASCII 0 character, i've been able to load this same the 3 weeks using the latest 2.x connector version, but now i've upgraded all deps (airbyte to 1.7.1, bigquery to 3.0.1) and now the full refresh is failing, i'm currently trying the bigquery write api instead of staging, but i dont think it will work out

the bigquery docs say you cant load using files with 0 ascii character
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#:~:text=Note%3A%20By%20default%2C%20if%20the%20CSV%20file%20contains%20the%20ASCII%200%20(NULL)%20character%2C%20you%20can%27t%20load%20the%20data%20into%20BigQuery.%20If%20you%20want%20to%20allow%20ASCII%200%20and%20other%20ASCII%20control%20characters%2C%20then%20set%20%2D%2Dpreserve_ascii_control_characters%3Dtrue%20to%20your%20load%20jobs.

airbyte_debug_logs.txt

Relevant log output

2025-06-28 09:53:59 replication-orchestrator INFO Failures: [ {
  "failureOrigin" : "destination",
  "failureType" : "system_error",
  "internalMessage" : "java.lang.RuntimeException: Failed to load CSV data from gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz to table airbyte_internal.airbyte_catco_dbrecords8360cc3a9463aa60e3102abea0255eb2",
  "externalMessage" : "Failed to load CSV data from gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz to table airbyte_internal.airbyte_catco_dbrecords8360cc3a9463aa60e3102abea0255eb2",
  "metadata" : {
    "attemptNumber" : 4,
    "jobId" : 9,
    "from_trace_message" : true,
    "connector_command" : "write"
  },
  "stacktrace" : "java.lang.RuntimeException: Failed to load CSV data from gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz to table airbyte_internal.airbyte_catco_dbrecords8360cc3a9463aa60e3102abea0255eb2\n\tat io.airbyte.integrations.destination.bigquery.write.bulk_loader.BigQueryBulkLoader.load(BigQueryBulkLoader.kt:64)\n\tat io.airbyte.integrations.destination.bigquery.write.bulk_loader.BigQueryBulkLoader.load(BigQueryBulkLoader.kt:33)\n\tat io.airbyte.cdk.load.pipeline.db.BulkLoaderTableLoader.accept(BulkLoaderTableLoader.kt:51)\n\tat io.airbyte.cdk.load.pipeline.db.BulkLoaderTableLoader.accept(BulkLoaderTableLoader.kt:23)\n\tat io.airbyte.cdk.load.task.internal.LoadPipelineStepTask$execute$$inlined$fold$1.emit(Reduce.kt:225)\n\tat kotlinx.coroutines.flow.FlowKt__ChannelsKt.emitAllImpl$FlowKt__ChannelsKt(Channels.kt:33)\n\tat kotlinx.coroutines.flow.FlowKt__ChannelsKt.access$emitAllImpl$FlowKt__ChannelsKt(Channels.kt:1)\n\tat kotlinx.coroutines.flow.FlowKt__ChannelsKt$emitAllImpl$1.invokeSuspend(Channels.kt)\n\tat kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)\n\tat kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)\n\tat kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:124)\n\tat kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)\n\tat kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)\nCaused by: com.google.cloud.bigquery.BigQueryException: An error occurred during execution of job: Job{job=JobId{project=cat-copwa, job=63e9d279-b113-413a-b92b-4cddc25fe29e, location=europe-west1}, status=JobStatus{state=RUNNING, error=null, executionErrors=null}, statistics=LoadStatistics{creationTime=1751093530755, endTime=null, startTime=1751093530839, numChildJobs=null, parentJobId=null, scriptStatistics=null, reservationUsage=null, transactionInfo=null, sessionInfo=null, totalSlotMs=null, inputBytes=null, inputFiles=null, outputBytes=null, outputRows=null, badRecords=null}, userEmail=airbyte-bq-write@cat-copwa.iam.gserviceaccount.com, etag=MnI4dyfPavAXOJiLHwqmsA==, generatedId=cat-copwa:europe-west1.63e9d279-b113-413a-b92b-4cddc25fe29e, selfLink=https://bigquery.googleapis.com/bigquery/v2/projects/cat-copwa/jobs/63e9d279-b113-413a-b92b-4cddc25fe29e?location=europe-west1, configuration=LoadJobConfiguration{type=LOAD, destinationTable=GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=airbyte_internal, projectId=cat-copwa, tableId=airbyte_catco_dbrecords8360cc3a9463aa60e3102abea0255eb2}}, decimalTargetTypes=null, destinationEncryptionConfiguration=null, createDisposition=null, writeDisposition=WRITE_APPEND, formatOptions=CsvOptions{type=CSV, allowJaggedRows=true, allowQuotedNewLines=true, encoding=null, fieldDelimiter=null, nullMarker=null, quote=null, skipLeadingRows=1, preserveAsciiControlCharacters=null}, nullMarker=\\N, maxBadRecords=null, schema=Schema{fields=[Field{name=_airbyte_raw_id, type=STRING, mode=REQUIRED, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=_airbyte_extracted_at, type=TIMESTAMP, mode=REQUIRED, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=_airbyte_meta, type=JSON, mode=REQUIRED, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=_airbyte_generation_id, type=INTEGER, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=__v, type=NUMERIC, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=_id, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=notes, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=stage, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=state, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=images, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=labels, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=damaged, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=actionAt, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=closedAt, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=location, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=lockedBy, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=metadata, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=reviewAt, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=assetType, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=createdAt, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=createdBy, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=decisions, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=deletedBy, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=flaggedAt, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=reference, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=responses, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=startedAt, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=updatedAt, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=updatedBy, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=workspace, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=attributes, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=lastAction, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=prevention, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=reasonCode, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=recordType, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=subscribed, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=accessGroup, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=lockedUntil, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=submittedAt, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=locationName, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=reviewStatus, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=stageChanges, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=linkedRecords, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=rulesActioned, type=JSON, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=userReference, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=_ab_cdc_cursor, type=INTEGER, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=clientResponse, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=timeToDecision, type=NUMERIC, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=submittedByRole, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=submittedByEmail, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=_ab_cdc_deleted_at, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}, Field{name=_ab_cdc_updated_at, type=STRING, mode=null, description=null, policyTags=null, maxLength=null, scale=null, precision=null, defaultValueExpression=null, collation=null, rangeElementType=null}]}, ignoreUnknownValue=null, sourceUris=[gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz], fileSetSpecType=null, columnNameCharacterMap=null, schemaUpdateOptions=null, autodetect=null, timePartitioning=null, clustering=null, useAvroLogicalTypes=null, labels=null, jobTimeoutMs=600000, rangePartitioning=null, hivePartitioningOptions=null, referenceFileSchemaUri=null, connectionProperties=null, createSession=null}}, \n For more details see Big Query Error collection: BigQueryError{reason=invalid, location=null, message=Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 1095765; errors: 5; max bad: 0; error percent: 0},\n BigQueryError{reason=invalid, location=gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz, message=Error while reading data, error message: Bad character (ASCII 0) encountered.; line_number: 90969 byte_offset_to_start_of_line: 172499870 column_index: 24 column_name: \"reference\" column_type: STRING value: \"Pen Test ref 6756...\" File: gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz},\n BigQueryError{reason=invalid, location=gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz, message=Error while reading data, error message: Bad character (ASCII 0) encountered.; line_number: 91402 byte_offset_to_start_of_line: 173242187 column_index: 39 column_name: \"locationName\" column_type: STRING value: \"..\\\\..\\\\..\\\\..\\\\..\\\\.....\" File: gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz},\n BigQueryError{reason=invalid, location=gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz, message=Error while reading data, error message: Bad character (ASCII 0) encountered.; line_number: 91408 byte_offset_to_start_of_line: 173250722 column_index: 39 column_name: \"locationName\" column_type: STRING value: \"../../../../../.....\" File: gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz},\n BigQueryError{reason=invalid, location=gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz, message=Error while reading data, error message: Bad character (ASCII 0) encountered.; line_number: 91963 byte_offset_to_start_of_line: 174365931 column_index: 39 column_name: \"locationName\" column_type: STRING value: \"..\\\\..\\\\..\\\\..\\\\..\\\\.....\" File: gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz},\n BigQueryError{reason=invalid, location=gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz, message=Error while reading data, error message: Bad character (ASCII 0) encountered.; line_number: 91969 byte_offset_to_start_of_line: 174373909 column_index: 39 column_name: \"locationName\" column_type: STRING value: \"../../../../../.....\" File: gs://cat-co-airbyte-bq-staging/sync/airbyte_mdb_euw1_cat-co_db/entries/2025/06/28/06/4acb80c2-c1fe-45d9-aa93-60335213444c2025_06_28_1751093320392_0.csv.gz}:\n\tat io.airbyte.integrations.destination.bigquery.BigQueryUtils.waitForJobFinish(BigQueryUtils.kt:223)\n\tat io.airbyte.integrations.destination.bigquery.write.bulk_loader.BigQueryBulkLoader.load(BigQueryBulkLoader.kt:62)\n\t... 15 more\nCaused by: com.google.cloud.bigquery.BigQueryException: Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 1095765; errors: 5; max bad: 0; error percent: 0\n\tat com.google.cloud.bigquery.Job.reload(Job.java:471)\n\tat com.google.cloud.bigquery.Job.waitForInternal(Job.java:290)\n\tat com.google.cloud.bigquery.Job.waitFor(Job.java:202)\n\tat io.airbyte.integrations.destination.bigquery.BigQueryUtils.waitForJobFinish(BigQueryUtils.kt:195)\n\t... 16 more\n",
  "timestamp" : 1751093637153
}

Contribute

  • Yes, I want to contribute

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions