这是indexloc提供的服务,不要输入任何密码
Skip to content

[Bug] Bloom Filter file pruning misbehavior on Linux with Iceberg 1.9.x (Spark 3.5.3) #13635

@wfxxh

Description

@wfxxh

Apache Iceberg version

1.9.2 (latest release)

Query engine

Spark

Please describe the bug 🐞

🧭 Problem Summary

My ENV is :

  • iceberg: 1.7.x ~1.9.x
  • spark: 3.5.3
  • jdk: openjdk version "17.0.2" 2022-01-18
  • linux: 5.12.5-1.el7.elrepo.x86_64

When querying a partitioned Iceberg table (by year, month) with Parquet bloom filter enabled on a STRING column (resource_id), the query returns 0 rows on Linux (Iceberg 1.9.x + Spark 3.5.3) but returns correctly on Windows or when downgrading to Iceberg 1.7.x.

This discrepancy leads to incorrect query results and is platform-dependent.


📦 Table DDL

CREATE TABLE IF NOT EXISTS iceberg_catalog.test.xxh (
date_time TIMESTAMP,
operate_type INT,
resource_id STRING,
year INT,
month INT,
day INT
)
USING iceberg
PARTITIONED BY (year, month)
TBLPROPERTIES (
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '2',
'write.parquet.bloom-filter-enabled.column.resource_id' = 'true',
'write.parquet.compression-codec' = 'zstd',
'write.target-file-size-bytes' = '4294967296'
);


on linux iceberg v1.7.2 return correct result, v1.9.2 can not return correct result

Image

on windows iceberg v1.9.2 can return correct result

Image

spark version > 3.5.3 with iceberg 1.7.1 will get another error

Image

when i use this code it worked well. But i know how to set vectorization-enabled in sql

Image

this is my data file

data.zip

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions