-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Apache Iceberg version
1.7.1
Query engine
Spark
Please describe the bug 🐞
Hello, I'd like to get some help on compaction OOM in Spark. Calling rewrite_data_files
on a relatively large partition in an AWS Glue 5.0 (Spark 3.5.4 and Iceberg 1.7.1) Spark job with 2 R.8X workers (256GB each) always gets java.lang.OutOfMemoryError. This issue is similar to a closed issue #10054
CALL system.rewrite_data_files(
table => 'some_db.some_table', where => "(partition_id = 'some_partition_id')",
strategy => 'binpack', options => map(
'partial-progress.enabled','true', 'rewrite-job-order','bytes-asc',
'target-file-size-bytes','134217728', 'partial-progress.max-commits','50',
'partial-progress.max-failed-commits','1000', 'max-file-group-size-bytes','536870912',
'max-concurrent-file-group-rewrites','1', 'min-input-files','10'
)
)
Partition:
partition=Row(Row(partition_id='some_partition_id'), spec_id=0, record_count=45984621, file_count=589, position_delete_record_count=5, position_delete_file_count=271, equality_delete_record_count=17, equality_delete_file_count=585)
Stacktrace:
WARN 2025-07-24T20:57:01,498 226477 org.apache.spark.scheduler.TaskSetManager [task-result-getter-1] 72 Lost task 2.0 in stage 1.0 (TID 3) (172.34.121.38 executor 1): java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.HashMap.resize(HashMap.java:702)
at java.base/java.util.HashMap.putVal(HashMap.java:661)
at java.base/java.util.HashMap.put(HashMap.java:610)
at java.base/java.util.HashSet.add(HashSet.java:221)
at org.apache.iceberg.util.StructLikeSet.add(StructLikeSet.java:102)
at org.apache.iceberg.util.StructLikeSet.add(StructLikeSet.java:32)
at org.apache.iceberg.relocated.com.google.common.collect.Iterators.addAll(Iterators.java:366)
at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:333)
at org.apache.iceberg.data.BaseDeleteLoader.loadEqualityDeletes(BaseDeleteLoader.java:110)
at org.apache.iceberg.data.DeleteFilter.applyEqDeletes(DeleteFilter.java:190)
at org.apache.iceberg.data.DeleteFilter.eqDeletedRowFilter(DeleteFilter.java:220)
at org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader$ColumnBatchLoader.applyEqDelete(ColumnarBatchReader.java:230)
at org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader$ColumnBatchLoader.loadDataToColumnBatch(ColumnarBatchReader.java:104)
at org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader.read(ColumnarBatchReader.java:72)
at org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader.read(ColumnarBatchReader.java:44)
at org.apache.iceberg.parquet.VectorizedParquetReader$CachedFileIterator.next(VectorizedParquetReader.java:272)
at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:171)
at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1$$Lambda$1336/0x00007f6230b92000.apply(Unknown Source)
at scala.Option.exists(Option.scala:376)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:97)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hasNext(Unknown Source)
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
jamesdrabinsky
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working