Builded data like this:df.repartition(20, col("id")).write.parquet(path)
When filter like this: filter(col("id") === 123), we can prune 19 repartition files, without any overhead.
It's very simple to implement, we needn't create the index, just call the same hash function that Dataset#repartition used and get the specified file in listFilesWithIndexSupport.
I almost have done with that, but I have a little concern about the entry point that enables this(Now we'll create the index when found there's no index, seems no perfect way to Inject this, or implement a new MetastoreSupport).
And if you are OK about this feature, I can give a PR first, looking forward to your kind advice.