-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Thanks for creating this project to index parquet tables. I tried to create the index metastore in s3 with the following default
spark.sql.index.metastore=s3a://xx/yy/index_metastore". I got an exception that said
java.lang.IllegalArgumentException: Wrong FS: s3a://xxx/index_metastore/catalog/parquet/s3a/yyy/part-f-00002, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:518)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:694)
at org.apache.spark.sql.execution.datasources.parquet.ParquetStatisticsRDD$$anonfun$1.apply(ParquetStatisticsRDD.scala:141)
at org.apache.spark.sql.execution.datasources.parquet.ParquetStatisticsRDD$$anonfun$1.apply(ParquetStatisticsRDD.scala:138)
I updated
https://github.com/lightcopy/parquet-index/blob/master/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetStatisticsRDD.scala#L122 to:
val filterURI = new URI(configuration.get(ParquetMetastoreSupport.FILTER_DIR))
val fs = FileSystem.get(filterURI, configuration)
And it seems to be working fine. But just wanted to let you know and get your advice on whether that is the right fix.