Archive

Archive for August, 2020

Spark error – Parquet does not support decimal. See HIVE-6384

August 5, 2020 1 comment

 
I was creating a Hive table in Databricks Notebook from a Parquet file located in Azure Data Lake store by following command:

val df = spark.read.parquet(
 "abfss://adlsstore@MyStorageAccount.dfs.core.windows.net/x/y/z/*.parquet")

df.write.mode("overwrite").saveAsTable("tblOrderDetail")

But I was getting following error:

warning: there was one feature warning; re-run with -feature for details
java.lang.UnsupportedOperationException: Parquet does not support decimal. See HIVE-6384

 
As per the above error it relates to some Hive version conflict, so I tried checking the Hive version by running below command and found that it is pointing to an old version (0.13.0). This version of Hive metastore did not support the BINARY datatypes for parquet formatted files.

spark.conf.get("spark.sql.hive.metastore.version")


 

Also as per this Jira Task on HIVE-6384 the support for multiple datatypes was implemented for Parquet SerDe in Hive 1.2.0 version.

 
So to update the Hive metastore to the current version you just need to add below commands in the configuration of the cluster you are using.

Click on “Clusters” –> click “Edit” on the top –> expand “Advanced Options” –> under “Spark” tab and “Spark Config” box add the below two commands:

spark.sql.hive.metastore.version 1.2.1
spark.sql.hive.metastore.jars builtin

You just need to restart the cluster so that the new settings are in use.
 


 

Some similar errors:
– Parquet does not support date
– Parquet does not support timestamp