[spark] FIXED_LEN_BYTE_ARRAY 오류

[spark] FIXED_LEN_BYTE_ARRAY 오류

Study/spark 2024. 10. 16. 10:19
parquet 파일을 read 하다 보면 아래와 같은 오류가 발생할 수 있다.

발생원인으로는 parquet 파일 내 decimal type의 컬럼이 존재할 경우 간혹 발생할 수 있다.

parquet 파일 read 시 vector parquet reader가 활성화 되면서 binary type으로 decording 하는데

이때, decimal type이 존재하면 오류가 발생하는 것이다.

error message

org.apache.spark.SparkException: Task failed while writing rows. Caused by: com.databricks.sql.io.FileReadException: Error while reading file s3://bucket-name/landing/edw/xxx/part-xxxx-tid-c00.snappy.parquet. Parquet column cannot be converted. Column: [Col1], Expected: DecimalType(10,0), Found: FIXED_LEN_BYTE_ARRAY Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException.

solution

vector parquet reader 옵션을 비활성화 하면 된다.

spark.conf.set("spark.sql.parquet.enableVectorizedReader","false")

출처

https://kb.databricks.com/scala/spark-job-fail-parquet-column-convert

Apache Spark job fails with Parquet column cannot be converted error

Problem You are reading data in Parquet format and writing to a Delta table when you get a Parquet column cannot be converted error message. The cluster is

kb.databricks.com

https://dataninjago.com/2021/12/12/databricks-deep-dive-4-vectorised-parquet-reading/

Parquet for Spark Deep Dive (4) – Vectorised Parquet Reading

In this blog post, I am going to dive into the vectorised Parquet file reading in Spark. Vectorised Parquet file reader is a feature added since Spark 2.0. Instead of reading and decoding a row at …

dataninjago.com
'Study > spark' 카테고리의 다른 글

[pyspark] RollUp 사용하기 (0) 2024.02.26
관련글 관련글 더보기
- [pyspark] RollUp 사용하기
댓글

ABOUT ME

ssoondata ssoondata

error message

solution

'Study > spark' 카테고리의 다른 글

티스토리툴바

ABOUT ME

error message

solution

'Study > spark' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바