spark

Spark Journal : Change the data type of columns in dataframe.

Hey there, recently, we found a functional issue in our parquet files, where the data type of columns was not accurate. The datatype was supposed to be a DECIMAL with some precision and scale, but it was found to be mix and match of String and Double.
The Challenge at hands was to make sure the columns have accurate datatype as needed.

How to make sure we assign the right data type to column in dataframe
We will take an example of converting a integer column to DECIMAL here.

Approach 1
If going with Spark SQL option, use CAST and convert to required data type.
the Dataframe reflects the correct data type

val df = spark.sql("""SELECT  CAST(1 as DECIMAL(14,4)) AS id union SELECT  CAST(2 as DECIMAL(14,4)) AS id""")
df.printSchema
df.show

df: org.apache.spark.sql.DataFrame = [id: decimal(14,4)]
root
 |-- id: decimal(14,4) (nullable = false)

+------+
|    id|
+------+
|1.0000|
|2.0000|
+------+

Approach 2

If going with Spark DataFrame API, we can still use cast method directly as below.

import org.apache.spark.sql.types.{DecimalType}
val dataList1 = List((1,"abc",99),(2,"def",99),(2,"def",99),(2,"def",99),(2,"def",99))
val df1 = dataList1.toDF("id","Name","Marks").select(col("Marks").cast(DecimalType(14,4)).alias("Marks_changed"), col("Marks"))
df1.printSchema
df1.show

import org.apache.spark.sql.types.DecimalType
dataList1: List[(Int, String, Int)] = List((1,abc,99), (2,def,99), (2,def,99), (2,def,99), (2,def,99))
df1: org.apache.spark.sql.DataFrame = [Marks_changed: decimal(14,4), Marks: int]
root
 |-- Marks_changed: decimal(14,4) (nullable = false)
 |-- Marks: integer (nullable = false)

+-------------+-----+
|Marks_changed|Marks|
+-------------+-----+
|      99.0000|   99|
|      99.0000|   99|
|      99.0000|   99|
|      99.0000|   99|
|      99.0000|   99|
+-------------+-----+

For other data types available with Spark API, refer to this link https://spark.apache.org/docs/latest/sql-reference.html.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s