spark

Spark Journal : Using Scala CASE construct with DataFrame API

This post is very much similar to my last post on using Scala IF construct with DataFrame API in spark. The only difference is, we will replace the IF construct with CASE construct in this case.

So, the task is same.
Problem Statement : We want to assign a new column to dataframe, dynamically, if the flag for getting this column data is true, show -1, else show -2.

Approach 1
Using the variable and assigning the value and then using the value directly in API

val getColorCode = true 
val dataList1 = List((1,"red"),(2,"blue"))
val df1 = dataList1.toDF("id","Name")
var colorcd : String = "1"

val colorcd = getColorCode match {
 case true => -1
 case false => -2
}

val df = df1.select(col("id"), col("Name"), lit(colorcd).as("colorcd"))

df.show

+---+----+-------+
| id|Name|colorcd|
+---+----+-------+
|  1| red|     -1|
|  2|blue|     -1|
+---+----+-------+

Approach 2
Instead of using the variable, we use the case clause directly in DataFrame API, yes, as I said, this is possible.

val getColorCode = true 
val dataList1 = List((1,"red"),(2,"blue"))
val df1 = dataList1.toDF("id","Name")
var colorcd : String = "1"

val df = df1.select(col("id"), col("Name"), lit(getColorCode match {
 case true => -1
 case false => -2
}).as("colorcd"))
 
df.show

+---+----+-------+
| id|Name|colorcd|
+---+----+-------+
|  1| red|     -1|
|  2|blue|     -1|
+---+----+-------+

Approach 3
Its more of an improvement to approach 2, here we use lit keyword in a separate way, the advantage is, we can either opt for any other transformation in the CASE Clause.
Below, we have intentionally set the flag to false and in the case clause, we are just using col(“id”) as another transformation instead of regular lit, that we have been using.
Finally the dataframe shows 3 columns, where 1st and 3rd column are exactly same.

val getColorCode = false 
val dataList1 = List((1,"red"),(2,"blue"))
val df1 = dataList1.toDF("id","Name")
var colorcd : String = "1"

val df = df1.select(col("id"), col("Name"), (getColorCode match {
 case true => lit(-1)
 case false => col("id")
}).as("colorcd"))
 
df.show

+---+----+-------+
| id|Name|colorcd|
+---+----+-------+
|  1| red|      1|
|  2|blue|      2|
+---+----+-------+

CASE command is very powerful command in scala and has unlimited use cases, which it helps to solves. Its one of the most used commands of scala. I will try to cover this command in detail, on the upcoming posts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s