spark

Spark Journal : Using alias for column names on dataframes

If you have already referred to my previous article on using the SELECT API on Dataframes in Spark Framework, this is more of a continuation to the same.
Many times, we come across scenarios where we need to use alias for proper representation of columns in a datafrrame. I know, if given a choice, you would opt for writing a SELECT SQL statement over the dataframes and use column alias the same conventional way. Yes, this is possible with Spark Dataframes easily.

However, I am coming out of comfort zone and trying to write the complete SELECT Statement using SELECT API on dataframes. So how will you add column aliases to Dataframes, while using alias.

Approach 1 : Using WithColumnRenamed

val dataList = List((1,"abc"),(2,"def"))
val df = dataList.toDF("id","Name")

df.select("*").withColumnRenamed("id","unique id").show

+---------+----+
|unique id|Name|
+---------+----+
|        1| abc|
|        2| def|
+---------+----+

Approach 2 : Using alias keyword

val dataList = List((1,"abc"),(2,"def"))
val df = dataList.toDF("id","Name")

df.select(col("id").alias("unique id")).show

+---------+
|unique id|
+---------+
|        1|
|        2|
+---------+

df.select(col("id").as("unique id"), col("Name").as("Actual Name")).show
+---------+-----------+
|unique id|Actual Name|
+---------+-----------+
|        1|        abc|
|        2|        def|
+---------+-----------+

Approach 3 : Using as keyword

val dataList = List((1,"abc"),(2,"def"))
val df = dataList.toDF("id","Name")

df.select(col("id").as("unique id"), col("Name").as("Actual Name")).show
+---------+-----------+
|unique id|Actual Name|
+---------+-----------+
|        1|        abc|
|        2|        def|
+---------+-----------+

df.select($"id".as("unique id"), $"Name".as("Actual Name")).show
+---------+-----------+
|unique id|Actual Name|
+---------+-----------+
|        1|        abc|
|        2|        def|
+---------+-----------+

Approach 4 : Using name keyword

val dataList = List((1,"abc"),(2,"def"))
val df = dataList.toDF("id","Name")

df.select(col("id").as("unique id"), col("Name").name("Actual Name")).show

+---------+-----------+
|unique id|Actual Name|
+---------+-----------+
|        1|        abc|
|        2|        def|
+---------+-----------+

There are some more ways of doing this the efficient way.
In the next article, I will try to cover, how to add column aliases dynamically , when there are many columns that needs to be aliased.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s