spark

Spark Journal : Converting a dataframe to List

This one is going to be a very short article.
We will cover on how to use the Spark API and convert a dataframe to a List.
The converted list is of type <row>. We can iterate over it normally and do any kind of List operations as done on regular lists.

Suppose, you have a use case, where dataframe needs to be converted to a list. You can do it easily using the below approach
Here we use the collect and toList method in sequence.
Collect : returns all elements of dataframe as an array, so every row is returned as one element of the array here.
Quick Tip : Make sure, the data passed to collect is not huge, as collect is an operation done by driver program and consumers resources on driver node. If the data is huge, the collect method may throw OOM issue. Ideally collect is used after filter method on dataframes.
toList : converts the array to type List.

// converting a dataframe to list
val dataList = List((1,"abc"),(2,"def"))
val df = dataList.toDF("id","Name")

val dList = df.collect().toList

dList.foreach{e=> {println(e)}}

[1,abc]
[2,def]

Go ahead and use this method, I am sure this is going to be lot much handy on your daily tasks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s