I recently started learning Scala language, along with the Spark framework, when working on our big data stack. Having not much experience with Java, it is a challenge to learn the fundamentals, but I am still learning and its a long way to go.
Publishing small bits of useful information to everyone, so that beginners like me, can find it useful.
So the task in spark was to create empty dataframes (Won’t go into dataframe details, for now, as even I am learning the stuff). If you have worked with Pandas framework in Python, you should be acquainted with Dataframe term.
If you still don’t understand it, for quick understanding, think of it as a 2 dimensional table, which stores the data in memory (can be extended to disk).
Creating a dataframe, this task is usually done by reading the data files in whichever format. But we had to create an empty dataframe, for this we used the below approach.
Using Case class
Case class, this is very frequently used construct in Scala language. we will use case class to define the schema of dataframe.
Here we are using Seq keyword, which means we are asking to create a empty sequence in scala and then convert it into the schema mentioned in case class, finally converting to a DataFrame, using the toDF method, which is spark Framework API for creating dataframes from Sequences, List.
case class model (id : Int, Name : String, marks : Double) val emptyDf = Seq.empty[model].toDF emptyDf.show +---+----+-----+ | id|Name|marks| +---+----+-----+ +---+----+-----+
So, you can define the schema, you need and then create an empty dataframe as above.