How to Use UDF in Spark Without Register Them
This article will show a code snippet on how to use UDF in Spark without registering them.
Join the DZone community and get the full member experience.
Join For FreeHere, we will demonstrate the use of UDF via a small example.
Use Case: We need to change the value of an existing column of DF/DS to add some prefix or suffix to the existing value in a new column.
// code snippet how to create UDF in Spark without registering them
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val rowKeyGenerator = udf((n: String) =>
{
val r = scala.util.Random
val randomNB = r.nextInt( (100) ).toString()
val deviceNew = randomNB.concat(n)
deviceNew
}, StringType)
// "Name" is column name of type string at source DF.
val ds2=dfFromFile.withColumn("NameNewValue",rowKeyGenerator(col("Name")))
ds2.show()
Note: We can also change the type from String to any other supported type, as per individual requirements. Make sure while developing that we handle null
cases, as this is a common cause of errors. UDFs are a black box for the Spark engine, whereas functions that take a Column
argument and return a Column
are not a black box for Spark. It is always recommended to use Spark's Native API/Expression over UDF's with contrast to performance parameters.
Opinions expressed by DZone contributors are their own.
Comments