How to Use UDF in Spark Without Register Them

This article will show a code snippet on how to use UDF in Spark without registering them.

sourabh mehta

Jun. 03, 20 · Code Snippet

Likes (3)

Comment

Save

7.1K Views

Here, we will demonstrate the use of UDF via a small example.

Use Case: We need to change the value of an existing column of DF/DS to add some prefix or suffix to the existing value in a new column.

// code snippet how to create UDF in Spark without registering them

    Scala
   
x

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val rowKeyGenerator = udf((n: String) =>
{
  val r = scala.util.Random
  val randomNB =  r.nextInt( (100) ).toString()
  val deviceNew = randomNB.concat(n)
  deviceNew
}, StringType)
// "Name" is column name of type string at source DF.
val ds2=dfFromFile.withColumn("NameNewValue",rowKeyGenerator(col("Name")))
ds2.show()

Note: We can also change the type from String to any other supported type, as per individual requirements. Make sure while developing that we handle null cases, as this is a common cause of errors. UDFs are a black box for the Spark engine, whereas functions that take a Column argument and return a Column are not a black box for Spark. It is always recommended to use Spark's Native API/Expression over UDF's with contrast to performance parameters.

Black box Column (database) Snippet (programming) Requirement Strings Scala (programming language) SPARK (programming language) Engine Data Types

Opinions expressed by DZone contributors are their own.

Related

Trending

How to Use UDF in Spark Without Register Them

This article will show a code snippet on how to use UDF in Spark without registering them.

Related

Partner Resources