Okay, so you’ve looked at sparkhello
; now what? How can I extend and make use of some scala code!
Simple example:
def addOne(df:DataFrame) : DataFrame = {
val colname: String = "test"
df.withColumn("test1", df("test") + 1)
}
How do we add this in R?
#' @import sparklyr
#' @export
spark_addOne <- function(df) {
sdf_register(
sparklyr::invoke_static(spark_connection(df), "SparkHello.HelloWorld", "addOne",
spark_dataframe(df))
)
}
Important to enforce the spark_dataframe
object, what is interesting is that we don’t need to specify “sc” as it is given based on spark_connection
Naturally we can add parameters to this:
def addOneCols(df:DataFrame, inputcolname: String, outputcolname:String) : DataFrame = {
df.withColumn(outputcolname, df(inputcolname) + 1)
}
#' @import sparklyr
#' @export
spark_addOneCols <- function(df, input_col, output_col) {
sdf_register(
sparklyr::invoke_static(spark_connection(df), "SparkHello.HelloWorld", "addOneCols",
spark_dataframe(df),
ensure_scalar_character(input_col),
ensure_scalar_character(output_col))
)
}
Which begs the question; how can we automatically generate R packages if we are writing scala applications?