Building Spark JAR Files with SBT
Spark JAR files let you package a project into a single file so it can be run on a Spark cluster. A lot of developers develop Spark code in brower […]
Spark JAR files let you package a project into a single file so it can be run on a Spark cluster. A lot of developers develop Spark code in brower […]
The Spark rlike method allows you to write powerful string matching algorithms with regular expressions (regexp). This blog post will outline tactics to detect strings that match multiple different patterns […]
The spark-slack library can be used to speak notifications to Slack from your Spark programs and handle Slack Slash command responses. You can speak Slack notifications to alert stakeholders when […]
The uTest Scala testing framework can be used to elegantly test your Spark code. The other popular Scala testing frameworks (Scalatest and Specs2) provide multiple different ways to solve the […]
PySpark code should generally be organized as single purpose DataFrame transformations that can be chained together for production analyses (e.g. generating a datamart). This blog post demonstrates how to monkey […]
implicit classes or the Dataset#transform method can be used to chain DataFrame transformations in Spark. This blog post will demonstrate how to chain DataFrame transformations and explain why the Dataset#transform […]