Limiting Order Dependencies in Spark Functions
Spark codebases can easily become a collection of order dependent custom transformations (see this blog post for background on custom transformations). Your library will be difficult to use if many […]
Spark codebases can easily become a collection of order dependent custom transformations (see this blog post for background on custom transformations). Your library will be difficult to use if many […]
Spark Structured Streaming and Trigger.Once can be used to incrementally update Spark extracts with ease. An extract that updates incrementally will take the same amount of time as a normal […]
JitPack is a package repository that provides easy access to your Spark projects that are checked into GitHub. JitPack is easier to use than Maven for open source projects and […]
Logistic regression models are a powerful way to predict binary outcomes (e.g. winning a game or surviving a shipwreck). Multiple explanatory variables (aka “features”) are used to train the model […]
Environment config files return different values for the test, development, staging, and production environments. In Spark projects, you will often want a variable to point to a local CSV file […]
Spark JAR files let you package a project into a single file so it can be run on a Spark cluster. A lot of developers develop Spark code in brower […]
The Spark rlike method allows you to write powerful string matching algorithms with regular expressions (regexp). This blog post will outline tactics to detect strings that match multiple different patterns […]
The spark-slack library can be used to speak notifications to Slack from your Spark programs and handle Slack Slash command responses. You can speak Slack notifications to alert stakeholders when […]
The uTest Scala testing framework can be used to elegantly test your Spark code. The other popular Scala testing frameworks (Scalatest and Specs2) provide multiple different ways to solve the […]