Shading Dependencies in Spark Projects with SBT
sbt-assembly makes it easy to shade dependencies in your Spark projects when you create fat JAR files. This blog post will explain why it’s useful to shade dependencies and will […]
sbt-assembly makes it easy to shade dependencies in your Spark projects when you create fat JAR files. This blog post will explain why it’s useful to shade dependencies and will […]
Spark SQL functions make it easy to perform DataFrame analyses. This post will show you how to use the built-in Spark SQL functions and how to build your own SQL […]
Spark DataFrames are similar to tables in relational databases – they store data in columns and rows and support a variety of operations to manipulate the data. Here’s an example […]
Spark codebases can easily become a collection of order dependent custom transformations (see this blog post for background on custom transformations). Your library will be difficult to use if many […]
Spark Structured Streaming and Trigger.Once can be used to incrementally update Spark extracts with ease. An extract that updates incrementally will take the same amount of time as a normal […]
JitPack is a package repository that provides easy access to your Spark projects that are checked into GitHub. JitPack is easier to use than Maven for open source projects and […]
Logistic regression models are a powerful way to predict binary outcomes (e.g. winning a game or surviving a shipwreck). Multiple explanatory variables (aka “features”) are used to train the model […]
Environment config files return different values for the test, development, staging, and production environments. In Spark projects, you will often want a variable to point to a local CSV file […]
Spark JAR files let you package a project into a single file so it can be run on a Spark cluster. A lot of developers develop Spark code in brower […]
The Spark rlike method allows you to write powerful string matching algorithms with regular expressions (regexp). This blog post will outline tactics to detect strings that match multiple different patterns […]