The Spark Column class defines a variety of column methods that are vital for manipulating DataFrames. This blog post demonstrates how to instantiate Column objects and covers the commonly used […]

Spark Datasets / DataFrames are filled with null values and you should write code that gracefully handles these null values. You don’t want to write code that thows NullPointerExceptions – […]

Spark supports DateType and TimestampType columns and defines a rich API of functions to make working with dates and times easy. This blog post will demonstrates how to make DataFrames […]

Spark broadcast joins are perfect for joining a large DataFrame with a small DataFrame. Broadcast joins cannot be used when joining two large DataFrames. This post explains how to do […]