
Scala Spark vs Python PySpark: Which is better?
Apache Spark code can be written with the Scala, Java, Python, or R APIs. Scala and Python are the most popular APIs. This blog post performs a detailed comparison of […]
Apache Spark code can be written with the Scala, Java, Python, or R APIs. Scala and Python are the most popular APIs. This blog post performs a detailed comparison of […]
This post explains how to perform type 2 upserts for slowly changing dimension tables with Delta Lake. We’ll start out by covering the basics of type 2 SCDs and when […]
Datasets are available to Spark Scala/Java users and offer more type safety than DataFrames. Python and R infer types during runtime, so these APIs cannot support the Datasets. This post […]
This post shows how to create beginningOfMonthDate and endOfMonthDate functions by leveraging the native Spark datetime functions. The native Spark datetime functions are not easy to use, so it’s important […]
You can use native Spark functions to compute the beginning and end dates for a week, but the code isn’t intuitive. This blog post demonstrates how to wrap the complex […]
This post explains how to wrap a Java library with a Scala interface. You can instantiate Java classes directly in Scala, but it’s best to wrap the Java code, so […]
This blog post shows how to serialize and deserialize Scala case classes with the JSON file format. Serialization is important when persisting data to disk or transferring data over the […]
This blog post explains how to read and write JSON with Scala using the uPickle / uJSON library. This library makes it easy to work with JSON files in Scala. […]
Writing open source software gives you the opportunity to collaborate with highly motivated developers and build awesome code that’s used by folks around the world. You can influence community best […]
Basic filesystem operations have traditionally been complex in Scala. A simple operation like copying a file is a one-liner in some languages like Ruby, but a multi-line / multi-import mess […]