Splitting Large CSV files with Python
This blog post demonstrates different approaches for splitting a large CSV file into smaller CSV files and outlines the costs / benefits of the different approaches. TL;DR It’s faster to […]
This blog post demonstrates different approaches for splitting a large CSV file into smaller CSV files and outlines the costs / benefits of the different approaches. TL;DR It’s faster to […]
Poetry makes it easy to install Pandas and Jupyter to perform data analyses. Poetry is a robust dependency management system and makes it easy to make Python libraries accessible in […]
Directed Acyclic Graphs (DAGs) are a critical data structure for data science / data engineering workflows. DAGs are used extensively by popular projects like Apache Airflow and Apache Spark. This […]
pyenv lets you manage multiple versions of Python on your computer. This blog post focuses on how pyenv uses the shim design pattern to provide a wonderful user experience (it […]
This blog post shows how to convert a CSV file to Parquet with Pandas, Spark, PyArrow and Dask. It discusses the pros and cons of each approach and explains how […]