Devin Petersohn

Senior Software Engineer

Devin focuses on building solutions for data scientists and engineers. He has a passion for distributed systems and big data technologies. Devin is the creator of Modin, an open source library that scales pandas workflows by leveraging distributed execution engines. He is dedicated to helping users leverage the full potential of their data through robust and scalable platforms.

Open Source

Beyond toPandas(): Streaming PySpark Data to DuckDB via Apache Arrow

Bypass Spark’s toPandas() memory bottleneck by streaming PySpark DataFrames to DuckDB via Apache Arrow using the PyCapsule interface for efficient, zero-copy data-transfer

Devin Petersohn

FEB 20, 2026|3 min read

MORE POSTSFROM Devin Petersohn

Data Engineering

Building an Apache Spark™ Connect Server Powered by Snowflake

Learn how Snowflake built Snowpark Connect for Apache Spark, including technical insights, best practices, and engineering lessons from our team.

Devin Petersohn

OCT 06, 2025|9 min read

Devin Petersohn

Beyond toPandas(): Streaming PySpark Data to DuckDB via Apache Arrow

MORE POSTSFROM Devin Petersohn

Building an Apache Spark™ Connect Server Powered by Snowflake

Where Data Does More