Koalas: Pandas API on Apache Spark - PyCon SG 2019

Published on: Monday, 11 November 2019

Speaker: Ben Sadeghi, Solutions Architect, Databricks

Pandas is the de facto standard (single-node) DataFrame implementation in Python, while Spark is the de facto standard for big data processing. With the recently open-sourced Koalas package, you can be immediately productive with Spark, with no learning curve, if you are already familiar with pandas, and have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets). In this talk, we'll go through the basics of Koalas, along with demos.

About the speaker:

Ben Sadeghi is a Partner Solutions Architect at Databricks, covering Asia Pacific and Japan, focusing on Microsoft and its partner ecosystem. Having spent several years with Microsoft as a Big Data & Advanced Analytics Technology Specialist, he has helped various companies and partners implement cloud-based, data-driven, machine learning solutions on the Azure platform. Prior to Databricks and Microsoft, Ben was engaged as a data scientist with Hadoop/Spark distributor MapR Technologies (APAC), developed internal and external data products at Wego.com, a travel meta-search site, and worked in the Internet of Things domain at Jawbone, where he implemented analytics and predictive applications for the UP Band physical activity monitor. Before moving to the private sector, Ben contributed to several NASA and JAXA space missions. Ben has been a user of Python for over 10 years, and is an active member of the open-source Julia language community. He holds an M.Sc. in computational physics, with an astrophysics emphasis.

Event Page: https://pycon.sg/

Produced by Engineers.SG