Building an analytics data pipeline using Airflow and PySpark - PyCon SG 2019

Published on: Monday, 11 November 2019

Speaker: Yohei Onishi, Data Engineer

I have been working on building analytics data pipeline for logistics process in retail industry using Airflow and Spark. I have used Python to author the data pipeline. In this session I will talk about real world use case of airflow including overview of Airflow, how to create your own Airflow cluster, how to integrate Airflow with Spark and how to reduce operation cost using Cloud Composer (managed Airflow cluster service). Note: this talk is based on my talk at PyCon PH 2019 but I will explain more about Airflow and Spark integration.

About the speaker:

Yohei is a data engineer at a global retail company. He has been working on building analytics data pipeline using Apache Airflow recently. He likes OSS and cloud services for data engineering such as Airflow, Spark, Beam and BigQuery

Event Page: https://pycon.sg/

Produced by Engineers.SG

Organization