How to build stream data pipeline with Apache Kafka and Spark Structured Streaming - PyCon SG 2019

Published on: Monday, 11 November 2019

Speaker: Takanori Aoki, Data Scientist, HOOQ

Objective: Main purpose of this session is to help audience be familiar with how to develop stream data processing application by Apache Kafka and Spark Structured Streaming in order to encourage them to start playing with these technologies. Description: In Big Data era, massive amount of data is generated at high speed by various types of devices. Stream processing technology plays an important role so that such data can be consumed by realtime application. In this talk, Takanori will present how to implement stream data pipeline and its application by using Apache Kafka and Spark Structured Streaming with Python. He will be elaborating on how to develop application rather than explaining system architectural design in order to help audience be familiar with stream processing implementation by Python. Takanori will introduce examples of application using Tweet data and pseudo-data of mobile device. In addition, he will also explain how to integrate streaming data into other data store technologies such as Apache Cassandra and Elasticsearch. Note: - Python codes to build these applications will be uploaded on GitHub.

About the speaker:

Takanori Aoki is working as a Data Scientist developing data-driven solution to provide better customer experience in on-demand video streaming service. He has been using Python for 3 years to conduct exploratory data analysis, develop production ETL pipeline, and build machine learning model. He built recommendation functionality for movies and tv shows by using Python as a production system. He is interested in not only machine learning algorithm but also data engineering and software engineering in order to build robust production system. LinkedIn profile https://sg.linkedin.com/in/takanori-aoki-7900a438

Event Page: https://pycon.sg/

Produced by Engineers.SG

Organization