How Traveloka's Runs Cloud-Scale Apache Spark in Production Since 2017

Published on: Wednesday, 29 May 2019

"How Traveloka's Runs Cloud-Scale Apache Spark in Production Since 2017" - in this Level 301 knowledge transfer, Traveloka's Data Engineering and Data Science team will share how the staff submit their cloud-scale Spark jobs today. Discussion of pros/cons, integration of Apache Spark with CI/CD components, Schedulers, Airflow, Key Management Systems (KMS), templates. Journey will start at historic event of a self-managed Spark cluster on-premise, and talk through adoption of AWS EMR, Qubole, Databricks, and Dataproc. How multiple back-end data sets helped transform Traveloka from meta-search engine to fully integrated On-Line Travel Booking agency, and one of top Indonesian Unicorn startups!

Speakers for Talk #1:
• Nisrina Luthfiyati joined Traveloka Data Team as as Software Engineer in 2014. She has been (and still is) working on the various infrastructures, platforms, and libraries that make up Traveloka's data processing pipelines and storages.
• Andri Lauw joined Traveloka in 2018. He is currently working on various data infrastructure and platform in Traveloka and also worked on similar thing within his previous role. His main interests lie in distributed systems, and at present deal extensively on end-to-end general data development/processing platform.
• Didik Achmadi joined Traveloka in 2017. He currently manage few teams working on a number of areas, ranging from cloud management, data infrastructure, and data engineering works related to all business units within traveloka. Previously, he was working on various engineering team for visual effects/animation, telco, and healthcare industry.

Event Page: https://www.meetup.com/Spark-Singapore/events/261637175/

Produced by Engineers.SG
Recorded by: Michael Cheng

Help us caption & translate this video!

https://amara.org/v/pGgY/