ABOUT THE TALK
Traveloka is an app that provides a wide range of travel-related products and services, such as flights, hotels, apartments, theme parks, and even international roaming packages. Having a wide-ranging business makes data modeling particularly challenging: it is like building many data warehouses for different business flows in one place.
In order to address that, we developed a modeling method and framework that enables us to model the data across business units, and ensure data is uniform across the board so that data scientists can make sense out of it across all products and services.
We developed a data model schema with an inheritance and business glossary concept. The concept enforces uniformity of the data and consistent definitions across all our products. The schema enables data architects to model data schemas, data analysts to describe data definitions, data governance specialists to protect personal data, and data engineers to define cleansing rules, all in one place!
The framework is built on top of Python Apache Beam and currently runs on GCP DataFlow. Building on Apache Beam enables us to run the very same framework on our batch and streaming pipeline. The framework is inspired by JSON schema and BigQuery schema. We call it NeoDDL.
ABOUT THE SPEAKERs
Rendy is currently a Data System Architect at Traveloka. He built Traveloka's data pipeline from scratch and managed to handle a 10,000x growth of data. He also established a batch and realtime data platform which powers organization insights and serves data-intensive application use cases. He is currently focusing on solving data modelling and processing challenges that Traveloka faces as a Travel SuperApp with various businesses. Last but not least, he is a devoted dad to a cute daughter, and aspiring to contribute to environmental informatics.
Joshua has been a Data Engineer at Traveloka since 2018, where he is developing a framework to create and manage end-to-end data warehouse pipelines. He holds a bachelor degree in computer science from Nanyang Technological University.
ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL: