Speaker: Arun Manivannan (ThoughtWorks)
Arun is a data engineer at ThoughtWorks.
He is a curious and passionate engineer with extensive experience in the architecture and development of highly scalable distributed applications using Scala, Hadoop, Akka and Spark. He is the author of “Scala Data Analysis Cookbook” and co-author of “Scala: Guide for Data Science Professionals”, which focuses on Spark, Spark ML and Hadoop for solving various analytics problems.
Architecting well-rounded and evolvable data platforms
More and more organizations realise the value in data – be it theirs or acquired. Datalakes serve as a unified store for all their structured, unstructured and semi-structured data. Once we see a need for a unified data lake, we generally go about identifying our initial set of data sources, crank up a pipeline and dump the data into the lake. Only to figure out that no one’s using it or not using it to the extent that we expect.
We realise, at a later point, that there’s a whole lot of things that we forgot while building the lake – metadata management, lineage tracking, reconciliation, retention, security or even data formats. This presentation aims to cover these cross-cutting concerns that every data lake must have in order to make it accessible.
Event Website: https://voxxeddays.com/singapore/