Incremental Data Processing with Apache Spark on Azure HDInsight - PyDataSG

Published on: Tuesday, 8 November 2016

Speaker: Rita Zhang

Synopsis: Social media, like Facebook and Twitter, have data feeds that contain a wealth of information that can aid in trend discovery. We recently worked with the United Nations to parse incoming social feed information to enable the UN to watch for trending keywords that could alert them to potential humanitarian crises like food shortages and terrorist attacks. One typical pattern for efficiently processing evolving datasets like this is to process the new slice of data incrementally and merge the results with previous results. In this talk, I will discuss how you can use incremental processing to make your data pipeline more efficient.

Speaker: Rita Zhang ( is an Open Source Engineer at Microsoft, based in San Francisco, hacking away with engineering teams, open source communities, and startups using emerging open source technologies, and sharing technical collaterals with the developer community. During her spare time, she develops new smart home gadgets for her startup.

Event Page:

Produced by Engineers.SG

Help us caption & translate this video!