Engineers.SG

Published on: Wednesday, 6 July 2016

Speaker: Amit Kapoor

Description
We rarely use the power of visualisation to understand our models better. Model evaluation is largely limited to numerical summaries. Visualising models helps us better understand - shape of the model, impact of parameters on the model, impact of different input data , model fit and where it can be improved. This talk summarizes the learnings and key takeaways when communicating model results

Abstract
For a data scientist building predictive models, the following are important:

How good is the model ?
How good is it compared to competing/alternate models?
Is there a way to identify what worked in the models built so far, to leverage it to build something even better?
The stakeholder/end-user who finally uses the output from the model, for whom the ML process is mostly black-box, is concerned with the following: 1. How to trust the model output? 2. How to understand the drivers? 3. How to do what-if analysis?

The unifying theme that could answer most of the above questions is visualization. The biggest challenge is to find a way to visualize the model, the model fitting process and the impact of drivers. This talk summarizes the learnings and key takeaways when communicating model results.

Even though exploratory data analysis (EDA) is an integral part of the data science pipeline and helps us understand the portrait of the data, we rarely leverage the power of visualisation for understanding our models better. Model evaluation is still largely restricted through numerical summaries. Visualising models can help us better understand - the shape of the model, the impact of parameter on the model, the impact of different input data on the model, the fit of the model and where it can be improved.

Inspired by "Visualising Statistical Models" by Hadley Wickham et.al, several visualisation techniques were tried and presented. In this talk we look at the practical examples of the methods that can help us better understand the model. This includes showing model in data space, plotting multiple models as opposed to just one and exploring the process of fitting (as opposed to final result). This talk summarises the learning and key takeaways.

Most of the visualisations were done using matplotlib, seaborn and bokeh.

Event Page: https://pycon.sg

Produced by Engineers.SG

Help us caption & translate this video!

http://amara.org/v/P6SL/

« Back