A Machine Learning Data Pipeline - PyData SG

Published on: Tuesday, 10 May 2016

Using Luigi and Scikit-Learn to create a Machine Learning Pipeline which trains a model and predict through a Rest API

Speaker: Atreya Biswas

Synopsis: A Machine Learning Pipeline can be broadly thought of as many tasks which includes - Data Ingestion - Data Cleaning - Feature Extraction - Training Models - Hyper Parameter Optimization - Model Evaluation - Model Deployment. Luigi is Spotify's open sourced Python framework for batch data processing including dependency resolution, workflow resolution, visualisation, handling failures and monitoring. Scikit-Learn is the most popular and widely used Machine Learning Library in Python. We will demonstrate how Luigi and Scikit-Learn can be used to orchestrate the Machine Learning Tasks, hence creating a cohesive Machine Learning Pipeline.

Speaker: Atreya is currently working as a Data Scientist for Pocketmath, a Digital Advertisement buying platform with Real Time Bidding. In his day to day life he has to process TBs of data using Hadoop, Spark and apply machine learning techniques. Prior to joining Pocketmath, he was pursuing his Master's in Enterprise Business Analytics from National University Of Singapore and also working as a Machine Learning Associate with Newcleus, a CRM Data Analytics Platform. At Newcleus, he has been responsible to productise a Machine Learning platform which ingests CRM data from Salesforce, apply cleaning and Machine Learning. Further his final year thesis was in association with Dailymotion, a video platform for web and mobile. At Dailymotion he was exposed to the world of Natural Language Processing and Text Mining on Twitter data to improve their existing recommendation system using Twitter trending topics. He has an experience of 2 years with SAP Labs in the Research and Development team creating Enterprise Applications in the Mobile and Big Data Space. He has been using Python now for almost 2.5 years for data analysis and backend development. Some of the libraries which he uses in his day to day task are - numpy, scipy, pandas, scikit-learn, luigi, hyperopt, flask etc.

Apart from work and technology he is a Football aficionado, love travelling to new places, read comics and an amateur wine connoisseur.

Event Page: http://www.meetup.com/PyData-SG/events/227687789/

Produced by Engineers.SG

Help us caption & translate this video!