Speaker: Weimin Wang
Synopsis: A binary classification problem (products recommendation) using PySpark on hadoop platform is presented. Specifically, presentation using ipython notebook will go through details such as - 1) data pre-processing, 2) Using mllib random forest classifier for binary classification, 3) Measuring performance using AUC score, 4) Different strategies to handle the problem of unbalanced dataset
Speaker: Weimin Wang - works as Data Scientist in Merck Singapore. During his job, he focuses on Advanced Analytics and Bioinformatics Research. With solid knowledge in Data Mining and Machine Learning. Weimin is also actively involved in Data Science competitions like Kaggle and Data Science Game. His interests lie in Machine Learning, Deep Learning and Natural Language Processing.
Produced by Engineers.SG
Help us caption & translate this video!