An average data scientist (ML Practitioner, AI expert) spends a significant amount of time designing and running machine learning experiments (and waiting for them to complete). This involves trying out various training algorithms, doing feature engineering, changing preprocessing steps to get more homogeneous data, trying different types of hyperparameters, and testing data with different datasets.
There is a lot that is involved in creating and running experiments. However, the only thing that we seem to be equipped with, in order to keep track of the performance, is the source code of the best-performing experiments. It is for this reason that
we hear the following phrases quite frequently:
“It was working yesterday” – highlighting the commonality in reproducibility of the experiment.
“I don’t remember what the actual scores are but using feature X didn’t help” – documentation issue.
“I fixed a bug but I ran so many previous experiments with that bug” – code dependency issue.
“I am using the same parameters as experiment 4, why is it not working” – reproducibility and documentation issue.
What You'll Learn
- To follow the process that machine learning practitioners and data scientists follow taking python and scikit-learn as a use case, and the recurring issues that we are starting to see with these processes
- Best practices to follow to help reproducibility
- Tools that the startups are working on to fix the gaping issues for machine learning experiment management
Slides on Experiment Management for Machine Learning can be found here.
Dr. Rutu Mulkar - Rutu Mulkar is the founder of Hunchera, and previously the founder of Ticary Solutions (acquired by Sigmoidal). She received her Ph.D. in Natural Language Processing from USC and has contributed to IBM's Watson system that defeated humans in Jeopardy! She is interested in solving problems related to Natural Language Processing, specifically - Topic Modeling, Recommender Systems, Information Extraction, Semantics, and Search to name a few, and to apply them to various domains such as SEO and healthcare.