I set out to evaluate a ML model (emotion classifier) from a human/user perspective. The heart of my attempt was going to be around designing the right set of data to evaluate the performance of the model. Very quickly, I realized that there is more to this task than meets the eye. In this post, I will share several problems […]