Artificial Intelligence and Machine Learning (AI/ML) have become a part of everything we do in our daily work. From personalised recommendations to automated decision-making, these technologies are everywhere. As AI/ML systems become more advanced, it’s crucial to ensure they are reliable and accurate. In this blog, we’ll explore simple and effective testing strategies to help improve these products, making them […]
Metamorphic testing with SHAP Analysis
Fine Tuning Model Evaluation using ROC and Precision Recall curves
Evaluating machine learning models is crucial for understanding their performance characteristics. In this blog post, we explore how ROC and Precision Recall curves can be used to improve the way we evaluate models. Additionally, we delve into the practical aspect of using these curves across various thresholds, customizing the model for specific requirements and achieving optimal performance. Why this post […]
Understanding Text Classification Models with LIME
Why this post? I’ve always been wondering how Machine learning models functions as black boxes, making predictions based on patterns learned from data. Despite the impressive accuracy, understanding the factors and features that influenced a particular prediction and the decision-making process is crucial and challenging task. The lack of transparency in these models adds complexity making their internal workings less […]
Insights and strategies on testing Machine Learning Models
Once a machine learning model is developed and its accuracy and related metrics have been thoroughly examined, it might seem like the model is ready for real-world deployment. However in reality this is hardly the case. Major part of testing begins when the model is integrated into the application it was designed for. We at Qxf2 Services feel most of […]
Data quality matters when building and refining a Classification Model
In the world of machine learning, data takes centre stage. It’s often said that data is the key to success. In this blog post, we emphasise the significance of data, especially when building a comment classification model. We will delve into how data quality, quantity, and biases significantly influence machine learning model performance. Additionally, we’ll explore techniques like undersampling as […]
Build a semantic search tool using FAISS
This post provides an overview of implementing semantic search. Why? Because too often, we notice testers skip testing more complex features like autocomplete. This might be ok in most applications. But in domain specific applications, testing autocomplete capabilities of the product is important. Since testers can benefit from understanding implementation details, in this post, we will look at how autocomplete […]
Data Generation for Text Classification
I set out to evaluate a ML model (emotion classifier) from a human/user perspective. The heart of my attempt was going to be around designing the right set of data to evaluate the performance of the model. Very quickly, I realized that there is more to this task than meets the eye. In this post, I will share several problems […]
Robustness Testing of Machine Learning Models
In the world of machine learning, assessing a model’s performance under real-world conditions is important to ensure its reliability and robustness. Real-world data is usually not perfect, it may contain messy data or data with noise, outliers, and variations. During model training, these types of data could be limited, and the model may not have received sufficient training to handle […]
Context-based question answering using LLM
Companies are going to want to query their own internal documents – especially with the rise of LLMs and improvements in AI. Qxf2 has already heard of several CEOs that want to use AI/ML models to glean insights from internal knowledge stores. What does this mean for a tester? Well, you can expect to test such systems in the coming […]
Baseline Model Comparison for Performance Evaluation
Machine learning models evolve. As a tester, how do we know the newer version of the model is better? How do we know that the model did not get worse in other areas? The most intuitive approach would be to design a ‘good’ labelled dataset and then calculate the evaluation score like the F1 score for the model under test. […]