This post extends my previous exploration of conducting data validation tasks using Large Language Models like ChatGPT. To provide context, at Qxf2, we execute a series of data quality tests using Great Expectations. Initially, we explored the possibility of employing ChatGPT for these validations, but it faced challenges in performing them effectively. Now, with the recent release of more advanced […]
Data Validation Using Assistants API: Exploring AI-driven approach
Fine Tuning Model Evaluation using ROC and Precision Recall curves
Evaluating machine learning models is crucial for understanding their performance characteristics. In this blog post, we explore how ROC and Precision Recall curves can be used to improve the way we evaluate models. Additionally, we delve into the practical aspect of using these curves across various thresholds, customizing the model for specific requirements and achieving optimal performance. Why this post […]
Testing Charts using GPT-4 with Vision model
This post builds upon my prior exploration of testing charts with Transformers using the Visual Question Answering approach. I had presented charts to Transformers models like Pix2Struct and matcha from Google (which were specifically trained on charts) and then queried with questions. The outcomes proved satisfactory when the charts were well-defined with clearly labeled data points. Now, with the recent […]
Testing DALL-E by creating single panel cartoons
I tested DALL-E for a specific real-world use case. I wanted to see how good it was for producing single panel cartoons. My testing has uncovered several promising aspects, some problems that need to be addressed and an interesting testing technique for DALL-E and ChatGPT like applications. I tried summarizing my findings in a blog post like an engineer would. […]
Insights and strategies on testing Machine Learning Models
Once a machine learning model is developed and its accuracy and related metrics have been thoroughly examined, it might seem like the model is ready for real-world deployment. However in reality this is hardly the case. Major part of testing begins when the model is integrated into the application it was designed for. We at Qxf2 Services feel most of […]
Testing Charts with Transformers using Visual Question Answering (VQA)
I tried testing charts using VQA. What that means is that I showed several charts to an AI model and made it answer questions about it. My idea was to use these answers as part of test automation. This post will show you what (sort of) worked for me and what techniques did not work. I hope people use this […]
Baseline Model Comparison for Performance Evaluation
Machine learning models evolve. As a tester, how do we know the newer version of the model is better? How do we know that the model did not get worse in other areas? The most intuitive approach would be to design a ‘good’ labelled dataset and then calculate the evaluation score like the F1 score for the model under test. […]
My experience with Auto-GPT
Qxf2 was intrigued by the rising trend of LLMs. We decided to venture beyond ChatGPT. With Auto-GPT’s increasing popularity and the widespread claims, we were eager to explore its capabilities. Given my background as an engineer, I was particularly equipped to dive into the intricacies of Auto-GPT. And as curious tester, I wanted to get a sense of how we […]
Testing OpenAI Whisper with Indian Languages
In previous blog, we tested OpenAI Whisper for English language with different accents and observed it did great job. We also provided details about how we generated audios, setup and test details. In this blog, we attempted to test OpenAI Whisper’s capability to transcribe and translate Indian Languages. At Qxf2, our teammates work from different regions of India, and everyone […]
Testing OpenAI Whisper with different accents
At Qxf2, we did some black box testing on OpenAI Whisper – a tool that does speech recognition well. OpenAI Whisper is also capable of language detection and translation. This model can be tested in various ways, by adjusting different voice attributes such as volume, pace, pitch, rate, etc. However, in this particular case, we have chosen to test it […]