Companies are going to want to query their own internal documents – especially with the rise of LLMs and improvements in AI. Qxf2 has already heard of several CEOs that want to use AI/ML models to glean insights from internal knowledge stores. What does this mean for a tester? Well, you can expect to test such systems in the coming […]
Context-based question answering using LLM
Baseline Model Comparison for Performance Evaluation
Machine learning models evolve. As a tester, how do we know the newer version of the model is better? How do we know that the model did not get worse in other areas? The most intuitive approach would be to design a ‘good’ labelled dataset and then calculate the evaluation score like the F1 score for the model under test. […]
My experience with Auto-GPT
Qxf2 was intrigued by the rising trend of LLMs. We decided to venture beyond ChatGPT. With Auto-GPT’s increasing popularity and the widespread claims, we were eager to explore its capabilities. Given my background as an engineer, I was particularly equipped to dive into the intricacies of Auto-GPT. And as curious tester, I wanted to get a sense of how we […]
Data Validation with ChatGPT: Trials and Insights
We conducted a study to explore the feasibility of using large language models like ChatGPT for performing validation on numerical data. At Qxf2, we execute a set of data quality tests using Great Expectations. Our goal was to assess the efficiency of leveraging ChatGPT to carry out these validations instead. In order to achieve this, I selected two specific scenarios. […]