Testing Charts with Transformers using Visual Question Answering (VQA)

I tried testing charts using VQA. What that means is that I showed several charts to an AI model and made it answer questions about it. My idea was to use these answers as part of test automation. This post will show you what (sort of) worked for me and what techniques did not work. I hope people use this work and build upon it. If you successfully use AI models to test your charts, please share your work with us.


Context

Recently, I had the opportunity to explore an interesting way of testing images. It is using a Visual Question Answering (VQA) approach, utilizing the Pix2Struct model, which is based on Transformers. I did for an image that is generated by one of the lambdas at Qxf2. You can find more details about it in this blog post. Building upon this experience, I decided to extend the same approach to perform testing on charts. As a tester, my focus is on evaluating the visual aspects and data accuracy of charts, a process that typically involves manual examination and analysis. However, automating these checks would be highly beneficial, as it not only saves time but also has the potential to enhance efficiency and accuracy in chart quality assurance.
To give automated checking of charts a try, I set out on experimenting VQA approach on charts using Pix2Struct Transformers models. While this approach is a widely used technique for reasoning questions on charts, I wanted to try this for testing charts. In this blog, I will share my experiments and the challenges I encountered while conducting these tests.


The Challenge of performing VQA on Charts

Testing charts using visual question answering, similar to regular images, presents significant challenges. The typical computer vision techniques designed for regular images, containing scenes and objects, become complex when applied to charts. Charts visually encode data using attributes like color, length, and shape, crucial for effective information conveyance. Visual Question Answering, successful with regular images, may not yield the same results for charts. Nevertheless, it’s worth exploring VQA techniques for chart testing as they have the potential to facilitate automation.


Scope definition

Given the vast domain of charts consisting of various types and testing possibilities, I defined the scope of my testing approach. For this task, I narrowed down my focus to specific aspects that I aimed to verify in charts:

  • Chart type and layout
    • Verify chart type, layout, axes, labels, titles and legends are displayed
  • Data Accuracy
    • Verify chart’s values match the source data

Navigating the Challenge

To perform Visual Question Answering, I used Pix2Struct models which had undergone fine-tuning specifically for Chart QA dataset. I tested using two models – google/pix2struct-chartqa-base and google/matcha-chartqa. The results for both models were quite similar, irrespective of whether they were positive or negative. Afterward, I experimented with another model called google/deplot, a plot-to-table translation model. DePlot operates by converting an image of a plot or chart into a linearized table. The resulting table can then be directly fed as input to a pretrained large language model. The results showed improvement; however, DePlot has limitations regarding the visual attributes of the chart which are lost during the modality conversion. To assess the visual aspects and data accuracy, I generated a set of questions with their corresponding answers. Subsequently, I fed these questions to the models and observed the results. Next, I will describe my attempts as I conducted these tests.


1. Attempt with Pix2Struct models

In this section, I showcase some of the charts I experimented with, along with the corresponding questions I used for testing. Additionally, I have included links to the code snippets and the list of questions for reference.

1.1) Testing a Basic Bar Chart (Poor)

To begin with, I used the pix2struct-chartqa-base model and a basic bar chart that I obtained online. I came up with simple questions to directly assess the data represented by the bars in the chart. And certain visual aspects like labels and colors represented in the legend.

I created a simple script to utilize the model and perform inference, following the instructions provided in the model’s usage section. You can find my code here.

With the script in place, I proceeded to test the chart with the questions I had prepared. I observed that most of the answers were incorrect. Although, I didn’t expect the answers to have exact values, but some of them were not even close to what was actually represented by the bars in the chart.

Below is the chart I tried.

This image shows a bar chart which shows summaries of different countries for three different years

The following is the list of questions along with the answers provided by the pix2struct-chartqa-base model.

Question: What is the chart type?
Answer: Country
Expected Answer: Bar

Question: What is the title of the chart?
Answer: Country
Expected Answer: Country Summary

Question: What is the value for Austria for the year 2004?
Answer: 90298
Expected Answer: 57000

Question: What is the value for Brazil for the year 2003?
Answer: 3183
Expected Answer: 20000

Question: What is the value for France for the year 2004?
Answer: 11742
Expected Answer: 46000

Question: Which color represents the year 2003?
Answer: Country
Expected Answer: Red

Question: Which country has highest value for the year 2004?
Answer: Country
Expected Answer: Italy

Question: For the country USA is the value for the year 2005 more than the year 2005?
Answer: 5
Expected Answer: Yes

Question: What is the label or title of the X axis?
Answer: Country
Expected Answer: Country

Question: Which country has highest value for year 2005?
Answer: Country
Expected Answer: USA

Question: What is the value for France for the year 2005?
Answer: 11142
Expected Answer: 19000

Next, I attempted using the google/matcha-chartqa model. However, the results did not differ much from the previous model. Out of the ten questions, only one was approximately correct in its response.

Question: What is the chart type?
Answer: 17112
Expected Answer: Bar

Question: What is the title of the chart?
Answer: 120600
Expected Answer: Country Summary

Question: What is the value for Austria for the year 2004?
Answer: 2003
Expected Answer: 57000

Question: What is the value for Brazil for the year 2003?
Answer: 46000
Expected Answer: 20000

Question: What is the value for France for the year 2004?
Answer: 19600
Expected Answer: 46000

Question: Which color represents the year 2003?
Answer: blue
Expected Answer: Red

Question: Which country has highest value for the year 2004?
Answer: Austria
Expected Answer: Italy

Question: For the country USA is the value for the year 2005 more than the year 2005?
Answer: 16000
Expected Answer: Yes

Question: What is the label or title of the X axis?
Answer: TITLE | Country Summary
Expected Answer: Country

Question: Which country has highest value for year 2005?
Answer: Germany
Expected Answer: USA

Question: What is the value for France for the year 2005?
Answer: 19600
Expected Answer: 19000

1.2) Experimenting with Self-Created Chart (Unsatisfactory)

I thought maybe the chart was a bit complex, with multiple data points and bars. So, created a simple Bar chart with straightforward data. For creating the chart, I used Apache ECharts and picked the data from ourworldindata.org.

This images shows a simple bar chart which shows investment in AI across different years

I tested the chart against both the models. Here are the answers provided by the MatCha model. The results were slightly better this time as at least 4 of the questions related to data points being right. However, the MatCha model was unable to capture title, labels, and other visual elements, it actually provided numeric answers for those questions.

Question: What type of chart is this?
Answer: 2021
Expected Answer: Bar

Question: What is the title of the chart?
Answer: 271.82
Expected Answer: Investment in AI

Question: What is the value for the year 2018?
Answer: 277
Expected Answer: 85

Question: What is the value for the year 2019?
Answer: 277
Expected Answer: 101

Question: What is the value for the year 2017?
Answer: 60.6
Expected Answer: 60

Question: What is the value for the year 2022?
Answer: 175.27
Expected Answer: 175

Question: For which year is the Investment highest?
Answer: 2021
Expected Answer: 2021

Question: Is the value for the year 2020 more than the year 2019?
Answer: Yes
Expected Answer: Yes

Question: What is the color of the bars in the chart?
Answer: Blue
Expected Answer: Green

Question: What is the label or title on Y-axis?
Answer: 127.5
Expected Answer: value

Question: What is the label or title on X-axis?
Answer: 127.5
Expected Answer: Years

Question: Are all the bars in the chart or graph vertical?
Answer: 329
Expected Answer: Yes

Question: What is the value represented by the smallest bar in the chart?
Answer: 279.5
Expected Answer: 60

Question: What does the legend represent or display?
Answer: display
Expected Answer: billion dollars

Question: How many billion dollars were invested in the year 2021?
Answer: 273
Expected Answer: 276

The csv file containing the above questions can be found here.

1.3) Try out a Chart from ChartQA Dataset (Good)

Unsatisfied with these results, I wanted to examine a chart picked from the ChartQA dataset. This dataset was created with human-written questions emphasizing visual and logical reasoning.

This image shows a bar chart which plots the turnover in million GBP

Below are the answers provided by the MatCha model. Interestingly, the answers to the data-related questions were correct this time.

Question: What type of chart is this?
Answer: [71.5, 72.5]
Expected Answer: Bar

Question: What is label or title on Y-axis?
Answer: TITLE | title
Expected Answer: Turnover in million GBP

Question: Are all the bars in the chart or graph vertical?
Answer: 68.5
Expected Answer: Yes

Question: What is the value of the largest bar in the chart?
Answer: 73.4
Expected Answer: 73.4

Question: What is the value for the year 2018?
Answer: 68.5
Expected Answer: 68.5

Question: What is the value for the year 2016?
Answer: 58.5
Expected Answer: 58.5

Question: What is the turnover for the year 2010?
Answer: 33
Expected Answer: 33

Question: For which year is the turnover highest?
Answer: 2019
Expected Answer: 2019

Question: Is the turnover for the year 2017 more than the year 2016?
Answer: Yes
Expected Answer: Yes

Question: What is the color of the bars in the chart?
Answer: Blue
Expected Answer: Blue

I felt the model performed better on the above chart since it was picked from its training dataset. Although, I realized the model was able to answer the data-related questions well probably because the chart had marked data points. This likely contributed to enhanced clarity and, as a result, improved accuracy. Taking a cue from this observation, I decided to mark data points on my chart as well to see if it could result in similar improvements.

1.4) Enhanced self-created chart plotted with data markers (Satisfactory)

Accordingly, I improvised the chart by plotting the data points and retested it against the models. This time, I found that the answers to the questions related to data were mostly accurate.

This image shows a bar chart which shows investment in AI with data markers on the chart

Here are the answers provided by the google/pix2struct-chartqa-base model. While the answers related to the labels remained incorrect, I noticed that the data-related questions had accurate answers.

Question: What is the chart type?
Answer: 2021
Expected Answer: Bar

Question: What is the title of the chart?
Answer: 276
Expected Answer: Investment in AI

Question: What is the value for the year 2018?
Answer: 85
Expected Answer: 85

Question: What is the value for the year 2019?
Answer: 101
Expected Answer: 101

Question: What is the value for the year 2017?
Answer: 60
Expected Answer: 60

Question: What is the value for the year 2022?
Answer: 175
Expected Answer: 175

Question: For which year is the Investment highest?
Answer: 2021
Expected Answer: 2021

Question: Is the value for the year 2020 more than the year 2019?
Answer: Yes
Expected Answer: Yes

Question: What is the color of the bars in the chart?
Answer: Blue
Expected Answer: Green

Question: What is the label or title of the Y-axis?
Answer: 111
Expected Answer: value

Question: What is the label or title of the X-axis?
Answer: 118
Expected Answer: Years

Question: Are all the bars in the chart or graph vertical?
Answer: 329
Expected Answer: Yes

Question: What is the value represented by the smallest bar in the chart?
Answer: 175
Expected Answer: 60

Question: What does the legend represent or display?
Answer: display
Expected Answer: billion dollars

Question: How many billion dollars were invested in the year 2021?
Answer: 276
Expected Answer: 276

Those were some of my experiments with the zero-shot approach for visual question answering on chart images using Pix2Struct models. Next, I decided to explore a different approach.


2. Testing with DePlot model

Instead of directly applying visual question answering on the charts, I began by extracting the underlying data table of the chart. Then, conducted visual question answering on the extracted data table. This approach is suggested in the paper introducing the DePlot model, which is a plot-to-table conversion model. It efficiently translates the chart into a linearized table format, and the resulting table can be directly used to prompt a pretrained large language model for visual question answering tasks.

1. Extract the data table using DePlot

The DePlot model takes the chart image and the text “Generate underlying data table of the figure below:” as input and extracts the chart data into a linearized table. I have used the code provided on the model page. You can find my code snippet here.

Here is the output I got when I applied the DePlot model on my self-created chart.

TITLE | Investment in AI Years | billion dollars 2017 | 60 2018 | 85 2019 | 101 2020 | 154 2021 | 276 2022 | 175

This image shows a bar chart which shows investment in AI across different years

With the chart data table obtained, I proceeded to conduct Visual Question Answering on it.

2. Performing Visual Question Answering on the table (Good)

To conduct visual question answering on the table, I Large Language Model, GPT-3. Utilizing the table data as a system prompt, I provided it as context for all the questions. You can find the code snippet here.

Below are the answers provided by the GPT-3 Language Model. The responses are more explanatory and not limited to single-word answers.

Question: What is the chart type?
Answer: The chart type in this case is a line chart.
Actual answer: Bar

Question: What is the title of the chart?
Answer: The title of the chart is “Investment in AI (Artificial Intelligence) Years”
Actual answer: Investment in AI

Question: What is the value for the year 2018?
Answer: The value for the year 2018 is 85 billion dollars.
Actual answer: 85

Question: What is the value for the year 2019?
Answer: According to the given data, the value for the year 2019 is 101 billion dollars.
Actual answer: 101

Question: What is the value for the year 2017?
Answer: The value for the year 2017 in terms of investment in AI is $60 billion.
Actual answer: 60

Question: What is the value for the year 2022?
Answer: The value for the year 2022 in terms of investment in AI is 175 billion dollars.
Actual answer: 175

Question: For which year is the Investment highest?
Answer: The highest investment in AI is in the year 2021, with a total of 276 billion dollars.
Actual answer: 2021

Question: Is the value for the year 2020 more than the year 2019?
Answer: Yes, the value for the year 2020 (154 billion dollars) is more than the year 2019 (101 billion dollars).
Actual answer: Yes

Question: What is the color of the bars in the chart?
Answer: The color of the bars in the chart is not specified in the given information.
Actual answer: Green

Question: What is the label or title of the Y-axis?
Answer: The label or title of the Y-axis is “Investment in AI (billion dollars)”.
Actual answer: value

Question: What is the label or title of the X-axis?
Answer: The label or title of the X-axis in this case is “Years”.
Actual answer: Years

Question: Are all the bars in the chart or graph vertical?
Answer: No, not all of the bars in the graph are vertical. The data points for the years 2017, 2018, 2019, and 2020 are represented by vertical bars, but the bars for the years 2021
Actual answer: Yes

Question: What is the value represented by the smallest bar in the chart?
Answer: The value represented by the smallest bar in the chart is 60 billion dollars.
Actual answer: 60

Question: What does the legend represent or display?
Answer: The legend in this case represents the amount of investment in Artificial Intelligence (AI) in billions of dollars for the years 2017, 2018, 2019, 2020, 2021, and 2022. The legend
Actual answer: billion dollars

Question: How many billion dollars were invested in the year 2021?
Answer: In the year 2021, a total of 276 billion dollars were invested.
Actual answer: 276

This approach for first extracting the table and then performing the question answering has more probability of accurate answers. However, need to design the prompt such that it returns a single-word answer. It is important to note that this approach cannot test the visual aspects of the chart since the DePlot model does not capture them.


Conclusion

From my experiments, I observed that the success of Visual Question Answering (VQA) on charts also depends on how well the chart is designed. Charts with clear visual attributes, properly marked data points, and effective data representation tend to produce more accurate VQA results. Additionally, fine-tuning the models specifically for chart-related tasks might be necessary to achieve improved accuracy and reliability. Despite the challenges, exploring chart testing with VQA remains an interesting approach. With further advancements and model fine-tuning, VQA has the potential to become a valuable tool in chart testing and quality assurance.


Hire Qxf2

Hire Qxf2 to benefit from our experienced and diverse team of technical testers. We specialize in testing early-stage products built with modern technical stacks, enabling us to provide valuable insights and thorough testing. Our approach goes beyond traditional test automation, allowing us to address complex testing problems and ensure your software meets the highest standards of quality. With Qxf2 on board, you can confidently accelerate development cycles and deliver exceptional products to your customers.


Leave a Reply

Your email address will not be published. Required fields are marked *