{"id":19106,"date":"2023-07-31T08:14:49","date_gmt":"2023-07-31T12:14:49","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=19106"},"modified":"2023-07-31T08:14:49","modified_gmt":"2023-07-31T12:14:49","slug":"tesing-charts-using-vqa","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/tesing-charts-using-vqa\/","title":{"rendered":"Testing Charts with Transformers using Visual Question Answering (VQA)"},"content":{"rendered":"<p>I tried testing charts using VQA. What that means is that I showed several charts to an AI model and made it answer questions about it. My idea was to use these answers as part of test automation. This post will show you what (sort of) worked for me and what techniques did not work. I hope people use this work and build upon it. If you successfully use AI models to test your charts, please share your work with us.<\/p>\n<hr>\n<h4>Context<\/h4>\n<p>Recently, I had the opportunity to explore an interesting way of testing images. It is using a Visual Question Answering (VQA) approach, utilizing the <a href=\"https:\/\/huggingface.co\/docs\/transformers\/main\/model_doc\/pix2struct\" rel=\"noopener\" target=\"_blank\">Pix2Struct<\/a> model, which is based on <a href=\"https:\/\/huggingface.co\/docs\/transformers\/index\" rel=\"noopener\" target=\"_blank\">Transformers<\/a>. I did for an image that is generated by one of the lambdas <a href=\"https:\/\/www.qxf2.com\/?utm_source=chart_testing_vqa&#038;utm_medium=click&#038;utm_campaign=From%20blog\" rel=\"noopener\" target=\"_blank\">at Qxf2<\/a>. You can find more details about it in <a href=\"https:\/\/qxf2.com\/blog\/testing-images-using-pix2struct\/\" rel=\"noopener\" target=\"_blank\">this<\/a> blog post. Building upon this experience, I decided to extend the same approach to perform testing on charts. As a tester, my focus is on evaluating the visual aspects and data accuracy of charts, a process that typically involves manual examination and analysis. However, automating these checks would be highly beneficial, as it not only saves time but also has the potential to enhance efficiency and accuracy in chart quality assurance.<br \/>\nTo give automated checking of charts a try, I set out on experimenting VQA approach on charts using Pix2Struct Transformers models. While this approach is a widely used technique for reasoning questions on charts, I wanted to try this for testing charts. In this blog, I will share my experiments and the challenges I encountered while conducting these tests.<\/p>\n<hr>\n<h4>The Challenge of performing VQA on Charts<\/h4>\n<p>Testing charts using visual question answering, similar to regular images, presents significant challenges. The typical computer vision techniques designed for regular images, containing scenes and objects, become complex when applied to charts. Charts visually encode data using attributes like color, length, and shape, crucial for effective information conveyance. Visual Question Answering, successful with regular images, may not yield the same results for charts. Nevertheless, it&#8217;s worth exploring VQA techniques for chart testing as they have the potential to facilitate automation.<\/p>\n<hr>\n<h4>Scope definition<\/h4>\n<p>Given the vast domain of charts consisting of various types and testing possibilities, I defined the scope of my testing approach. For this task, I narrowed down my focus to specific aspects that I aimed to verify in charts:<\/p>\n<ul>\n<li>Chart type and layout<\/li>\n<ul>\n<li>Verify chart type, layout, axes, labels, titles and legends are displayed<\/li>\n<\/ul>\n<li>Data Accuracy<\/li>\n<ul>\n<li>Verify chart&#8217;s values match the source data<\/li>\n<\/ul>\n<\/ul>\n<hr>\n<h4>Navigating the Challenge<\/h4>\n<p>To perform Visual Question Answering, I used Pix2Struct models which had undergone fine-tuning specifically for Chart QA dataset. I tested using two models &#8211; <a href=\"https:\/\/huggingface.co\/google\/pix2struct-chartqa-base\" rel=\"noopener\" target=\"_blank\">google\/pix2struct-chartqa-base<\/a> and <a href=\"https:\/\/huggingface.co\/google\/matcha-chartqa\" rel=\"noopener\" target=\"_blank\">google\/matcha-chartqa<\/a>. The results for both models were quite similar, irrespective of whether they were positive or negative. Afterward, I experimented with another model called <a href=\"https:\/\/huggingface.co\/google\/matcha-chartqa\" rel=\"noopener\" target=\"_blank\">google\/deplot<\/a>, a plot-to-table translation model. DePlot operates by converting an image of a plot or chart into a linearized table. The resulting table can then be directly fed as input to a pretrained large language model. The results showed improvement; however, DePlot has limitations regarding the visual attributes of the chart which are lost during the modality conversion. To assess the visual aspects and data accuracy, I generated a set of questions with their corresponding answers. Subsequently, I fed these questions to the models and observed the results. Next, I will describe my attempts as I conducted these tests.<\/p>\n<hr>\n<h5>1. Attempt with Pix2Struct models<\/h5>\n<p>In this section, I showcase some of the charts I experimented with, along with the corresponding questions I used for testing. Additionally, I have included links to the code snippets and the list of questions for reference.<\/p>\n<h5>1.1) Testing a Basic Bar Chart <span style=\"color:Tomato; font-size:medium\">(Poor)<\/span><\/h5>\n<p>To begin with, I used the pix2struct-chartqa-base model and a basic bar chart that I obtained <a href=\"https:\/\/docs.oracle.com\/javase\/8\/javafx\/user-interface-tutorial\/bar-chart.htm\" rel=\"noopener\" target=\"_blank\">online<\/a>. I came up with simple questions to directly assess the data represented by the bars in the chart. And certain visual aspects like labels and colors represented in the legend. <\/p>\n<p>I created a simple script to utilize the model and perform inference, following the instructions provided in the model&#8217;s usage section. You can find my code <a href=\"https:\/\/gist.github.com\/sravantit25\/e05e90619de224fb48c73d7b1b390d3e#file-perform_vqa_charts-py\" rel=\"noopener\" target=\"_blank\">here<\/a>.<\/p>\n<p>With the script in place, I proceeded to test the chart with the questions I had prepared. I observed that most of the answers were incorrect. Although, I didn&#8217;t expect the answers to have exact values, but some of them were not even close to what was actually represented by the bars in the chart. <\/p>\n<p>Below is the chart I tried.  <\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-sample.png\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-sample.png\" alt=\"This image shows a bar chart which shows summaries of different countries for three different years\" width=\"538\" height=\"413\" class=\"size-full wp-image-19105\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-sample.png 538w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-sample-300x230.png 300w\" sizes=\"auto, (max-width: 538px) 100vw, 538px\" \/><\/a><\/p>\n<p>The following is the list of questions along with the answers provided by the pix2struct-chartqa-base model.<\/p>\n<div style=\"height: 500px; width: 700px; overflow-y: auto; color: white; background-color: black; font-family: 'Helvetica Neue', Arial, sans-serif; word-break: break-all;\">\n<p>Question: What is the chart type?<br \/>\nAnswer: Country<br \/>\nExpected Answer: Bar<\/p>\n<p>Question: What is the title of the chart?<br \/>\nAnswer: Country<br \/>\nExpected Answer: Country Summary<\/p>\n<p>Question: What is the value for Austria for the year 2004?<br \/>\nAnswer: 90298<br \/>\nExpected Answer: 57000<\/p>\n<p>Question: What is the value for Brazil for the year 2003?<br \/>\nAnswer: 3183<br \/>\nExpected Answer: 20000<\/p>\n<p>Question: What is the value for France for the year 2004?<br \/>\nAnswer: 11742<br \/>\nExpected Answer: 46000<\/p>\n<p>Question: Which color represents the year 2003?<br \/>\nAnswer: Country<br \/>\nExpected Answer: Red<\/p>\n<p>Question: Which country has highest value for the year 2004?<br \/>\nAnswer: Country<br \/>\nExpected Answer: Italy<\/p>\n<p>Question: For the country USA is the value for the year 2005 more than the year 2005?<br \/>\nAnswer: 5<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the label or title of the X axis?<br \/>\nAnswer: Country<br \/>\nExpected Answer: Country<\/p>\n<p>Question: Which country has highest value for year 2005?<br \/>\nAnswer: Country<br \/>\nExpected Answer: USA<\/p>\n<p>Question: What is the value for France for the year 2005?<br \/>\nAnswer: 11142<br \/>\nExpected Answer: 19000<\/p>\n<\/div>\n<p>\nNext, I attempted using the google\/matcha-chartqa model. However, the results did not differ much from the previous model. Out of the ten questions, only one was approximately correct in its response.\n<\/p>\n<div style=\"height: 500px; width: 700px; overflow-y: auto; color: white; background-color: black; font-family: 'Helvetica Neue', Arial, sans-serif; word-break: break-all;\">\n<p>Question: What is the chart type?<br \/>\nAnswer: 17112<br \/>\nExpected Answer: Bar<\/p>\n<p>Question: What is the title of the chart?<br \/>\nAnswer: 120600<br \/>\nExpected Answer: Country Summary<\/p>\n<p>Question: What is the value for Austria for the year 2004?<br \/>\nAnswer: 2003<br \/>\nExpected Answer: 57000<\/p>\n<p>Question: What is the value for Brazil for the year 2003?<br \/>\nAnswer: 46000<br \/>\nExpected Answer: 20000<\/p>\n<p>Question: What is the value for France for the year 2004?<br \/>\nAnswer: 19600<br \/>\nExpected Answer: 46000<\/p>\n<p>Question: Which color represents the year 2003?<br \/>\nAnswer: blue<br \/>\nExpected Answer: Red<\/p>\n<p>Question: Which country has highest value for the year 2004?<br \/>\nAnswer: Austria<br \/>\nExpected Answer: Italy<\/p>\n<p>Question: For the country USA is the value for the year 2005 more than the year 2005?<br \/>\nAnswer: 16000<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the label or title of the X axis?<br \/>\nAnswer: TITLE | Country Summary<br \/>\nExpected Answer: Country<\/p>\n<p>Question: Which country has highest value for year 2005?<br \/>\nAnswer: Germany<br \/>\nExpected Answer: USA<\/p>\n<p>Question: What is the value for France for the year 2005?<br \/>\nAnswer: 19600<br \/>\nExpected Answer: 19000<\/p>\n<\/div>\n<h5>1.2) Experimenting with Self-Created Chart <span style=\"color:Tomato; font-size:medium\"> (Unsatisfactory)<\/h5>\n<p>I thought maybe the chart was a bit complex, with multiple data points and bars. So, created a simple Bar chart with straightforward data. For creating the chart, I used <a href=\"https:\/\/echarts.apache.org\/en\/index.html\" rel=\"noopener\" target=\"_blank\">Apache ECharts<\/a> and picked the data from <a href=\"https:\/\/ourworldindata.org\/grapher\/corporate-investment-in-artificial-intelligence-by-type?facet=none\" rel=\"noopener\" target=\"_blank\">ourworldindata.org<\/a>. <\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart.png\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart.png\" alt=\"This images shows a simple bar chart which shows investment in AI across different years\" width=\"600\" height=\"428\" class=\"alignnone size-full wp-image-19184\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart.png 600w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart-300x214.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>I tested the chart against both the models. Here are the answers provided by the MatCha model. The results were slightly better this time as at least 4 of the questions related to data points being right. However, the MatCha model was unable to capture title, labels, and other visual elements, it actually provided numeric answers for those questions. <\/p>\n<div style=\"height: 500px; width: 700px; overflow-y: auto; color: white; background-color: black; font-family: 'Helvetica Neue', Arial, sans-serif; word-break: break-all;\">\nQuestion: What type of chart is this?<br \/>\nAnswer: 2021<br \/>\nExpected Answer: Bar<\/p>\n<p>Question: What is the title of the chart?<br \/>\nAnswer: 271.82<br \/>\nExpected Answer: Investment in AI<\/p>\n<p>Question: What is the value for the year 2018?<br \/>\nAnswer: 277<br \/>\nExpected Answer: 85<\/p>\n<p>Question: What is the value for the year 2019?<br \/>\nAnswer: 277<br \/>\nExpected Answer: 101<\/p>\n<p>Question: What is the value for the year 2017?<br \/>\nAnswer: 60.6<br \/>\nExpected Answer: 60<\/p>\n<p>Question: What is the value for the year 2022?<br \/>\nAnswer: 175.27<br \/>\nExpected Answer: 175<\/p>\n<p>Question: For which year is the Investment highest?<br \/>\nAnswer: 2021<br \/>\nExpected Answer: 2021<\/p>\n<p>Question: Is the value for the year 2020 more than the year 2019?<br \/>\nAnswer: Yes<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the color of the bars in the chart?<br \/>\nAnswer: Blue<br \/>\nExpected Answer: Green<\/p>\n<p>Question: What is the label or title on Y-axis?<br \/>\nAnswer: 127.5<br \/>\nExpected Answer: value<\/p>\n<p>Question: What is the label or title on X-axis?<br \/>\nAnswer: 127.5<br \/>\nExpected Answer: Years<\/p>\n<p>Question: Are all the bars in the chart or graph vertical?<br \/>\nAnswer: 329<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the value represented by the smallest bar in the chart?<br \/>\nAnswer: 279.5<br \/>\nExpected Answer: 60<\/p>\n<p>Question: What does the legend represent or display?<br \/>\nAnswer: display<br \/>\nExpected Answer: billion dollars<\/p>\n<p>Question: How many billion dollars were invested in the year 2021?<br \/>\nAnswer: 273<br \/>\nExpected Answer: 276\n<\/p><\/div>\n<p>The csv file containing the above questions can be found <a href=\"https:\/\/gist.github.com\/sravantit25\/e05e90619de224fb48c73d7b1b390d3e#file-questions-csv\" rel=\"noopener\" target=\"_blank\">here<\/a>.<\/p>\n<h5>1.3) Try out a Chart from ChartQA Dataset<span style=\"color:MediumSeaGreen; font-size:medium\"> (Good)<\/h5>\n<p>Unsatisfied with these results, I wanted to examine a chart picked from the ChartQA dataset. This dataset was created with human-written questions emphasizing visual and logical reasoning. <\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/turnover.png\" data-rel=\"lightbox-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/turnover.png\" alt=\"This image shows a bar chart which plots the turnover in million GBP\" width=\"574\" height=\"400\" class=\"size-full wp-image-19108\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/turnover.png 574w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/turnover-300x209.png 300w\" sizes=\"auto, (max-width: 574px) 100vw, 574px\" \/><\/a><\/p>\n<p>Below are the answers provided by the MatCha model. Interestingly, the answers to the data-related questions were correct this time. <\/p>\n<div style=\"height: 500px; width: 700px; overflow-y: auto; color: white; background-color: black; font-family: 'Helvetica Neue', Arial, sans-serif; word-break: break-all;\">\nQuestion: What type of chart is this?<br \/>\nAnswer: [71.5, 72.5]<br \/>\nExpected Answer: Bar<\/p>\n<p>Question: What is label or title on Y-axis?<br \/>\nAnswer: TITLE | title<br \/>\nExpected Answer: Turnover in million GBP<\/p>\n<p>Question: Are all the bars in the chart or graph vertical?<br \/>\nAnswer: 68.5<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the value of the largest bar in the chart?<br \/>\nAnswer: 73.4<br \/>\nExpected Answer: 73.4<\/p>\n<p>Question: What is the value for the year 2018?<br \/>\nAnswer: 68.5<br \/>\nExpected Answer: 68.5<\/p>\n<p>Question: What is the value for the year 2016?<br \/>\nAnswer: 58.5<br \/>\nExpected Answer: 58.5<\/p>\n<p>Question: What is the turnover for the year 2010?<br \/>\nAnswer: 33<br \/>\nExpected Answer: 33<\/p>\n<p>Question: For which year is the turnover highest?<br \/>\nAnswer: 2019<br \/>\nExpected Answer: 2019<\/p>\n<p>Question: Is the turnover for the year 2017 more than the year 2016?<br \/>\nAnswer: Yes<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the color of the bars in the chart?<br \/>\nAnswer: Blue<br \/>\nExpected Answer: Blue<\/p>\n<\/div>\n<p>I felt the model performed better on the above chart since it was picked from its training dataset. Although, I realized the model was able to answer the data-related questions well probably because the chart had marked data points. This likely contributed to enhanced clarity and, as a result, improved accuracy. Taking a cue from this observation, I decided to mark data points on my chart as well to see if it could result in similar improvements.<\/p>\n<h5>1.4) Enhanced self-created chart plotted with data markers<span style=\"color:MediumSeaGreen; font-size:medium\"> (Satisfactory)<\/h5>\n<p>Accordingly, I improvised the chart by plotting the data points and retested it against the models. This time, I found that the answers to the questions related to data were mostly accurate.<\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-simple-self-markers.png\" data-rel=\"lightbox-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-simple-self-markers.png\" alt=\"This image shows a bar chart which shows investment in AI with data markers on the chart\" width=\"600\" height=\"428\" class=\"alignnone size-full wp-image-19139\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-simple-self-markers.png 600w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/bar-simple-self-markers-300x214.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>Here are the answers provided by the google\/pix2struct-chartqa-base model. While the answers related to the labels remained incorrect, I noticed that the data-related questions had accurate answers.<\/p>\n<div style=\"height: 500px; width: 700px; overflow-y: auto; color: white; background-color: black; font-family: 'Helvetica Neue', Arial, sans-serif;\">\nQuestion: What is the chart type?<br \/>\nAnswer: 2021<br \/>\nExpected Answer: Bar<\/p>\n<p>Question: What is the title of the chart?<br \/>\nAnswer: 276<br \/>\nExpected Answer: Investment in AI<\/p>\n<p>Question: What is the value for the year 2018?<br \/>\nAnswer: 85<br \/>\nExpected Answer: 85<\/p>\n<p>Question: What is the value for the year 2019?<br \/>\nAnswer: 101<br \/>\nExpected Answer: 101<\/p>\n<p>Question: What is the value for the year 2017?<br \/>\nAnswer: 60<br \/>\nExpected Answer: 60<\/p>\n<p>Question: What is the value for the year 2022?<br \/>\nAnswer: 175<br \/>\nExpected Answer: 175<\/p>\n<p>Question: For which year is the Investment highest?<br \/>\nAnswer: 2021<br \/>\nExpected Answer: 2021<\/p>\n<p>Question: Is the value for the year 2020 more than the year 2019?<br \/>\nAnswer: Yes<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the color of the bars in the chart?<br \/>\nAnswer: Blue<br \/>\nExpected Answer: Green<\/p>\n<p>Question: What is the label or title of the Y-axis?<br \/>\nAnswer: 111<br \/>\nExpected Answer: value<\/p>\n<p>Question: What is the label or title of the X-axis?<br \/>\nAnswer: 118<br \/>\nExpected Answer: Years<\/p>\n<p>Question: Are all the bars in the chart or graph vertical?<br \/>\nAnswer: 329<br \/>\nExpected Answer: Yes<\/p>\n<p>Question: What is the value represented by the smallest bar in the chart?<br \/>\nAnswer: 175<br \/>\nExpected Answer: 60<\/p>\n<p>Question: What does the legend represent or display?<br \/>\nAnswer: display<br \/>\nExpected Answer: billion dollars<\/p>\n<p>Question: How many billion dollars were invested in the year 2021?<br \/>\nAnswer: 276<br \/>\nExpected Answer: 276<\/p>\n<\/div>\n<p>Those were some of my experiments with the zero-shot approach for visual question answering on chart images using Pix2Struct models. Next, I decided to explore a different approach.<\/p>\n<hr>\n<h4>2. Testing with DePlot model<\/h4>\n<p>Instead of directly applying visual question answering on the charts, I began by extracting the underlying data table of the chart. Then, conducted visual question answering on the extracted data table. This approach is suggested in the <a href=\"https:\/\/arxiv.org\/pdf\/2212.10505v2.pdf\">paper<\/a> introducing the DePlot model, which is a plot-to-table conversion model. It efficiently translates the chart into a linearized table format, and the resulting table can be directly used to prompt a pretrained large language model for visual question answering tasks.<\/p>\n<h5>1. Extract the data table using DePlot<\/h5>\n<p>The DePlot model takes the chart image and the text &#8220;Generate underlying data table of the figure below:&#8221; as input and extracts the chart data into a linearized table. I have used the code provided on the model page. You can find my code snippet <a href=\"https:\/\/gist.github.com\/sravantit25\/e05e90619de224fb48c73d7b1b390d3e#file-extract_table_using_deplot-py\" rel=\"noopener\" target=\"_blank\">here<\/a>. <\/p>\n<p>Here is the output I got when I applied the DePlot model on my self-created chart.<\/p>\n<pre lang=\"python\">\r\nTITLE | Investment in AI Years | billion dollars 2017 | 60 2018 | 85 2019 | 101 2020 | 154 2021 | 276 2022 | 175\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart.png\" data-rel=\"lightbox-image-4\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart.png\" alt=\"This image shows a bar chart which shows investment in AI across different years\" width=\"600\" height=\"428\" class=\"alignnone size-full wp-image-19184\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart.png 600w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/investment_chart-300x214.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>With the chart data table obtained, I proceeded to conduct Visual Question Answering on it.<\/p>\n<h5>2. Performing Visual Question Answering on the table<span style=\"color:MediumSeaGreen; font-size:medium\"> (Good)<\/h5>\n<p>To conduct visual question answering on the table, I Large Language Model, GPT-3. Utilizing the table data as a system prompt, I provided it as context for all the questions. You can find the code snippet <a href=\"https:\/\/gist.github.com\/sravantit25\/e05e90619de224fb48c73d7b1b390d3e#file-query_gpt-py\" rel=\"noopener\" target=\"_blank\">here<\/a>.<\/p>\n<p>Below are the answers provided by the GPT-3 Language Model. The responses are more explanatory and not limited to single-word answers.<\/p>\n<div style=\"height: 500px; width: 700px; overflow-y: auto; color: white; background-color: black; font-family: 'Helvetica Neue', Arial, sans-serif;\">\n<p>Question: What is the chart type?<br \/>\nAnswer: The chart type in this case is a line chart.<br \/>\nActual answer: Bar<\/p>\n<p>Question: What is the title of the chart?<br \/>\nAnswer: The title of the chart is &#8220;Investment in AI (Artificial Intelligence) Years&#8221;<br \/>\nActual answer: Investment in AI<\/p>\n<p>Question: What is the value for the year 2018?<br \/>\nAnswer: The value for the year 2018 is 85 billion dollars.<br \/>\nActual answer: 85<\/p>\n<p>Question: What is the value for the year 2019?<br \/>\nAnswer: According to the given data, the value for the year 2019 is 101 billion dollars.<br \/>\nActual answer: 101<\/p>\n<p>Question: What is the value for the year 2017?<br \/>\nAnswer: The value for the year 2017 in terms of investment in AI is $60 billion.<br \/>\nActual answer: 60<\/p>\n<p>Question: What is the value for the year 2022?<br \/>\nAnswer: The value for the year 2022 in terms of investment in AI is 175 billion dollars.<br \/>\nActual answer: 175<\/p>\n<p>Question: For which year is the Investment highest?<br \/>\nAnswer: The highest investment in AI is in the year 2021, with a total of 276 billion dollars.<br \/>\nActual answer: 2021<\/p>\n<p>Question: Is the value for the year 2020 more than the year 2019?<br \/>\nAnswer: Yes, the value for the year 2020 (154 billion dollars) is more than the year 2019 (101 billion dollars).<br \/>\nActual answer: Yes<\/p>\n<p>Question: What is the color of the bars in the chart?<br \/>\nAnswer: The color of the bars in the chart is not specified in the given information.<br \/>\nActual answer: Green<\/p>\n<p>Question: What is the label or title of the Y-axis?<br \/>\nAnswer: The label or title of the Y-axis is &#8220;Investment in AI (billion dollars)&#8221;.<br \/>\nActual answer: value<\/p>\n<p>Question: What is the label or title of the X-axis?<br \/>\nAnswer: The label or title of the X-axis in this case is &#8220;Years&#8221;.<br \/>\nActual answer: Years<\/p>\n<p>Question: Are all the bars in the chart or graph vertical?<br \/>\nAnswer: No, not all of the bars in the graph are vertical. The data points for the years 2017, 2018, 2019, and 2020 are represented by vertical bars, but the bars for the years 2021<br \/>\nActual answer: Yes<\/p>\n<p>Question: What is the value represented by the smallest bar in the chart?<br \/>\nAnswer: The value represented by the smallest bar in the chart is 60 billion dollars.<br \/>\nActual answer: 60<\/p>\n<p>Question: What does the legend represent or display?<br \/>\nAnswer: The legend in this case represents the amount of investment in Artificial Intelligence (AI) in billions of dollars for the years 2017, 2018, 2019, 2020, 2021, and 2022. The legend<br \/>\nActual answer: billion dollars<\/p>\n<p>Question: How many billion dollars were invested in the year 2021?<br \/>\nAnswer: In the year 2021, a total of 276 billion dollars were invested.<br \/>\nActual answer: 276<\/p>\n<\/div>\n<p>This approach for first extracting the table and then performing the question answering has more probability of accurate answers. However, need to design the prompt such that it returns a single-word answer. It is important to note that this approach cannot test the visual aspects of the chart since the DePlot model does not capture them.<\/p>\n<hr>\n<h4>Conclusion<\/h4>\n<p>From my experiments, I observed that the success of Visual Question Answering (VQA) on charts also depends on how well the chart is designed. Charts with clear visual attributes, properly marked data points, and effective data representation tend to produce more accurate VQA results. Additionally, fine-tuning the models specifically for chart-related tasks might be necessary to achieve improved accuracy and reliability. Despite the challenges, exploring chart testing with VQA remains an interesting approach. With further advancements and model fine-tuning, VQA has the potential to become a valuable tool in chart testing and quality assurance.<\/p>\n<hr>\n<h4>Hire Qxf2<\/h4>\n<p><a href=\"https:\/\/www.qxf2.com\/contact?utm_source=chart_testing_vqa&#038;utm_medium=click&#038;utm_campaign=From%20blog\" rel=\"noopener\" target=\"_blank\">Hire Qxf2<\/a> to benefit from our experienced and diverse team of technical testers. We specialize in testing early-stage products built with modern technical stacks, enabling us to provide valuable insights and thorough testing. Our approach goes beyond traditional test automation, allowing us to address complex testing problems and ensure your software meets the highest standards of quality. With Qxf2 on board, you can confidently accelerate development cycles and deliver exceptional products to your customers.<\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>I tried testing charts using VQA. What that means is that I showed several charts to an AI model and made it answer questions about it. My idea was to use these answers as part of test automation. This post will show you what (sort of) worked for me and what techniques did not work. I hope people use this [&hellip;]<\/p>\n","protected":false},"author":27,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[270,362,369,370],"tags":[],"class_list":["post-19106","post","type-post","status-publish","format-standard","hentry","category-ai","category-pix2struct","category-transformers","category-vqa"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/19106","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=19106"}],"version-history":[{"count":39,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/19106\/revisions"}],"predecessor-version":[{"id":19247,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/19106\/revisions\/19247"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=19106"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=19106"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=19106"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}