{"id":17803,"date":"2023-03-24T09:40:19","date_gmt":"2023-03-24T13:40:19","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=17803"},"modified":"2023-04-02T09:36:41","modified_gmt":"2023-04-02T13:36:41","slug":"investigating-data-loss-in-qxf2s-newsletter-automation","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/investigating-data-loss-in-qxf2s-newsletter-automation\/","title":{"rendered":"Investigating Data Loss: A Forensic Approach"},"content":{"rendered":"<p>Testing these days spills onto production too. Testers from <a href=\"https:\/\/qxf2.com\/?utm_source=Investigating_newsletter_delete_Debezium&#038;utm_medium=click&#038;utm_campaign=From%20blog\">Qxf2<\/a> have been able to show value by knowing how to trace transactions across multiple forensic tools. This post is for testers who have always wanted some insight into how to investigate issues in production but never had the opportunity.<\/p>\n<h4>Background<\/h4>\n<p>Testing in production <i>also<\/i> is a good idea. We were uncomfortable about the idea when <a href=\"https:\/\/copyconstruct.medium.com\/testing-microservices-the-sane-way-9bb31d158c16\">this article<\/a> came out. But that was because we were pretty inexperienced at testing loosely coupled microservices. After having wrestled with these systems over the last 2-3 years, we wholeheartedly embraced the concept of testing in production also. These are habits that are easy enough to pick up once you have the opportunity. But most testers rarely get the opportunity to learn about such practices. So, we simulated a story for this post using one of our internal applications. Testers interested in this kind of detective work can follow along.<\/p>\n<h4>The Problem:<\/h4>\n<p>At <a href=\"https:\/\/qxf2.com\/?utm_source=Investigating_nesletter_delete_Debezium&#038;utm_medium=click&#038;utm_campaign=From%20blog\">Qxf2<\/a>, we use the <a href=\"https:\/\/github.com\/qxf2\/newsletter_automation\" rel=\"noopener\" target=\"_blank\">Newsletter Automation Application<\/a> to streamline the process of sending out newsletters to our subscribers. The application is built using Python Flask and uses a MySQL database.  This app allows users to create new articles and add details such as the article title, description, and estimated reading time for inclusion in the newsletter. However, recently, we encountered an issue , where a user reported that they couldn&#8217;t find the articles they had added. This was a major concern for our team as it threatened the integrity of our newsletter distribution process. In response, we employed a range of forensic tools that we had set up to monitor the application to investigate the issue. In this post, we will detail the steps we took to identify and troubleshoot the problem<\/p>\n<h4>Forensic tools used to monitor the application:<\/h4>\n<p>As a software testing firm, Qxf2 often uses a range of tools to ensure the quality of the software. Our in-house Newsletter Automation Application is no exception this. We have set up a number of tools to analyze and monitor the application. Some of these tools included:<\/p>\n<ol>\n1. <strong><a href=\"https:\/\/debezium.io\/\" rel=\"noopener\" target=\"_blank\">Debezium<\/a>:<\/strong> Debezium is an open-source tool that helps you capture changes in your database. Essentially, it&#8217;s like having a camera on your database, recording everything that happens so you can easily track down issues when they arise. With Debezium, you can quickly identify what changed, when it changed, making it a super useful change data capture tool .<\/p>\n<p>2. <strong><a href=\"https:\/\/github.com\/obsidiandynamics\/kafdrop\" rel=\"noopener\" target=\"_blank\">Kafdrop<\/a>:<\/strong> Kafdrop is a cool tool used to view data stored in Kafka in a readable format. It&#8217;s like a magic window that allows you to see Kafka messages and inspect their contents without having to write code. <\/p>\n<p>3. <strong><a href=\"https:\/\/prometheus.io\/\" rel=\"noopener\" target=\"_blank\">Prometheus<\/a>:<\/strong> Prometheus is a software that helps monitor metrics from different systems. It&#8217;s commonly used to keep track of the health and performance of various applications. For our application, we have Prometheus setup to monitor Debezium metrics that captures the data changes in the newsletter automation database.<\/p>\n<p>4. <strong><a href=\"https:\/\/grafana.com\/\" rel=\"noopener\" target=\"_blank\">Grafana<\/a>:<\/strong> Grafana is an open source tool that can visualize data in a way that&#8217;s easy to understand. It can take data from various sources like Prometheus, MySQL etc. and display it as charts or graphs. We have setup Grafana for the newsletter automation project and configured it with the Prometheus data source that monitors metrics from Debezium. <\/p>\n<p>5. <a href=\"https:\/\/aws.amazon.com\/cloudwatch\/\" rel=\"noopener\" target=\"_blank\"><strong>AWS CloudWatch<\/strong><\/a>: CloudWatch is a monitoring and logging service provided by Amazon Web Services. Basically, it lets you keep track of your AWS applications in real-time.\n<\/ol>\n<p>Thanks to these tools, we were able to resolve the issue and get to the root cause of the problem efficiently.<\/p>\n<h4>Verifying the addition of articles through CloudWatch<\/h4>\n<p>As the first step in investigating this issue, we wanted to make sure that the articles were indeed added to the database.<\/p>\n<ol>\n1. At Qxf2, a Skype listener passes messages to a (AWS) Lambda that extracts URLs from the messages if they are present. The lambda then stores the extracted URLs in the Newsletter automation database.<\/p>\n<p>2. In order to validate the user&#8217;s claim that the articles were added, we examined logs from AWS CloudWatch to check if the lambda had picked up the articles.<\/p>\n<p>3. In our case, The user had reported missing two articles with URL&#8217;s <code>https:\/\/levelup.gitconnected.com\/rust-unit-testing-in-a-nutshell-aaa053ed425a<\/code> and <code>https:\/\/sdtimes.com\/test\/ai-in-api-and-ui-software-test-automation<\/code>, which they had claimed to have added on <code>March 7, 2023<\/code>.<\/p>\n<p>4. Based on this information, We located the logs for the time frames specified by the user when they claimed they had added the articles.<\/p>\n<p>5. We were able to see the articles processed by the lambda in the logs, confirming that the article URL was picked up by the lambda and saved in the newsletter automation database.<br \/>\n<figure id=\"attachment_17842\" aria-describedby=\"caption-attachment-17842\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloudwatch_merged-scaled.jpg\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloudwatch_merged-1024x153.jpg\" alt=\"Cloud watch logs showing the articles processed by the lambda\" width=\"900\" height=\"134\" class=\"size-large wp-image-17842\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloudwatch_merged-1024x153.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloudwatch_merged-300x45.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloudwatch_merged-768x115.jpg 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloudwatch_merged-1536x229.jpg 1536w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloudwatch_merged-2048x306.jpg 2048w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-17842\" class=\"wp-caption-text\">AWS CloudWatch logs showing the article URL&#8217;s processed by the URL filtering lambda<\/figcaption><\/figure>\n<\/ol>\n<h4>Using Grafana &#038; Prometheus to identify the Delete Events<\/h4>\n<p>As the next step in our investigation, we analyzed the Grafana graph that keeps track of the DELETE events. <\/p>\n<ol>\n1. The graph is set up with the Prometheus datasource and runs following metrics query to track all the delete operations:<\/p>\n<pre lang=\"promql\">\r\ndebezium_metrics_TotalNumberOfDeleteEventsSeen{context=\"streaming\", instance=\"debezium:8080\", job=\"debezium\", name=\"dbserver1\", plugin=\"mysql\"} \r\n<\/pre>\n<p>2. Using this graph, we were able to determine if any delete operations had taken place since the time the articles were added. <\/p>\n<p>3. To track the delete operations, we filtered the graph by specifying a custom date range. The date range spanned from the date when the articles were added to the date they were reported missing. <\/p>\n<p>4. By doing so, we were able to isolate all the delete operations that occurred within the given timeframe.<br \/>\n<figure id=\"attachment_17851\" aria-describedby=\"caption-attachment-17851\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/delete_graph.png\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/delete_graph-1024x377.png\" alt=\"Monitoring delete events using Grafana\" width=\"900\" height=\"331\" class=\"size-large wp-image-17851\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/delete_graph-1024x377.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/delete_graph-300x110.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/delete_graph-768x283.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/delete_graph-1536x565.png 1536w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/delete_graph.png 1801w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-17851\" class=\"wp-caption-text\">Graph monitoring the delete operations in Grafana<\/figcaption><\/figure><\/p>\n<p>5. As you can see from the above image there were three delete operations recorded within the specified time period. Two of the delete operations took place at <code>2023-03-08 11:46<\/code> while another delete operation was performed on <code>2023-03-08 13:11<\/code>\n<\/ol>\n<p>Next, we&#8217;ll see how further investigation was carried out to pinpoint the exact timestamps when the reported article where deleted.<\/p>\n<h4>Verifying the delete operations with Kafdrop<\/h4>\n<p>We need a way to visualize the change data captured by Debezium, right?. To solve this , we have set up Kafdrop with the newsletter application to view and analyze data changes captured by Debezium.<\/p>\n<ol>\n1. After analyzing the delete events graph in Grafana and visualizing the change data in Kafdrop, we cross-checked the timestamps of the delete operations. This allowed us to pinpoint the exact time when the articles were deleted and obtain the before and after data.<\/p>\n<p>2. However, since Kafka Streams only support UTC timestamps, we first had to convert the timestamps fetched from Grafana to UTC time. Therefore in our case, the UTC time stamps of the delete operations were <code>2023-03-08 06:15<\/code> and <code>2023-03-08 07:41<\/code>, respectively.<\/p>\n<p>3. By analyzing the data that was changed during these timestamps, we figured out that both the intended articles were deleted simultaneously one after the other at UTC <code>2023-03-08 06:15<\/code>.<br \/>\n<figure id=\"attachment_17853\" aria-describedby=\"caption-attachment-17853\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop.png\" data-rel=\"lightbox-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-1024x568.png\" alt=\"Change data visualization\" width=\"900\" height=\"499\" class=\"size-large wp-image-17853\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-1024x568.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-300x166.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-768x426.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop.png 1448w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-17853\" class=\"wp-caption-text\">Change data captured by Debezium visualized using Kafdrop<\/figcaption><\/figure>\n<\/ol>\n<p>Following this, we are also able to look at the actual data that was deleted.<\/p>\n<h4>Viewing the Deleted Data<\/h4>\n<p>Now that we confirmed the timestamps of the delete operations, we wanted to look at the actual data record that was removed too.<\/p>\n<ol>\n1. Fortunately, Debezium stores both the before and after data for each change, so we were easily able to look at the data before the change was made to it.<\/p>\n<p>2. In addition, Kafdrop also helped us by formatting this data in a well-readable form. By expanding the Kafka messages in Kafdrop ,we were able to see the data before changes were made.<br \/>\n<figure id=\"attachment_17854\" aria-describedby=\"caption-attachment-17854\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-AI.png\" data-rel=\"lightbox-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-AI-1024x229.png\" alt=\"Data before deletion of first article\" width=\"900\" height=\"201\" class=\"size-large wp-image-17854\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-AI-1024x229.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-AI-300x67.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-AI-768x172.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/kafdrop-AI.png 1333w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-17854\" class=\"wp-caption-text\">Data before and after deletion of first article<\/figcaption><\/figure><\/p>\n<figure id=\"attachment_17855\" aria-describedby=\"caption-attachment-17855\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/Kafdrop-Rust.png\" data-rel=\"lightbox-image-4\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/Kafdrop-Rust-1024x220.png\" alt=\"Data before deletion of first article\" width=\"900\" height=\"193\" class=\"size-large wp-image-17855\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/Kafdrop-Rust-1024x220.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/Kafdrop-Rust-300x65.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/Kafdrop-Rust-768x165.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/Kafdrop-Rust.png 1371w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-17855\" class=\"wp-caption-text\">Data before and after deletion of second article<\/figcaption><\/figure>\n<p>Finally, to wrap up our investigation, we had to find out how the articles got deleted.\n<\/ol>\n<h4>Finding the Culprit:<\/h4>\n<p>We had to find whether the deletion was manually performed by a user, or if the test suite for the newsletter automation application was responsible for removing the articles.<\/p>\n<ol>\n1. To determine the source of the delete operation, we revisited the AWS Cloudwatch logs to investigate the timeframe when the articles were deleted.<\/p>\n<p>2. We know that the articles were deleted around the time <code>2023-03-08 11:46<\/code>. So, We filtered the CloudWatch logs based on that timeframe.<br \/>\n<figure id=\"attachment_17876\" aria-describedby=\"caption-attachment-17876\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloud_watch_confirmation-1.png\" data-rel=\"lightbox-image-5\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloud_watch_confirmation-1-1024x339.png\" alt=\"Cloudwatch logs around the time the reported articles were deleted, to check if any inserts happened simultaneously with the deletion\" width=\"900\" height=\"298\" class=\"size-large wp-image-17876\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloud_watch_confirmation-1-1024x339.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloud_watch_confirmation-1-300x99.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloud_watch_confirmation-1-768x255.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/cloud_watch_confirmation-1.png 1478w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-17876\" class=\"wp-caption-text\">AWS Cloud watch logs showing the insertion of articles to the database around the timeframe the reported articles were deleted<\/figcaption><\/figure><\/p>\n<p>4. We then examined the log that was created at <code>2023-03-08 11:30<\/code>, which can be seen the above image, to check if any article insertion step from the test suite had been executed. But we found none.<\/p>\n<p>5. As a result, we were able to confirm that the test suite for the newsletter automation app was not responsible for the delete operation because none of the steps from the test suite to add new articles were executed at the time the articles were deleted. <\/ol>\n<p>This helped us to conclude that the delete was manually performed by a user.<\/p>\n<h4>Hire Qxf2: Modern testing for modern stacks<\/h4>\n<p>Qxf2 facilitates modern testing of modern stacks. Our versatile engineers excel at testing and love using tools. If you want technically inclined testers to join your project, reach out to Qxf2 via <a href=\"https:\/\/qxf2.com\/contact?utm_source=Investigating_newsletter_delete_Debezium&#038;utm_medium=click&#038;utm_campaign=From%20blog\">this simple form<\/a>.<\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>Testing these days spills onto production too. Testers from Qxf2 have been able to show value by knowing how to trace transactions across multiple forensic tools. This post is for testers who have always wanted some insight into how to investigate issues in production but never had the opportunity. Background Testing in production also is a good idea. We were [&hellip;]<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[345,333,334,336,335],"tags":[],"class_list":["post-17803","post","type-post","status-publish","format-standard","hentry","category-cloudwatch-aws","category-debezium","category-grafana","category-kafdrop","category-prometheus"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/17803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=17803"}],"version-history":[{"count":58,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/17803\/revisions"}],"predecessor-version":[{"id":17900,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/17803\/revisions\/17900"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=17803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=17803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=17803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}