{"id":17986,"date":"2023-04-18T09:59:51","date_gmt":"2023-04-18T13:59:51","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=17986"},"modified":"2023-04-18T09:59:51","modified_gmt":"2023-04-18T13:59:51","slug":"testing-aws-lambda-functions-with-batch-messages","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/testing-aws-lambda-functions-with-batch-messages\/","title":{"rendered":"Testing AWS Lambda Functions with Batch Messages"},"content":{"rendered":"<p>We find most testers just run the most basic tests on Lambdas. One technique that has helped us discover more bugs when testing Lambdas at <a href=\"https:\/\/qxf2.com\/?utm_source=batch_message_testing&#038;utm_medium=click&#038;utm_campaign=From%20blog\">Qxf2<\/a> is to send batch messages to the SQS that triggers the Lambda. Often times, when you (the tester) send batch messages and &#8220;something goes wrong&#8221;, you need to be able to triage the cause of error. In this post, we will outline an investigation we recently undertook. <\/p>\n<p><strong>Note:<\/strong> Not all the quick and dirty triaging tricks here will be applicable to you. So pick and choose as it suits you.<\/p>\n<h4>Background:<\/h4>\n<p>Testers can sometimes find Lambdas as black boxes. The supposed &#8220;serverless&#8221; technology works off micro VMs. You can read more about it <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/firecracker-lightweight-virtualization-for-serverless-computing\" rel=\"noopener\" target=\"_blank\">here<\/a>. When you do go through the architecture, you will realize that there are many ways in which the infrastructure itself can produce errors. For example, Firecracker microVM stays around for about 15 minutes. With concurrency, you can have multiple instances of the Lambda working. These lead to many interesting possibilities that need to be tested well.<br \/>\nWhile working with a client, one of our colleagues wanted to test out if the AWS Lambda function processes all the messages when the messages were sent in batches. She used <a href=\"https:\/\/www.npmjs.com\/package\/aws-sdk\" rel=\"noopener\" target=\"_blank\">AWS SDK<\/a> to test the Lambda by sending batch messages to the SQS queue that triggers the Lambda function. Here, she noticed that some events got triggered multiple times &#038; also the order in which the messages were sent was not preserved.<br \/>\nSince we cannot reveal any information on our client&#8217;s application,  we decided to simulate a similar scenario on one of our own internal application, the &#8220;<a href=\"https:\/\/github.com\/qxf2\/newsletter_automation\/tree\/main\/newsletter_automation\" rel=\"noopener\" target=\"_blank\">Qxf2 Newsletter Automation&#8221;<\/a> tool.<\/p>\n<h4>Why testing with Batch Messages is important:<\/h4>\n<p>Batch messaging is a common technique used to process multiple messages simultaneously. Testing your Lambda function with batch messages could help uncover vital issues in your workflow.<br \/>\nThe order of the messages might not preserved when batch messages are sent. Therefore, it is important to test if this can cause any issues in the workflow. There can also be issues like message duplication that arises from the retry policy of the Lambdas. This can lead to several issues such as increasing processing time, wastage of resources, and producing inaccurate results. This makes it essential to test this so as to  help uncover bugs in your workflow. Furthermore, it also necessary to ensure that all the messages in the batch are processed by the Lambda by testing it with large number of messages in the batch. This could help identify potential issues where some important messages or records could be missed by your Lambda function.<\/p>\n<h4>Testing with Batch Messages:<\/h4>\n<p>As we mentioned earlier, we decided to test our <em><strong><a href=\"https:\/\/github.com\/qxf2\/newsletter_automation\" rel=\"noopener\" target=\"_blank\">Newsletter Automation<\/a><\/strong><\/em> application with batch messages. <\/p>\n<p>1. The Newsletter Application has a Lambda function that is triggered by an <code>SQS<\/code>. The <code>SQS<\/code> is subscribed to an <code>SNS<\/code> topic &#8220;<strong><em>Skype Listener<\/em><\/strong>&#8221; , which actively listens to our official skype group, and broadcasts the messages sent in the group.<\/p>\n<p>2. Every time the <code>SQS<\/code> gets a message, the Lambda is triggered. The Lambda checks if the message contains a URL in it. If it does, then the URL is extracted from the message and stored in the <code>Newsletter Automation<\/code> database.<\/p>\n<figure id=\"attachment_18065\" aria-describedby=\"caption-attachment-18065\" style=\"width: 900px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/newsletter_flow_5.png\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/newsletter_flow_5-1024x244.png\" alt=\"Newsletter Automation workflow architecture.\" width=\"900\" height=\"214\" class=\"size-large wp-image-18065\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/newsletter_flow_5-1024x244.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/newsletter_flow_5-300x71.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/newsletter_flow_5-768x183.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/newsletter_flow_5.png 1139w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-18065\" class=\"wp-caption-text\">Architecture diagram of Newsletter Automation application<\/figcaption><\/figure>\n<p>3. We wrote a test in Python using the library <code><a href=\"https:\/\/pypi.org\/project\/aws-sqs-batchlib\/\" rel=\"noopener\" target=\"_blank\">aws-sqs-batchlib<\/a><\/code>, to send batch messages to an SQS queue that triggers the Lambda function.<\/p>\n<pre lang=\"python\">\r\ndef send_batch_messages():\r\n    \"Send batch messages to the SQS and verify if all messages are sent\"\r\n\r\n    # Create a list of messages to send\r\n    message_count=10\r\n    for message in range(message_count):\r\n        url = f\"https:\/\/stage-newsletter-lambda-test-{message}.com\"\r\n        articles_sent.append(url)\r\n        message_body = {\"Message\" : \"{\\\"user_id\\\": \\\".cid.f9000d4f3453e385\\\", \\\"chat_id\\\": \\\"19:1941d15dada14943b5d742f2acdb99bb@thread.skype\\\", \\\"msg\\\": \\\"\" + url + \"\\\"}\"}\r\n        messages.append({\"Id\": f'{message}', \"MessageBody\": json.dumps(message_body)})\r\n\r\n    # Send the messages to the queue\r\n    response = aws_sqs_batchlib.send_message_batch(\r\n        QueueUrl=queue_url,\r\n        Entries=messages,\r\n    )\r\n    return len(response['Successful']) == message_count\r\n<\/pre>\n<p>As, you can see in the above code snippet, we use the <code>send_message_batch()<\/code> method from the <code>aws_sqs_batchlib<\/code> library to send the batch messages to the <code>SQS<\/code>. It requires two mandatory arguments, the <code>QueueUrl<\/code> and <code>Entries<\/code>. <code>Entries<\/code> is a list of messages that you wish to send to the <code>SQS<\/code>, Each message in the list should have an <code>Id<\/code> and a<code> MessageBody<\/code><\/p>\n<p>4. We initially tested the Lambda by passing about 50 messages in the batch. Upon running this test, we noticed that not all messages sent in the batch were processed by the Lambda.<\/p>\n<p>5. We then tuned down number of messages to 10 to find out the minimum amount of messages in a batch that can be processed successfully. However, the Lambda still failed to process all of the 10 messages.<\/p>\n<p>6. In the hope to get any traces of this error, we went through the <code>AWS CloudWatch<\/code> logs to see if we could find the cause for the missing articles in the database. However, we couldn&#8217;t find any failures or errors recorded in the logs.<\/p>\n<h4>Load Testing the API endpoint with Locust:<\/h4>\n<p>On examining the Lambda metrics we noticed spikes in the Lambda invocation metrics around the time the messages were sent to the Lambda. This indicated that the Lambda was getting invoked when the messages were sent, but most of these messages were not being processed. This made us suspect that it might be an issue with the API endpoint used to post the URL&#8217;s to the database. So, we decided to load test the API endpoint to check if it is able to handle large number of requests simultaneously.<br \/>\nTo do this, we used Locust, a load testing tool that is easy to setup and get started with, to write a simple test for load testing the endpoint.<\/p>\n<p>1. We first tested the endpoint with a load of about 15 requests per second. The endpoint was able to process all the requests without any failures<br \/>\n<figure id=\"attachment_18014\" aria-describedby=\"caption-attachment-18014\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680245377.png\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680245377-1024x758.png\" alt=\"Graph depicting performance of API endpoint under a load of about 15 request per second\" width=\"900\" height=\"666\" class=\"size-large wp-image-18014\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680245377-1024x758.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680245377-300x222.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680245377-768x568.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680245377.png 1499w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-18014\" class=\"wp-caption-text\">API performance for a load of about 15 requests per second<\/figcaption><\/figure><\/p>\n<p>2. By this point we knew that the API was not the source of the problem. However, we decided to continue testing it by tuning up the load to about 56 requests per second because this eventually was going to be a question.<br \/>\n<figure id=\"attachment_18016\" aria-describedby=\"caption-attachment-18016\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680248390.png\" data-rel=\"lightbox-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680248390-1024x758.png\" alt=\"Graph depicting performance of API endpoint under a load of about 56 requests per second\" width=\"900\" height=\"666\" class=\"size-large wp-image-18016\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680248390-1024x758.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680248390-300x222.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680248390-768x568.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/03\/total_requests_per_second_1680248390.png 1499w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-18016\" class=\"wp-caption-text\">API performance for a load of about 56 requests per second<\/figcaption><\/figure><\/p>\n<p>3. Interestingly enough, the API endpoint was able to handle all the requests without a single failure. <\/p>\n<p>Now that we had eliminated the API endpoint as a suspect, we half heartedly decided to test if it was an issue with the Lambda concurrency. Well, not really. We were fairly certain that lambda concurrency would not be a problem for such a simple case. But for the sake of completeness of this blog post, we are going to pretend that we suspected lambda concurrency. <\/p>\n<h4>Testing Lambda Concurrency:<\/h4>\n<p>In order to verify if it was an issue with the Lambda concurrency, we had to invoke the Lambda directly by passing the payload to it. This invocation had to happen multiple time simultaneously.<\/p>\n<p>1. Therefore, we decided  to use threads in Python to invoke the Lambda function concurrently.<\/p>\n<p>2. To achieve this, We first wrote a function to invoke the Lambda using boto3 library in Python.<\/p>\n<pre lang=\"python\">\r\nresponse_data = []\r\n# Create a Boto3 client for AWS Lambda\r\nlambda_client = boto3.client('lambda')\r\n\r\n# Define a function to invoke the Lambda function with a given payload\r\ndef invoke_lambda(payload):\r\n    \"Method to invoke the lambda function\"\r\n    # Define the Lambda function name and its payload data\r\n    function_name = 'staging-newsletter-url-filter'\r\n\r\n    # Invoke the Lambda function with the payload data\r\n    response = lambda_client.invoke(\r\n        FunctionName=function_name,\r\n        Payload=payload,\r\n        InvocationType='Event'\r\n    )\r\n    response_data.append(response)\r\n<\/pre>\n<p>3. We then added another test method that invokes the above function concurrently based on the payload data using threads.<\/p>\n<pre lang=\"python\">\r\ndef test_invoke_lambda_concurrently():\r\n    \"Method to execute the function to invoke the lambda multiple time concurrently\"\r\n    for url_count in range(140,150):\r\n        url = f\"https:\/\/lambda-batch-test{url_count}.com\"\r\n        articles_sent.append(url)\r\n        payload_data.append(json.dumps({\r\n        \"Records\": [\r\n            {\r\n            \"body\": \"{\\\"Message\\\":\\\"{\\\\\\\"msg\\\\\\\": \\\\\\\"\" + url + \"\\\\\\\", \\\\\\\"chat_id\\\\\\\": \\\\\\\"19:1941d15dada14943b5d742f2acdb99bb@thread.skype\\\\\\\", \\\\\\\"user_id\\\\\\\":\\\\\\\".cid.f9000d4f3453e567\\\\\\\"}\\\"}\"\r\n            }\r\n        ]\r\n        }))\r\n\r\n    # Use a ThreadPoolExecutor to invoke the Lambda function concurrently with multiple payloads\r\n    with concurrent.futures.ThreadPoolExecutor() as executor:\r\n        # Invoke the Lambda function with each payload in the payload data list\r\n        executor.map(invoke_lambda, payload_data)\r\n    assert len(response_data) == len(payload_data),f\"Not all payloads were invoked\"\r\n<\/pre>\n<p>4. On running this test, as expected, the Lambda was able to process all the payloads that was sent. <\/p>\n<p>5. This helped us eliminate Lambda concurrency as the cause for the issue.<\/p>\n<p>Now that we knew neither API endpoint nor Lambda concurrency was responsible for the issue, we wanted to make sure if it wasn&#8217;t a problem with the SQS itself. <\/p>\n<h4>Testing the SQS:<\/h4>\n<p>1. In order to test the SQS, we created a dummy queue with the same configuration as the actual queue but without a Lambda trigger.<\/p>\n<p>2. We then sent the batch messages to this dummy queue and polled for messages to check if all the sent messages were present in the queue.<\/p>\n<pre lang=\"python\">\r\ndef test_receive_messages():\r\n    \"Retreive messages from the SQS and verify if all the messages sent are retrieved\"\r\n    # Receive the messages from the queue\r\n    res = aws_sqs_batchlib.receive_message(\r\n        QueueUrl = queue_url,\r\n        MaxNumberOfMessages=100,\r\n        WaitTimeSeconds=20,\r\n    )\r\n    \r\n    # Extract the message body from the recieved messages\r\n    received_messages = {msg['Body'] for msg in res['Messages']}\r\n    \r\n    for message in received_messages:\r\n        json_message = json.loads(message)\r\n        recieved_msg_body = json.loads(json_message['Message'])\r\n        #Append the message content to a list\r\n        received_articles.append(recieved_msg_body['msg'])\r\n    \r\n    #Check if all the messages sent are present in the recieved messages\r\n    assert all(item in received_articles for item in articles_sent),f\"Not all articles sent were recieved from the SQS\" \r\n<\/pre>\n<p>3. You can see in the above code snippet that we are using the <code>receive_message()<\/code> method from the <code>aws_sqs_batchlib<\/code> library to poll for the messages from the queue. Once we get the messages, we extract the message body and store it into a list. We then compare this list to the list of articles that was sent to the SQS.<\/p>\n<p>4. The above test passed indicating that all the articles sent were received from the dummy queue.<br \/>\n<figure id=\"attachment_18045\" aria-describedby=\"caption-attachment-18045\" style=\"width: 900px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/sqs_recieve_message.png\" data-rel=\"lightbox-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/sqs_recieve_message-1024x177.png\" alt=\"Batch message test results for dummy queue\" width=\"900\" height=\"156\" class=\"size-large wp-image-18045\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/sqs_recieve_message-1024x177.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/sqs_recieve_message-300x52.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/sqs_recieve_message-768x133.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/sqs_recieve_message-1536x265.png 1536w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/04\/sqs_recieve_message.png 1872w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><figcaption id=\"caption-attachment-18045\" class=\"wp-caption-text\">Batch message test passes for Queue without any triggers<\/figcaption><\/figure><br \/>\nThis test helped us confirm that SQS was not the source of the problem as well.<\/p>\n<h4>Finding the cause for the issue:<\/h4>\n<p>Having eliminated all other suspects, we were now sure the problem was with our code. So we started digging through StackOverflow for any possible hinters and yay! finally found one <a href=\"https:\/\/stackoverflow.com\/questions\/69011162\/not-all-sqs-messages-end-up-in-lambda-most-just-disappear\" rel=\"noopener\" target=\"_blank\">thread<\/a> that helped get us to the core of the issue.<\/p>\n<p>1. Our investigation had narrowed down to the way the SQS events were being handled by the Lambda function. The StackOverflow thread that we stumbled upon helped us confirm our suspicion.<\/p>\n<p>2. The problem was that, our Lambda assumes that there is only going to be a single record in an event all the time. Therefore, it was fetching only the first record from the event, blatantly ignoring the rest of the records.<\/p>\n<pre lang=\"python\">\r\ndef get_message_contents(event):\r\n    \"Retrieve the message contents from the SQS event\"\r\n    record = event.get('Records')[0]\r\n    message = record.get('body')\r\n    message = json.loads(message)['Message']\r\n    message = json.loads(message)\r\n<\/pre>\n<p>3. As you can see from the above code snippet <code>record = event.get('Records')[0]<\/code>,  we are aonly getting the first record from the event.<\/p>\n<p>4. However, when a batch messages is passed, there can be multiple records in a single event. Since the Lambda fetches only the first record, the rest of the records were not processed at all.<\/p>\n<p>Finally, after a long investigation we were able to debug the issue and get to the root cause of the problem!<\/p>\n<h4>Hire technical testers from Qxf2<\/h4>\n<p>Qxf2 employs highly technical testers. Our approach is to go well beyond standard &#8216;test automation&#8217;. From this post you must have picked up the fact that we know about the underlying architecture of AWS Lambdas. Our testers are versatile and enable teams to test better. You can reach out to us <a href=\"https:\/\/qxf2.com\/contact?utm_source=batch_message_testing&#038;utm_medium=click&#038;utm_campaign=From%20blog\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We find most testers just run the most basic tests on Lambdas. One technique that has helped us discover more bugs when testing Lambdas at Qxf2 is to send batch messages to the SQS that triggers the Lambda. Often times, when you (the tester) send batch messages and &#8220;something goes wrong&#8221;, you need to be able to triage the cause [&hellip;]<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[282,347,346,251],"tags":[],"class_list":["post-17986","post","type-post","status-publish","format-standard","hentry","category-aws-lambda","category-aws-sqs-batchlib","category-batch-message-testing","category-boto3"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/17986","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=17986"}],"version-history":[{"count":54,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/17986\/revisions"}],"predecessor-version":[{"id":18048,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/17986\/revisions\/18048"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=17986"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=17986"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=17986"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}