{"id":9564,"date":"2018-08-08T07:00:53","date_gmt":"2018-08-08T11:00:53","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=9564"},"modified":"2018-08-13T06:43:02","modified_gmt":"2018-08-13T10:43:02","slug":"exploring-wallaroo-giles-sender","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/exploring-wallaroo-giles-sender\/","title":{"rendered":"Exploring Wallaroo &#8211; Giles Sender"},"content":{"rendered":"<p>We have been exploring <a href=\"https:\/\/www.wallaroolabs.com\/\">Wallaroo<\/a>, a framework that makes it easy to handle streaming data and write event processing applications quickly. Wallaroo&#8217;s examples and documentation are excellent. They cover the core concepts well. But their documentation is sparse for certain tools that Wallaroo has developed to make it easy to independently test the developed application. One such example is the Giles Sender that is used in all the examples. This post is for folks who want to know Giles Sender better.<\/p>\n<hr>\n<h3>Overview<\/h3>\n<p>1. Setup<br \/>\n2. Generate an input binary file<br \/>\n3. Run sender and check the output<br \/>\n4. Rerun the sender<br \/>\n5. Send an exact number of messages in our input binary file<br \/>\n6. Control the batch size and interval<br \/>\n7. Repeat your messages<\/p>\n<hr>\n<p><strong>1. Setup<\/strong><\/p>\n<p>We will rely on Wallaroo&#8217;s existing <a href=\"https:\/\/docs.wallaroolabs.com\/book\/python\/writing-your-own-stateful-application.html\">alphabet<\/a> application to help us understand the Giles Sender better. To follow along, make sure you are setup with <a href=\"https:\/\/docs.wallaroolabs.com\/book\/getting-started\/docker-setup.html\">Wallaroo + Python in Docker<\/a>. The setup instructions are clear and easy to follow.  <\/p>\n<p><strong>Note:<\/strong> In the steps below, we are making you open up multiple terminal prompts &#8211; one each for the sender, receiver and the application. We are doing this to make it extremely clear about what each component is doing.<\/p>\n<p>Once you are done with the Docker setup<\/p>\n<p>a. Start the Docker container by running the command:<\/p>\n<pre lang=\"bash\">\r\ndocker run --rm -it --privileged -p 4000:4000 \\\r\n-v \/tmp\/wallaroo-docker\/wallaroo-src:\/src\/wallaroo \\\r\n-v \/tmp\/wallaroo-docker\/python-virtualenv:\/src\/python-virtualenv \\\r\n--name wally \\\r\nwallaroo-labs-docker-wallaroolabs.bintray.io\/release\/wallaroo:0.5.1\r\n<\/pre>\n<p>You should see an output similar to the image below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.20.23-PM.png\" alt=\"Wallaroo: docker run\" width=\"816\" height=\"250\" class=\"aligncenter size-full wp-image-9568\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.20.23-PM.png 816w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.20.23-PM-300x92.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.20.23-PM-768x235.png 768w\" sizes=\"auto, (max-width: 816px) 100vw, 816px\" \/><\/p>\n<p>b. Start the data receiver<\/p>\n<p>In a new terminal prompt, enter the container<\/p>\n<pre lang=\"bash\">\r\ndocker exec -it wally env-setup\r\n<\/pre>\n<p>You should see the prompt within the container. In the prompt of the Docker container, start the receiver:<\/p>\n<pre lang=\"bash\">\r\ndata_receiver --ponythreads=1 --ponynoblock \\\r\n  --listen 127.0.0.1:7002\r\n<\/pre>\n<p>Leave this terminal prompt open. Once we run our application and start our sender, you will see the result of sending messages to our alphabet application on this prompt.<\/p>\n<p>c. Start the alphabet application:<\/p>\n<p>In a new terminal prompt, as usual, enter the container<\/p>\n<pre lang=\"bash\">\r\ndocker exec -it wally env-setup\r\n<\/pre>\n<p>Then, in the prompt of the Docker container, start the application:<\/p>\n<pre lang=\"bash\">\r\nmachida --application-module alphabet --in 127.0.0.1:7010 \\\r\n  --out 127.0.0.1:7002 --metrics 127.0.0.1:5001 --control 127.0.0.1:6000 \\\r\n  --external 127.0.0.1:5050 --cluster-initializer --data 127.0.0.1:6001 \\\r\n  --name worker-name --ponythreads=1 --ponynoblock\r\n<\/pre>\n<p>You should see an output that looks something like the screenshot below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.22.22-PM.png\" alt=\"starting Wallaroo success\" width=\"809\" height=\"701\" class=\"aligncenter size-full wp-image-9569\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.22.22-PM.png 809w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.22.22-PM-300x260.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.22.22-PM-768x665.png 768w\" sizes=\"auto, (max-width: 809px) 100vw, 809px\" \/><\/p>\n<p>d. Prepare the sender:<\/p>\n<p>We will be using this terminal prompt through this blog post. In a new terminal prompt, as usual, enter the container<\/p>\n<pre lang=\"bash\">\r\ndocker exec -it wally env-setup\r\n<\/pre>\n<p>Then, in the prompt of the Docker container, go to the alphabet application&#8217;s directory<\/p>\n<pre lang=\"bash\">\r\ncd wallaroo\/examples\/python\/alphabet\/\r\n<\/pre>\n<p>This is the terminal prompt we will use in all subsequent steps.<\/p>\n<hr>\n<p><strong>2. Generate an input binary file<\/strong><\/p>\n<p>Wallaroo&#8217;s alphabet application uses a <code>votes.msg<\/code> file as input. We will produce our own input binary file with limited data. This will make it easy for us to predict the output and thereby easily associate the output result with the changes we make when calling the sender.<\/p>\n<p>The alphabet application expects 9-byte messages:<br \/>\na) a 4-byte header that has just the length of the payload<br \/>\nb) a 1-byte alphabet<br \/>\nc) a 4-byte number that represents the number of votes for the alphabet<\/p>\n<p>To make things easy for us, we will design the file to have 52 messages. The first 26 messages will be an alphabet and the number 1. The second 26 messages will be an alphabet and the number 2.<\/p>\n<p>Save the following code as (say) <code>generate_binary_votes.py<\/code> in your \/src\/wallaroo\/examples\/python\/alphabet directory.<\/p>\n<pre lang=\"python\">\r\n\"\"\"\r\nA scratch script to produce a file similar to votes.msg for Wallaroo's alphabet application\r\n\r\nDESIGN:\r\n1. Open a file fake_votes.msg in binary format\r\n2. Write 5 byte messages alphabet,vote \r\n3. Prepend a header with the number 5 (unsigned int) in it because all messages are 5 bytes\r\n4. Write this binary string to a file\r\n5. To make it easy to test for correctness, we will design the file to have only 52 messages \r\n6. The first 26 messages will be an alphabet and the number 1\r\n7. The second 26 messages will be an alphabet and the number 2\r\n\"\"\"\r\n\r\nimport string\r\nimport struct\r\n\r\ndef generate_alphabet_input(filename):\r\n    \"Generate a binary file with alphabets and votes\"\r\n    alphabets = list(string.ascii_lowercase)\r\n    with open(filename,'wb') as fp: \r\n        for i in range(1,3):\r\n            for alphabet in alphabets:\r\n                msg = struct.pack('>IsI',5,alphabet,i)\r\n                fp.write(msg)\r\n\r\n\r\n#----START OF SCRIPT\r\nif __name__=='__main__':\r\n    generate_alphabet_input('fake_votes.msg')\r\n<\/pre>\n<p>Now, run the python script to produce a new 468 byte (=52*9) binary input file called <code>fake_votes.msg<\/code>.<\/p>\n<pre lang=\"bash\">\r\npython generate_binary_votes.py\r\n<\/pre>\n<p>At the end of this step, you should have a <code>fake_votes.msg<\/code> file in your current directory.<\/p>\n<p><strong>3. Run sender and check the output<\/strong><br \/>\nLet us run sender and make sure we have a useful input file. We will set the <code>messages<\/code> argument to be 52 to send all messages in the file. This will make it clear that the messages in our file are being relayed and processed correctly. We expect every alphabet to have a vote count of 1 after the first 26 messages and a vote count of 3 (=1+2) once the second half of our input file has been processed. To do this, on your sender&#8217;s terminal prompt, run the following command:<\/p>\n<pre lang=\"bash\">\r\nsender --host 127.0.0.1:7010 --file fake_votes.msg \\\r\n  --messages 52 --binary --msg-size 9 --ponythreads=1 \\\r\n  --ponynoblock --no-write\r\n<\/pre>\n<p>Observe the output in the receiver&#8217;s terminal prompt. You may need to scroll through. You will notice the first 26 lines of the output show each alphabet with a score of 1. The next 26 messages will show each alphabet listed with a score of 3.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.57.39-PM.png\" alt=\"example output alphabet\" width=\"811\" height=\"777\" class=\"aligncenter size-full wp-image-9570\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.57.39-PM.png 811w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.57.39-PM-300x287.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-08-at-2.57.39-PM-768x736.png 768w\" sizes=\"auto, (max-width: 811px) 100vw, 811px\" \/><\/p>\n<p><strong>4. Rerun the sender<\/strong><br \/>\nWe are going to rerun the sender. Why?? To verify that the state is saved and that the votes get updated when new messages come along. <\/p>\n<pre lang=\"bash\">\r\nsender --host 127.0.0.1:7010 --file fake_votes.msg \\\r\n  --messages 52 --binary --msg-size 9 --ponythreads=1 \\\r\n  --ponynoblock --no-write\r\n<\/pre>\n<p>Observe the output in the receiver&#8217;s terminal prompt. The final values of votes for each alphabet should be 6 (=3+3). The vote count before we reran the sender was 3 and by running the sender, we added 3 (=1+2) more votes to each alphabet.<\/p>\n<p><strong>5. Send an exact number of messages in our input binary file<\/strong><br \/>\nThe sender allows you to specify exactly how many messages should be sent to the application. In this step, we will send only 26 messages even though we have 52 messages in it.<\/p>\n<pre lang=\"bash\">\r\nsender --host 127.0.0.1:7010 --file fake_votes.msg \\\r\n  --messages 26 --binary --msg-size 9 --ponythreads=1 \\\r\n  --ponynoblock --no-write\r\n<\/pre>\n<p>The first 26 messages in our file have an alphabet and the number 1. That means, we expect the vote count in our output to be 7 (=6+1) once we run the above command.<\/p>\n<p><strong>6. Control the batch size and interval<\/strong><br \/>\nSometimes, when testing our streaming application, we end up using really large files. It makes little sense to read or process the entire file all at once. So Giles Sender allows us to control the rate at which to send input messages by providing two arguments &#8211; <code>batch-size<\/code> and <code>interval<\/code>. The <code>batch-size<\/code> argument is helpful when you are reading really large input files that cannot be processed all at once. Our example is laughably small and does not lend itself to illustrating this use case well. Instead, we will make our <code>interval<\/code> really large (1-second) to illustrate that the output gets updated by a <code>batch-size<\/code> number of messages per <code>interval<\/code> nanoseconds. To do so, run the following command<\/p>\n<pre lang=\"bash\">\r\nsender --host 127.0.0.1:7010 --file fake_votes.msg   \\\r\n--batch-size 1 --interval 1_000_000_000 --messages 52 \\\r\n--binary --msg-size 9 --ponythreads=1 --ponynoblock --no-write\r\n<\/pre>\n<p>If things are going well, you will observe that every one second (or 1_000_000_000 nano-seconds), you will see one updated message on your receiver screen. If you had set <code>batch-size<\/code> to 2, you would have seen two updated messages every second. At the end of this step, the vote count for each alphabet should be 10 (=7+3).<\/p>\n<p><strong>7. Repeat your messages<\/strong><br \/>\nThere is another common use case when testing a streaming application. You need to use a small set of well-designed messages and repeat them many times. To do this, you can simply use the <code>repeat<\/code> argument. In this example, we will send 520 messages by simply using the 52 messages we created. To do so, run<\/p>\n<pre lang=\"bash\">\r\nsender --host 127.0.0.1:7010 --file fake_votes.msg \\\r\n  --messages 520 --binary --msg-size 9 --repeat \\\r\n  --ponythreads=1 --ponynoblock --no-write\r\n<\/pre>\n<p>At the end of this step, the final vote count for each alphabet should be 40 (=10 + 3*10)<\/p>\n<hr>\n<p>By creating a simple input file and then playing around with the <code>messages<\/code>, <code>batch-size<\/code>, <code>interval<\/code> and <code>repeat<\/code> arguments, we managed to understand how to use the Giles Sender better to test streaming applications written with Wallaroo.<\/p>\n<hr>\n<h3>References<\/h3>\n<p>1. <a href=\"https:\/\/docs.wallaroolabs.com\/book\/wallaroo-tools\/giles-sender.html\">Giles Sender documentation<\/a><\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>We have been exploring Wallaroo, a framework that makes it easy to handle streaming data and write event processing applications quickly. Wallaroo&#8217;s examples and documentation are excellent. They cover the core concepts well. But their documentation is sparse for certain tools that Wallaroo has developed to make it easy to independently test the developed application. One such example is the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,154],"tags":[],"class_list":["post-9564","post","type-post","status-publish","format-standard","hentry","category-python","category-wallaroo"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/9564","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=9564"}],"version-history":[{"count":21,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/9564\/revisions"}],"predecessor-version":[{"id":9629,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/9564\/revisions\/9629"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=9564"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=9564"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=9564"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}