Getting started with Wallaroo

We used to think writing (near) real-time applications that process multiple data streams was for the high IQ crowd and well-funded teams. This belief was probably strengthened by the fact that we (at Qxf2) love Python … and our favorite language lacked good complex event processors, stream processors. So we were excited to discover Wallaroo has set out to fill that gap.

Wallaroo describes itself as “a fast, elastic data processing engine that rapidly takes you from prototype to production by eliminating infrastructure complexity”. The charm for us lay in the fact that it could do stream processing and gave us an easy way to designate nodes, write business logic for them and define the structure of a distributed, stream processing application.

We decided to explore Wallaroo. We used a Docker image with sample applications that Wallaroo Labs published. We were able to do the tutorials fairly easily. In this post, we will walk you through one of the examples Wallaroo Labs has provided.

Wait! Are we using somebody else’s code for a post?

Yeah, we will be using Wallaroo Labs’s Docker image and their own code in this post. So no new code from our end. We know that seems lazy but we have a few good reasons:

a) The excellent posts on Wallaroo Labs’ blog are written for a far more technical audience. But we know there is a class of engineers who want to first see something work before they actually spend time learning something new. This post is meant for such engineers.

b) Google is struggling to rank Wallaroo Labs’ material well. So there is no simple “30-minute guide to Wallaroo” that a reader can scan and grok without having to try things out.

c) This post helped us articulate our understanding (or lack thereof!) of Wallaroo


Getting started with Wallaroo using Docker

In this blog, we will show you how to set up Wallaroo environment using Docker image and then run a sample “Celsius to Fahrenheit” application. We will be using a Linux box. Below are the steps involved

A) Install Docker Community Edition(CE)
B) Get the official Wallaroo Docker image
C) Run a Wallaroo application in Docker
D) Start the Metrics UI
E) Run Giles Receiver
F) Run the “Celsius to Fahrenheit” application
G) Send Data to the application
H) Open “Celsius To Fahrenheit” application
I) Shut down the Cluster
J) Shut down Giles Sender and Giles Receiver
K) Shut down the Metrics UI
L) Shut down the Wallaroo container


A) Install Docker Community Edition(CE)

Get Docker Community Edition (CE) to set up Docker in your system so that we can install the Wallaroo Docker image.

B) Get the official Wallaroo image

Once Docker is installed on your system pull the official Wallaroo image.

Open a new terminal and run the following command.

docker pull wallaroo-labs-docker-wallaroolabs.bintray.io/release/wallaroo:0.3.1

Wallaroo Labs uses a number of names/words to describe the various components of Wallaroo. We are beginners but we will take a shot at giving you some sort of idea (even if it is inaccurate) about the different components. Following things are included in the Docker image :

  • Machida: this is Wallaroo’s run time Python environment. Technology that makes it easy to write applications that handle distributed data streams understandably needs its own run-time environment.
  • Giles Sender: mimics an incoming data stream and supplies data to Wallaroo applications over TCP.
  • Giles Receiver: mimics a data sink and receives data from Wallaroo over TCP.
  • Cluster Shutdown tool: notifies the cluster to shut down cleanly.
  • Metrics UI: receives and displays metrics for running Wallaroo applications.
  • Wallaroo Source Code: full source code is provided, including Python example applications.

C) Run a Wallaroo application in Docker

Run the Wallaroo application in docker by using the following command in new terminal.

docker run --rm -it --privileged -p 4000:4000 \
-v /tmp/wallaroo-docker/wallaroo-src:/src/wallaroo \
-v /tmp/wallaroo-docker/python-virtualenv:/src/python-virtualenv \
--name wally \
wallaroo-labs-docker-wallaroolabs.bintray.io/release/wallaroo:0.3.1

D) Start the Metrics UI

Start the Metrics UI to receive and display metrics for running Wallaroo application. Open a new terminal and run the following commands.

1. Enter the Wallaroo Docker container

docker exec -it wally environment-setup.sh

2. Start the Metrics UI

metrics_reporter_ui start

Verify it started up correctly by visiting http://localhost:4000

E) Run Giles Receiver

Run Giles Receiver to receives data from Wallaroo over TCP. Open a new terminal and run the following commands.

1. Enter the Wallaroo Docker container

docker exec -it wally environment-setup.sh

2. listen for data from Wallaroo application

receiver --listen 127.0.0.1:5555 --no-write --ponythreads=1 --ponynoblock

You should see the line Listening for data that indicates that Giles receiver is up and running.

F) Run the “Celsius to Fahrenheit” application

To run the “Celsius to Fahrenheit” application. This is a stateless application that takes a floating point Celsius value and sends out a floating point Fahrenheit value.Open a new terminal and run the following commands.

1. Enter the Wallaroo Docker container

docker exec -it wally environment-setup.sh

2. Go to the python Celsius example directory

cd /src/wallaroo/examples/python/celsius

3. Run the celsius to fahrenheit application

machida --application-module celsius --in 127.0.0.1:7000 \
 --out 127.0.0.1:5555 --metrics 127.0.0.1:5001 --control 127.0.0.1:6000 \
 --data 127.0.0.1:6001 --name worker-name --external 127.0.0.1:5050 \
 --cluster-initializer --ponythreads=1 --ponynoblock

This tells the “Celsius to Fahrenheit” application that it should listen on port 7000 for incoming data, write outgoing data to port 5555, and send metrics data to port 5001.

G) Send Data to the application

To send the data to the application, open a new terminal and run the following commands.

1. Enter the Wallaroo Docker container

docker exec -it wally environment-setup.sh

2. Start the sender with the following command

sender --host 127.0.0.1:7000 --messages 25000000 --binary \
--batch-size 50 --interval 10_000_000 --repeat --no-write \
--msg-size 8 --ponythreads=1 --ponynoblock \
--file /src/wallaroo/examples/python/celsius/celsius.msg

A pre-generated data file will repeatedly send messages via Giles Sender until application reach 25,000,000 messages.

If the sender is working correctly, you should see Connected printed to the screen.

H) Open “Celsius To Fahrenheit” application

To open the application “Celsius To Fahrenheit” in browser visit http://localhost:4000. You can look at different metrics related to pipeline, worker and computations by clicking on the hyperlinks. The metric stats will get updated as data is processed through the application

I) Shut down the Cluster

To shut down the cluster cleanly, open a new terminal and run the following commands.

1. Enter the Wallaroo Docker container

docker exec -it wally environment-setup.sh

2. Shut down the cluster

cluster_shutdown 127.0.0.1:5050

J) Shut down Giles Sender and Giles Receiver

Press Ctrl-c from Giles Sender and Giles Receiver shells.

K) Shut down the Metrics UI

metrics_reporter_ui stop

L) Shut down the Wallaroo container

docker stop wally

Getting set up with Wallaroo using Docker was fairly simple. Our next step is to try and build an application using Wallaroo. Stay tuned for further updates…

If you liked what you read, know more about Qxf2.


References:

Here are some useful references which we followed.
1. Setting Up Your Environment for Wallaroo in Docker
2. Run a Wallaroo Application

Leave a Reply

Your email address will not be published. Required fields are marked *