In this post, I outline a technique to create repeatable data patterns that I have used when testing algorithms that operate on streaming data. Imagine an API endpoint that returns the value of a real time metric. Examples of such metrics would be CPU load or stock price or the glucose levels of a patient.
An application integrating with such an endpoint would typically apply a business rule on the data returned and display it to the user. In the examples above, that could be displaying an alert when CPU exceeds 80% or triggering a buy action when the stock price falls below a certain price or paging a nurse when the glucose levels crosses a certain threshold.
Background
Testing such algorithms that rely on real time data in a repeated manner poses some problems. I have seen most testers rely on natural data but that makes automating some acceptance criteria extremely tricky. Testers usually either just use the real feed or they have a snapshot of a database and do some masking and/or time-field manipulations. I feel we should also add some controlled synthetic data that can test the business rules better.
In this post, I’ll show you one way to mimic such and endpoint that can send back the data in the exact order/pattern that you design. You can get all the code used in this post at this GitHub Gist
The feature being tested
To keep things relatable, let us assume that our app queries a /glucose-level endpoint regularly, plots a graph of the last 12 values and marks the values that are above 200 in red. If you are a tester using real data, then you might actually have to hunt around for patients that have glucose levels above 200 and then check if the graph plot looks correct. While this is still a good check that you should still perform, you can add a repeatable test using synthetic data in the following manner.
Designing our test
We will be performing the following steps to get a HTTP endpoint that can return the data pattern we want, one data point at a time.
- Design the data pattern
- Create a simple flask app for this endpoint
- Start the Flask app
- Test the endpoint returns the data pattern
1. Design the data pattern
In our case, since we will be looking at a graph with 12 data points, let us design the levels to be instantly and easily recognizable. We will create a glucose_levels.csv and add values of glucose levels – one per line. The data I am adding will have a base level of ~100 for four values, ~200 for four values and ~100 for four values. There is a business rule to show a red data point if the glucose level is above 200. So, let us design our data to perfom some simple boundary value analysis. We will add values for the boundary conditions (200, 199.99 and 200.01) to the data. That way, the moment I see the graph, I know that there should be exactly one red datapoint. Zero or more than one red point will immediately tell me something is wrong with either the business rule or the UI.
97 98 99 100 199 200 200.01 199.99 105 104 103 102 |
2. Create a simple flask app for this endpoint
The code to create such an endpoint is fairly straightforward. We create a session variable to keep count of the number of times a session has called the endpoint already. Based on the call count, we return the appropriate line number within the csv file.
Note: In our example, on the first call the first line of the csv must be returned. On the 7th call, the 7th line of the csv should be returned. On the 13th call, however, the first line should be returned. So the line count should be:
line_number = session_count%num_lines_in_csv |
The code for such an endpoint looks something like this:
""" Mimick an endpoint that returns - glucose levels of a patient - as a time series - since the last call made in the session """ from flask import Flask, session app = Flask(__name__) app.secret_key = b'DONT make your key public unless it is part of a throwaway example!' with open('glucose_levels.csv','r') as fp: GLUCOSE_DATA = fp.readlines() @app.route('/glucose-level') def glucose_levels(): "Return glucose levels" session_count = session.get('count', 0) line_number = session_count%len(GLUCOSE_DATA) data = GLUCOSE_DATA[line_number] session['count'] = session_count + 1 return data #----START OF SCRIPT if __name__=='__main__': app.run('0.0.0.0',debug=True,port=6464) |
3. Start the Flask app
You can start the application by simply running python example_streaming.py
in a terminal. Make sure you have pip install Flask
first.
4. Test the endpoint returns the data pattern
To test if this works, you can simply visit http://localhost:6464/glucose-level.
You can also check this using Python. With the above app running in a terminal, open another terminal and try:
import requests session_handle = requests.Session() for i in range(1,15): response = session_handle.get('http://localhost:6464/glucose-level') print(response.content) |
If all goes well, you should see the data pattern we designed repeating itself!
I’d like to end by emphasizing a point – use this technique of synthetic data in addition to what you are already doing with natural data. It is not a complete replacement. But it is an excellent way to setup your UI tests and also perform some specific/controlled tests on algorithms that rely on streaming data. Hope this helped!
I want to find out what conditions produce remarkable software. A few years ago, I chose to work as the first professional tester at a startup. I successfully won credibility for testers and established a world-class team. I have lead the testing for early versions of multiple products. Today, I run Qxf2 Services. Qxf2 provides software testing services for startups. If you are interested in what Qxf2 offers or simply want to talk about testing, you can contact me at: [email protected]. I like testing, math, chess and dogs.