Compare json objects in AWS S3 bucket using deepdiff

Recently, I got a chance to work on the AWS S3 bucket, where I compared the JSON files stored in the S3 bucket with the pre-defined data structure stored as a dictionary object using deepdiff. I can’t actually replicate, the entire system, I had tested. For the blog purpose I have come up with the following prerequisites/setup/flow:


1. AWS login is required and details AWS_ACCOUNT_ID, AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY in the file. These are the user specific details.

2. Create a S3 bucket compare-json.

3. Keep json file sample.json in the S3 bucket.

4. In the samples folder expected_message.json which will be used to compare sample.json.


In the following sections, I will discuss the following steps:

1. Create a S3 bucket.

2. Create sample.json in the S3 bucket, which will be referred to as Key in this blog.

3. expected_message.json is stored in the samples templates_directory.

4. Execute Python script I have kept source code here.


1. Created a S3 bucket in the AWS. The article here will help you to create an S3 bucket in AWS.

2. Created sample.json in the s3 bucket, which will be referred to as Key in this blog. My sample.json look like below:

   "MATCHES WON":577,
   "MATCHES LOST":123,
   "TURNED PRO":"2000-02-29",

3. Used below expected_message.json:

   "MATCHES WON":400,
   "MATCHES LOST":100,
   "TURNED PRO":"9999-12-31",

Note that, there is a difference between some of the key values of both json, which I have kept purposefully to demo the sample code.

4. Written following python script to compare the Key with the expected json format. Method compare_dict is used to compare dictionary objects created for sample.json and expected_message.json. deepDiff is used to find the difference between two dictionary objects.

This file will contain the following  method and class:
1. Compare dict method.
2. S3Utilities class this has the following methods:
2.a. Get Response from s3 client.
2.b. Convert response into dict object.
2.c. Get Response dict object
2.d. Get expected dict from json stored as expected json
import boto3
import collections
import deepdiff
import json
import logging
import os
import re
import sys
import conf.aws_configuration_conf as aws_conf
from pythonjsonlogger import jsonlogger
from pprint import pprint
# logging
log_handler = logging.StreamHandler()
logger = logging.getLogger()
#setting environment variable
os.environ["AWS_ACCOUNT_ID"]= aws_conf.AWS_ACCOUNT_ID
os.environ['AWS_ACCESS_KEY_ID'] = aws_conf.AWS_ACCESS_KEY_ID
# Defining method to compare dict
def compare_dict(response_dict, expected_dict):
    exclude_paths = re.compile(r"\'TURNED PRO\'|\'NAME\'")
    diff = deepdiff.DeepDiff(expected_dict, response_dict,\
    return diff
# class to write s3 utilities
class s3utilities():
    logger = logging.getLogger(__name__)
    def __init__(self, s3_bucket, key, template_directory):
        # initialising the class's3 utilities activated')
        self.s3_bucket = s3_bucket
        self.key = key
        self.template_directory = template_directory
        self.s3_client = boto3.client('s3')
    def get_response(self, bucket, key):
        # Get Response s3 client object
        response = self.s3_client.get_object(Bucket=bucket, Key=key)
        return response
    def convert_dict_from_response(self,response):
        # Convert response into dict object
        response_json = ""
        for line in response["Body"].iter_lines():
            response_json += line.decode("utf-8")
        response_dict = json.loads(response_json)
        return response_dict
    def get_response_dict(self):
        # Get Response dict object
        response = self.get_response(self.s3_bucket,self.key)
        response_dict = self.convert_dict_from_response(response)
        return response_dict
    def get_expected_dict(self):
        # Get expected dict from json stored as expected json
        current_directory = os.path.dirname(os.path.realpath(__file__))
        message_template = os.path.join(current_directory,\
        with open(message_template,'r') as fp:
            expected_dict = json.loads(
        return expected_dict
if __name__ == "__main__":
    # Testing s3utilities
    s3_bucket = "compare-json"
    key = 'sample.json'
    template_directory = 'samples'
    s3utilities_obj = s3utilities(s3_bucket, key, template_directory)
    response_dict = s3utilities_obj.get_response_dict()
    expected_dict = s3utilities_obj.get_expected_dict()
    diff = compare_dict(response_dict, expected_dict)
    pprint(f'Actual difference between two jsons is: \n {diff}')

When I ran the script using command python, the difference in values changed between the expected json and sample json is shown on the console. Note that, TURNED PRO and NAME are different between both jsons, but it is filtered out from the result as that has excluded in the following code snippet:

exclude_paths = re.compile(r"\'TURNED PRO\'|\'NAME\'")
diff = deepdiff.DeepDiff(expected_dict, response_dict,\

I hope you have liked the blog. The source code is available here. You can find some useful documentation about deepdiff here.


One comment

  1. A good solution that does not require the extra cost of transferring S3 object to disk. The exclusion of the ‘Turned Pro’ was a nice icing for other uses. Thanks for sharing

Leave a Reply

Your email address will not be published.