{"id":13493,"date":"2020-08-27T01:51:13","date_gmt":"2020-08-27T05:51:13","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=13493"},"modified":"2024-10-17T10:19:57","modified_gmt":"2024-10-17T14:19:57","slug":"comapre-pre-defined-json-structure-with-json-object-stored-in-the-aws-s3-bucket-using-deepdiff","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/comapre-pre-defined-json-structure-with-json-object-stored-in-the-aws-s3-bucket-using-deepdiff\/","title":{"rendered":"Compare json objects in AWS S3 bucket using deepdiff"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Recently, I got a chance to work on the AWS S3 bucket, where I compared the JSON files stored in the S3 bucket with the pre-defined data structure stored as a dictionary object using deepdiff. I can&#8217;t actually replicate, the entire system, I had tested. For the blog purpose I have come up with the following prerequisites\/setup\/flow:<br \/>\n<\/span><\/p>\n<hr \/>\n<h3><b>Pre-requisite:<\/b><\/h3>\n<p>1. AWS login is required and details <code>AWS_ACCOUNT_ID<\/code>, <code>AWS_DEFAULT_REGION<\/code>, <code>AWS_ACCESS_KEY_ID<\/code>, <code>AWS_SECRET_ACCESS_KEY<\/code> in the <code>aws_configuration_conf.py<\/code> file. These are the user specific details.<\/p>\n<p>2. Create a S3 bucket <code>compare-json<\/code>.<\/p>\n<p>3. Keep json file <code>sample.json<\/code> in the S3 bucket.<\/p>\n<p>4. In the <code>samples<\/code> folder <code>expected_message.json<\/code> which will be used to compare sample.json.<\/p>\n<hr \/>\n<h3><b>Summary:<\/b><\/h3>\n<p>In the following sections, I will discuss the following steps:<\/p>\n<p>1. Create a S3 bucket.<\/p>\n<p>2. Create sample.json in the S3 bucket, which will be referred to as Key in this blog.<\/p>\n<p>3. <code>expected_message.json<\/code> is stored in the <code>samples<\/code> templates_directory.<\/p>\n<p>4. Execute Python script <code>s3_compare_json.py<\/code>. I have kept source code <a href=\"https:\/\/github.com\/rahul-bhave\/s3-python\/blob\/master\/s3_compare_json.py\">here<\/a>.<\/p>\n<hr \/>\n<h3><b>Steps:<\/b><\/h3>\n<p><strong>1<\/strong>.  Created a S3 bucket in the AWS. The article <a href=https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/user-guide\/create-bucket.html>here<\/a> will help you to create an S3 bucket in AWS.<\/p>\n<p><strong>2<\/strong>. Created <code>sample.json<\/code> in the s3 bucket, which will be referred to as Key in this blog. My <code>sample.json<\/code> look like below:<\/p>\n<pre lang=\"JSON\">\r\n{\r\n   \"PROFESSIONAL PLAYER\":true,\r\n   \"NAME\":\"NADAL\",\r\n   \"MATCHES PLAYED\":750,\r\n   \"MATCHES WON\":577,\r\n   \"MATCHES LOST\":123,\r\n   \"STATUS\":\"ACTIVE\",\r\n   \"COUNTRY\":\"ESP\",\r\n   \"TURNED PRO\":\"2000-02-29\",\r\n   \"PRICE MONEY\":{\r\n      \"AMOUNT\":8900005,\r\n      \"CURRENCY\":\"USD\"\r\n   },\r\n   \"ENDORSEMENT FEE\":{\r\n      \"AMOUNT\":400000,\r\n      \"CURRENCY\":\"INR\"\r\n   }\r\n}\r\n<\/pre>\n<p><strong>3<\/strong>. Used below <code>expected_message.json<\/code>:<\/p>\n<pre lang=\"JSON\">\r\n{\r\n   \"PROFESSIONAL PLAYER\":true,\r\n   \"NAME\":\"ABCDEF\",\r\n   \"MATCHES PLAYED\":500,\r\n   \"MATCHES WON\":400,\r\n   \"MATCHES LOST\":100,\r\n   \"STATUS\":\"ACTIVE\",\r\n   \"COUNTRY\":\"IND\",\r\n   \"TURNED PRO\":\"9999-12-31\",\r\n   \"PRICE MONEY\":{\r\n      \"AMOUNT\":1000000,\r\n      \"CURRENCY\":\"USD\"\r\n   },\r\n   \"ENDORSEMENT FEE\":{\r\n      \"AMOUNT\":500000,\r\n      \"CURRENCY\":\"USD\"\r\n   }\r\n}\r\n<\/pre>\n<p>Note that, there is a difference between some of the key values of both json, which I have kept purposefully to demo the sample code. <\/p>\n<p><strong>4<\/strong>. Written following python script <code>s3_compare_json.py<\/code> to compare the Key with the expected json format. Method <code>compare_dict<\/code> is used to compare dictionary objects created for <code>sample.json<\/code> and <code>expected_message.json<\/code>. <code>deepDiff<\/code> is used to find the difference between two dictionary objects.<\/p>\n<pre lang=\"Python\">\r\n\"\"\"\r\nThis file will contain the following  method and class:\r\n1. Compare dict method.\r\n2. S3Utilities class this has the following methods:\r\n2.a. Get Response from s3 client.\r\n2.b. Convert response into dict object.\r\n2.c. Get Response dict object\r\n2.d. Get expected dict from json stored as expected json\r\n\"\"\"\r\nimport boto3\r\nimport collections\r\nimport deepdiff\r\nimport json\r\nimport logging\r\nimport os\r\nimport re\r\nimport sys\r\nsys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))\r\nimport conf.aws_configuration_conf as aws_conf\r\nfrom pythonjsonlogger import jsonlogger\r\nfrom pprint import pprint\r\n\r\n# logging\r\nlog_handler = logging.StreamHandler()\r\nlog_handler.setFormatter(jsonlogger.JsonFormatter())\r\nlogger = logging.getLogger()\r\nlogger.setLevel(logging.INFO)\r\nlogger.addHandler(log_handler)\r\n\r\n#setting environment variable\r\nos.environ[\"AWS_ACCOUNT_ID\"]= aws_conf.AWS_ACCOUNT_ID\r\nos.environ['AWS_DEFAULT_REGION'] = aws_conf.AWS_DEFAULT_REGION\r\nos.environ['AWS_ACCESS_KEY_ID'] = aws_conf.AWS_ACCESS_KEY_ID\r\nos.environ['AWS_SECRET_ACCESS_KEY'] = aws_conf.AWS_SECRET_ACCESS_KEY\r\n\r\n# Defining method to compare dict\r\ndef compare_dict(response_dict, expected_dict):\r\n    exclude_paths = re.compile(r\"\\'TURNED PRO\\'|\\'NAME\\'\")\r\n    diff = deepdiff.DeepDiff(expected_dict, response_dict,\\\r\n        exclude_regex_paths=[exclude_paths],verbose_level=0)\r\n\r\n    return diff\r\n\r\n# class to write s3 utilities\r\nclass s3utilities():\r\n    logger = logging.getLogger(__name__)\r\n\r\n    def __init__(self, s3_bucket, key, template_directory):\r\n        # initialising the class\r\n        self.logger.info(f's3 utilities activated')\r\n        self.s3_bucket = s3_bucket\r\n        self.key = key\r\n        self.template_directory = template_directory\r\n        self.s3_client = boto3.client('s3')\r\n\r\n    def get_response(self, bucket, key):\r\n        # Get Response s3 client object\r\n        response = self.s3_client.get_object(Bucket=bucket, Key=key)\r\n\r\n        return response\r\n\r\n    def convert_dict_from_response(self,response):\r\n        # Convert response into dict object\r\n        response_json = \"\"\r\n        for line in response[\"Body\"].iter_lines():\r\n            response_json += line.decode(\"utf-8\")\r\n        response_dict = json.loads(response_json)\r\n\r\n        return response_dict\r\n\r\n    def get_response_dict(self):\r\n        # Get Response dict object\r\n        response = self.get_response(self.s3_bucket,self.key)\r\n        response_dict = self.convert_dict_from_response(response)\r\n\r\n        return response_dict\r\n\r\n    def get_expected_dict(self):\r\n        # Get expected dict from json stored as expected json\r\n        current_directory = os.path.dirname(os.path.realpath(__file__))\r\n        message_template = os.path.join(current_directory,\\\r\n            self.template_directory,'expected_message.json')\r\n        with open(message_template,'r') as fp:\r\n            expected_dict = json.loads(fp.read())\r\n\r\n        return expected_dict\r\n\r\nif __name__ == \"__main__\":\r\n    # Testing s3utilities\r\n    s3_bucket = \"compare-json\"\r\n    key = 'sample.json'\r\n    template_directory = 'samples'\r\n    s3utilities_obj = s3utilities(s3_bucket, key, template_directory)\r\n    response_dict = s3utilities_obj.get_response_dict()\r\n    expected_dict = s3utilities_obj.get_expected_dict()\r\n    diff = compare_dict(response_dict, expected_dict)\r\n    print(\"=========================================================\")\r\n    pprint(f'Actual difference between two jsons is: \\n {diff}')\r\n    print(\"=========================================================\")\r\n\r\n  \r\n<\/pre>\n<p>When I ran the script using command <code>python s3_compare_json.py<\/code>, the difference in values changed between the expected json and sample json is shown on the console. Note that, <code>TURNED PRO<\/code> and <code>NAME<\/code> are different between both jsons, but it is filtered out from the result as that has excluded in the following code snippet:<\/p>\n<pre lang=\"Python\">\r\nexclude_paths = re.compile(r\"\\'TURNED PRO\\'|\\'NAME\\'\")\r\ndiff = deepdiff.DeepDiff(expected_dict, response_dict,\\\r\n     exclude_regex_paths=[exclude_paths],verbose_level=0)\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/08\/s3_utilities-4.png\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/08\/s3_utilities-4.png\" alt=\"\" \/><\/a><\/p>\n<hr\/>\n<p>I hope you have liked the blog.\u00a0The source code is available <a href=\"https:\/\/github.com\/rahul-bhave\/s3-python\/blob\/master\/s3_compare_json.py\">here<\/a>. You can find some useful documentation about <code>deepdiff<\/code> <a href=\"https:\/\/pypi.org\/project\/deepdiff\/\">here.<\/a><\/p>\n<hr\/>\n<h3>Work with Qxf2 for top-tier startup QA<\/h3>\n<p> At Qxf2, we don\u2019t just test\u2014we help build better products. Our technical testers work alongside your developers to create sustainable QA processes that scale. Looking for quality assurance expertise that fits your startup\u2019s pace? Check out our <a href=\"https:\/\/qxf2.com\/?utm_source=deepdiff-json&#038;utm_medium=click&#038;utm_campaign=From%20blog\">startup-focused QA solutions<\/a> and get in touch with us today!<\/p>\n<hr \/>\n","protected":false},"excerpt":{"rendered":"<p>Recently, I got a chance to work on the AWS S3 bucket, where I compared the JSON files stored in the S3 bucket with the pre-defined data structure stored as a dictionary object using deepdiff. I can&#8217;t actually replicate, the entire system, I had tested. For the blog purpose I have come up with the following prerequisites\/setup\/flow: Pre-requisite: 1. AWS [&hellip;]<\/p>\n","protected":false},"author":28,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[244],"tags":[245],"class_list":["post-13493","post","type-post","status-publish","format-standard","hentry","category-aws-s3","tag-aws-s3"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/13493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/28"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=13493"}],"version-history":[{"count":69,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/13493\/revisions"}],"predecessor-version":[{"id":22936,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/13493\/revisions\/22936"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=13493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=13493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=13493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}