{"id":12848,"date":"2020-05-29T01:23:22","date_gmt":"2020-05-29T05:23:22","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=12848"},"modified":"2020-06-05T07:15:15","modified_gmt":"2020-06-05T11:15:15","slug":"collect-real-time-streaming-data-and-store-in-amazon-s3-using-amazon-kinesis-data-firehose","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/collect-real-time-streaming-data-and-store-in-amazon-s3-using-amazon-kinesis-data-firehose\/","title":{"rendered":"Collect real-time streaming data and store in Amazon S3 using Amazon Kinesis Data Firehose"},"content":{"rendered":"<p>This post will serve as a quick tutorial to understand and use Amazon Kinesis Data Firehose. As a hands-on experience, here we will learn how to host a sample website using the apache web server on the EC2 Linux instance and collect the real-time logs of the website to AWS S3 using Kinesis Data Firehose.<\/p>\n<hr \/>\n<h3>Why this post?<\/h3>\n<p>Streaming data is data that is continuously generated by various data sources. AWS has a service known as <a href=\"https:\/\/aws.amazon.com\/kinesis\/\">Amazon Kinesis<\/a>.<br \/>\nUsing this service, the real-time\u00a0continuous\u00a0data can be collected, transformed, and stored in data stores. In further, these stored data can be visualized using visualization tools.<br \/>\nExamples of real-time streaming data are Log files generated by an application, Financial stock market data, IoT device data, etc.<\/p>\n<hr \/>\n<h3>About Amazon Kinesis<\/h3>\n<p>Amazon Kinesis has 4 different capabilities:<\/p>\n<ul>\n<li><strong>Kinesis Data Streams<\/strong> \u2014 Used to capture, process, and store data streams.<\/li>\n<li><strong>Kinesis Data Firehose<\/strong> \u2014 Used to load data streams into AWS data stores.<\/li>\n<li><strong>Kinesis Data Analytics<\/strong> \u2014 Used to analyze data streams with SQL or JAVA.<\/li>\n<li><strong>Kinesis Video Streams <\/strong> \u2014 Used to capture, process, and store video streams.<\/li>\n<\/ul>\n<p>In this blog, we will see how to use Amazon Kinesis Data Firehose.<br \/>\nThe <strong>Amazon Kinesis Data Firehose<\/strong> provides the facility to capture the real-time continuous streaming data, transform the data using lambda function, and store the data in data store. The various data stores are Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk.<br \/>\nFor our example, we will be using Amazon Simple Storage (S3) as the data store.<br \/>\nTo follow along, I assume that the reader of the post has an AWS account and is familiar with basic AWS services. Make\u00a0sure the IAM user (preferably) or root user has full permission to access EC2 instance, Kinesis, IAM role, and S3.<\/p>\n<hr \/>\n<h3>Implementation<\/h3>\n<p>1) Set up the EC2 server<\/p>\n<ul>\n<li style=\"list-style-type: none;\">a. Install AWS CLI. Launch the Linux instance and <a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/ec2-instance-connect-methods.html#ec2-instance-connect-connecting-aws-cli\">connect<\/a> to the instance.<br \/>\nb. Install <a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/install-LAMP.html\">LAMP Web Server<\/a> on the Linux instance, so that using the apache web server the sample website can be hosted.<\/p>\n<pre lang=\"bash\">$ sudo yum install -y httpd24 php72 mysql57-server php72-mysqlnd<\/pre>\n<p>Verify if the webserver is installed correctly by using EC2 instance Public IP address with default port-80 on the browser (<span>Public IP<\/span>:80)<a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Check_Webser_page.png\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-12917\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Check_Webser_page-1024x367.png\" alt=\"\" width=\"1024\" height=\"367\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Check_Webser_page-1024x367.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Check_Webser_page-300x108.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Check_Webser_page-768x275.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Check_Webser_page.png 1905w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><br \/>\nc. Navigate to path: cd \/var\/www\/html<br \/>\nd. Download a sample website template. Here I am downloading a zip file from the\u00a0<a href=\"https:\/\/www.free-css.com\/free-css-templates\">site<\/a> using wget.<\/p>\n<pre lang=\"bash\">$ wget https:\/\/www.free-css.com\/assets\/files\/free-css-templates\/download\/page253\/estateagency.zip<\/pre>\n<p>e. Check if it is download in the html path using ls command.<br \/>\nf. Unzip the downloaded html template<\/p>\n<pre lang=\"bash\">$ unzip estateagency.zip<\/pre>\n<p>g. Verify if the sample website is hosted correctly by using IP address\/html template name on the browser (<span>Public IP<\/span>\/EstateAgency).<br \/>\nh. The website logs will be in the path &#8220;\/var\/log\/httpd\/access_log&#8221;. For each click and use of the website, the related logs will be collected and stored here. Now we will see how to store these continuous logs. Before proceeding, change the permission of the file, so that the file will be in\u00a0<span>readable, writable, and executable mode by any users.\u00a0<\/span><\/p>\n<pre lang=\"bash\">$ chmod 777 \/var\/log\/httpd\/access_log<\/pre>\n<\/li>\n<\/ul>\n<hr \/>\n<p>2) Setting up Amazon Kinesis Data Firehose delivery stream<\/p>\n<ul>\n<li style=\"list-style-type: none;\">a. We will create Kinesis Firehose delivery streams via the console. We can update and modify the delivery stream at any time after it has been created.<br \/>\nb. Search for Kinesis in the console and click on &#8220;Create delivery stream&#8221;<br \/>\nc. Provide a name for the delivery stream and choose &#8220;Direct PUT&#8221;<a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/DeliveryStream_Page_1.png\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-12901\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/DeliveryStream_Page_1.png\" alt=\"\" width=\"893\" height=\"626\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/DeliveryStream_Page_1.png 893w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/DeliveryStream_Page_1-300x210.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/DeliveryStream_Page_1-768x538.png 768w\" sizes=\"auto, (max-width: 893px) 100vw, 893px\" \/><\/a>d. Navigate to the next page. Keep AWS Lambda function in the disabled state. We can use the AWS Lambda function to transform records. In this example we are not doing it, we are directly collecting the logs.<br \/>\ne. On the next page keep S3 as the destination, create a new bucket, or choose an existing bucket.<br \/>\nf. On the next page, keep the Buffer interval to the minimum (60 seconds) and leave the rest things as default.<br \/>\ng. In the IAM role, click on Create new and give allow. New IAM role with required permission would create and will be assigned to this Kinesis delivery stream. Click Allow and it will return back to Kinesis screen.<a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/IAM_role.png\" data-rel=\"lightbox-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-12909\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/IAM_role.png\" alt=\"\" width=\"1363\" height=\"584\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/IAM_role.png 1363w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/IAM_role-300x129.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/IAM_role-768x329.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/IAM_role-1024x439.png 1024w\" sizes=\"auto, (max-width: 1363px) 100vw, 1363px\" \/><\/a>h. On the next page, review the details provided and create the delivery stream. Yes, you have successfully created the Kinesis Data Firehose delivery stream.<a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Created_Kinesis_data_firehose.png\" data-rel=\"lightbox-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-12910\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Created_Kinesis_data_firehose.png\" alt=\"\" width=\"1366\" height=\"587\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Created_Kinesis_data_firehose.png 1366w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Created_Kinesis_data_firehose-300x129.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Created_Kinesis_data_firehose-768x330.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Created_Kinesis_data_firehose-1024x440.png 1024w\" sizes=\"auto, (max-width: 1366px) 100vw, 1366px\" \/><\/a>i. Test the Kinesis by sending a demo data and verify if the data is sent to S3 bucket.<a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Test_Kineisis.png\" data-rel=\"lightbox-image-4\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-12911\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Test_Kineisis.png\" alt=\"\" width=\"1364\" height=\"583\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Test_Kineisis.png 1364w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Test_Kineisis-300x128.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Test_Kineisis-768x328.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/Test_Kineisis-1024x438.png 1024w\" sizes=\"auto, (max-width: 1364px) 100vw, 1364px\" \/><\/a><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/S3_Demo_data.png\" data-rel=\"lightbox-image-5\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-12908\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/S3_Demo_data.png\" alt=\"\" width=\"1362\" height=\"587\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/S3_Demo_data.png 1362w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/S3_Demo_data-300x129.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/S3_Demo_data-768x331.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2020\/05\/S3_Demo_data-1024x441.png 1024w\" sizes=\"auto, (max-width: 1362px) 100vw, 1362px\" \/><\/a><\/li>\n<\/ul>\n<hr \/>\n<p>3) Configure the Kinesis agent<\/p>\n<ul>\n<li style=\"list-style-type: none;\">a. <a href=\"https:\/\/docs.aws.amazon.com\/streams\/latest\/dev\/writing-with-agents.html\">Install<\/a> the Kinesis agent on the instance to use the Kinesis Data Firehose.\n<pre lang=\"bash\">$ sudo yum install \u2013y aws-kinesis-agent<\/pre>\n<p>b. After installing the Kinesis agent, update the json file available in the path \/etc\/aws-kinesis\/agent.json as the below code.<\/p>\n<pre lang=\"bash\">{\r\n  \"cloudwatch.emitMetrics\": true,\r\n  \"cloudwatch.endpoint\": \"monitoring.us-east-2.amazonaws.com\",\r\n  \"firehose.endpoint\": \"firehose.us-east-2.amazonaws.com\",\r\n\r\n  \"flows\": [\r\n    {\r\n      \"filePattern\": \"\/var\/log\/httpd\/access_log\",\r\n      \"deliveryStream\": \"SampleDeliveryStream\"\r\n    }\r\n  ]\r\n}<\/pre>\n<p>Make sure the &#8220;filePattern&#8221; consists of <span>the log file path<\/span> and &#8220;deliveryStream&#8221; consists of created firehose name.<br \/>\nc. Run the Kinesis agent on the instance. Here we configure the agent such that for each reboot of the system, the Kinesis agent starts to run.<\/p>\n<pre lang=\"bash\">$ sudo chkconfig aws-kinesis-agent on<\/pre>\n<\/li>\n<\/ul>\n<hr \/>\n<p>That&#8217;s it, we have successfully created a delivery stream using the Amazon Kinesis Firehose for S3. You can test by hosting the above sample website on multiple browsers or do some click activity on the website (<span>Public IP<\/span>\/EstateAgency), the related logs will be collected on the listed S3 bucket.<\/p>\n<p>I hope this blog helps you in understanding the Kinesis Data Firehose. With this example, I hope you will be able to collect real-time streaming data based on the requirement. Follow the <a href=\"https:\/\/docs.aws.amazon.com\/firehose\/latest\/dev\/what-is-this-service.html\">documentation<\/a> <span>to go more in-depth\u00a0<\/span>on the Amazon Kinesis Firehose.<\/p>\n<hr \/>\n","protected":false},"excerpt":{"rendered":"<p>This post will serve as a quick tutorial to understand and use Amazon Kinesis Data Firehose. As a hands-on experience, here we will learn how to host a sample website using the apache web server on the EC2 Linux instance and collect the real-time logs of the website to AWS S3 using Kinesis Data Firehose. Why this post? Streaming data [&hellip;]<\/p>\n","protected":false},"author":18,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38,177,234],"tags":[],"class_list":["post-12848","post","type-post","status-publish","format-standard","hentry","category-automation","category-aws","category-aws-kinesis-data-firehose"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/12848","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=12848"}],"version-history":[{"count":36,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/12848\/revisions"}],"predecessor-version":[{"id":13077,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/12848\/revisions\/13077"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=12848"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=12848"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=12848"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}