Lab Details
This lab shows you how to create a real-time data streaming system with Amazon Kinesis Data Streams.
This lab will be practiced using Amazon Kinesis and AWS Lambda as well as Amazon S3 (IAM) services.
Time : 1 hour and 20 minutes
AWS Region: US East (N. Virginia). us-east-1
Introduction
Amazon Kinesis Data Streams
Data streaming technology allows customers to access, process, and analyze large amounts of data from many sources.
Kinesis data streams are a scalable, durable and reliable real-time streaming data service.
Kinesis data streams are a sequence of data records that can be written to or read from in real-time.
Pricing for data streams is per-shard.
Components
Data record - A unit of data that Kinesis Data Stream stores.
Data stream - A group of data records. Data records within a data stream can be broken down into shards.
Retention period – Length of the time Data records can be accessed from streams. Kinesis data streams store records for 24 hours default and up to 365 day.
Kinesis Client Library – Ensures there is a record processor running for each shard.
Producer converts data records into shards.
Data records are obtained from shards by a consumer
Shard - A sequence of data records that are part of a stream.
You can have more than one shard. When creating the data stream, it is necessary to specify how many shards are required.
The total capacity of a stream is the sum all its shards.
Ingest rate per shard - 1 MB or 1,000 messages per second.
Data read rate per shard - 2 MB per second.
Partition Key – Used to group data within a stream by shard.
The stream records can be sent directly to services such as ElasticSearch, Redshift, and S3. Instead of creating consumer apps.
Case Study
An application that uploads text files into S3 Bucket is available.
When a file is uploaded into the S3 Bucket it will trigger a lambda function.
The lambda function acts as a data producer. It reads data from the S3 bucket and pushes it to the Kinesis data stream.
Two consumers consume data from the stream.
Data can be used by consumers to do many different things.
Let's say that the consumer has the ability to read the data and can send an email to clients with the information. The data can also be published on social media platforms, or saved in the database.
This lab will log the data and verify it.