Kinesis
note
Kinesis Agent cannot write to a Kinesis Firehose for which the delivery stream source is already set as Kinesis Data Streams
Kinesis Data Stream (KDS)β
- Real-time data streaming service
- Used to ingest data in real time directly from source
- Capacity Modes
- Provisioned
- Publishing: 1MB/sec per shard or 1000 msg/sec per shard
- Consuming:
- 2MB/sec per shard (throughput shared between all consumers)
- Enhanced Fanout: 2MB/sec per shard per consumer (dedicated throughput for each consumer)
- Throughput scales with shards (manual scaling)
- Pay per shard provisioned per hour
- On-demand
- No need to provision or manage the capacity
- Default capacity provisioned - 4 MB/sec or 4000 records/sec
- Scales automatically based on observed throughput peak during the last 30 days
- Pay per stream per hour & data in/out per GB
- Provisioned
- Not Serverless
- Data Retention: 1 day (default) to 365 days
- A record consists of a partition key (used to partition data coming from multiple publishers) and data blob (max 1MB)
- Records will be ordered in each shard
- Producers use SDK, Kinesis Producer Library (KPL) or Kinesis Agent to publish records
- Consumers use SDK or Kinesis Client Library (KCL) to consume the records
- Once data is inserted in Kinesis, it canβt be modified or deleted (immutability)
- Ability to reprocess (replay) data
- Diagram
Kinesis Data Firehose (KDF)β
- Used to load streaming data into a target location
- Writes data in batches efficiently (near real time)
- Buffer size (size of the batch) - 1 MB to 128MB (default 5MB)
- Buffer interval (how long to wait for buffer to fill up) - 60s to 900s (default 300s)
- Greater the buffer size, higher the write efficiency, longer it will take to fill the buffer
- Can ingest data in real time directly from source
- Auto-scaling
- Serverless
- Destinations:
- AWS: Redshift, S3, OpenSearch
- 3rd party: Splunk, MongoDB, DataDog, NewRelic, etc.
- Custom HTTP endpoint
- Pay for data going through Firehose (no provisioning)
- Supports custom data transformation using Lambda functions
- No replay capability (does not store data like KDS)
Kinesis Data Analytics (KDA)β
- Perform real-time analytics on Kinesis streams using SQL
- Creates streams from SQL query response
- Cannot ingest data directly from source (ingests data from KDS or KDF)
- Auto-scaling
- Serverless
- Pay for the data processed (no provisioning)
- Use cases:
- Time-series analytics
- Real-time dashboards
- Real-time metrics
Kinesis Video Streamsβ
- Capture, process and store video streams