Skip to content
Ayhan Sipahi Ayhan Sipahi

AWS Kinesis: Four Services Behind One Name

Kinesis is four AWS services, not one. A guide to the four, the Data Streams shard engine underneath, its cost shape, and when to pick something else.

“Kinesis” is one of the most overloaded names in AWS. It is not a single product but a brand covering four separate services that solve different problems, so a team that picks “Kinesis” from a console dropdown often discovers later that it cannot do what they assumed. When an engineer says “we’ll put it on Kinesis,” they almost always mean Data Streams: the raw, shard-based, ordered, replayable log. The other three are a delivery pipe, a stream-processing engine, and a video service. What follows disambiguates the four, then goes deep on the Data Streams mechanics, cost shape, and decision boundaries you need before you commit.

The four services, disambiguated

Two of the four still carry the “Kinesis” prefix; the other two were renamed away from it but kept their old APIs. Here is each one in a paragraph, with the single job it exists for.

Amazon Kinesis Data Streams is the raw real-time data stream. It is shard-based, ordered within a shard, and replayable within a retention window. You write the producers and the consumers; AWS manages durability and (optionally) capacity. This is the service people mean when they say “Kinesis” without qualifying it, and the deep-dive subject below.

Amazon Data Firehose (formerly Amazon Kinesis Data Firehose, renamed 2024-02-09) is a fully managed delivery and ETL pipe. It captures, optionally transforms, and delivers streams into sinks like S3, Redshift, OpenSearch, Splunk, and Snowflake. There are no shards to manage and delivery is buffered: you wait for a buffer interval or size to fill, not for sub-second arrival. Reach for it when the requirement is “I just want this data to land in S3 or a warehouse.” Firehose can even consume a Data Stream as its source, so the two compose rather than compete.

Amazon Managed Service for Apache Flink (formerly Amazon Kinesis Data Analytics, renamed 2023-08-30) is serverless Apache Flink for stateful stream processing: windowed aggregations, joins, and SQL or notebook-based analytics over streams. Reach for it when you need to run compute over the stream, not just move it.

Amazon Kinesis Video Streams ingests, stores, and processes media (video, audio, and other time-encoded data) for playback, analytics, and machine learning. It has a separate API surface and pricing from Data Streams. It is the odd one out: media, not JSON records.

One detail trips people up. The 2024 and 2023 renames touched only the console, documentation, and marketing. The APIs, CLI commands, IAM action names, and CloudWatch metric namespaces kept their old identifiers. So aws firehose and the kinesisanalytics API names still look “wrong” against the new console labels. That is expected, not a bug.

ServiceIngestsWho manages partitioningTypical outputSay this when you mean it
Kinesis Data StreamsJSON or binary recordsYou (partition key, shards)Your consumers”ordered, replayable real-time stream”
Amazon Data FirehoseRecords (or a Data Stream)Fully managedS3, Redshift, OpenSearch, Snowflake”just land this in a sink”
Managed Service for Apache FlinkA stream (Kinesis, Kafka)Fully managedAggregates, derived streams, sinks”run compute over the stream”
Kinesis Video StreamsMedia (video, audio)Fully managedPlayback, ML, media analytics”ingest and process video”

Everything below is about the first row.

How Data Streams works: shards and capacity

A Data Stream is a set of shards. The shard is the unit of capacity, and it has hard limits. Per shard, in provisioned mode, you can write up to 1 MB/s or 1,000 records/s (whichever ceiling hits first) and read up to 2 MB/s or 2,000 records/s through GetRecords, with that read budget shared across all shared-throughput consumers. Capacity scales linearly with shard count: ten shards give ten times the throughput. Sizing a stream means turning your expected throughput into a shard count.

A few API limits matter for sizing. A single record’s data blob can be up to 10 MiB (raised from the old 1 MB cap); Kinesis uses burst capacity for intermittent 1-10 MiB records. Do not confuse this per-record cap with the per-shard-per-second write rate; “1 MB” is the throughput ceiling, not the record-size limit. PutRecords batches up to 500 records or 10 MiB per request. GetRecords returns up to 10,000 records or 10 MB per call, with 5 read transactions/s per shard; if one call returns the full 10 MB, the next calls within the following 5 seconds throw ProvisionedThroughputExceededException.

Partition key, hash, and shard placement

When a producer writes a record, it supplies a partition key (a string). Kinesis MD5-hashes that key into a 128-bit integer hash-key space. Each shard owns a contiguous range of that space, so a given key always lands on the same shard. This is the rule that makes ordering work: records are ordered within a shard, that is, per partition key, never across the whole stream.

low range

mid range

high range

Partition key (string)

MD5 to 128-bit hash

Which range owns the hash?

Shard 1

Shard 2

Shard 3

If you ever need to force two different keys onto the same shard, you can override placement with an ExplicitHashKey. That is a deliberate co-location tool, not an everyday setting.

The hot shard problem

Because placement follows the key, a skewed key distribution sends disproportionate traffic to one shard. That shard throttles (WriteProvisionedThroughputExceeded) while the stream looks under-utilised in aggregate, so dashboards that watch only stream-level totals show plenty of headroom. This is the single most common Data Streams design mistake. A partition key like country or tenant_id looks reasonable until one country or one large tenant dominates. The fix is a higher-cardinality, better-distributed key. If you genuinely need per-tenant ordering, accept that a hot tenant needs its own shard budget, and size for the busiest key rather than the average.

Resharding: split and merge

You change capacity in provisioned mode by resharding. There are two primitives. SplitShard turns one shard into two children, raising capacity and cost. MergeShards combines two adjacent shards into one child, lowering both. Splits and merges are always pairwise. Parent shards close, and the children inherit the hash range. For provisioned scaling, UpdateShardCount is the higher-level call that figures out the splits and merges for you. On-demand mode does all of this automatically, so resharding APIs are a provisioned-mode concern.

One ordering rule constrains consumers during resharding: a consumer must read all records from a parent shard before reading its children, otherwise per-key order breaks across the boundary. The Kinesis Client Library handles this automatically; a hand-rolled consumer must respect it.

Capacity modes: provisioned vs on-demand

Provisioned mode means you set and scale the shard count yourself. It is the cheapest option at steady, predictable, high throughput, because you are not paying a per-GB data charge; the trade-off is that you own the sizing and the resharding.

On-demand mode means AWS manages shards and auto-scales for you. A new on-demand stream starts at 4 shards (4 MB/s write, 8 MB/s read) and auto-scales to double the peak write throughput seen in the trailing 30 days. Because scaling is reactive, a sudden spike beyond 2x the prior peak within roughly 15 minutes can still throttle. The default on-demand ceiling is 200 MB/s write and 400 MB/s read, raised to 10 GB/s write and 20 GB/s read in us-east-1, us-west-2, and eu-west-1 (a support request unlocks the higher tier elsewhere). You can switch a stream between modes twice per 24 hours.

There is now a third option worth treating as a first-class part of the on-demand story. On-demand Advantage mode (announced 2025-11-20) is an account-level setting with roughly 60% lower per-GB pricing than On-demand Standard and a higher enhanced fan-out consumer cap (50 versus 20). It is the current cost-and-scale play for high, steady on-demand volume, with a catch covered in the cost section: an account-level minimum commitment.

Tip: A sensible default for a new stream with unknown traffic is on-demand, then a move to provisioned once the shape is known and steady. Paying the per-GB premium to skip sizing decisions is worth it early; paying it forever on a predictable high-volume stream is not.

Retention, delivery, and ordering

Default retention is 24 hours. You can extend it up to a maximum of 8,760 hours (365 days) with IncreaseStreamRetentionPeriod; the minimum is 24 hours. Retention beyond 24 hours is billed, and the extended tier (24h to 7d) and long-term tier (over 7d to 365d) have different pricing. Long-term retention is the feature that makes backtesting, backfills, and audit replays possible without an external archive.

Two semantic facts must shape every consumer you write. Delivery is at-least-once, so consumers must be idempotent; duplicates happen on retries and around resharding. And producers can create duplicates too, on PUT retries. Combined with ordering that holds only within a shard, the consequence is concrete: deduplication and ordering-sensitive logic belong on the consumer side, keyed by an idempotency key and a dedup store.

Consumer models

There are four ways to read a Data Stream, and the difference between them is mostly about how the 2 MB/s per-shard read budget is shared.

Shared-throughput polling is the default. Every shared consumer splits that 2 MB/s (and the 5 read transactions/s) per shard. Add a second consumer and each effectively gets about 1 MB/s. Add a third and they start to starve: GetRecords.IteratorAgeMilliseconds climbs as consumers fall behind. This is the mechanism behind the wrong conclusion that “Kinesis is slow.” It is not slow; the shared read budget is being divided.

The Kinesis Client Library (KCL) is for self-run consumer fleets. It coordinates one lease per shard across workers, checkpoints the per-shard sequence number to a DynamoDB table, load-balances shards across workers, and reacts to resharding automatically by draining parent shards before children. KCL 3.x (the current major version) adds two more DynamoDB metadata tables (worker-metrics and coordinator-state) and a global secondary index on leaseOwner to cut read-capacity cost. It also adds graceful lease handoff: a worker finishes checkpointing before transferring a lease, which reduces reprocessing. Note the cost implication: the DynamoDB lease table is a real, billable side-dependency that teams routinely forget when they budget a KCL consumer.

Enhanced fan-out (EFO) gives each registered consumer a dedicated 2 MB/s per shard, pushed over HTTP/2 via SubscribeToShard rather than polled, typically delivered within about 70 ms (roughly 65% lower latency than GetRecords, by AWS’s own published numbers). Consumers no longer contend with each other. The registered-consumer cap is 20 per stream on On-demand Standard and Provisioned, and 50 per stream on On-demand Advantage. EFO costs extra (a consumer-shard-hour charge plus a per-GB retrieval charge, additive per consumer), so it earns its keep past roughly two or three consumers, or whenever latency genuinely matters.

Lambda as a consumer, via an event source mapping (ESM), is the serverless default. Lambda polls each shard (about once per second for the standard iterator) or subscribes via EFO if you pass a consumer ARN, batches the records, and invokes your function. You tune it with a batching window up to 5 minutes, a payload up to 6 MB, and a ParallelizationFactor of 1 to 10 (concurrent batches per shard, still in order per partition key).

The Lambda failure behaviour is the gotcha worth memorising. By default, MaximumRetryAttempts and MaximumRecordAgeInSeconds are -1, meaning infinite. So a single poison-pill batch blocks its shard and retries until those records age out of retention; the whole shard stalls behind one bad record. Three mitigations matter: BisectBatchOnFunctionError splits a failing batch to isolate the bad record (and does not consume retry quota), finite MaximumRetryAttempts or MaximumRecordAgeInSeconds bound the stall, and an OnFailure destination (SQS, SNS, or S3) captures what could not be processed. One caveat on that destination: for SQS and SNS, Lambda sends only metadata (streamArn, shardId, startSequenceNumber, endSequenceNumber), not the record bodies, so your dead-letter handler must re-read the stream to recover them.

ModelPer-shard read budgetLatencyWhen to use
Shared polling2 MB/s split across all consumersPoll intervalOne or two consumers, cost-sensitive
KCL 3.xShared (or EFO)Poll or pushSelf-run fleet needing checkpointing and rebalancing
Enhanced fan-out2 MB/s dedicated per consumer~70 ms pushThree or more consumers, or latency-sensitive
Lambda ESMShared or EFOPoll (~1s) or pushServerless processing without running workers

A minimal producer and a stream check

The AWS SDK v3 for JavaScript is server-side, so a Lambda or service producer looks like this. Note that the partition key is what decides the shard.

import {
  KinesisClient,
  PutRecordCommand,
} from "@aws-sdk/client-kinesis";

const kinesis = new KinesisClient({ region: "us-east-1" });

export async function publishOrderEvent(order: {
  id: string;
  tenantId: string;
}) {
  await kinesis.send(
    new PutRecordCommand({
      StreamName: "order-events",
      // High-cardinality key spreads load across shards.
      // Using tenantId alone risks a hot shard for a large tenant.
      PartitionKey: `${order.tenantId}#${order.id}`,
      Data: new TextEncoder().encode(JSON.stringify(order)),
    }),
  );
}

To register an EFO consumer once at deploy time, the CLI is the simplest path:

aws kinesis register-stream-consumer \
  --stream-arn arn:aws:kinesis:us-east-1:123456789012:stream/order-events \
  --consumer-name analytics-efo
{
    "Consumer": {
        "ConsumerName": "analytics-efo",
        "ConsumerARN": "arn:aws:kinesis:us-east-1:123456789012:stream/order-events/consumer/analytics-efo:1719532800",
        "ConsumerStatus": "CREATING",
        "ConsumerCreationTimestamp": "2026-06-28T00:00:00+00:00"
    }
}

When you want a quick read of a stream’s capacity mode and shard count without parsing the full shard list, describe-stream-summary is the right call:

aws kinesis describe-stream-summary --stream-name order-events
{
    "StreamDescriptionSummary": {
        "StreamName": "order-events",
        "StreamARN": "arn:aws:kinesis:us-east-1:123456789012:stream/order-events",
        "StreamStatus": "ACTIVE",
        "StreamModeDetails": {
            "StreamMode": "ON_DEMAND"
        },
        "RetentionPeriodHours": 24,
        "OpenShardCount": 4,
        "ConsumerCount": 1,
        "EnhancedMonitoring": [
            {
                "ShardLevelMetrics": []
            }
        ]
    }
}

That empty ShardLevelMetrics is a reminder: per-shard metrics, the ones that actually reveal a hot shard, are opt-in via EnableEnhancedMonitoring and bill as additional CloudWatch custom metrics. Stream-level metrics like IncomingBytes, IncomingRecords, and GetRecords.IteratorAgeMilliseconds are emitted by default at one-minute granularity, but they hide skew. If you suspect a hot shard, the per-shard view is where you confirm it.

Common use cases

Strip away the domain labels and the canonical Data Streams use cases all buy the same three properties from the mechanics above: sub-second intake at scale, per-shard ordering, and replayable fan-out to independent consumers. AWS’s own list (log intake, real-time metrics, real-time analytics, and complex multi-stage processing) is those properties wearing different clothes, with many producers in, one ordered durable log, and several independent readers out.

Use caseWhat it leans onWhere it sits in the family
Log and event intakesub-second intake with no producer-side batching, so data survives a front-end crashData Streams in, Firehose to land raw logs in S3
Real-time metrics and dashboardsordering plus a put-to-get delay under a secondData Streams to Lambda or Managed Service for Apache Flink
Clickstream and product analyticsreplay plus parallel, independent multi-consumer readsData Streams to several apps at once over enhanced fan-out
IoT and telemetry ingestionper-device ordering via the partition key, at high throughputData Streams keyed by device id
Streaming ETL and aggregationstateful windowed processing, then landing curated dataData Streams to Flink to Firehose to a warehouse
Multi-stage (DAG) processingone stage’s output feeds the next stage’s streamData Streams chained into Data Streams

Notice how often the shape repeats: Data Streams for the ordered, replayable log, then Firehose or Flink for the work on top. That is the four services earning their separate names one more time, now from the use-case side rather than the definition side.

Putting Data Streams to work: CDC, routing, and idempotency

The mechanics earn their keep in three patterns that show up in almost every real Data Streams deployment: capturing database changes as a stream, routing records by content, and consuming them idempotently because delivery is at least once, not exactly once.

DynamoDB native CDC

Relational DB via AWS DMS

Kinesis Data Streams ordered, replayable

EventBridge Pipes filter + enrich

Materialized view idempotent write

Search index

Firehose to S3 audit + backfill

Change data capture into a stream

Change data capture turns every insert, update, and delete in a database into an ordered event, and Data Streams is a natural home for them: ordered per shard, replayable, and readable by many consumers at once. Two on-ramps cover most cases, and each has a record shape and a few knobs worth knowing before you wire it up.

DynamoDB, natively. Turn on Kinesis Data Streams for DynamoDB and the table emits, for every item-level change, a record carrying the change time, the item’s primary key, and both a before and an after image of the item. It captures only from the moment you enable it, so seed history with an export or scan before cutover; there is no backfill. The reach is what separates it from plain DynamoDB Streams: DynamoDB Streams keeps 24 hours and has its own KCL adapter, while the Kinesis path gives retention up to 365 days, enhanced fan-out, and Firehose or Managed Service for Apache Flink as consumers. Two behaviours the docs call out directly shape your consumer: records can arrive out of order, and the same change can appear more than once. You reconcile both with the ApproximateCreationDateTime stamp on each record, which you can set to millisecond or microsecond precision. One more sharp edge: binary attributes are base64-encoded a second time on the way in, so consumers decode twice.

Relational databases, through AWS DMS. A DMS task targets a Data Stream and writes each row change as a JSON record, or JSON_UNFORMATTED, a single line, when you want Firehose to land it in S3 for Athena. You shape the record and pick the partition key with object mapping. The endpoint settings decide how much context rides along: IncludeTransactionDetails adds the commit timestamp and transaction_id, IncludeTableAlterOperations carries DDL such as add-column, and the before-image settings attach the row’s prior values so a consumer can diff old against new. The throttling trap is concrete: if you key by primary key and thousands of narrow-keyed tables share a range, the same key from every table piles onto one shard, and PartitionIncludeSchemaTable prefixes the schema and table name to spread it. For throughput, DMS applies CDC with up to 32 ParallelApplyThreads; when you turn that on, the partition key defaults to the table’s primary key, so each row still serializes onto one shard while different rows go in parallel.

The rule underneath both is the same: per-entity order holds only within a shard, so the partition key has to be the entity’s primary key. Key changes by customerId and one customer’s history stays in order on one shard; key by something coarse and a delete can overtake the insert it was meant to follow. A single hot entity then becomes a hot shard, and you cannot perfectly order one key and parallelize that same key at once, which is why the version-based idempotency below makes an occasional reorder harmless.

Routing: Kinesis partitions, EventBridge routes

Data Streams does not route by content. Its only routing is the partition-key hash that picks a shard, and every consumer reads every record on the shards it owns and decides for itself what to ignore. The moment the requirement becomes “orders over USD 1,000 go to this handler, refunds go to that one”, that is content routing, and it belongs to EventBridge rather than Kinesis.

EventBridge Pipes is the connective tissue, and it is more configurable than it first looks. A pipe maps to a Data Stream as either a shared-throughput consumer or a dedicated enhanced-fan-out one, polls each shard about once a second, and buffers up to a five-minute window or 6 MB before it acts. It can run up to ten batches per shard in parallel and still preserve order at the partition-key level, and it reports partial batch failures so one poison record does not force the whole batch to retry, which is the shard-blocking problem from the consumer section solved declaratively. Each record reaches the pipe as an envelope (partitionKey, sequenceNumber, base64 data, approximateArrivalTimestamp); the filter stage matches an EventBridge event pattern against those fields and drops non-matching records before they cost any enrichment or target compute. Because data is base64, deep content filtering usually rides on the enrichment step, a Lambda or Step Functions workflow or API call capped at a 6 MB response, rather than the raw pattern.

A pipe goes to exactly one target. To fan a stream out to many content-routed destinations, the documented shape is Kinesis to a pipe to an EventBridge event bus as the target, then bus rules route to as many targets as you need. That is the clean division of labour: the stream is the durable ordered log you replay, the bus is the content router and fan-out with no replay, and the pipe is the one-way bridge between them.

ToolRouting modelReplayBilled
Kinesis Data Streamspartition-key hash to a shard, not by contentyes, within retentionshard-hour or per-GB
EventBridge buscontent rules to many targets, fan-outnoper event
EventBridge Pipesone source, filter and enrich, one target, order-preservingno, reads from the sourceper request

Idempotency: at-least-once means design for duplicates

Data Streams delivers at least once, so duplicates are part of the contract, not an edge case, and as the CDC on-ramps showed, the source can reorder and repeat on its own. Four things produce them: a producer retries a PutRecord after a timeout and writes twice; a consumer crashes after processing a batch but before checkpointing and reprocesses on restart; a reshard replays records around the boundary; and DynamoDB’s own change stream is documented to reorder and duplicate. Plan for all four.

The defense is a dedup gate on a stable identity. Every record carries a sequenceNumber, unique per partition key within its shard, so pair it with the shard id for a stream-wide key (that pair is the eventID from the pipe envelope above), or use a business idempotency key from the payload; either keys a conditional write where applying the effect and recording the key happen atomically and a replay becomes a no-op. The non-obvious part is the window: a stream can be replayed across its whole retention, up to 365 days, so a dedup table with a one-hour TTL silently stops protecting you the moment a backfill replays older history. Size the TTL to the longest replay you will actually run, and treat the table as the real, billable dependency it is.

For CDC, go a step further and make the write idempotent on a version, not just a key, because once records reorder, “have I seen this?” is the wrong question and “is this newer than what I have?” is the right one. Use DynamoDB’s ApproximateCreationDateTime, or the commit timestamp from DMS, as the version and apply a change only when it is newer than the stored one. A re-delivered or out-of-order older change then lands as a no-op instead of resurrecting stale data.

import {
  DynamoDBClient,
  UpdateItemCommand,
  ConditionalCheckFailedException,
} from "@aws-sdk/client-dynamodb";

const db = new DynamoDBClient({ region: "us-east-1" });

// Apply a CDC change only if it is newer than what we already stored.
// A re-delivered or out-of-order record fails the condition and is skipped.
async function applyChange(change: {
  entityId: string;
  version: number;
  state: string;
}) {
  try {
    await db.send(
      new UpdateItemCommand({
        TableName: "projection",
        Key: { id: { S: change.entityId } },
        UpdateExpression: "SET #s = :state, version = :v",
        ConditionExpression: "attribute_not_exists(version) OR version < :v",
        ExpressionAttributeNames: { "#s": "state" },
        ExpressionAttributeValues: {
          ":state": { S: change.state },
          ":v": { N: String(change.version) },
        },
      }),
    );
  } catch (err) {
    if (err instanceof ConditionalCheckFailedException) return; // duplicate or stale
    throw err;
  }
}

Two details bite in practice. If your producers use the Kinesis Producer Library, it packs many user records into one Kinesis record and the KCL unpacks them, so deduplicate on your own payload key rather than the Kinesis record. And if hand-rolling all of this sounds heavy, that is the case for Managed Service for Apache Flink: its checkpointing gives exactly-once processing over its own state, trading the dedup store you operate for a Flink application you operate.

The same idea covers the general case; for a fuller treatment of keys, windows, and storage, see the idempotency guide.

The cost model

Pricing changes, so treat the model as the durable part and every dollar figure as a dated anchor. All figures below are US East (N. Virginia), from the AWS pricing page, retrieved 2026-06-28; verify currency before relying on them, because Kinesis pricing changes.

Provisioned mode bills these line items:

  • Shard hour: USD 0.015 per shard-hour (one shard is 1 MB/s write, 2 MB/s read).
  • PUT payload units: USD 0.014 per million units, where one unit is a 25 KB chunk of a record, rounded up. A 5 KB record is 1 unit; a 30 KB record is 2 units; a 1 MB record is 40 units. This is the line that bites high-rate, small-record workloads.
  • Extended retention (24h to 7d): an extra USD 0.020 per shard-hour.
  • Long-term retention (over 7d to 365d): storage at USD 0.023 per GB-month plus retrieval at USD 0.021 per GB through GetRecords.
  • Enhanced fan-out: USD 0.015 per consumer-shard-hour plus USD 0.013 per GB retrieved, additive for each registered consumer.

On-demand Standard bills differently: USD 0.040 per stream-hour, plus USD 0.08 per GB ingested (with 1 KB rounding), plus USD 0.040 per GB retrieved, with EFO billed per GB on top. There is no shard to count and no shard-hour.

On-demand Advantage (2025-11-20) bills USD 0.032 per GB ingested and USD 0.016 per GB retrieved, roughly 60% under Standard, with no per-stream hourly charge and EFO included in the data-out rate. The catch is an account-level minimum commitment of 25 MB/s ingest plus 25 MB/s retrieval. That minimum is exactly why Advantage is a high-steady-throughput play, not a default for small streams.

A worked example

Take a workload of about 1,000 records/s of roughly 3 KB each, which is about 3 MB/s of ingest. The arithmetic below is the method; re-run it at current prices for your own numbers.

In provisioned mode, 3 MB/s of throughput needs ceil(3 / 1) = 3 shards (the record-rate cap of 1,000/s/shard is not the binding constraint here, the byte rate is). Add two EFO consumers.

Line itemCalculationMonthly
Shard hours3 shards x USD 0.015 x 730 h~USD 32.85
PUT payload units~2.63 billion units x USD 0.014/million~USD 36.80
EFO consumer-shard hours2 x 3 shards x USD 0.015 x 730 h~USD 65.70
EFO + retrievalper-GB retrieval on topadds with volume

The same 3 MB/s under On-demand Standard ingests about 7.78 TB/month, which at USD 0.08 per GB is roughly USD 622 in ingest charges alone, before retrieval. That gap is the whole point: provisioned wins at steady, predictable load; on-demand wins at spiky or unknown load. The crossover is real and it is large, so the capacity-mode choice is a cost decision, not just an operational one.

Across services the shape of the bill differs more than the exact numbers. Data Streams charges for provisioned throughput (shard-hours) or per GB (on-demand). SQS charges per request (Standard is roughly USD 0.40 per million requests in us-east-1 at this date; verify) and scales to zero, but offers no ordering outside FIFO, no replay, and no one-to-many fan-out from a single queue. MSK charges for a cluster (broker-hours plus storage) that bills even when idle, in exchange for Kafka semantics and a higher floor. Pick by workload shape, and the bill follows the shape.

When Data Streams is the right answer, and when it isn’t

Reach for Kinesis Data Streams when you need an ordered, replayable, multi-consumer real-time stream and you want a managed, shard-sized service without operating Kafka. That is its sweet spot, and within the streaming use case it is the sensible default. The moment one of those three properties (ordering, replay, multiple independent consumers) is not actually required, a simpler service usually wins.

No, just decouple work

No, route and filter events to AWS targets

No, react to table changes

Yes, and need Kafka semantics or portability

Yes, AWS-native and managed

Just land it in S3 or a warehouse

Run stateful compute over it

Attach your own consumers

Need ordering, replay, AND multiple independent consumers?

SQS work queue

EventBridge

DynamoDB Streams

Amazon MSK

Kinesis Data Streams

What next?

Amazon Data Firehose

Managed Service for Apache Flink

Your KCL or Lambda consumers

These boundaries are already argued in depth elsewhere on this site, so rather than re-litigate them here: if the question is Kafka versus a bus, see Kafka or event bus: migration signals. For SNS and SQS fan-out patterns, see SNS to SQS cross-account fan-out and Event fan-out to isolated consumer accounts. For the broader landscape of event-system options, see Event-driven systems tools comparison.

Common pitfalls

A few mistakes account for most of the pain teams report with Data Streams.

  • Picking Firehose when you needed Data Streams, or the reverse. Firehose is buffered delivery only; it lands data in a sink and is not a replayable log you attach many independent consumers to. Conversely, hand-rolling S3 delivery on Data Streams when Firehose (which can consume a Data Stream as its source) would have been zero-code.
  • A low-cardinality partition key creating a hot shard. Distribute the key; use an explicit hash key only when co-location is the goal.
  • Adding a third or fourth shared consumer and blaming Kinesis for the latency instead of moving to enhanced fan-out.
  • Non-idempotent consumers under at-least-once delivery, which causes double-processing on retries and around resharding.
  • Forgetting that a failing Lambda batch can block its shard until the records age out; not configuring BisectBatchOnFunctionError, finite retries, or an on-failure destination.
  • Assuming retention is free and unlimited. Extended and long-term retention are billed, and the two tiers price differently.
  • Treating shard egress as per-consumer in shared mode. It is a shared 2 MB/s; only EFO makes it per-consumer.

Closing

Kinesis is four services, and most of the time you mean Data Streams: an ordered, replayable, shard-sized stream you attach your own consumers to. Size it by turning throughput into shards, choose the partition key so no single shard runs hot, pick the capacity mode from your traffic predictability rather than habit, and make every consumer idempotent because delivery is at-least-once. The boundary matters as much as the default. If you do not need ordering, replay, and independent multi-consumer fan-out together, a work queue, an event router, or a table stream is the simpler and often cheaper answer; a Kafka-heavy shop is better served by MSK. The next concrete step before you commit is to write your expected peak throughput and consumer count on paper and check both against the per-shard limits; the sizing falls out of that, and so does the bill.

References

Related posts