Skip to content

Event Fan-Out to Isolated Consumer Accounts: Zero-Touch Producer, Per-Domain Ownership

A platform-engineering default for multi-team AWS orgs: one event, many consumers, each in its own account with its own SQS and DLQ, fan-out lives in the event bus layer.

A producer service collects new downstream consumers like barnacles. Every new team that wants events adds another if enabled(x) publish(x) branch, and every new consumer needs a producer deploy. The branches are a junction box pretending to be application code, and the blast radius of any mistake sits in the same account as payments.

This post is for platform and backend engineers who own the event backbone for a multi-team AWS org. It describes a default topology for Audit, Customer Center, Marketing, and any future consumer, and shows when to reach for a different backbone.

Thesis: fan-out lives in the event bus layer, not in application code

One event, many consumers, each in its own account with its own SQS and DLQ, fan-out lives in the event bus layer, not in application code.

Concretely, this means a central EventBridge custom bus in a dedicated events account, cross-account rules from that bus to a receiver bus in every consumer account, and per-consumer SQS plus DLQ plus compute plus store inside each consumer account. The producer service never learns who reads its events. Onboarding a new consumer is a platform pull request, not a producer deploy.

The payoff is organisational. Each domain team owns a consumer end-to-end: budget, compliance posture, on-call, and rollback. The producer team keeps shipping features without integration meetings. The Turkish shorthand for this is "bağımsız yönetilebilir yapı", an independently manageable structure.

Service selection: which bus, which queue

Before wiring the default, name the constraint that would break it. EventBridge custom bus wins by default for its schema registry, native cross-account bus-to-bus rules, per-target dead-letter queues, and filter-at-the-bus semantics. Four constraints knock it off that default, and each one maps to a specific service that takes over.

EventBridge custom bus is the router. Filter rules at the bus, schema registry as the producer-consumer contract, cross-account bus-to-bus delivery via an IAM role in the sender (required since March 2023), and one DLQ per target. It stays the default as long as the four constraints above are soft.

Kinesis Data Streams is the stream. Per-shard ordering keyed by partition key, 24 hours to 365 days of replay by sequence number, and enhanced fan-out for low-latency consumers. The cost model is shard-based, not per-event, so it rewards steady high throughput and punishes spiky low volume. Cross-account consumption exists but needs more plumbing than EventBridge bus-to-bus.

Managed Streaming for Kafka (MSK) is Kinesis's peer when the organisation already runs Kafka tooling. Debezium sinks natively, Schema Registry is Kafka-native, and existing producers and consumers stay source-compatible. Operational surface area is larger than Kinesis; pick it for ecosystem fit, not feature parity.

SNS plus SQS fan-out is the legacy pub/sub pattern and still the correct answer when the targets are SMS, email, or raw HTTPS webhooks. EventBridge does not deliver to a phone number. Keep SNS plus SQS for that slice only; do not use it as the backbone for AWS-native consumers when EventBridge is on the table.

DynamoDB Streams is a source, not a bus. It pairs with EventBridge Pipes (as in the default snippet below), Lambda, or Kinesis. When the producer's substrate is DynamoDB, this is how zero-touch capture is achieved.

Kinesis Data Firehose is a delivery sink, not a bus. It shines as the Audit consumer's substrate: batches events into S3 Parquet, compresses, partitions by event time, and hands off to Glue. A bus still sits in front of it.

Step Functions is orchestration, not a bus. Use it inside a consumer when the consumer is a multi-step workflow, for example a Customer Center cold-restore that triggers an S3 Restore, polls until the object is warm, and then notifies the agent. The bus delivers the trigger; Step Functions owns the state machine.

The decision is not religious. EventBridge's weaknesses are Kinesis and MSK's strengths; the question is how many of the four constraints are real today, not in a projected scale plan a year out.

The default: DynamoDB Streams and EventBridge Pipes on the producer side

Start with the mechanism that requires zero producer code. If the producer's substrate is DynamoDB, a DDB stream plus EventBridge Pipes is the shortest path from database write to custom bus.

typescript
// aws-cdk-lib v2import { Stack, StackProps, RemovalPolicy } from "aws-cdk-lib";import { Construct } from "constructs";import * as dynamodb from "aws-cdk-lib/aws-dynamodb";import * as events from "aws-cdk-lib/aws-events";import * as pipes from "aws-cdk-lib/aws-pipes";import * as iam from "aws-cdk-lib/aws-iam";
export class ProducerSideStack extends Stack {  constructor(scope: Construct, id: string, props?: StackProps) {    super(scope, id, props);
    const orders = new dynamodb.Table(this, "Orders", {      partitionKey: { name: "pk", type: dynamodb.AttributeType.STRING },      stream: dynamodb.StreamViewType.NEW_AND_OLD_IMAGES,      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,      removalPolicy: RemovalPolicy.RETAIN,    });
    const bus = new events.EventBus(this, "CentralBus", {      eventBusName: "platform-events",    });
    const pipeRole = new iam.Role(this, "PipeRole", {      assumedBy: new iam.ServicePrincipal("pipes.amazonaws.com"),    });    orders.grantStreamRead(pipeRole);    bus.grantPutEventsTo(pipeRole);
    new pipes.CfnPipe(this, "OrdersPipe", {      roleArn: pipeRole.roleArn,      source: orders.tableStreamArn!,      target: bus.eventBusArn,      sourceParameters: {        dynamoDbStreamParameters: {          startingPosition: "LATEST",          batchSize: 10,        },        filterCriteria: {          filters: [            { pattern: JSON.stringify({ eventName: ["INSERT", "MODIFY"] }) },          ],        },      },      targetParameters: {        eventBridgeEventBusParameters: {          source: "com.example.orders",          detailType: "order.changed",        },      },    });  }}

This shape is zero-touch: the producer's application code writes to DynamoDB as it already does, and the pipe turns each change into a bus event. Ordering is preserved per partition key through the pipe. It is lost downstream when the bus fans out, which is fine for most domains that dedupe by event id. If the producer is on Postgres or MySQL instead, the outbox plus CDC shape gives stronger transactional guarantees at the cost of one extra moving part. Options include Debezium tailing the WAL, or DMS serverless to Kinesis with a small Lambda relay to PutEvents. Pick one default per substrate and make it a paved road.

Fan-out mechanics: one producer bus, many consumer buses

Cross-account delivery in EventBridge is bus-to-bus. The sender account has one rule per consumer; the target is the consumer account's event bus ARN. Since March 2023, every cross-account event-bus target must be invoked via an IAM role in the sender account with events:PutEvents on the target. The receiver side adds a resource policy on its own bus that allows PutEvents from the sender's bus ARN, or from the whole AWS Organization.

The CDK for the sender-side rule with a cross-account target looks like this.

typescript
import { Stack, StackProps } from "aws-cdk-lib";import { Construct } from "constructs";import * as events from "aws-cdk-lib/aws-events";import * as targets from "aws-cdk-lib/aws-events-targets";import * as iam from "aws-cdk-lib/aws-iam";import * as sqs from "aws-cdk-lib/aws-sqs";
interface Props extends StackProps {  centralBusArn: string;  auditBusArn: string; // arn:aws:events:eu-central-1:<audit-acct>:event-bus/audit-in}
export class SenderRulesStack extends Stack {  constructor(scope: Construct, id: string, props: Props) {    super(scope, id, props);
    const bus = events.EventBus.fromEventBusArn(this, "Bus", props.centralBusArn);
    const crossAccountRole = new iam.Role(this, "AuditTargetRole", {      assumedBy: new iam.ServicePrincipal("events.amazonaws.com"),    });    crossAccountRole.addToPolicy(new iam.PolicyStatement({      actions: ["events:PutEvents"],      resources: [props.auditBusArn],    }));
    const ruleDlq = new sqs.Queue(this, "AuditRuleDlq", {      retentionPeriod: require("aws-cdk-lib").Duration.days(14),    });
    new events.Rule(this, "AuditRule", {      eventBus: bus,      eventPattern: {        source: ["com.example.orders", "com.example.payments"],        detailType: ["order.changed", "payment.settled"],      },      targets: [        new targets.EventBus(          events.EventBus.fromEventBusArn(this, "AuditBus", props.auditBusArn),          {            role: crossAccountRole,            deadLetterQueue: ruleDlq,          },        ),      ],    });  }}

Two details bite if missed. First, the rule itself needs a dead-letter queue, not only the downstream SQS in the consumer account. When the receiver bus resource policy is wrong, the sender rule reports success but the delivery is dropped, and only the rule-level DLQ catches it. Second, if the receiver SQS uses a customer-managed KMS key, that key's policy must name the sender account's EventBridge service as a principal for kms:GenerateDataKey and kms:Decrypt. An AWS-managed key silently rejects cross-account writes.

On the receiver side, each consumer account owns the four-layer paved road: receiver bus, consumer-specific rule, SQS plus DLQ, compute plus store. The rule filters only the event types that consumer cares about; everything else is ignored at zero cost.

typescript
import { Stack, StackProps, Duration } from "aws-cdk-lib";import { Construct } from "constructs";import * as events from "aws-cdk-lib/aws-events";import * as targets from "aws-cdk-lib/aws-events-targets";import * as sqs from "aws-cdk-lib/aws-sqs";import * as cloudwatch from "aws-cdk-lib/aws-cloudwatch";
interface ReceiverProps extends StackProps {  senderAccountId: string;}
export class AuditReceiverStack extends Stack {  constructor(scope: Construct, id: string, props: ReceiverProps) {    super(scope, id, props);
    const bus = new events.EventBus(this, "AuditBus", {      eventBusName: "audit-in",    });    bus.grantPutEventsTo(      new (require("aws-cdk-lib/aws-iam")).AccountPrincipal(props.senderAccountId),    );
    const dlq = new sqs.Queue(this, "AuditDlq", {      retentionPeriod: Duration.days(14),    });    const queue = new sqs.Queue(this, "AuditQueue", {      visibilityTimeout: Duration.seconds(60),      deadLetterQueue: { queue: dlq, maxReceiveCount: 5 },    });
    new events.Rule(this, "AuditIngest", {      eventBus: bus,      eventPattern: { source: ["com.example.orders", "com.example.payments"] },      targets: [new targets.SqsQueue(queue)],    });
    new cloudwatch.Alarm(this, "DlqAlarm", {      metric: dlq.metricApproximateNumberOfMessagesVisible(),      threshold: 1,      evaluationPeriods: 1,      alarmDescription: "Audit DLQ has messages, investigate",    });  }}

Onboarding a new consumer is now a three-file change in the platform repo: one sender rule, one IAM role, one receiver stack. The producer does not ship.

Three consumers, three terminal stores

Each consumer account keeps the same four-layer shape. What differs is the store, the retention, and the retrieval SLO. The consumer boundary is where domain-specific concerns live, not the producer.

Audit: S3 plus Object Lock plus Glacier lifecycle

The audit consumer writes every event to S3 with Object Lock in Compliance mode and Versioning enabled. A lifecycle rule transitions objects to Glacier Deep Archive after 90 days. Legal Hold is a separate flag applied at the object level; it has no expiration and survives the retention window, which is what satisfies litigation requirements. Retrieval SLO is minutes to hours: Standard Retrieval from Deep Archive is roughly 12 hours, Bulk is up to 48 hours, and retrievals must be explicitly requested.

The KMS key that encrypts the audit bucket lives in the audit account. The producer team cannot decrypt audit records even if they wanted to; the access boundary is enforced by the key policy, not by trust. For queries, Athena over the S3 prefix is adequate for most incident-response workflows; cold data stays cold until someone asks.

Customer Center: DynamoDB hot, S3 plus Athena warm, restore-on-demand cold

The Customer Center consumer projects events into a shape that support agents use in real time. Recent 90 days sit in DynamoDB keyed by customer id for sub-second lookups; 90 days to 2 years live in S3 partitioned by date, queried with Athena in seconds; older data is in Glacier and must be explicitly restored with a minutes-to-hours SLO. An optional OpenSearch mirror sits next to DynamoDB when agent search UX needs full-text over the hot tier.

The consumer Lambda handles the projection logic, which is domain-specific and iterates faster than the producer's data model. When an agent opens a customer record older than 2 years, the UI renders a "restore in progress" state and re-queries when the restore completes. The tiering is an implementation detail of this account, invisible to the producer, and changeable without coordination.

Marketing: warehouse or CDP, PII filter at the consumer boundary

Marketing writes events into Snowflake or Redshift, or into a CDP like Segment, via Firehose. The critical rule here is that the PII filter lives at this consumer boundary, not at the producer. The producer does not know which consumer needs email, which needs anonymised identifiers, and which needs neither. Putting the filter upstream means Marketing files tickets against the producer team every time they want a new field.

Retention is short, typically 90 days on raw event data plus an opt-out lineage table that supports GDPR Article 17 right-to-erasure requests. Cost per million events is lower than Audit and Customer Center because the warehouse is a shared resource and the retention window is narrow. Segment-building is batch or near-real-time, not sub-second.

When to override

The bus choice is covered above; this section is for overrides that stay on EventBridge but relax the multi-account or zero-touch defaults.

The first override case is the small team with one AWS account and two or three in-process consumers. The multi-account shape is worth its overhead once there are at least three domains with distinct compliance postures; below that, SNS to SQS fan-out inside one account keeps the pattern recognisable without paying for cross-account tooling. Promote to the multi-account shape the moment a third consumer appears with a retention or PII policy that does not match the first two.

The second override is the brownfield Postgres service where the DMS or Debezium bill is not approved. A logical replication slot read by a small Lambda that calls PutEvents is cheaper operationally for low-to-mid volumes, at the cost of owning the slot's bookkeeping yourself. This is a fallback, not a default, because slot management errors are silent and corrupt the event stream without raising an alarm.

Common pitfalls

The cross-account IAM role on the sender side is the most common miss. Rule creation succeeds without it, and delivery silently fails; the March 2023 change is still missed in older tutorials. Any cross-account bus-to-bus snippet written before that date is stale.

PII scrubbing at the producer boundary is the second. When the producer owns the scrub, every Marketing request to add or remove a field becomes a producer ticket, and every Audit or Customer Center team has to re-negotiate for the raw fields they actually need. Push the filter into the consumer account where the domain knowledge lives.

Idempotency is the third. EventBridge delivery is at-least-once, and DynamoDB Streams through Pipes can replay on Lambda errors. Every consumer must dedupe by event id, usually by writing an event_id column with a unique constraint or by using crypto.randomUUID() at ingestion and checking a short-TTL cache before the terminal write. A duplicate order.changed that reaches the warehouse twice is a silent double-count in a campaign report; a duplicate in Audit is a WORM-protected duplicate that cannot be deleted.

FIFO SQS "just in case" is the fourth. Because EventBridge fan-out does not preserve ordering across the bus anyway, FIFO buys almost nothing downstream and halves throughput per partition. Keep FIFO for cases where per-key ordering is business-critical and the consumer genuinely reads from a single partition, such as account state machines.

Closing

The recommended default holds when there are at least three domain teams, distinct compliance or retention postures per consumer, and an AWS Organization boundary already in place. Override to Kinesis or MSK when ordering, replay, or throughput crosses the thresholds above, and stay in a single account when the consumer count is small and the retention policies agree. The next step for most teams is to pick one substrate (DynamoDB Streams plus Pipes, or outbox plus CDC) and make it the paved road, then onboard the second consumer without touching the producer.

References

Related Posts