Skip to content

wasmCloud + NATS: Why the Event Bus Is Where Lock-In Really Lives

An exploration thesis: vendor lock-in in event-driven systems lives in the bus topology, not the runtime; wasmCloud and NATS turn the bus into a portable primitive worth investigating.

The Problem, Made Concrete

Most conversations about serverless lock-in fixate on the runtime: Lambda versus Cloud Functions, Workers versus Containers. The event bus gets treated as plumbing. In event-driven systems that framing is backwards, because the runtime is replaceable in weeks and the bus topology is replaceable in quarters. This post is a thesis piece for engineers already comfortable with Lambda and EventBridge who want to know whether wasmCloud and NATS are worth a closer look.

The Thesis

In event-driven systems, vendor lock-in does not live at the runtime layer. It lives in the event bus topology. Lambda plus EventBridge make publishing events trivial, but you do not own the bus. The routing rules, the cross-account permissions, the archive and replay, the schema registry, the dead-letter flow, the retention semantics, all of that is AWS-shaped, and all of it travels with you whether you want it to or not. wasmCloud plus NATS turn the bus into a portable primitive: the same subject hierarchy, the same JetStream retention, the same lattice can run in AWS today, on a colocation box tomorrow, and at an edge point-of-presence the quarter after, without re-plumbing the event contracts. That is why this space is worth exploring.

Lambda and EventBridge are not the villains of this story. They are excellent tools, and for teams fully committed to AWS the pair is often the right call. The thesis is narrower: if the bus itself matters to your long-term architecture, owning the bus matters more than owning the runtime.

The Mental Model Shift

The centralized managed bus and the federated lattice are not small variations on the same shape. They make different bets about where the coordination surface lives.

With EventBridge, the bus is a service you consume. Your producer writes to a bus ARN, the routing table lives in IAM and EventBridge rules, and your consumer is a Lambda target. You ship events into a box you do not own and you trust the box to fan them out by rules you express in its dialect. When you move clouds, the producer code may port in a week and the bus topology ports never.

With wasmCloud on a NATS lattice, the bus is a primitive you run. Producers publish to subjects like payments.webhook.received. Consumers subscribe with queue groups for load balancing or fan-out subscriptions for broadcast. JetStream streams retain what you choose, for as long as you choose, on storage you choose. The lattice is self-forming across VPCs, regions, leaf nodes in an on-prem datacenter, and browser clients if you need that, all addressed by subject rather than endpoint.

The EventBridge side has a proprietary bus with proprietary rules in the middle. The NATS side has a subject hierarchy in the middle, which is just a naming convention, and the transport that carries it is a set of open-source processes you control.

The practical consequence: on the managed side the bus topology is an AWS artifact and migrating means re-expressing it. On the lattice side the bus topology is a set of subject names and stream configs that travel with your code.

What a Webhook Fan-Out Exploration Would Look Like

The clearest way to test the topology claim is a workload that stresses the bus rather than the scheduler. Webhook fan-out fits: a single incoming event must reach several consumers with different durability and latency needs, which is exactly what a routing layer has to handle well.

The shape I want to try looks like this. A wasmCloud HTTP component receives an inbound webhook from a payment provider. It verifies the signature, extracts the tenant, and publishes to a subject tree of the form payments.{tenant}.webhook.{event-type}. Three kinds of consumers attach to that tree:

  1. A latency-sensitive consumer that joins a queue group so exactly one worker per message updates the entitlement cache.
  2. A durability-sensitive consumer backed by a JetStream work-queue stream that drives invoicing, which must not lose messages even if the consumer is down for a day.
  3. An analytics consumer on an interest-based stream that lets late-attaching aggregators backfill from the retained log.

The specific questions I want answered by the exploration, not asserted ahead of it:

  • Subject design. Does a tenant-scoped prefix like payments.{tenant}.* give enough isolation, or do per-tenant streams scale better when the tenant count climbs? The retention policy choice (LimitsPolicy vs WorkQueuePolicy vs InterestPolicy) depends on the answer.
  • JetStream retention. For the invoicing path, is a WorkQueuePolicy stream with a max-age of 7 days the right shape, or does the business actually want LimitsPolicy with replay? These are different failure modes under a down-consumer scenario.
  • Leaf-node shape. If a region needs local consumption, does a leaf node in that region with a filtered JetStream mirror give the same semantics as the managed side, or are there timing gaps that only show up under load?
  • Cross-region lattice. Can a single logical lattice span two cloud regions and an on-prem leaf without the producer code knowing the difference? The lattice docs say yes; whether it holds under a partition is what a POC has to answer.

I am deliberately not giving latency numbers, throughput figures, or cost comparisons in this post. Those claims belong to a run that has not happened yet. The shape above is what I plan to try, not what I have measured.

The Honest Nuance: wasmCloud's Roadmap Weakens the "Requires NATS" Pitch

A clean version of the thesis would be "wasmCloud is great because it uses NATS." That framing is already out of date. The wasmCloud Q3 2025 roadmap is actively decoupling the scheduler and the providers from NATS-as-default, moving toward a transport-pluggable wRPC layer where NATS becomes one option among several (TCP, WebTransport, Unix Domain Sockets, and QUIC land next to it).

That change strengthens the thesis rather than weakening it. The topology claim is not about NATS specifically. It is about owning the routing primitive rather than consuming a managed router. If wRPC makes the transport even more pluggable, the set of places a wasmCloud lattice can run grows, and the lock-in surface shrinks further. Today NATS is the default and the exploration above works through it. Tomorrow the same actor manifest might pin a different transport for a specific provider without the producer code noticing.

The honest statement is not "wasmCloud forever requires NATS" but "wasmCloud today is mature through NATS, and the roadmap is making it more pluggable, not less."

One more piece of context worth naming: wasmCloud is commercially backed by Cosmonic, which donated it to the CNCF in 2021; NATS is stewarded by Synadia, whose April 2025 proposal to relicense NATS under the Business Source License (BUSL) prompted the wasmCloud statement cited above. Both facts matter when you are weighing a portability thesis — the "own the bus" argument is stronger precisely because the commercial layers around these primitives can shift.

The Operational Boundary You Trade For

The thesis has a boundary worth naming directly, because otherwise it reads as an uncritical sell. A portable bus trades a vendor boundary for an operational boundary. With EventBridge, AWS runs the bus: you do not monitor it, do not patch it, do not wake up when its disk fills. With a NATS lattice you run yourself, all of that is your problem: JetStream storage sizing, leaf-node health, cluster upgrade cycles, TLS rotation across the mesh.

The fence I sit on this side of: for a team that already runs stateful infrastructure (Postgres, Redis, Kafka), adding a NATS cluster is an incremental burden, and the portability benefit is real. For a team whose entire ops posture is "everything is managed," the operational boundary is a step-function increase, and EventBridge is probably still the right answer even if it means owning less of the bus.

The thesis is strongest for the first kind of team, and I think that team is underserved by the current discourse because the loudest voices optimize for the second.

Closing

If your event bus is a minor piece of plumbing in a short-lived product, use EventBridge and move on. If the bus topology is going to outlive two runtime choices and one cloud vendor, the portable-bus path is worth investigating, and wasmCloud on NATS is the most mature stack in that direction today; the wRPC roadmap only makes that investigation more future-safe. This is not a production migration blueprint; it is an exploration thesis. The next step is a small webhook fan-out POC that tests the subject-design and leaf-node questions above, and a follow-up post that reports what actually happened.

References

Related Posts