Ayhan Sipahi 2025-12-12

Platform Engineering: Building Internal Developer Platforms That Developers Actually Want to Use

A practical guide to building Internal Developer Platforms with golden paths, self-service infrastructure, and product thinking, using Backstage, Port, and AWS.

Platform Engineering creates Internal Developer Platforms that enable self-service infrastructure through golden paths and standardized workflows. This guide covers IDP architecture, implementation patterns, tooling options (Backstage, Port, AWS services), success metrics, and common pitfalls for teams building platforms developers will actually adopt.

Understanding Platform Engineering

Platform Engineering is fundamentally about designing toolchains and workflows that enable self-service capabilities for software engineering organizations. The key shift is treating internal developers as customers rather than users to be controlled.

What makes Platform Engineering different from traditional DevOps?

The core difference is the platform-as-a-product mindset. Instead of DevOps teams acting as service providers responding to tickets, platform teams become product owners who build tools that developers voluntarily choose because they provide superior experiences.

This isn’t just semantic; it’s a fundamental change in approach:

DevOps: “We’ll deploy that for you (ticket required)”
Platform Engineering: “Here’s how you deploy it yourself in 5 minutes”

Why Platform Engineering keeps gaining traction:

The market data tells the story. The Platform Engineering Services Market grew from $7.19B (2024) to a projected $40.17B by 2032 (23.99% CAGR). Gartner predicts 80% of large software engineering organizations will have platform teams by 2026 (up from 45% in 2022).

The driver? A developer productivity crisis. Research shows 75% of developers lose 6+ hours weekly to tool fragmentation. Organizations need to scale DevOps practices without creating bottlenecks, while managing ever-increasing cloud-native complexity.

Core Philosophy:

These principles matter most in practice:

Developer empowerment through self-service (not ticket-driven workflows)
Standardization without reducing flexibility (golden paths, not rigid mandates)
Reducing cognitive load (one way that works well beats ten options)
Enabling speed AND safety simultaneously (not trading one for the other)
Product thinking applied to internal tools (developers are customers)

Core Components of an Internal Developer Platform

An effective IDP consists of several interconnected components:

Software Catalog / Service Catalog

A central registry of all services, APIs, resources, and teams. This isn’t just documentation; it’s a living inventory that answers “who owns this?” and “what depends on what?”

Backstage uses YAML-based catalog entities (Component, System, API, Resource, User, Group, Domain). The key is version control integration with GitHub or GitLab for catalog-as-code:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: user-service
  description: User management microservice
  annotations:
    github.com/project-slug: org/user-service
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  lifecycle: production
  owner: team-platform
  system: user-management
  providesApis:
    - user-api
  consumesApis:
    - auth-api
  dependsOn:
    - resource:user-database

Golden Paths / Paved Roads

The term “Golden Paths” comes from Spotify (Netflix calls the same concept “Paved Roads”). The definition that works best: “An opinionated, well-documented, supported way of building and deploying software”.

Characteristics of effective golden paths:

Fully self-serviceable (no ticket filing required)
Minimal cognitive load (sensible defaults, clear documentation)
Discoverable by anyone in the organization
Optional but most convenient (adoption through superiority, not mandate)
Drives standardization naturally (people choose it because it’s better)

Technical examples of golden paths:

Standardized service templates (new microservice in 30 minutes)
Pre-configured CI/CD pipelines (testing + deployment wired up)
Infrastructure-as-Code modules (common patterns ready to use)
Containerization blueprints (security baseline included)
Monitoring and observability pre-wired (no after-the-fact instrumentation)

Self-Service Actions / Developer Portal

The UI layer for platform capabilities. This is where developers interact with the platform to create services, deploy environments, provision databases, all without waiting.

Common implementations include web portals, CLI tools, IDE plugins, and Slack/ChatOps integrations. The best platforms support all of these, letting developers work however they prefer.

Infrastructure as Code (IaC) Orchestration

Centralized IaC templates and modules enable consistent infrastructure provisioning. Version-controlled infrastructure definitions ensure repeatability.

Examples include AWS CDK constructs, Terraform modules, and Pulumi components. Note that AWS Proton is being discontinued (October 2026), so teams should plan alternatives.

CI/CD Integration

Standardized pipeline templates with automated testing, quality gates, and progressive deployment strategies (canary, blue-green). Integration with GitHub Actions, GitLab CI, Jenkins, or CircleCI.

Observability and Monitoring

Pre-configured monitoring dashboards, standardized logging and tracing, alerting templates, and cost tracking. Tools like Prometheus, Datadog, New Relic, or CloudWatch integrate here.

Security and Compliance

Policy-as-code enforcement, security scanning automation (SAST, DAST, dependency scanning), compliance guardrails built into golden paths, and secret management integration.

Tools like Snyk, Dependabot, AWS Security Hub, and OPA (Open Policy Agent) automate security at platform level.

Platform as a Product Mindset

The most important principle when building platforms: developers are customers, not users to be controlled.

Platform success is measured by voluntary adoption. If you mandate platform usage, you’ve lost the most important feedback signal: whether developers actually find it valuable.

Critical practices that work:

Research developer pain points before building
Gather continuous feedback loops (surveys, office hours, Slack channels)
Measure developer satisfaction quarterly (NPS or custom metrics)
Don’t mandate platform adoption (it closes feedback loops)
Build what developers need, not what you think they need
Maintain a product roadmap and communicate changes transparently

Product thinking checklist:

Defined user personas (different dev team types)
User research conducted (surveys, interviews)
Clear value proposition for developers
Onboarding experience designed
Documentation written for humans
Support channels established
Success metrics defined and tracked
Roadmap shared transparently
Feedback mechanisms in place

Implementation Patterns and Best Practices

Starting Your Platform Journey

Don’t start with:

Biggest, most critical service (too risky for nascent platform)
Portal/UI first (need solid backend APIs first)
Top-down mandates (kills adoption)
“Build it and they will come” mentality
Everything in-house (“Not Invented Here” syndrome)

Do start with:

Minimum viable platform (MVP) with core value
Pilot with friendly, interested team
Backend APIs and orchestration first, UI second
One golden path done really well
Existing pain point that everyone feels
Developer research and feedback

Build vs. Buy Considerations

Build custom when:

Highly specific organizational requirements
Deep integration with legacy systems needed
Have dedicated platform engineering team (3+ engineers)
Long-term investment commitment
React/TypeScript expertise available (for Backstage)

Buy/adopt existing when:

Standard requirements covered by existing tools
Limited platform engineering resources
Need fast time-to-value (weeks, not months)
Prefer managed solutions over self-hosting
Want active community and ecosystem

Team Structure Recommendations

Platform Engineering Team Composition:

Former infrastructure engineers (already have expertise)
Product-minded engineers (user empathy)
Developer experience (DevEx) specialists
Technical writers (documentation matters)

Avoid: Moving all senior engineers to platform team (creates knowledge gaps in dev teams)

Team size by organization:

Small (< 50 engineers): 1-2 platform engineers
Medium (50-200 engineers): 3-5 platform engineers
Large (200-500 engineers): 5-10 platform engineers
Enterprise (500+ engineers): 10-20+ platform engineers

Common guideline (not industry standard): ~1 platform engineer per 30-50 application developers. This ratio varies significantly based on platform maturity, organizational complexity, and automation level.

Tooling Landscape

Backstage (Open Source by Spotify)

Open-sourced in 2020, Backstage has the largest ecosystem and market share. It’s a plugin-based architecture with React/TypeScript frontend and Node.js backend.

Strengths:

Highly customizable for complex requirements
Rich plugin ecosystem (100+ plugins)
Free and open source
Large community support
Unified developer experience

Challenges:

Significant implementation effort (3-6 months typical)
Requires React/TypeScript/SAML expertise
Self-hosting required (or use managed service like Roadie)
Steep learning curve
Ongoing maintenance burden

Important reality check: Organizations often underestimate that Backstage is “not ready-to-use” (Gartner warning). It’s a framework, not a product. You’re building your IDP using Backstage components.

Data point: Companies report 2x code change improvement and 17% cycle time decrease after Backstage implementation.

When to choose Backstage:

Large-scale, complex environments
Need deep customization
Have dedicated development resources
React/TypeScript expertise available
Open source preference

Port (Commercial Platform)

Founded in 2022, Port raised $35M Series B (2024), bringing total funding to $60M. It’s a no-code platform for IDP creation, hosted as SaaS.

Strengths:

Much faster time-to-value (weeks vs months)
No React/TypeScript expertise required
Excellent onboarding experience
Auto-import from GitHub/GitLab
Less maintenance overhead
Built-in tutorials and guides
Dynamic inventorying (CI/CD flows, clusters, environments)
Advanced search and RBAC

Challenges:

Implementation still lengthy (3-6 months reported)
Significant licensing costs (higher than competitors)
Less customizable than Backstage
Vendor lock-in concerns
Higher total cost of ownership

When to choose Port:

Want user-friendly, hassle-free IDP
Limited technical expertise for platform development
Prefer SaaS over self-hosted
Need fast deployment
Budget for commercial tooling

AWS Services for Platform Engineering

AWS Proton (Note: Discontinued October 7, 2026)

Managed service for infrastructure template vending
Platform engineers define standards, developers self-serve
Organizations should plan alternatives

Amazon EKS (Elastic Kubernetes Service)

Fully managed Kubernetes for platform foundations
EKS Blueprints: Curated templates for complete EKS setup
Platform engineering patterns well-supported
Integrates with Backstage, Port, custom IDPs

AWS CDK (Cloud Development Kit)

Infrastructure as Code in programming languages. Perfect for building platform golden paths:

// Platform team creates reusable construct
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import { Construct } from 'constructs';

export interface StandardApiServiceProps {
  serviceName: string;
  team: string;
  runtime: lambda.Runtime;
  handler: string;
  codeAsset: lambda.Code;
}

export class StandardApiService extends Construct {
  public readonly api: apigateway.RestApi;
  public readonly function: lambda.Function;

  constructor(scope: Construct, id: string, props: StandardApiServiceProps) {
    super(scope, id);

    // Lambda with standard configuration
    this.function = new lambda.Function(this, 'Handler', {
      runtime: props.runtime,
      handler: props.handler,
      code: props.codeAsset,
      timeout: cdk.Duration.seconds(30),
      memorySize: 1024,
      environment: {
        SERVICE_NAME: props.serviceName,
        TEAM: props.team,
      },
      // Observability built-in
      tracing: lambda.Tracing.ACTIVE,
      insightsVersion: lambda.LambdaInsightsVersion.VERSION_1_0_229_0,
    });

    // API Gateway with standard settings
    this.api = new apigateway.RestApi(this, 'Api', {
      restApiName: props.serviceName,
      // Security defaults
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
      },
    });

    // Standard integration
    const integration = new apigateway.LambdaIntegration(this.function);
    this.api.root.addProxy({ defaultIntegration: integration });

    // Standard tags
    cdk.Tags.of(this).add('Service', props.serviceName);
    cdk.Tags.of(this).add('Team', props.team);
    cdk.Tags.of(this).add('ManagedBy', 'Platform');
  }
}

// Developers use the golden path
const service = new StandardApiService(this, 'UserService', {
  serviceName: 'user-service',
  team: 'platform',
  runtime: lambda.Runtime.NODEJS_20_X,
  handler: 'index.handler',
  codeAsset: lambda.Code.fromAsset('./dist'),
});

AWS Service Catalog

Vending machine for approved AWS resources
Pre-configured product portfolios
Budget controls and governance
Alternative to AWS Proton

Other Notable Tools

Humanitec: Platform orchestration, focuses on application configuration
Kratix: Platform-as-a-product framework on Kubernetes
Crossplane: Infrastructure composition using Kubernetes APIs
Terraform Cloud/Enterprise: Workspace management for teams
Pulumi: Multi-language IaC with state management
ArgoCD / Flux: GitOps for Kubernetes deployments
Cortex: Developer scorecards and standards tracking

Measuring Platform Success

DORA Metrics (Traditional DevOps)

Four Key Metrics:

Deployment Frequency: How often code deploys to production
Lead Time for Change: Time from commit to production
Time to Restore Service: How long to recover from failure
Change Failure Rate: Percentage of deployments causing issues

Elite Performers (DORA 2024):

Deploy multiple times per day
Lead time < 1 day
Recovery time < 1 hour
Failure rate < 5%

DORA Limitations for Platform Engineering

Critical gap: DORA measures software delivery performance, NOT platform engineering effectiveness.

What DORA misses:

Infrastructure management quality
Security and compliance improvements
Platform usability and developer happiness
Tech debt reduction (unless it causes failures)
Scalability and maintainability work
Day 2-N operations improvements

Key insight: You cannot evaluate platform engineering teams on DORA metrics alone.

Platform-Specific Metrics

Developer Experience (DevEx) Metrics:

Platform Adoption Rate: % of teams using IDP vs alternatives
Self-Service Success Rate: % of self-service actions completed without help
Time to First Deployment: How long for new team to deploy using platform
Developer Satisfaction Score: Quarterly surveys (NPS or custom)
Tool Fragmentation Score: Number of tools developers must use
Onboarding Time: Days to productivity for new engineers

Platform Health Metrics:

Golden Path Usage: % of deployments using standard templates
Support Ticket Volume: Platform-related help requests (downward trend = success)
Platform Uptime: Availability of platform services
Template Update Velocity: How quickly platform capabilities improve
Documentation Coverage: % of platform features documented

Business Impact Metrics:

Cost Optimization: Infrastructure spend reduction through standardization
Security Posture: Vulnerability reduction, compliance improvements
Velocity Impact: Team throughput before/after platform adoption
Operational Efficiency: Reduced toil, automation coverage

Recommended Approach (DX Core 4 Framework):

Combine quantitative DORA metrics with:

Speed: Deployment frequency, lead time
Effectiveness: Self-service success, time to value
Quality: Change failure rate, security posture
Business Impact: Cost savings, team efficiency

Organizations using this balanced approach see 3-12% efficiency gains, 14% increases in R&D focus, 15% improvements in developer engagement.

Platform metrics tracking example:

// Platform metrics collection example
interface PlatformMetrics {
  deployments: {
    total: number;
    viaGoldenPath: number;
    selfService: number;
    requiredSupport: number;
  };
  adoption: {
    totalTeams: number;
    teamsUsingPlatform: number;
    activeUsers: number;
  };
  performance: {
    avgTimeToFirstDeploy: number; // days
    avgSelfServiceDuration: number; // minutes
    supportTickets: number;
  };
  satisfaction: {
    npsScore: number;
    surveyResponses: number;
  };
}

// Track golden path usage
function trackDeployment(method: 'golden-path' | 'custom' | 'manual') {
  metrics.deployments.total++;
  if (method === 'golden-path') {
    metrics.deployments.viaGoldenPath++;
  }
  // Golden path adoption rate
  const adoptionRate =
    (metrics.deployments.viaGoldenPath / metrics.deployments.total) * 100;

  console.log(`Golden path adoption: ${adoptionRate.toFixed(1)}%`);
}

Common Anti-Patterns and Pitfalls

Strategic Anti-Patterns

Confusing Platform with Portal

Reality: Platform is backend APIs, orchestration, and golden paths
Portal is just the UI layer
Fix: Build solid backend first, UI second

Not Treating Platform as Product

Symptom: Low adoption, developer frustration
Root cause: Building without user research
Fix: Apply product management practices, treat developers as customers

Top-Down Mandates Without Buy-In

Problem: Forces developers to use tools they don’t want
Impact: Resistance, workarounds, shadow IT
Fix: Make platform best option through superior experience

“Field of Dreams” Mentality

Mistake: “Build it and they will come”
Reality: Platform must solve actual developer pain
Fix: Start with research, validate with pilots

Implementation Anti-Patterns

Starting with Biggest/Most Critical Service

Risk: Too much pressure on nascent platform
Impact: Failed pilot damages platform credibility
Fix: Start with friendly team, non-critical service

Overly Complex Platforms

Symptoms: Unfamiliar config formats, no documentation, inconsistent APIs
Impact: Developers avoid platform
Fix: Simplicity first, consistency always, document everything

Over-Reliance on Ticket Systems

Problem: Tickets create bottlenecks, reduce autonomy
Impact: Slow delivery, developer frustration
Fix: True self-service, minimal approval workflows

Templates-as-a-Service Only

Problem: Rigid templates, no customization
Impact: Workarounds, shadow IT, abandoned templates
Fix: Templates as starting point, allow customization within boundaries

Organizational Anti-Patterns

Skill Concentration Trap

Mistake: Moving all senior engineers to platform team
Impact: Knowledge gaps in development teams
Fix: Balanced team distribution, rotate people

Underinvested Platforms

Symptom: Platform team disbands after initial delivery
Impact: Platform becomes unmaintained anchor
Fix: Long-term investment commitment, ongoing team

Lack of Cost Controls

Problem: Self-service without spending limits
Impact: Runaway cloud costs
Fix: Budget controls, automated spending limits, cost visibility

Anti-Pattern	Symptom	Fix
Portal-First	No backend APIs	Build orchestration first
No Product Thinking	Low adoption	User research, feedback loops
Mandatory Platform	Resistance	Make it best option
Complex Config	Developers avoid it	Simplify, document
Ticket-Driven	Bottlenecks	True self-service
Skill Concentration	Knowledge gaps	Balanced distribution
Underinvestment	Abandoned platform	Long-term commitment
No Cost Controls	Cloud bill shock	Automated limits

Getting Started: Practical Roadmap

Phase 1: Foundation (Weeks 1-4)

Identify 3-5 biggest developer pain points (surveys, interviews)
Define platform vision and success metrics
Form initial platform team (2-3 people)
Choose pilot team (friendly, non-critical service)
Document current state (tools, workflows, pain points)

Phase 2: MVP Development (Weeks 5-12)

Build one golden path (e.g., standard API service template)
Create basic service catalog (manual is fine)
Implement self-service workflow (CLI or simple UI)
Write clear documentation
Deploy pilot with friendly team

Phase 3: Validation (Weeks 13-16)

Gather pilot feedback (what works, what doesn’t)
Measure baseline metrics (time to deploy, satisfaction)
Iterate on golden path based on feedback
Expand to 2-3 more teams
Document learnings and adjust roadmap

Phase 4: Scale (Months 5-12)

Add 2-3 more golden paths (common use cases)
Build or adopt developer portal (Backstage/Port decision)
Integrate observability and security
Implement cost controls
Scale to 25-50% of organization
Establish support channels

Phase 5: Mature (Year 2+)

Continuous improvement based on metrics
Advanced features (AI assistance, automated optimization)
Cross-team collaboration features
Platform API stability and versioning
Community building (internal user groups)

Quick win ideas for early momentum:

Standardized service template: New service in < 30 minutes
One-click environment: Ephemeral dev/test environments
Automated security scanning: Build into pipelines
Cost dashboards: Show teams their spend
Onboarding automation: New engineer productivity in < 1 day

Key Takeaways

For Engineering Leaders:

Platform engineering is product management applied to internal tools
Measure success through developer satisfaction, not just DORA metrics
Long-term investment required (people, time, budget)
Start small, validate, scale based on feedback

For Platform Engineers:

Backend APIs before UI/portal
One golden path done well beats many mediocre ones
Developer research prevents building wrong things
Documentation and onboarding are product features, not afterthoughts
Security and cost controls from day one

For Development Teams:

Platforms succeed when developers choose them voluntarily
Provide feedback to platform teams (they need it)
Golden paths are shortcuts, not cages
Self-service reduces friction but requires platform investment

Critical Success Factors:

Product mindset (developers as customers)
Start with real pain points
Measure what matters (not just DORA)
Avoid common anti-patterns
Long-term commitment and iteration

Platform Engineering is fundamentally about enabling developers to do their best work. Treating the platform as a product and developers as customers produces something they’ll actually want to use; that adoption is what determines whether the investment pays off.

References

CNCF Platforms White Paper - CNCF TAG App Delivery canonical definition of internal developer platforms, capabilities, and maturity levels
Platform Engineering Maturity Model - CNCF guidance on progressing from ad-hoc tooling to managed, optimized platform products
What is Platform Engineering? - CNCF - CNCF’s current definition of platform engineering, its goals, and its relationship to DevOps
Platform as a Product - platformengineering.org - Community resource on applying product thinking to internal developer platforms
Team Topologies - Platform Engineering - How Team Topologies’ “platform team” and “thinnest viable platform” concepts apply to IDP design
What is Platform Engineering? - platformengineering.org - Community definition and foundational reading on platform engineering goals and golden paths

Backstage Build vs Buy: The Real Cost of Running Your Own IDP

Backstage looks like a quick install, but the recurring cost is a standing platform team. A leader's guide to deciding DIY Backstage versus a hosted IDP.

backstageplatform-engineeringdeveloper-experience +2

June 30, 2026

Golden Paths for a Frontend Platform: Scaffolding, Shared Packages, and a Task CLI

How a frontend platform team makes the right way the easy way: golden-path scaffolding, versioned shared packages, and a task CLI that removes per-team drift.

platform-engineeringdeveloper-experiencemonorepo +5

June 16, 2026

Managing IAM Policies and Roles at Scale Without Hitting AWS Limits

The exact IAM size, attach, and quota limits you will hit at scale, and the scoped-policy, permission-boundary, and SCP structure that keeps you far from every one.

awsiamsecurity +4

July 1, 2026

EventBridge Cross-Account Fan-Out: One Producer, Isolated Consumers

A platform-engineering default for multi-team AWS orgs: one event, many consumers, each in its own account with its own SQS and DLQ, fan-out lives in the event bus layer.

awseventbridgeevent-driven +5

April 20, 2026

Org-Level Reusable GitHub Actions Workflows: Architecture, Security, and Adoption

A practical guide to building an org-level shared GitHub Actions platform: architecture decisions, security governance, adoption, and 7 costly mistakes.

github-actionsci-cddevops +5

March 1, 2026