Database Selection Guide: From Classical to Edge - A Complete Engineering Perspective

Comprehensive guide to choosing the right database for your project - covering SQL, NoSQL, NewSQL, and edge solutions with real-world implementation stories and performance benchmarks.

Last month, I watched a startup burn through $50K in a single weekend because they chose the wrong database. A simple product catalog that worked perfectly with 1,000 products suddenly ground to a halt at 100,000. MongoDB collections weren't indexed properly, queries were scanning entire collections, and their "web-scale" solution became a very expensive lesson in database fundamentals.

This isn't uncommon. Database choices derail more projects than most engineers realize. The database is your application's foundation - get it wrong, and everything built on top starts to crack.

Why Database Choice Matters More Than You Think#

The Real Cost of Wrong Choices#

Technical Debt Explosion: Switching databases mid-project isn't just a migration - it's architectural surgery. One system I inherited was using MySQL for time-series data. Every query was a nightmare of date functions and subqueries. Moving to InfluxDB took 6 months and required rewriting half the application logic.

Team Productivity Impact: Your choice directly affects development velocity. A team comfortable with SQL struggling with MongoDB document queries can lose 40% productivity for months. Conversely, NoSQL experts forced into rigid SQL schemas often over-engineer simple problems.

Hidden Operational Costs:

  • PostgreSQL self-hosted: $200/month server + 20 hours admin time
  • RDS PostgreSQL: $400/month but 2 hours admin time
  • DynamoDB: $50/month usage + zero admin time (when designed properly)

The cheapest option often costs the most when you factor in engineering time.

Classical Database Categories#

Relational (SQL) Databases#

PostgreSQL: The Swiss Army Knife#

PostgreSQL has become my default choice for most projects. It's boring in the best possible way - reliable, well-documented, and handles edge cases gracefully.

When PostgreSQL Shines:

  • Complex business logic requiring ACID transactions
  • Analytics workloads with sophisticated queries
  • Applications needing both relational and document storage (JSONB)
  • Teams comfortable with SQL

Production Story: We migrated a Rails app from MySQL to PostgreSQL specifically for its JSON capabilities. Being able to store flexible metadata alongside relational data eliminated the need for a separate document store. Query performance improved 3x, and we avoided the complexity of maintaining two database systems.

TypeScript
// PostgreSQL with JSONB - best of both worlds
const user = await db.query(`
  SELECT id, email, 
         preferences->>'theme' as theme,
         preferences->'notifications'->>'email' as email_notifications
  FROM users 
  WHERE preferences @> '{"beta_features": true}'
`);

Gotchas:

  • Write amplification with frequent updates (use HOT updates wisely)
  • Connection management - use pgBouncer in production
  • Vacuum tuning required for high-write workloads

MySQL: The Web-Scale Veteran#

MySQL earned its reputation powering the web's biggest sites. It's fast, well-understood, and has an ecosystem built around web applications.

When MySQL Works:

  • Read-heavy web applications
  • Applications requiring master-slave replication
  • Teams with existing MySQL expertise
  • Cost-conscious projects (excellent community support)

Real-World Performance: At a previous company, our MySQL cluster handled 50K QPS across 12 read replicas. The key was treating it like a cache layer - denormalized data, aggressive indexing, and strategic partitioning.

SQL
-- MySQL optimized for read performance
CREATE TABLE user_stats (
  user_id INT PRIMARY KEY,
  total_orders INT DEFAULT 0,
  last_order_date DATE,
  lifetime_value DECIMAL(10,2),
  INDEX idx_lifetime_value (lifetime_value DESC),
  INDEX idx_last_order (last_order_date)
) ENGINE=InnoDB;

Trade-offs:

  • Less sophisticated query planner than PostgreSQL
  • JSON support exists but feels bolted-on
  • Replication lag can be tricky in multi-master setups

SQLite: The Embedded Champion#

Don't underestimate SQLite. It's not just for mobile apps anymore. With proper configuration, it can handle surprising workloads.

Perfect For:

  • Edge applications with local data requirements
  • Development and testing environments
  • Applications with <100GB data and modest concurrency
  • Embedded systems and IoT devices

Performance Reality Check: SQLite can handle 100K reads/second on modern hardware. The bottleneck is usually concurrent writes, not reads.

TypeScript
// SQLite with WAL mode for better concurrency
const db = new Database('app.db', {
  pragma: {
    journal_mode: 'WAL',
    synchronous: 'NORMAL',
    cache_size: -64000, // 64MB cache
    temp_store: 'MEMORY'
  }
});

NoSQL Databases#

MongoDB: The Document Store#

MongoDB gets a lot of hate, often deserved, but it genuinely excels in specific scenarios. The key is understanding its strengths and designing around its limitations.

Where MongoDB Excels:

  • Rapid prototyping with evolving schemas
  • Content management systems
  • Catalog systems with varied product attributes
  • Applications where document structure matches business logic

Hard-Learned Lessons: Always design your indexes first. MongoDB without proper indexes is like a Ferrari without wheels - impressive specs but unusable performance.

JavaScript
// MongoDB indexing strategy for e-commerce
db.products.createIndex({
  "category": 1,
  "price": 1,
  "createdAt": -1
});

// Compound index for faceted search
db.products.createIndex({
  "category": 1,
  "attributes.brand": 1,
  "attributes.color": 1,
  "price": 1
});

Production Gotchas:

  • Memory usage grows with working set size
  • Aggregation pipelines can be memory-intensive
  • Sharding requires careful planning of shard keys

Redis: The Speed Demon#

Redis isn't just a cache - it's a data structure server that can solve complex problems elegantly.

Redis Use Cases Beyond Caching:

  • Session storage with automatic expiration
  • Rate limiting with sliding windows
  • Real-time leaderboards and counters
  • Pub/sub for real-time features
  • Distributed locks for coordination

Battle-Tested Pattern: Using Redis for distributed rate limiting across microservices:

TypeScript
// Sliding window rate limiter in Redis
async function checkRateLimit(userId: string, limit: number, windowMs: number) {
  const key = `rate_limit:${userId}`;
  const now = Date.now();
  const windowStart = now - windowMs;
  
  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);
  pipeline.zadd(key, now, now);
  pipeline.zcard(key);
  pipeline.expire(key, Math.ceil(windowMs / 1000));
  
  const results = await pipeline.exec();
  const currentCount = results[2][1] as number;
  
  return currentCount <= limit;
}

DynamoDB: The Serverless Powerhouse#

DynamoDB is either amazing or terrible depending on how well you understand its data model. There's no middle ground.

DynamoDB Strengths:

  • True serverless with pay-per-use pricing
  • Predictable single-digit millisecond latency
  • Automatic scaling and backup
  • Global tables for multi-region applications

The DynamoDB Mental Model: Stop thinking in SQL. Start thinking in access patterns. Design your table structure around how you'll query the data, not how you'll store it.

TypeScript
// DynamoDB single-table design pattern
interface GameRecord {
  PK: string;     // USER#123 or GAME#456
  SK: string;     // PROFILE or SCORE#2024-01-15
  Type: string;   // USER or GAME or SCORE
  GSI1PK?: string; // For secondary access patterns
  GSI1SK?: string;
  // ... other attributes
}

// Query user's recent scores
const scores = await dynamodb.query({
  TableName: 'GameData',
  KeyConditionExpression: 'PK = :pk AND begins_with(SK, :sk)',
  ExpressionAttributeValues: {
    ':pk': 'USER#123',
    ':sk': 'SCORE#'
  },
  ScanIndexForward: false, // Latest first
  Limit: 10
}).promise();

DynamoDB Gotchas:

  • Hot partitions can throttle your entire application
  • Query patterns must be known upfront
  • Complex relationships require careful GSI design
  • FilterExpressions still consume read capacity

NewSQL: Best of Both Worlds#

CockroachDB: Distributed SQL Done Right#

CockroachDB promises PostgreSQL compatibility with global distribution. In practice, it delivers on most of these promises with some important caveats.

When CockroachDB Makes Sense:

  • Global applications requiring strong consistency
  • Financial systems needing ACID across regions
  • Applications outgrowing single-node PostgreSQL
  • Teams wanting SQL with automatic sharding

Real Experience: We used CockroachDB for a fintech application spanning US and EU. The automatic geo-partitioning kept user data in the right regions for compliance, while maintaining strong consistency for financial transactions.

SQL
-- CockroachDB geo-partitioning
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email STRING UNIQUE,
  region STRING NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY LIST (region);

CREATE PARTITION us_users VALUES IN ('us-east', 'us-west');
CREATE PARTITION eu_users VALUES IN ('eu-west', 'eu-central');

Trade-offs:

  • Higher latency than single-node databases due to consensus
  • More expensive than traditional PostgreSQL
  • Some PostgreSQL features still missing or different

Edge Database Solutions#

PouchDB/CouchDB: Offline-First Architecture#

For applications that need to work offline, CouchDB's replication model is unmatched. PouchDB brings this to the browser seamlessly.

Perfect For:

  • Field service applications
  • Mobile apps in areas with poor connectivity
  • Collaborative applications with eventual consistency needs

Implementation Pattern:

JavaScript
// PouchDB offline-first pattern
const localDB = new PouchDB('local-data');
const remoteDB = new PouchDB('https://server.com/data');

// Two-way sync with conflict resolution
const sync = localDB.sync(remoteDB, {
  live: true,
  retry: true
}).on('change', (info) => {
  console.log('Sync change:', info);
}).on('error', (err) => {
  console.log('Sync error:', err);
});

// App works offline, syncs when online
await localDB.put({
  _id: 'user-123',
  name: 'John Doe',
  lastModified: new Date().toISOString()
});

InfluxDB: Time-Series Specialist#

When you're dealing with metrics, logs, or IoT data, specialized time-series databases like InfluxDB outperform general-purpose databases dramatically.

InfluxDB Advantages:

  • Automatic downsampling and retention policies
  • Built-in time-based functions and aggregations
  • Efficient storage for time-series data
  • Native integration with monitoring tools
InfluxQL
-- InfluxDB query for system metrics
SELECT mean("cpu_usage") 
FROM "system_metrics" 
WHERE time >= now() - 24h 
GROUP BY time(1h), "host"

Database Selection Matrix#

By Use Case#

E-commerce Platform:

  • Catalog: PostgreSQL (structured product data + JSONB for attributes)
  • Sessions: Redis (fast access + automatic expiration)
  • Orders: PostgreSQL (ACID compliance for financial data)
  • Analytics: ClickHouse or BigQuery (analytical workloads)

IoT Application:

  • Device State: Redis (real-time updates)
  • Time Series: InfluxDB (sensor data)
  • Configuration: PostgreSQL (device management)
  • Edge Cache: SQLite (local device storage)

Social Media App:

  • User Profiles: PostgreSQL (relational data)
  • Posts/Timeline: DynamoDB (high scale, simple queries)
  • Real-time: Redis Streams (notifications, chat)
  • Search: Elasticsearch (content discovery)

By Scale Requirements#

Small Scale (1K-100K users): PostgreSQL + Redis covers 90% of use cases. Simple, well-understood, cost-effective.

Medium Scale (100K-10M users):

  • Read replicas for PostgreSQL
  • DynamoDB for high-traffic features
  • Elasticsearch for search
  • Redis cluster for caching

Large Scale (10M+ users):

  • Sharded PostgreSQL or CockroachDB
  • DynamoDB with careful partition design
  • Redis Cluster with consistent hashing
  • Specialized databases for specific workloads

Selection Criteria Deep Dive#

Consistency Requirements#

Strong Consistency (ACID): PostgreSQL, CockroachDB, SQL Server

  • Financial transactions
  • Inventory management
  • User authentication

Eventual Consistency (BASE): DynamoDB, MongoDB, Cassandra

  • Social media feeds
  • Content catalogs
  • Analytics data

Choose Strong When: Data integrity is more important than availability Choose Eventual When: Availability and partition tolerance are priority

Performance Patterns#

Read-Heavy Workloads: MySQL with read replicas, Redis caching layer Write-Heavy Workloads: DynamoDB, Cassandra, or sharded PostgreSQL Mixed Workloads: PostgreSQL with proper indexing and connection pooling

Latency Requirements:

  • <1ms: Redis (in-memory)
  • <10ms: DynamoDB, well-tuned PostgreSQL
  • <100ms: Most SQL databases with proper indexing
  • 100ms: Acceptable for analytical workloads

Real-World Migration Stories#

The MongoDB to PostgreSQL Migration#

The Problem: A content management system using MongoDB was struggling with complex queries. Aggregation pipelines were becoming unmaintainable, and the lack of schema validation was causing data quality issues.

The Solution: Migrated to PostgreSQL with JSONB columns for flexible content, maintaining the benefits of document storage while gaining SQL's query power.

Timeline: 3 months with zero downtime using a dual-write pattern:

TypeScript
// Dual-write migration pattern
class ContentService {
  async createPost(post: Post) {
    // Write to new PostgreSQL database
    const pgResult = await this.postgresDB.insert(post);
    
    try {
      // Write to legacy MongoDB (for rollback safety)
      await this.mongoDB.insertOne(post);
    } catch (error) {
      // MongoDB failure shouldn't break the flow
      console.error('MongoDB write failed:', error);
    }
    
    return pgResult;
  }
}

Lessons Learned:

  • Schema validation in PostgreSQL caught data quality issues early
  • JSONB queries were faster than MongoDB aggregations for our use case
  • The migration improved developer productivity significantly

The Single-Region to Multi-Region Challenge#

The Challenge: A growing SaaS needed to expand from US-only to global, requiring data residency compliance and low latency worldwide.

The Solution: Migrated from single PostgreSQL to CockroachDB with geo-partitioning. User data stayed in their regions while maintaining global consistency for billing and analytics.

Implementation:

SQL
-- Geo-partitioned user data
ALTER TABLE users CONFIGURE ZONE USING constraints = '[+region=us-east1]';
ALTER TABLE user_profiles CONFIGURE ZONE USING constraints = '[+region=us-east1]';

-- Global data (billing, analytics)
ALTER TABLE subscriptions CONFIGURE ZONE USING constraints = '[]';

Results:

  • Latency reduced from 200ms to 50ms for European users
  • GDPR compliance achieved through data localization
  • Development complexity increased but operational benefits justified the cost

Performance Benchmarking#

Read Performance Comparison#

Based on real-world testing with 1M records:

Simple Key Lookups (ops/second):

  • Redis: 100,000+
  • DynamoDB: 50,000
  • PostgreSQL (indexed): 25,000
  • MongoDB (indexed): 20,000
  • MySQL (indexed): 22,000

Complex Queries (analytical workloads):

  • PostgreSQL: Excellent (sophisticated query planner)
  • CockroachDB: Good (distributed but still SQL)
  • MongoDB: Poor (aggregation pipelines)
  • DynamoDB: Not applicable (limited query capabilities)

Write Performance Under Load#

Concurrent Writes (1000 clients):

  • DynamoDB: Scales automatically, consistent performance
  • Redis: Excellent until memory limit
  • PostgreSQL: Good with proper connection pooling
  • MongoDB: Degrades with document size growth

Implementation Patterns#

Database Sharding Strategies#

Horizontal Sharding (dividing data across servers):

TypeScript
// User-based sharding
function getShardForUser(userId: string): string {
  const hash = createHash('md5').update(userId).digest('hex');
  const shardIndex = parseInt(hash.substring(0, 8), 16) % NUM_SHARDS;
  return `shard_${shardIndex}`;
}

// Route queries to appropriate shard
class ShardedUserService {
  async getUser(userId: string) {
    const shard = getShardForUser(userId);
    return this.databases[shard].query('SELECT * FROM users WHERE id = ?', [userId]);
  }
}

Vertical Sharding (separating by feature):

TypeScript
// Separate databases by domain
class UserService {
  profiles = new DatabaseConnection('user_profiles_db');
  preferences = new DatabaseConnection('user_preferences_db');
  analytics = new DatabaseConnection('user_analytics_db');
  
  async getFullUser(userId: string) {
    const [profile, preferences, analytics] = await Promise.all([
      this.profiles.getUser(userId),
      this.preferences.getUser(userId),
      this.analytics.getUser(userId)
    ]);
    
    return { ...profile, preferences, analytics };
  }
}

Connection Management#

PostgreSQL Connection Pooling:

TypeScript
// Production PostgreSQL setup
import { Pool } from 'pg';

const pool = new Pool({
  host: process.env.DB_HOST,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  // Critical production settings
  max: 20,                    // Maximum connections
  idleTimeoutMillis: 30000,   // Close idle connections
  connectionTimeoutMillis: 2000, // Fail fast on connection issues
  maxUses: 7500,              // Rotate connections to prevent memory leaks
});

// Always use transactions for data consistency
async function transferMoney(fromUserId: string, toUserId: string, amount: number) {
  const client = await pool.connect();
  
  try {
    await client.query('BEGIN');
    
    await client.query(
      'UPDATE accounts SET balance = balance - $1 WHERE user_id = $2',
      [amount, fromUserId]
    );
    
    await client.query(
      'UPDATE accounts SET balance = balance + $1 WHERE user_id = $2',
      [amount, toUserId]
    );
    
    await client.query('COMMIT');
  } catch (error) {
    await client.query('ROLLBACK');
    throw error;
  } finally {
    client.release();
  }
}

Monitoring and Troubleshooting#

Key Metrics to Track#

PostgreSQL Essential Metrics:

  • Connection usage (pg_stat_activity)
  • Query performance (pg_stat_statements)
  • Index usage (pg_stat_user_indexes)
  • Replication lag (pg_stat_replication)
SQL
-- PostgreSQL health check queries
-- Long-running queries
SELECT pid, now() - query_start as duration, query 
FROM pg_stat_activity 
WHERE now() - query_start > interval '5 minutes';

-- Index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes 
ORDER BY idx_scan DESC;

-- Connection count by state
SELECT state, count(*) 
FROM pg_stat_activity 
GROUP BY state;

DynamoDB CloudWatch Metrics:

  • ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits
  • ThrottledRequests (critical!)
  • SuccessfulRequestLatency
  • SystemErrors

MongoDB Key Metrics:

  • Operations per second (opcounters)
  • Working set size vs available memory
  • Lock percentage
  • Replication lag

Common Performance Issues#

The N+1 Query Problem:

TypeScript
// BAD: N+1 queries
async function getUsersWithPosts() {
  const users = await db.query('SELECT * FROM users');
  
  for (const user of users) {
    user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
  }
  
  return users;
}

// GOOD: Single query with JOIN
async function getUsersWithPosts() {
  return db.query(`
    SELECT u.*, p.id as post_id, p.title, p.content
    FROM users u
    LEFT JOIN posts p ON u.id = p.user_id
    ORDER BY u.id, p.created_at DESC
  `);
}

Connection Pool Exhaustion:

TypeScript
// Monitoring connection pool health
setInterval(() => {
  console.log({
    totalConnections: pool.totalCount,
    idleConnections: pool.idleCount,
    waitingClients: pool.waitingCount
  });
  
  if (pool.waitingCount > 5) {
    console.warn('Connection pool under pressure!');
  }
}, 30000);

Future-Proofing Your Database Choice#

Technology Trends to Watch#

Vector Databases for AI/ML: pgvector for PostgreSQL, Pinecone, Weaviate

  • Embedding storage for semantic search
  • RAG (Retrieval-Augmented Generation) applications
  • Image and document similarity search

Multi-Model Databases: FaunaDB, Azure Cosmos DB

  • Single database supporting multiple data models
  • Reduced operational complexity
  • Unified query interfaces

Serverless-First Architectures:

  • PlanetScale (serverless MySQL)
  • Neon (serverless PostgreSQL)
  • FaunaDB (serverless transactional)

Planning for Growth#

Capacity Planning Framework:

TypeScript
// Database growth projection model
interface GrowthProjection {
  currentUsers: number;
  userGrowthRate: number; // monthly %
  avgDataPerUser: number; // in KB
  queryGrowthMultiplier: number; // queries grow faster than users
}

function projectDatabaseNeeds(projection: GrowthProjection, months: number) {
  const futureUsers = projection.currentUsers * Math.pow(1 + projection.userGrowthRate, months);
  const futureDataSize = futureUsers * projection.avgDataPerUser;
  const futureQPS = futureUsers * projection.queryGrowthMultiplier;
  
  return {
    estimatedUsers: Math.round(futureUsers),
    estimatedDataSizeGB: Math.round(futureDataSize / 1024 / 1024),
    estimatedQPS: Math.round(futureQPS),
    recommendedShards: Math.ceil(futureQPS / 10000) // Assuming 10K QPS per shard
  };
}

Team Development Strategy#

Skill Building Path:

  1. Foundation: Master one SQL database deeply (PostgreSQL recommended)
  2. NoSQL Understanding: Learn one document store (MongoDB) and one key-value (Redis)
  3. Cloud Native: Understand one cloud database (DynamoDB or Cosmos DB)
  4. Specialization: Deep dive into domain-specific databases (time-series, graph, etc.)

Knowledge Sharing Practices:

  • Database design reviews for all new features
  • Regular performance analysis sessions
  • Post-mortem analysis of database-related incidents
  • Cross-training on different database technologies

Decision Framework#

When choosing a database for a new project, ask these questions in order:

1. Consistency Requirements#

  • Do you need ACID transactions? → SQL databases
  • Can you work with eventual consistency? → NoSQL options open up

2. Query Complexity#

  • Complex analytical queries? → PostgreSQL, CockroachDB
  • Simple key-value lookups? → Redis, DynamoDB
  • Full-text search required? → Elasticsearch + primary database

3. Scale and Performance#

  • Current scale: <100K users → PostgreSQL + Redis
  • Growth trajectory: >1M users → Consider sharding or cloud-native options
  • Latency requirements: <10ms → In-memory (Redis) or optimized NoSQL

4. Team and Operational Constraints#

  • Team expertise: Stick close to existing skills initially
  • Operational budget: Managed services vs. self-hosted
  • Compliance requirements: Data residency, encryption, audit trails

5. Future Flexibility#

  • How likely is the data model to change? → Document stores for high change rate
  • Multi-region expansion planned? → Consider distributed databases early
  • Integration requirements: What other systems need to connect?
Loading...

Comments (0)

Join the conversation

Sign in to share your thoughts and engage with the community

No comments yet

Be the first to share your thoughts on this post!

Related Posts