TechLead
Lesson 25 of 25
5 min read
Cloud & Kubernetes

Multi-Cloud Strategies

Design and implement multi-cloud architectures, avoid vendor lock-in, use abstraction layers, and manage complexity across cloud providers

What is Multi-Cloud?

A multi-cloud strategy involves using services from two or more cloud providers (AWS, GCP, Azure) to run your applications and infrastructure. Unlike hybrid cloud (which combines on-premises with cloud), multi-cloud specifically refers to using multiple public cloud providers. Organizations adopt multi-cloud to avoid vendor lock-in, leverage best-of-breed services, improve resilience, and optimize costs.

Benefits vs Challenges

  • Benefit - No Vendor Lock-in: Freedom to move workloads between providers
  • Benefit - Best-of-Breed: Use BigQuery from GCP, Lambda from AWS, AD from Azure
  • Benefit - Resilience: Survive a full cloud provider outage
  • Challenge - Complexity: Managing multiple platforms, tools, and APIs
  • Challenge - Cost: Data transfer fees, duplicate infrastructure, skills gap
  • Challenge - Consistency: Different IAM, networking, and storage models

Multi-Cloud Architecture Patterns

Pattern 1: Cloud-Agnostic with Kubernetes

Run Kubernetes everywhere (EKS, GKE, AKS) and deploy the same application manifests across providers. Kubernetes provides the abstraction layer that makes your workloads portable.

# multi-cluster-deployment.yaml
# This same manifest works on EKS, GKE, and AKS
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: app
        image: ghcr.io/myorg/my-app:1.0.0  # Cloud-agnostic registry
        ports:
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi

Pattern 2: Best-of-Breed Services

Use each cloud for what it does best: GCP for data analytics (BigQuery), AWS for compute (EC2/Lambda), Azure for identity (Entra ID). This requires a service mesh or API gateway to manage cross-cloud communication.

Pattern 3: Active-Active for Disaster Recovery

Run identical stacks on two cloud providers, with traffic routed via DNS (Route 53, CloudFlare). If one provider experiences an outage, traffic fails over to the other.

Terraform for Multi-Cloud

# multi-cloud main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "google" {
  project = "my-gcp-project"
  region  = "us-central1"
}

provider "azurerm" {
  features {}
}

# EKS cluster on AWS
module "eks" {
  source = "./modules/eks"
  # ...
}

# GKE cluster on GCP
module "gke" {
  source = "./modules/gke"
  # ...
}

# Use the Kubernetes provider for each cluster
provider "kubernetes" {
  alias = "eks"
  host  = module.eks.cluster_endpoint
  token = module.eks.cluster_token
  cluster_ca_certificate = base64decode(module.eks.cluster_ca_cert)
}

provider "kubernetes" {
  alias = "gke"
  host  = module.gke.cluster_endpoint
  token = module.gke.cluster_token
  cluster_ca_certificate = base64decode(module.gke.cluster_ca_cert)
}

Abstraction Layers for Portability

// cloud-storage-abstraction.ts
// Abstract cloud storage operations for portability
interface CloudStorageProvider {
  upload(bucket: string, key: string, data: Buffer): Promise<string>;
  download(bucket: string, key: string): Promise<Buffer>;
  delete(bucket: string, key: string): Promise<void>;
  listObjects(bucket: string, prefix: string): Promise<string[]>;
}

class AWSStorageProvider implements CloudStorageProvider {
  private s3: S3Client;

  constructor(region: string) {
    this.s3 = new S3Client({ region });
  }

  async upload(bucket: string, key: string, data: Buffer): Promise<string> {
    await this.s3.send(new PutObjectCommand({ Bucket: bucket, Key: key, Body: data }));
    return `s3://${bucket}/${key}`;
  }

  async download(bucket: string, key: string): Promise<Buffer> {
    const response = await this.s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
    return Buffer.from(await response.Body!.transformToByteArray());
  }

  // ... delete, listObjects
}

class GCPStorageProvider implements CloudStorageProvider {
  private storage: Storage;

  constructor() {
    this.storage = new Storage();
  }

  async upload(bucket: string, key: string, data: Buffer): Promise<string> {
    await this.storage.bucket(bucket).file(key).save(data);
    return `gs://${bucket}/${key}`;
  }

  // ... download, delete, listObjects
}

// Factory function based on environment
function createStorageProvider(): CloudStorageProvider {
  switch (process.env.CLOUD_PROVIDER) {
    case 'aws': return new AWSStorageProvider(process.env.AWS_REGION!);
    case 'gcp': return new GCPStorageProvider();
    default: throw new Error(`Unknown provider: ${process.env.CLOUD_PROVIDER}`);
  }
}

export const storage = createStorageProvider();

Cross-Cloud Networking

# Set up VPN between AWS and GCP

# AWS: Create a VPN Gateway
aws ec2 create-vpn-gateway --type ipsec.1
aws ec2 attach-vpn-gateway --vpn-gateway-id vgw-xxx --vpc-id vpc-xxx

# GCP: Create a Cloud VPN
gcloud compute vpn-gateways create my-vpn-gw \
  --network=default \
  --region=us-central1

# Alternative: Use a service mesh like Istio for cross-cluster communication
# Install Istio on both clusters and configure multi-cluster mesh

Avoiding Vendor Lock-in

Strategies for Portability

  • Kubernetes: Use K8s as the compute abstraction — EKS, GKE, and AKS all run the same workloads
  • Terraform: Define infrastructure in Terraform with provider-agnostic modules where possible
  • OCI Images: Use cloud-neutral container registries (GHCR, Docker Hub)
  • Standard APIs: Prefer S3-compatible storage, PostgreSQL over proprietary databases
  • Abstraction Layers: Wrap cloud-specific SDKs behind interfaces in your application code
  • Open Standards: Use OpenTelemetry, Prometheus, Grafana instead of cloud-specific monitoring

Key Takeaways

  • Multi-cloud avoids vendor lock-in but adds complexity — weigh benefits against costs
  • Kubernetes provides the best workload portability across cloud providers
  • Terraform manages infrastructure across all major clouds from a single tool
  • Use abstraction layers and interfaces to keep application code cloud-agnostic
  • Start with a primary cloud and expand to multi-cloud only when the business case is clear

Continue Learning