What is Multi-Cloud?
A multi-cloud strategy involves using services from two or more cloud providers (AWS, GCP, Azure) to run your applications and infrastructure. Unlike hybrid cloud (which combines on-premises with cloud), multi-cloud specifically refers to using multiple public cloud providers. Organizations adopt multi-cloud to avoid vendor lock-in, leverage best-of-breed services, improve resilience, and optimize costs.
Benefits vs Challenges
- Benefit - No Vendor Lock-in: Freedom to move workloads between providers
- Benefit - Best-of-Breed: Use BigQuery from GCP, Lambda from AWS, AD from Azure
- Benefit - Resilience: Survive a full cloud provider outage
- Challenge - Complexity: Managing multiple platforms, tools, and APIs
- Challenge - Cost: Data transfer fees, duplicate infrastructure, skills gap
- Challenge - Consistency: Different IAM, networking, and storage models
Multi-Cloud Architecture Patterns
Pattern 1: Cloud-Agnostic with Kubernetes
Run Kubernetes everywhere (EKS, GKE, AKS) and deploy the same application manifests across providers. Kubernetes provides the abstraction layer that makes your workloads portable.
# multi-cluster-deployment.yaml
# This same manifest works on EKS, GKE, and AKS
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: app
image: ghcr.io/myorg/my-app:1.0.0 # Cloud-agnostic registry
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
Pattern 2: Best-of-Breed Services
Use each cloud for what it does best: GCP for data analytics (BigQuery), AWS for compute (EC2/Lambda), Azure for identity (Entra ID). This requires a service mesh or API gateway to manage cross-cloud communication.
Pattern 3: Active-Active for Disaster Recovery
Run identical stacks on two cloud providers, with traffic routed via DNS (Route 53, CloudFlare). If one provider experiences an outage, traffic fails over to the other.
Terraform for Multi-Cloud
# multi-cloud main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
provider "google" {
project = "my-gcp-project"
region = "us-central1"
}
provider "azurerm" {
features {}
}
# EKS cluster on AWS
module "eks" {
source = "./modules/eks"
# ...
}
# GKE cluster on GCP
module "gke" {
source = "./modules/gke"
# ...
}
# Use the Kubernetes provider for each cluster
provider "kubernetes" {
alias = "eks"
host = module.eks.cluster_endpoint
token = module.eks.cluster_token
cluster_ca_certificate = base64decode(module.eks.cluster_ca_cert)
}
provider "kubernetes" {
alias = "gke"
host = module.gke.cluster_endpoint
token = module.gke.cluster_token
cluster_ca_certificate = base64decode(module.gke.cluster_ca_cert)
}
Abstraction Layers for Portability
// cloud-storage-abstraction.ts
// Abstract cloud storage operations for portability
interface CloudStorageProvider {
upload(bucket: string, key: string, data: Buffer): Promise<string>;
download(bucket: string, key: string): Promise<Buffer>;
delete(bucket: string, key: string): Promise<void>;
listObjects(bucket: string, prefix: string): Promise<string[]>;
}
class AWSStorageProvider implements CloudStorageProvider {
private s3: S3Client;
constructor(region: string) {
this.s3 = new S3Client({ region });
}
async upload(bucket: string, key: string, data: Buffer): Promise<string> {
await this.s3.send(new PutObjectCommand({ Bucket: bucket, Key: key, Body: data }));
return `s3://${bucket}/${key}`;
}
async download(bucket: string, key: string): Promise<Buffer> {
const response = await this.s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
return Buffer.from(await response.Body!.transformToByteArray());
}
// ... delete, listObjects
}
class GCPStorageProvider implements CloudStorageProvider {
private storage: Storage;
constructor() {
this.storage = new Storage();
}
async upload(bucket: string, key: string, data: Buffer): Promise<string> {
await this.storage.bucket(bucket).file(key).save(data);
return `gs://${bucket}/${key}`;
}
// ... download, delete, listObjects
}
// Factory function based on environment
function createStorageProvider(): CloudStorageProvider {
switch (process.env.CLOUD_PROVIDER) {
case 'aws': return new AWSStorageProvider(process.env.AWS_REGION!);
case 'gcp': return new GCPStorageProvider();
default: throw new Error(`Unknown provider: ${process.env.CLOUD_PROVIDER}`);
}
}
export const storage = createStorageProvider();
Cross-Cloud Networking
# Set up VPN between AWS and GCP
# AWS: Create a VPN Gateway
aws ec2 create-vpn-gateway --type ipsec.1
aws ec2 attach-vpn-gateway --vpn-gateway-id vgw-xxx --vpc-id vpc-xxx
# GCP: Create a Cloud VPN
gcloud compute vpn-gateways create my-vpn-gw \
--network=default \
--region=us-central1
# Alternative: Use a service mesh like Istio for cross-cluster communication
# Install Istio on both clusters and configure multi-cluster mesh
Avoiding Vendor Lock-in
Strategies for Portability
- Kubernetes: Use K8s as the compute abstraction — EKS, GKE, and AKS all run the same workloads
- Terraform: Define infrastructure in Terraform with provider-agnostic modules where possible
- OCI Images: Use cloud-neutral container registries (GHCR, Docker Hub)
- Standard APIs: Prefer S3-compatible storage, PostgreSQL over proprietary databases
- Abstraction Layers: Wrap cloud-specific SDKs behind interfaces in your application code
- Open Standards: Use OpenTelemetry, Prometheus, Grafana instead of cloud-specific monitoring
Key Takeaways
- Multi-cloud avoids vendor lock-in but adds complexity — weigh benefits against costs
- Kubernetes provides the best workload portability across cloud providers
- Terraform manages infrastructure across all major clouds from a single tool
- Use abstraction layers and interfaces to keep application code cloud-agnostic
- Start with a primary cloud and expand to multi-cloud only when the business case is clear