Problem Statement
Design a notification system that can send notifications across multiple channels: push notifications, SMS, email, and in-app notifications. The system must handle high throughput, respect user preferences, and provide delivery tracking. Notification systems are a critical component of virtually every modern application.
Step 1: Requirements
Functional Requirements
- Support multiple notification types: push (iOS/Android), SMS, email, in-app
- Template-based notification content with personalization
- User preference management (opt-in/opt-out per channel and category)
- Priority levels: critical, high, medium, low
- Delivery tracking and analytics
- Rate limiting to prevent notification fatigue
- Scheduled notifications
Non-Functional Requirements
- High throughput: 10 million notifications per day
- Low latency for critical notifications (<1 second)
- At-least-once delivery guarantee
- Soft real-time (most notifications delivered within 5 seconds)
- Graceful degradation if a channel provider is down
Step 2: Notification Types Deep Dive
Channel Comparison
| Channel | Latency | Cost | Reach | Provider |
|---|---|---|---|---|
| Push (iOS) | ~1s | Free | App installed | APNs |
| Push (Android) | ~1s | Free | App installed | FCM |
| 1-30s | Low | Has email | SendGrid, SES | |
| SMS | 1-5s | High | Has phone | Twilio, SNS |
| In-App | Instant | Free | In the app | WebSocket/SSE |
Step 3: System Architecture
// Core notification data model
interface Notification {
id: string;
userId: string;
type: "push" | "sms" | "email" | "in_app";
category: string; // "marketing", "transactional", "social", "security"
priority: "critical" | "high" | "medium" | "low";
templateId: string;
templateData: Record<string, any>; // Variables for personalization
scheduledAt?: Date;
status: "pending" | "queued" | "sent" | "delivered" | "failed" | "read";
createdAt: Date;
sentAt?: Date;
deliveredAt?: Date;
readAt?: Date;
retryCount: number;
metadata: Record<string, any>;
}
// User notification preferences
interface UserNotificationPreferences {
userId: string;
channels: {
push: boolean;
email: boolean;
sms: boolean;
inApp: boolean;
};
categories: {
marketing: boolean;
social: boolean;
transactional: boolean; // Usually can't be disabled
security: boolean; // Usually can't be disabled
};
quietHours?: {
start: string; // "22:00"
end: string; // "08:00"
timezone: string;
};
}
Architecture Components
- Notification Service (API): Accepts notification requests from other services, validates, and enqueues them
- Preference Service: Checks user preferences and filters out unwanted notifications
- Priority Queue (Kafka/RabbitMQ): Separate queues for each priority level
- Rate Limiter: Prevents sending too many notifications to a single user
- Template Engine: Renders notification content from templates and user data
- Channel Workers: Separate worker pools for each delivery channel
- Delivery Tracker: Records delivery status and generates analytics
class NotificationService {
private preferenceService: PreferenceService;
private rateLimiter: RateLimiter;
private templateEngine: TemplateEngine;
private queue: MessageQueue;
async send(request: NotificationRequest): Promise<string> {
// Step 1: Validate the request
this.validateRequest(request);
// Step 2: Check user preferences
const prefs = await this.preferenceService.getPreferences(request.userId);
if (!this.isAllowed(request, prefs)) {
return "FILTERED_BY_PREFERENCE";
}
// Step 3: Check rate limits
const allowed = await this.rateLimiter.checkLimit(
request.userId,
request.type,
request.category
);
if (!allowed) {
return "RATE_LIMITED";
}
// Step 4: Check quiet hours
if (request.priority !== "critical" && this.isQuietHours(prefs)) {
// Schedule for after quiet hours
request.scheduledAt = this.getEndOfQuietHours(prefs);
}
// Step 5: Render template
const content = await this.templateEngine.render(
request.templateId,
request.templateData
);
// Step 6: Create notification record
const notification: Notification = {
id: generateId(),
userId: request.userId,
type: request.type,
category: request.category,
priority: request.priority,
templateId: request.templateId,
templateData: request.templateData,
status: "queued",
createdAt: new Date(),
retryCount: 0,
metadata: { renderedContent: content },
};
// Step 7: Enqueue to priority queue
const queueName = `notifications.${request.priority}`;
await this.queue.publish(queueName, notification);
return notification.id;
}
}
Step 4: Priority Queues and Rate Limiting
Not all notifications are equally urgent. A security alert should be delivered immediately, while a marketing email can wait. Use separate queues for each priority level, with different consumer concurrency settings.
// Priority queue configuration
const queueConfig = {
critical: { concurrency: 100, maxRetries: 5, retryDelay: 1000 }, // 2FA codes, security alerts
high: { concurrency: 50, maxRetries: 3, retryDelay: 5000 }, // Order confirmations
medium: { concurrency: 20, maxRetries: 3, retryDelay: 30000 }, // Social notifications
low: { concurrency: 5, maxRetries: 2, retryDelay: 60000 }, // Marketing, digests
};
// Rate limiter implementation
class NotificationRateLimiter {
private redis: RedisClient;
// Rate limit rules
private rules = {
push: { perHour: 10, perDay: 50 },
email: { perHour: 5, perDay: 20 },
sms: { perHour: 3, perDay: 10 },
inApp: { perHour: 30, perDay: 100 },
};
async checkLimit(
userId: string,
channel: string,
category: string
): Promise<boolean> {
// Transactional and security notifications bypass rate limits
if (category === "transactional" || category === "security") {
return true;
}
const rule = this.rules[channel];
const hourKey = `ratelimit:${userId}:${channel}:hour:${currentHour()}`;
const dayKey = `ratelimit:${userId}:${channel}:day:${currentDay()}`;
const [hourCount, dayCount] = await Promise.all([
this.redis.incr(hourKey),
this.redis.incr(dayKey),
]);
// Set expiry on first increment
if (hourCount === 1) await this.redis.expire(hourKey, 3600);
if (dayCount === 1) await this.redis.expire(dayKey, 86400);
return hourCount <= rule.perHour && dayCount <= rule.perDay;
}
}
Step 5: Template Management
Notifications should not contain hardcoded text. A template system allows non-engineers to modify notification content without code changes, and supports localization.
interface NotificationTemplate {
id: string;
name: string;
channels: {
push?: { title: string; body: string; };
email?: { subject: string; htmlBody: string; textBody: string; };
sms?: { body: string; };
inApp?: { title: string; body: string; actionUrl: string; };
};
variables: string[]; // Required template variables
locale: string; // "en", "es", "fr"
}
// Example template
const orderShippedTemplate: NotificationTemplate = {
id: "order_shipped_v2",
name: "Order Shipped",
channels: {
push: {
title: "Your order is on its way!",
body: "Order #{{orderId}} has shipped. Track: {{trackingUrl}}",
},
email: {
subject: "Your order #{{orderId}} has shipped",
htmlBody: "<h1>Great news, {{userName}}!</h1><p>Your order has shipped...</p>",
textBody: "Great news, {{userName}}! Your order has shipped...",
},
sms: {
body: "Your order #{{orderId}} shipped! Track at {{trackingUrl}}",
},
},
variables: ["orderId", "userName", "trackingUrl"],
locale: "en",
};
Step 6: Delivery Tracking and Analytics
class DeliveryTracker {
// Track delivery status changes
async updateStatus(
notificationId: string,
status: Notification["status"],
metadata?: Record<string, any>
): Promise<void> {
await this.db.notifications.update(notificationId, {
status,
[`${status}At`]: new Date(),
metadata: { ...metadata },
});
// Emit event for analytics pipeline
await this.eventBus.emit("notification.status_changed", {
notificationId,
status,
timestamp: Date.now(),
});
}
// Analytics queries
async getDeliveryStats(
timeRange: { start: Date; end: Date },
groupBy: "channel" | "category" | "priority"
): Promise<DeliveryStats[]> {
// Returns: sent count, delivered count, read count, failed count
// Grouped by the specified dimension
return this.analyticsDB.query({
metrics: ["sent", "delivered", "read", "failed"],
dimensions: [groupBy],
timeRange,
});
}
}
// Delivery rate metrics to track:
// - Send rate: notifications sent per second
// - Delivery rate: % of sent notifications confirmed delivered
// - Open rate: % of delivered notifications opened/read (email, push)
// - Click-through rate: % that clicked on a CTA
// - Bounce rate: % that failed delivery (email bounces, invalid tokens)
// - Unsubscribe rate: users opting out after receiving
Step 7: Retry Mechanisms
Notification delivery can fail for various reasons: provider outages, invalid device tokens, rate limits from external providers, or network issues. A robust retry strategy is essential.
class NotificationWorker {
async processNotification(notification: Notification): Promise<void> {
try {
const result = await this.deliverByChannel(notification);
if (result.success) {
await this.tracker.updateStatus(notification.id, "sent");
} else {
throw new Error(result.error);
}
} catch (error) {
await this.handleFailure(notification, error);
}
}
private async handleFailure(
notification: Notification,
error: Error
): Promise<void> {
const config = queueConfig[notification.priority];
if (notification.retryCount >= config.maxRetries) {
// Max retries exceeded - mark as failed
await this.tracker.updateStatus(notification.id, "failed", {
error: error.message,
finalRetryAt: new Date(),
});
// Move to dead letter queue for investigation
await this.queue.publish("notifications.dead_letter", notification);
return;
}
// Exponential backoff
const delay = config.retryDelay * Math.pow(2, notification.retryCount);
notification.retryCount++;
await this.queue.publishWithDelay(
`notifications.${notification.priority}`,
notification,
delay
);
}
private async deliverByChannel(notification: Notification): Promise<DeliveryResult> {
switch (notification.type) {
case "push":
return this.pushProvider.send(notification);
case "email":
return this.emailProvider.send(notification);
case "sms":
return this.smsProvider.send(notification);
case "in_app":
return this.inAppDelivery.send(notification);
}
}
}
Key Design Principles
- Decouple sending from delivery: Use message queues between the notification API and channel workers
- Respect user preferences: Always check opt-in/opt-out before sending
- Idempotency: Use notification IDs to prevent duplicate sends on retry
- Provider abstraction: Use an adapter pattern so you can swap providers (e.g., switch from Twilio to Vonage) without changing core logic
- Graceful degradation: If push notifications fail, fall back to email or in-app notifications
- Observability: Log every state transition and track delivery metrics per channel