Incident Response

Q: What is Incident Response?

Prepare for and respond to security incidents with structured playbooks, communication plans, and post-incident reviews

What is Incident Response?

Incident response is the organized approach to addressing and managing the aftermath of a security breach or cyberattack. The goal is to handle the situation in a way that limits damage, reduces recovery time and costs, and prevents recurrence. No matter how strong your defenses are, incidents will happen — what matters is how quickly and effectively you respond.

A well-prepared incident response plan is the difference between a minor security event and a catastrophic breach. Organizations without incident response plans take an average of 287 days to identify and contain a breach, compared to 214 days for those with plans. The financial impact difference can be millions of dollars.

Incident Response Phases (NIST Framework)

Preparation: Establish policies, procedures, tools, and team roles before an incident occurs. Train regularly.
Detection and Analysis: Identify that a security event has occurred, determine its scope and severity, and classify it.
Containment: Limit the damage by isolating affected systems while preserving evidence for investigation.
Eradication: Remove the threat completely from your environment — patching vulnerabilities, removing malware, revoking compromised credentials.
Recovery: Restore systems to normal operation, verify they are clean, and monitor for recurrence.
Post-Incident Review: Document lessons learned, identify process improvements, and update your response plan.

Building Detection Systems

You cannot respond to incidents you do not detect. Comprehensive logging, monitoring, and alerting are the foundation of incident detection. Log all security-relevant events, set up alerts for suspicious patterns, and regularly review logs for anomalies.

// Security event logging and anomaly detection
interface SecurityEvent {
  timestamp: string;
  eventType: string;
  severity: "info" | "warning" | "critical";
  userId?: string;
  ip: string;
  userAgent: string;
  resource: string;
  action: string;
  outcome: "success" | "failure";
  details: Record<string, unknown>;
}

class SecurityMonitor {
  private events: SecurityEvent[] = [];
  private alertThresholds = {
    failedLogins: { count: 5, windowMinutes: 15 },
    apiErrors: { count: 50, windowMinutes: 5 },
    newAdminAccounts: { count: 1, windowMinutes: 1 },
    dataExport: { count: 3, windowMinutes: 60 },
  };

  async recordEvent(event: SecurityEvent): Promise<void> {
    // Store event in your logging system (ELK, Datadog, CloudWatch)
    await logger.info("security_event", event);

    // Check for anomalies
    await this.checkAnomalies(event);
  }

  private async checkAnomalies(event: SecurityEvent): Promise<void> {
    // Detect brute force login attempts
    if (event.eventType === "LOGIN" && event.outcome === "failure") {
      const recentFailures = await this.countRecentEvents(
        "LOGIN",
        "failure",
        event.ip,
        this.alertThresholds.failedLogins.windowMinutes
      );

      if (recentFailures >= this.alertThresholds.failedLogins.count) {
        await this.triggerAlert({
          type: "BRUTE_FORCE_DETECTED",
          severity: "critical",
          message: `${recentFailures} failed login attempts from IP ${event.ip}`,
          ip: event.ip,
          recommendation: "Block IP and investigate",
        });
      }
    }

    // Detect privilege escalation
    if (event.eventType === "ROLE_CHANGE" && event.details.newRole === "admin") {
      await this.triggerAlert({
        type: "PRIVILEGE_ESCALATION",
        severity: "critical",
        message: `New admin account created: ${event.userId}`,
        userId: event.userId,
        recommendation: "Verify this was authorized",
      });
    }

    // Detect unusual data access patterns
    if (event.eventType === "DATA_EXPORT") {
      const recentExports = await this.countRecentEvents(
        "DATA_EXPORT",
        "success",
        event.userId!,
        this.alertThresholds.dataExport.windowMinutes
      );

      if (recentExports >= this.alertThresholds.dataExport.count) {
        await this.triggerAlert({
          type: "DATA_EXFILTRATION_SUSPECTED",
          severity: "critical",
          message: `Unusual data export volume by user ${event.userId}`,
          userId: event.userId,
          recommendation: "Disable account and investigate",
        });
      }
    }
  }

  private async countRecentEvents(
    type: string,
    outcome: string,
    identifier: string,
    windowMinutes: number
  ): Promise<number> {
    // Query your logging system for recent events
    const since = new Date(Date.now() - windowMinutes * 60 * 1000);
    return db.query(
      "SELECT COUNT(*) FROM security_events WHERE event_type = $1 AND outcome = $2 AND (ip = $3 OR user_id = $3) AND timestamp > $4",
      [type, outcome, identifier, since]
    ).then((r) => parseInt(r.rows[0].count));
  }

  private async triggerAlert(alert: SecurityAlert): Promise<void> {
    // Send alerts through multiple channels
    await Promise.all([
      slack.sendToChannel("#security-alerts", formatAlert(alert)),
      pagerDuty.createIncident(alert),
      email.sendToSecurityTeam(alert),
      db.query(
        "INSERT INTO security_alerts (type, severity, message, details, created_at) VALUES ($1, $2, $3, $4, NOW())",
        [alert.type, alert.severity, alert.message, JSON.stringify(alert)]
      ),
    ]);
  }
}

Incident Response Playbooks

Playbooks are step-by-step procedures for handling specific types of incidents. Having pre-written playbooks ensures consistent, thorough responses even during the stress and chaos of an active incident. Each playbook should define who to notify, what actions to take, and how to communicate with stakeholders.

// Incident response playbook as code
interface IncidentPlaybook {
  type: string;
  severity: "low" | "medium" | "high" | "critical";
  steps: PlaybookStep[];
  communicationPlan: CommunicationPlan;
  rollbackProcedure?: string;
}

interface PlaybookStep {
  order: number;
  action: string;
  responsible: string;
  automated: boolean;
  timeLimit: string;
}

const credentialLeakPlaybook: IncidentPlaybook = {
  type: "CREDENTIAL_LEAK",
  severity: "critical",
  steps: [
    {
      order: 1,
      action: "Rotate all exposed credentials immediately",
      responsible: "On-call engineer",
      automated: true,
      timeLimit: "15 minutes",
    },
    {
      order: 2,
      action: "Revoke all active sessions for affected users/services",
      responsible: "On-call engineer",
      automated: true,
      timeLimit: "15 minutes",
    },
    {
      order: 3,
      action: "Audit access logs for unauthorized usage of leaked credentials",
      responsible: "Security team",
      automated: false,
      timeLimit: "1 hour",
    },
    {
      order: 4,
      action: "Identify the source of the leak (git commit, log, etc.)",
      responsible: "Security team",
      automated: false,
      timeLimit: "2 hours",
    },
    {
      order: 5,
      action: "Remove credentials from the source (git history, logs, etc.)",
      responsible: "On-call engineer",
      automated: false,
      timeLimit: "4 hours",
    },
    {
      order: 6,
      action: "Implement controls to prevent recurrence",
      responsible: "Engineering lead",
      automated: false,
      timeLimit: "1 week",
    },
  ],
  communicationPlan: {
    internal: [
      { when: "immediately", who: "Security team lead", channel: "PagerDuty" },
      { when: "within 30min", who: "Engineering manager", channel: "Slack DM" },
      { when: "within 1hr", who: "CTO", channel: "Phone" },
    ],
    external: [
      { when: "if user data affected", who: "Affected users", channel: "Email" },
      { when: "if legally required", who: "Regulatory bodies", channel: "Formal notice" },
    ],
  },
};

// Automated incident response actions
async function executeCredentialLeakResponse(incident: SecurityIncident) {
  const timeline: string[] = [];

  // Step 1: Rotate credentials
  timeline.push(`${new Date().toISOString()} - Starting credential rotation`);

  if (incident.details.credentialType === "api_key") {
    await rotateApiKey(incident.details.keyId);
    timeline.push("API key rotated");
  } else if (incident.details.credentialType === "database") {
    await rotateDatabasePassword(incident.details.database);
    timeline.push("Database password rotated");
  }

  // Step 2: Kill active sessions
  if (incident.details.affectedUsers?.length) {
    for (const userId of incident.details.affectedUsers) {
      await revokeAllSessions(userId);
      timeline.push(`Sessions revoked for user ${userId}`);
    }
  }

  // Step 3: Create incident record
  await db.query(
    "INSERT INTO incidents (type, severity, timeline, status, created_at) VALUES ($1, $2, $3, $4, NOW())",
    [incident.type, incident.severity, JSON.stringify(timeline), "investigating"]
  );

  return timeline;
}

Post-Incident Review

After every security incident, conduct a blameless post-incident review (also called a post-mortem or retrospective). The goal is to understand what happened, why it happened, and how to prevent it from happening again. Focus on systemic improvements rather than blaming individuals.

// Post-incident review template
interface PostIncidentReview {
  incidentId: string;
  title: string;
  severity: string;
  timeline: TimelineEntry[];
  rootCause: string;
  impact: {
    usersAffected: number;
    dataExposed: string;
    downtime: string;
    financialImpact: string;
  };
  whatWentWell: string[];
  whatCouldBeImproved: string[];
  actionItems: ActionItem[];
}

interface ActionItem {
  description: string;
  owner: string;
  deadline: string;
  priority: "P0" | "P1" | "P2";
  status: "open" | "in_progress" | "completed";
}

// Example post-incident review
const exampleReview: PostIncidentReview = {
  incidentId: "INC-2025-042",
  title: "API Key Exposed in Public GitHub Repository",
  severity: "High",
  timeline: [
    { time: "14:00 UTC", event: "Developer committed .env file to public repo" },
    { time: "14:15 UTC", event: "GitHub secret scanning alert received" },
    { time: "14:20 UTC", event: "On-call engineer acknowledged alert" },
    { time: "14:25 UTC", event: "API key rotated, old key revoked" },
    { time: "14:30 UTC", event: "Audit of API key usage showed no unauthorized access" },
    { time: "14:45 UTC", event: "Commit removed, git history rewritten" },
    { time: "15:00 UTC", event: "Incident resolved" },
  ],
  rootCause: "Missing .env in .gitignore template for new projects. No pre-commit hook to prevent secret commits.",
  impact: {
    usersAffected: 0,
    dataExposed: "No data exposure confirmed",
    downtime: "None",
    financialImpact: "Minimal - 1 hour of engineering time",
  },
  whatWentWell: [
    "GitHub secret scanning caught the leak within 15 minutes",
    "On-call responded quickly and followed the playbook",
    "API key rotation was automated and completed in 5 minutes",
    "Audit logs confirmed no unauthorized usage",
  ],
  whatCouldBeImproved: [
    "Pre-commit hooks should have prevented the commit",
    "Project templates should include .env in .gitignore by default",
    "Developer training on secret handling needed refresh",
  ],
  actionItems: [
    {
      description: "Add git-secrets pre-commit hook to all repositories",
      owner: "Platform team",
      deadline: "2025-05-01",
      priority: "P0",
      status: "in_progress",
    },
    {
      description: "Update project templates to include .env in .gitignore",
      owner: "DevEx team",
      deadline: "2025-04-15",
      priority: "P1",
      status: "open",
    },
    {
      description: "Schedule security refresher training for all developers",
      owner: "Security team",
      deadline: "2025-05-15",
      priority: "P1",
      status: "open",
    },
  ],
};

Security Warning: Incident Response Pitfalls

Do not panic: Follow your playbook. Hasty actions can make things worse or destroy evidence.
Preserve evidence: Do not wipe or restart compromised systems before forensic analysis. Take snapshots first.
Communicate carefully: Avoid sharing details in public channels. Use secure, out-of-band communication.
Do not blame individuals: Blameless post-mortems encourage honesty and lead to better systemic improvements.
Know your legal obligations: Many jurisdictions require breach notification within specific timeframes (e.g., GDPR requires 72 hours).

Incident Response Preparation Checklist

Document your plan: Write incident response procedures before you need them.
Establish an on-call rotation: Ensure someone is always available to respond to security alerts.
Set up alerting: Configure alerts for security events with appropriate severity levels.
Practice regularly: Run tabletop exercises and simulated incidents at least quarterly.
Maintain contact lists: Keep updated emergency contacts for security team, legal, PR, and executive leadership.
Prepare communication templates: Draft breach notification templates in advance for faster response.
Review and update: After every incident and every drill, update your plan based on lessons learned.

What is Incident Response?

Incident Response Phases (NIST Framework)

Building Detection Systems

Incident Response Playbooks

Post-Incident Review

Security Warning: Incident Response Pitfalls

Incident Response Preparation Checklist

Continue Learning

Web Security

Cloud & Kubernetes

Docker & DevOps

REST & APIs

Node.js