Intermediate
40 min
Full Guide

Deep Learning

Advanced neural networks with multiple layers for complex pattern recognition

What is Deep Learning?

Deep Learning is a subset of machine learning that uses neural networks with multiple hidden layers (hence "deep") to learn hierarchical representations of data. While traditional neural networks might have 1-2 hidden layers, deep learning models can have dozens or even hundreds of layers.

🚀 Why "Deep" Matters:

Each layer learns increasingly abstract features. In image recognition: first layers detect edges, middle layers detect shapes, final layers recognize objects. This hierarchical learning is what makes deep learning so powerful!

Shallow vs Deep Networks

Shallow Network

Input Layer
Hidden Layer (1-2)
Output Layer
  • ✓ Simple patterns
  • ✓ Faster training
  • ✓ Less data needed
  • ✗ Limited complexity

Deep Network

Input Layer
Hidden Layer 1
Hidden Layer 2
Hidden Layer 3+
Output Layer
  • ✓ Complex patterns
  • ✓ Hierarchical features
  • ✓ Better accuracy
  • ✗ More data & compute

Key Deep Learning Architectures

🖼️

Convolutional Neural Networks (CNNs)

Specialized for processing grid-like data, especially images. Uses convolutional layers to detect spatial patterns.

Best for:

Image classification, object detection, face recognition, medical imaging, self-driving cars

📝

Recurrent Neural Networks (RNNs)

Has memory of previous inputs through feedback loops. Processes sequential data one step at a time.

Best for:

Text generation, speech recognition, time series prediction, language translation, video analysis

🎯

Long Short-Term Memory (LSTM)

Advanced RNN that can remember long-term dependencies. Solves the vanishing gradient problem.

Best for:

Machine translation, text summarization, sentiment analysis, music generation, stock prediction

Generative Adversarial Networks (GANs)

Two networks compete: generator creates fake data, discriminator tries to detect fakes.

Best for:

Image generation, style transfer, data augmentation, deepfakes, art creation, super-resolution

Convolutional Neural Network Implementation

Let's build a simple CNN for image classification:

// Simplified CNN Layer Implementation
class ConvLayer {
  constructor(filterSize, numFilters, stride = 1) {
    this.filterSize = filterSize;
    this.numFilters = numFilters;
    this.stride = stride;
    
    // Initialize filters randomly
    this.filters = Array(numFilters).fill(0).map(() =>
      Array(filterSize).fill(0).map(() =>
        Array(filterSize).fill(0).map(() => Math.random() * 2 - 1)
      )
    );
  }

  // Convolution operation
  convolve(input, filter) {
    const inputHeight = input.length;
    const inputWidth = input[0].length;
    const filterSize = filter.length;
    
    const outputHeight = Math.floor((inputHeight - filterSize) / this.stride) + 1;
    const outputWidth = Math.floor((inputWidth - filterSize) / this.stride) + 1;
    
    const output = Array(outputHeight).fill(0).map(() => Array(outputWidth).fill(0));
    
    for (let i = 0; i < outputHeight; i++) {
      for (let j = 0; j < outputWidth; j++) {
        let sum = 0;
        for (let fi = 0; fi < filterSize; fi++) {
          for (let fj = 0; fj < filterSize; fj++) {
            const row = i * this.stride + fi;
            const col = j * this.stride + fj;
            sum += input[row][col] * filter[fi][fj];
          }
        }
        output[i][j] = Math.max(0, sum); // ReLU activation
      }
    }
    
    return output;
  }

  forward(input) {
    // Apply each filter to input
    return this.filters.map(filter => this.convolve(input, filter));
  }
}

// Max Pooling Layer
class MaxPoolLayer {
  constructor(poolSize = 2) {
    this.poolSize = poolSize;
  }

  forward(input) {
    const height = input.length;
    const width = input[0].length;
    
    const outputHeight = Math.floor(height / this.poolSize);
    const outputWidth = Math.floor(width / this.poolSize);
    
    const output = Array(outputHeight).fill(0).map(() => Array(outputWidth).fill(0));
    
    for (let i = 0; i < outputHeight; i++) {
      for (let j = 0; j < outputWidth; j++) {
        let maxVal = -Infinity;
        
        for (let pi = 0; pi < this.poolSize; pi++) {
          for (let pj = 0; pj < this.poolSize; pj++) {
            const row = i * this.poolSize + pi;
            const col = j * this.poolSize + pj;
            maxVal = Math.max(maxVal, input[row][col]);
          }
        }
        
        output[i][j] = maxVal;
      }
    }
    
    return output;
  }
}

// Simple CNN for digit recognition
class SimpleCNN {
  constructor() {
    this.conv1 = new ConvLayer(3, 8);  // 8 filters of size 3x3
    this.pool1 = new MaxPoolLayer(2);   // 2x2 pooling
    this.conv2 = new ConvLayer(3, 16); // 16 filters of size 3x3
    this.pool2 = new MaxPoolLayer(2);
  }

  forward(image) {
    console.log("Input image size:", image.length, "x", image[0].length);
    
    // First conv + pool
    let features = this.conv1.forward(image);
    console.log("After Conv1:", features.length, "feature maps");
    features = features.map(fm => this.pool1.forward(fm));
    console.log("After Pool1:", features[0].length, "x", features[0][0].length);
    
    // Second conv + pool
    const finalFeatures = [];
    for (const featureMap of features) {
      const conv2Output = this.conv2.forward(featureMap);
      const pooled = conv2Output.map(fm => this.pool2.forward(fm));
      finalFeatures.push(...pooled);
    }
    
    console.log("Final features:", finalFeatures.length, "feature maps");
    
    return finalFeatures;
  }
}

// Example: Process a 28x28 image (like MNIST digits)
const image = Array(28).fill(0).map(() => 
  Array(28).fill(0).map(() => Math.random())
);

const cnn = new SimpleCNN();
console.log("Processing image through CNN...");
const features = cnn.forward(image);
console.log("Extracted", features.length, "feature maps for classification");

🔍 CNN Key Concepts:

  • Convolution: Slide filters over image to detect features (edges, patterns)
  • Pooling: Reduce spatial dimensions while keeping important information
  • Feature Maps: Each filter produces one feature map highlighting specific patterns
  • Hierarchical Learning: Early layers: edges → Middle: shapes → Late: objects

Recurrent Neural Network (RNN) for Sequences

RNNs process sequences by maintaining hidden state:

// Simple RNN Cell Implementation
class RNNCell {
  constructor(inputSize, hiddenSize) {
    this.hiddenSize = hiddenSize;
    
    // Initialize weights
    this.Wxh = this.initWeights(inputSize, hiddenSize);  // Input to hidden
    this.Whh = this.initWeights(hiddenSize, hiddenSize); // Hidden to hidden
    this.Why = this.initWeights(hiddenSize, inputSize);  // Hidden to output
    
    this.bh = Array(hiddenSize).fill(0);  // Hidden bias
    this.by = Array(inputSize).fill(0);   // Output bias
  }

  initWeights(rows, cols) {
    return Array(rows).fill(0).map(() =>
      Array(cols).fill(0).map(() => Math.random() * 0.1 - 0.05)
    );
  }

  tanh(x) {
    return Math.tanh(x);
  }

  softmax(arr) {
    const max = Math.max(...arr);
    const exps = arr.map(x => Math.exp(x - max));
    const sum = exps.reduce((a, b) => a + b, 0);
    return exps.map(x => x / sum);
  }

  // Forward pass for one timestep
  forward(x, hPrev) {
    // Calculate hidden state: h = tanh(Wxh * x + Whh * hPrev + bh)
    const hidden = [];
    for (let i = 0; i < this.hiddenSize; i++) {
      let sum = this.bh[i];
      
      // Input contribution
      for (let j = 0; j < x.length; j++) {
        sum += this.Wxh[j][i] * x[j];
      }
      
      // Previous hidden state contribution
      for (let j = 0; j < hPrev.length; j++) {
        sum += this.Whh[j][i] * hPrev[j];
      }
      
      hidden.push(this.tanh(sum));
    }

    // Calculate output: y = Why * h + by
    const output = [];
    for (let i = 0; i < this.Why[0].length; i++) {
      let sum = this.by[i];
      for (let j = 0; j < hidden.length; j++) {
        sum += this.Why[j][i] * hidden[j];
      }
      output.push(sum);
    }

    return {
      hidden: hidden,
      output: this.softmax(output)
    };
  }

  // Process entire sequence
  processSequence(sequence) {
    let hidden = Array(this.hiddenSize).fill(0);
    const outputs = [];

    for (const input of sequence) {
      const result = this.forward(input, hidden);
      hidden = result.hidden;
      outputs.push(result.output);
    }

    return { hidden, outputs };
  }
}

// Example: Character-level text prediction
const vocab = ['h', 'e', 'l', 'o', 'w', 'r', 'd'];
const vocabSize = vocab.length;

// One-hot encode characters
function encodeChar(char) {
  const idx = vocab.indexOf(char);
  const encoded = Array(vocabSize).fill(0);
  if (idx !== -1) encoded[idx] = 1;
  return encoded;
}

function decodeOutput(output) {
  const maxIdx = output.indexOf(Math.max(...output));
  return vocab[maxIdx];
}

// Create RNN
const rnn = new RNNCell(vocabSize, 64);

// Process sequence: "hello"
const sequence = ['h', 'e', 'l', 'l', 'o'].map(encodeChar);
console.log("Processing sequence through RNN...");

const result = rnn.processSequence(sequence);
console.log("\nFinal hidden state dimension:", result.hidden.length);
console.log("Number of outputs:", result.outputs.length);

// Predict next character
const lastOutput = result.outputs[result.outputs.length - 1];
const predicted = decodeOutput(lastOutput);
console.log("Predicted next character:", predicted);

🔄 RNN Key Features:

  • Hidden State: Maintains memory of previous inputs
  • Sequential Processing: Processes data one step at a time
  • Weight Sharing: Same weights used across all time steps
  • Backpropagation Through Time: Training accounts for sequence history

Training Deep Networks: Advanced Techniques

Optimization Algorithms

// Adam Optimizer
class AdamOptimizer {
  constructor(learningRate = 0.001) {
    this.lr = learningRate;
    this.beta1 = 0.9;   // Momentum decay
    this.beta2 = 0.999; // RMSprop decay
    this.epsilon = 1e-8;
    this.m = {};  // First moment
    this.v = {};  // Second moment
    this.t = 0;   // Timestep
  }

  update(paramName, param, gradient) {
    this.t++;
    
    // Initialize moments if first time
    if (!this.m[paramName]) {
      this.m[paramName] = 0;
      this.v[paramName] = 0;
    }

    // Update biased first moment
    this.m[paramName] = this.beta1 * this.m[paramName] + 
                        (1 - this.beta1) * gradient;
    
    // Update biased second moment
    this.v[paramName] = this.beta2 * this.v[paramName] + 
                        (1 - this.beta2) * gradient * gradient;
    
    // Bias correction
    const mHat = this.m[paramName] / (1 - Math.pow(this.beta1, this.t));
    const vHat = this.v[paramName] / (1 - Math.pow(this.beta2, this.t));
    
    // Update parameter
    return param - this.lr * mHat / (Math.sqrt(vHat) + this.epsilon);
  }
}

// Usage
const optimizer = new AdamOptimizer(0.001);
let weight = 0.5;
const gradient = 0.1;
weight = optimizer.update('w1', weight, gradient);

Regularization: Dropout

// Dropout Layer
class Dropout {
  constructor(rate = 0.5) {
    this.rate = rate;
    this.mask = null;
  }

  forward(input, training = true) {
    if (!training) {
      return input; // No dropout during inference
    }

    // Create dropout mask
    this.mask = input.map(() => 
      Math.random() > this.rate ? 1 : 0
    );

    // Apply mask and scale
    return input.map((val, i) => 
      val * this.mask[i] / (1 - this.rate)
    );
  }

  backward(gradient) {
    // Only propagate gradients where mask is 1
    return gradient.map((val, i) => 
      val * this.mask[i] / (1 - this.rate)
    );
  }
}

// Example
const dropout = new Dropout(0.5);
const activations = [0.5, 0.8, 0.3, 0.9, 0.2];

console.log("Original:", activations);
const dropped = dropout.forward(activations, true);
console.log("With dropout:", dropped);
// Some values become 0, others scaled up

Batch Normalization

// Batch Normalization
class BatchNorm {
  constructor(numFeatures, momentum = 0.9) {
    this.momentum = momentum;
    this.epsilon = 1e-5;
    
    // Learnable parameters
    this.gamma = Array(numFeatures).fill(1);
    this.beta = Array(numFeatures).fill(0);
    
    // Running statistics
    this.runningMean = Array(numFeatures).fill(0);
    this.runningVar = Array(numFeatures).fill(1);
  }

  forward(batch, training = true) {
    if (training) {
      // Calculate batch statistics
      const batchMean = this.calculateMean(batch);
      const batchVar = this.calculateVariance(batch, batchMean);
      
      // Update running statistics
      this.runningMean = this.runningMean.map((rm, i) =>
        this.momentum * rm + (1 - this.momentum) * batchMean[i]
      );
      this.runningVar = this.runningVar.map((rv, i) =>
        this.momentum * rv + (1 - this.momentum) * batchVar[i]
      );
      
      // Normalize and transform
      return this.normalize(batch, batchMean, batchVar);
    } else {
      return this.normalize(batch, this.runningMean, this.runningVar);
    }
  }

  calculateMean(batch) {
    const numSamples = batch.length;
    const numFeatures = batch[0].length;
    const means = Array(numFeatures).fill(0);
    
    for (const sample of batch) {
      sample.forEach((val, i) => means[i] += val);
    }
    
    return means.map(m => m / numSamples);
  }

  calculateVariance(batch, mean) {
    const numSamples = batch.length;
    const numFeatures = batch[0].length;
    const vars = Array(numFeatures).fill(0);
    
    for (const sample of batch) {
      sample.forEach((val, i) => {
        vars[i] += Math.pow(val - mean[i], 2);
      });
    }
    
    return vars.map(v => v / numSamples);
  }

  normalize(batch, mean, variance) {
    return batch.map(sample =>
      sample.map((val, i) => {
        const normalized = (val - mean[i]) / 
                          Math.sqrt(variance[i] + this.epsilon);
        return this.gamma[i] * normalized + this.beta[i];
      })
    );
  }
}

Learning Rate Scheduling

// Learning Rate Scheduler
class LRScheduler {
  constructor(initialLR, strategy = 'step') {
    this.initialLR = initialLR;
    this.strategy = strategy;
    this.currentLR = initialLR;
  }

  // Step decay: reduce LR every N epochs
  stepDecay(epoch, stepSize = 10, gamma = 0.1) {
    const factor = Math.pow(gamma, Math.floor(epoch / stepSize));
    return this.initialLR * factor;
  }

  // Exponential decay
  expDecay(epoch, gamma = 0.95) {
    return this.initialLR * Math.pow(gamma, epoch);
  }

  // Cosine annealing
  cosineAnnealing(epoch, totalEpochs) {
    return this.initialLR * 0.5 * 
      (1 + Math.cos(Math.PI * epoch / totalEpochs));
  }

  getLR(epoch, totalEpochs = 100) {
    switch(this.strategy) {
      case 'step':
        return this.stepDecay(epoch);
      case 'exp':
        return this.expDecay(epoch);
      case 'cosine':
        return this.cosineAnnealing(epoch, totalEpochs);
      default:
        return this.initialLR;
    }
  }
}

// Example
const scheduler = new LRScheduler(0.1, 'cosine');
console.log("Learning rate schedule:");
for (let epoch = 0; epoch <= 100; epoch += 10) {
  const lr = scheduler.getLR(epoch, 100);

  
  console.log(`Epoch ${epoch}: LR = ${lr.toFixed(6)}`);
}

⚠️ Deep Learning Challenges

  • Vanishing/Exploding Gradients: Gradients become too small or too large in deep networks
  • Overfitting: Model memorizes training data instead of learning patterns
  • Computational Cost: Deep models require significant GPU power and time
  • Data Requirements: Need large labeled datasets for good performance
  • Hyperparameter Tuning: Many parameters to optimize (learning rate, batch size, etc.)

📚 Key Takeaways

  • Deep learning uses many layers to learn hierarchical features
  • CNNs are best for images using convolution and pooling
  • RNNs handle sequences by maintaining hidden state
  • Advanced optimizers like Adam improve training speed
  • Regularization techniques (dropout, batch norm) prevent overfitting
  • Learning rate scheduling helps models converge better