Deep Learning - Learn AI | TechLead

What is Deep Learning?

Deep Learning is a subset of machine learning that uses neural networks with multiple hidden layers (hence "deep") to learn hierarchical representations of data. While traditional neural networks might have 1-2 hidden layers, deep learning models can have dozens or even hundreds of layers.

🚀 Why "Deep" Matters:

Each layer learns increasingly abstract features. In image recognition: first layers detect edges, middle layers detect shapes, final layers recognize objects. This hierarchical learning is what makes deep learning so powerful!

Shallow vs Deep Networks

Shallow Network

Input Layer

↓

Hidden Layer (1-2)

↓

Output Layer

✓ Simple patterns
✓ Faster training
✓ Less data needed
✗ Limited complexity

Deep Network

Input Layer

↓

Hidden Layer 1

Hidden Layer 2

Hidden Layer 3+

↓

Output Layer

✓ Complex patterns
✓ Hierarchical features
✓ Better accuracy
✗ More data & compute

Key Deep Learning Architectures

🖼️

Convolutional Neural Networks (CNNs)

Specialized for processing grid-like data, especially images. Uses convolutional layers to detect spatial patterns.

Best for:

Image classification, object detection, face recognition, medical imaging, self-driving cars

📝

Recurrent Neural Networks (RNNs)

Has memory of previous inputs through feedback loops. Processes sequential data one step at a time.

Best for:

Text generation, speech recognition, time series prediction, language translation, video analysis

🎯

Long Short-Term Memory (LSTM)

Advanced RNN that can remember long-term dependencies. Solves the vanishing gradient problem.

Best for:

Machine translation, text summarization, sentiment analysis, music generation, stock prediction

⚡

Generative Adversarial Networks (GANs)

Two networks compete: generator creates fake data, discriminator tries to detect fakes.

Best for:

Image generation, style transfer, data augmentation, deepfakes, art creation, super-resolution

Convolutional Neural Network Implementation

Let's build a simple CNN for image classification:

// Simplified CNN Layer Implementation
class ConvLayer {
  constructor(filterSize, numFilters, stride = 1) {
    this.filterSize = filterSize;
    this.numFilters = numFilters;
    this.stride = stride;
    
    // Initialize filters randomly
    this.filters = Array(numFilters).fill(0).map(() =>
      Array(filterSize).fill(0).map(() =>
        Array(filterSize).fill(0).map(() => Math.random() * 2 - 1)
      )
    );
  }

  // Convolution operation
  convolve(input, filter) {
    const inputHeight = input.length;
    const inputWidth = input[0].length;
    const filterSize = filter.length;
    
    const outputHeight = Math.floor((inputHeight - filterSize) / this.stride) + 1;
    const outputWidth = Math.floor((inputWidth - filterSize) / this.stride) + 1;
    
    const output = Array(outputHeight).fill(0).map(() => Array(outputWidth).fill(0));
    
    for (let i = 0; i < outputHeight; i++) {
      for (let j = 0; j < outputWidth; j++) {
        let sum = 0;
        for (let fi = 0; fi < filterSize; fi++) {
          for (let fj = 0; fj < filterSize; fj++) {
            const row = i * this.stride + fi;
            const col = j * this.stride + fj;
            sum += input[row][col] * filter[fi][fj];
          }
        }
        output[i][j] = Math.max(0, sum); // ReLU activation
      }
    }
    
    return output;
  }

  forward(input) {
    // Apply each filter to input
    return this.filters.map(filter => this.convolve(input, filter));
  }
}

// Max Pooling Layer
class MaxPoolLayer {
  constructor(poolSize = 2) {
    this.poolSize = poolSize;
  }

  forward(input) {
    const height = input.length;
    const width = input[0].length;
    
    const outputHeight = Math.floor(height / this.poolSize);
    const outputWidth = Math.floor(width / this.poolSize);
    
    const output = Array(outputHeight).fill(0).map(() => Array(outputWidth).fill(0));
    
    for (let i = 0; i < outputHeight; i++) {
      for (let j = 0; j < outputWidth; j++) {
        let maxVal = -Infinity;
        
        for (let pi = 0; pi < this.poolSize; pi++) {
          for (let pj = 0; pj < this.poolSize; pj++) {
            const row = i * this.poolSize + pi;
            const col = j * this.poolSize + pj;
            maxVal = Math.max(maxVal, input[row][col]);
          }
        }
        
        output[i][j] = maxVal;
      }
    }
    
    return output;
  }
}

// Simple CNN for digit recognition
class SimpleCNN {
  constructor() {
    this.conv1 = new ConvLayer(3, 8);  // 8 filters of size 3x3
    this.pool1 = new MaxPoolLayer(2);   // 2x2 pooling
    this.conv2 = new ConvLayer(3, 16); // 16 filters of size 3x3
    this.pool2 = new MaxPoolLayer(2);
  }

  forward(image) {
    console.log("Input image size:", image.length, "x", image[0].length);
    
    // First conv + pool
    let features = this.conv1.forward(image);
    console.log("After Conv1:", features.length, "feature maps");
    features = features.map(fm => this.pool1.forward(fm));
    console.log("After Pool1:", features[0].length, "x", features[0][0].length);
    
    // Second conv + pool
    const finalFeatures = [];
    for (const featureMap of features) {
      const conv2Output = this.conv2.forward(featureMap);
      const pooled = conv2Output.map(fm => this.pool2.forward(fm));
      finalFeatures.push(...pooled);
    }
    
    console.log("Final features:", finalFeatures.length, "feature maps");
    
    return finalFeatures;
  }
}

// Example: Process a 28x28 image (like MNIST digits)
const image = Array(28).fill(0).map(() => 
  Array(28).fill(0).map(() => Math.random())
);

const cnn = new SimpleCNN();
console.log("Processing image through CNN...");
const features = cnn.forward(image);
console.log("Extracted", features.length, "feature maps for classification");

🔍 CNN Key Concepts:

Convolution: Slide filters over image to detect features (edges, patterns)
Pooling: Reduce spatial dimensions while keeping important information
Feature Maps: Each filter produces one feature map highlighting specific patterns
Hierarchical Learning: Early layers: edges → Middle: shapes → Late: objects

Recurrent Neural Network (RNN) for Sequences

RNNs process sequences by maintaining hidden state:

// Simple RNN Cell Implementation
class RNNCell {
  constructor(inputSize, hiddenSize) {
    this.hiddenSize = hiddenSize;
    
    // Initialize weights
    this.Wxh = this.initWeights(inputSize, hiddenSize);  // Input to hidden
    this.Whh = this.initWeights(hiddenSize, hiddenSize); // Hidden to hidden
    this.Why = this.initWeights(hiddenSize, inputSize);  // Hidden to output
    
    this.bh = Array(hiddenSize).fill(0);  // Hidden bias
    this.by = Array(inputSize).fill(0);   // Output bias
  }

  initWeights(rows, cols) {
    return Array(rows).fill(0).map(() =>
      Array(cols).fill(0).map(() => Math.random() * 0.1 - 0.05)
    );
  }

  tanh(x) {
    return Math.tanh(x);
  }

  softmax(arr) {
    const max = Math.max(...arr);
    const exps = arr.map(x => Math.exp(x - max));
    const sum = exps.reduce((a, b) => a + b, 0);
    return exps.map(x => x / sum);
  }

  // Forward pass for one timestep
  forward(x, hPrev) {
    // Calculate hidden state: h = tanh(Wxh * x + Whh * hPrev + bh)
    const hidden = [];
    for (let i = 0; i < this.hiddenSize; i++) {
      let sum = this.bh[i];
      
      // Input contribution
      for (let j = 0; j < x.length; j++) {
        sum += this.Wxh[j][i] * x[j];
      }
      
      // Previous hidden state contribution
      for (let j = 0; j < hPrev.length; j++) {
        sum += this.Whh[j][i] * hPrev[j];
      }
      
      hidden.push(this.tanh(sum));
    }

    // Calculate output: y = Why * h + by
    const output = [];
    for (let i = 0; i < this.Why[0].length; i++) {
      let sum = this.by[i];
      for (let j = 0; j < hidden.length; j++) {
        sum += this.Why[j][i] * hidden[j];
      }
      output.push(sum);
    }

    return {
      hidden: hidden,
      output: this.softmax(output)
    };
  }

  // Process entire sequence
  processSequence(sequence) {
    let hidden = Array(this.hiddenSize).fill(0);
    const outputs = [];

    for (const input of sequence) {
      const result = this.forward(input, hidden);
      hidden = result.hidden;
      outputs.push(result.output);
    }

    return { hidden, outputs };
  }
}

// Example: Character-level text prediction
const vocab = ['h', 'e', 'l', 'o', 'w', 'r', 'd'];
const vocabSize = vocab.length;

// One-hot encode characters
function encodeChar(char) {
  const idx = vocab.indexOf(char);
  const encoded = Array(vocabSize).fill(0);
  if (idx !== -1) encoded[idx] = 1;
  return encoded;
}

function decodeOutput(output) {
  const maxIdx = output.indexOf(Math.max(...output));
  return vocab[maxIdx];
}

// Create RNN
const rnn = new RNNCell(vocabSize, 64);

// Process sequence: "hello"
const sequence = ['h', 'e', 'l', 'l', 'o'].map(encodeChar);
console.log("Processing sequence through RNN...");

const result = rnn.processSequence(sequence);
console.log("\nFinal hidden state dimension:", result.hidden.length);
console.log("Number of outputs:", result.outputs.length);

// Predict next character
const lastOutput = result.outputs[result.outputs.length - 1];
const predicted = decodeOutput(lastOutput);
console.log("Predicted next character:", predicted);

🔄 RNN Key Features:

Hidden State: Maintains memory of previous inputs
Sequential Processing: Processes data one step at a time
Weight Sharing: Same weights used across all time steps
Backpropagation Through Time: Training accounts for sequence history

Training Deep Networks: Advanced Techniques

Optimization Algorithms

// Adam Optimizer
class AdamOptimizer {
  constructor(learningRate = 0.001) {
    this.lr = learningRate;
    this.beta1 = 0.9;   // Momentum decay
    this.beta2 = 0.999; // RMSprop decay
    this.epsilon = 1e-8;
    this.m = {};  // First moment
    this.v = {};  // Second moment
    this.t = 0;   // Timestep
  }

  update(paramName, param, gradient) {
    this.t++;
    
    // Initialize moments if first time
    if (!this.m[paramName]) {
      this.m[paramName] = 0;
      this.v[paramName] = 0;
    }

    // Update biased first moment
    this.m[paramName] = this.beta1 * this.m[paramName] + 
                        (1 - this.beta1) * gradient;
    
    // Update biased second moment
    this.v[paramName] = this.beta2 * this.v[paramName] + 
                        (1 - this.beta2) * gradient * gradient;
    
    // Bias correction
    const mHat = this.m[paramName] / (1 - Math.pow(this.beta1, this.t));
    const vHat = this.v[paramName] / (1 - Math.pow(this.beta2, this.t));
    
    // Update parameter
    return param - this.lr * mHat / (Math.sqrt(vHat) + this.epsilon);
  }
}

// Usage
const optimizer = new AdamOptimizer(0.001);
let weight = 0.5;
const gradient = 0.1;
weight = optimizer.update('w1', weight, gradient);

Regularization: Dropout

// Dropout Layer
class Dropout {
  constructor(rate = 0.5) {
    this.rate = rate;
    this.mask = null;
  }

  forward(input, training = true) {
    if (!training) {
      return input; // No dropout during inference
    }

    // Create dropout mask
    this.mask = input.map(() => 
      Math.random() > this.rate ? 1 : 0
    );

    // Apply mask and scale
    return input.map((val, i) => 
      val * this.mask[i] / (1 - this.rate)
    );
  }

  backward(gradient) {
    // Only propagate gradients where mask is 1
    return gradient.map((val, i) => 
      val * this.mask[i] / (1 - this.rate)
    );
  }
}

// Example
const dropout = new Dropout(0.5);
const activations = [0.5, 0.8, 0.3, 0.9, 0.2];

console.log("Original:", activations);
const dropped = dropout.forward(activations, true);
console.log("With dropout:", dropped);
// Some values become 0, others scaled up

Batch Normalization

// Batch Normalization
class BatchNorm {
  constructor(numFeatures, momentum = 0.9) {
    this.momentum = momentum;
    this.epsilon = 1e-5;
    
    // Learnable parameters
    this.gamma = Array(numFeatures).fill(1);
    this.beta = Array(numFeatures).fill(0);
    
    // Running statistics
    this.runningMean = Array(numFeatures).fill(0);
    this.runningVar = Array(numFeatures).fill(1);
  }

  forward(batch, training = true) {
    if (training) {
      // Calculate batch statistics
      const batchMean = this.calculateMean(batch);
      const batchVar = this.calculateVariance(batch, batchMean);
      
      // Update running statistics
      this.runningMean = this.runningMean.map((rm, i) =>
        this.momentum * rm + (1 - this.momentum) * batchMean[i]
      );
      this.runningVar = this.runningVar.map((rv, i) =>
        this.momentum * rv + (1 - this.momentum) * batchVar[i]
      );
      
      // Normalize and transform
      return this.normalize(batch, batchMean, batchVar);
    } else {
      return this.normalize(batch, this.runningMean, this.runningVar);
    }
  }

  calculateMean(batch) {
    const numSamples = batch.length;
    const numFeatures = batch[0].length;
    const means = Array(numFeatures).fill(0);
    
    for (const sample of batch) {
      sample.forEach((val, i) => means[i] += val);
    }
    
    return means.map(m => m / numSamples);
  }

  calculateVariance(batch, mean) {
    const numSamples = batch.length;
    const numFeatures = batch[0].length;
    const vars = Array(numFeatures).fill(0);
    
    for (const sample of batch) {
      sample.forEach((val, i) => {
        vars[i] += Math.pow(val - mean[i], 2);
      });
    }
    
    return vars.map(v => v / numSamples);
  }

  normalize(batch, mean, variance) {
    return batch.map(sample =>
      sample.map((val, i) => {
        const normalized = (val - mean[i]) / 
                          Math.sqrt(variance[i] + this.epsilon);
        return this.gamma[i] * normalized + this.beta[i];
      })
    );
  }
}

Learning Rate Scheduling

// Learning Rate Scheduler
class LRScheduler {
  constructor(initialLR, strategy = 'step') {
    this.initialLR = initialLR;
    this.strategy = strategy;
    this.currentLR = initialLR;
  }

  // Step decay: reduce LR every N epochs
  stepDecay(epoch, stepSize = 10, gamma = 0.1) {
    const factor = Math.pow(gamma, Math.floor(epoch / stepSize));
    return this.initialLR * factor;
  }

  // Exponential decay
  expDecay(epoch, gamma = 0.95) {
    return this.initialLR * Math.pow(gamma, epoch);
  }

  // Cosine annealing
  cosineAnnealing(epoch, totalEpochs) {
    return this.initialLR * 0.5 * 
      (1 + Math.cos(Math.PI * epoch / totalEpochs));
  }

  getLR(epoch, totalEpochs = 100) {
    switch(this.strategy) {
      case 'step':
        return this.stepDecay(epoch);
      case 'exp':
        return this.expDecay(epoch);
      case 'cosine':
        return this.cosineAnnealing(epoch, totalEpochs);
      default:
        return this.initialLR;
    }
  }
}

// Example
const scheduler = new LRScheduler(0.1, 'cosine');
console.log("Learning rate schedule:");
for (let epoch = 0; epoch <= 100; epoch += 10) {
  const lr = scheduler.getLR(epoch, 100);

  
    Learn More
    
      
        Deep Learning Specialization
        Master Deep Learning, and break into AI.
      
      
        Fast.ai - Practical Deep Learning
        Top-down approach to learning deep learning.
      
      
        TensorFlow Core Tutorials
        Official tutorials for building models with TensorFlow.
      
    
  
  console.log(`Epoch ${epoch}: LR = ${lr.toFixed(6)}`);
}

⚠️ Deep Learning Challenges

Vanishing/Exploding Gradients: Gradients become too small or too large in deep networks
Overfitting: Model memorizes training data instead of learning patterns
Computational Cost: Deep models require significant GPU power and time
Data Requirements: Need large labeled datasets for good performance
Hyperparameter Tuning: Many parameters to optimize (learning rate, batch size, etc.)

📚 Key Takeaways

✓ Deep learning uses many layers to learn hierarchical features
✓ CNNs are best for images using convolution and pooling
✓ RNNs handle sequences by maintaining hidden state
✓ Advanced optimizers like Adam improve training speed
✓ Regularization techniques (dropout, batch norm) prevent overfitting
✓ Learning rate scheduling helps models converge better