Deep Learning
Advanced neural networks with multiple layers for complex pattern recognition
What is Deep Learning?
Deep Learning is a subset of machine learning that uses neural networks with multiple hidden layers (hence "deep") to learn hierarchical representations of data. While traditional neural networks might have 1-2 hidden layers, deep learning models can have dozens or even hundreds of layers.
🚀 Why "Deep" Matters:
Each layer learns increasingly abstract features. In image recognition: first layers detect edges, middle layers detect shapes, final layers recognize objects. This hierarchical learning is what makes deep learning so powerful!
Shallow vs Deep Networks
Shallow Network
- ✓ Simple patterns
- ✓ Faster training
- ✓ Less data needed
- ✗ Limited complexity
Deep Network
- ✓ Complex patterns
- ✓ Hierarchical features
- ✓ Better accuracy
- ✗ More data & compute
Key Deep Learning Architectures
Convolutional Neural Networks (CNNs)
Specialized for processing grid-like data, especially images. Uses convolutional layers to detect spatial patterns.
Best for:
Image classification, object detection, face recognition, medical imaging, self-driving cars
Recurrent Neural Networks (RNNs)
Has memory of previous inputs through feedback loops. Processes sequential data one step at a time.
Best for:
Text generation, speech recognition, time series prediction, language translation, video analysis
Long Short-Term Memory (LSTM)
Advanced RNN that can remember long-term dependencies. Solves the vanishing gradient problem.
Best for:
Machine translation, text summarization, sentiment analysis, music generation, stock prediction
Generative Adversarial Networks (GANs)
Two networks compete: generator creates fake data, discriminator tries to detect fakes.
Best for:
Image generation, style transfer, data augmentation, deepfakes, art creation, super-resolution
Convolutional Neural Network Implementation
Let's build a simple CNN for image classification:
// Simplified CNN Layer Implementation
class ConvLayer {
constructor(filterSize, numFilters, stride = 1) {
this.filterSize = filterSize;
this.numFilters = numFilters;
this.stride = stride;
// Initialize filters randomly
this.filters = Array(numFilters).fill(0).map(() =>
Array(filterSize).fill(0).map(() =>
Array(filterSize).fill(0).map(() => Math.random() * 2 - 1)
)
);
}
// Convolution operation
convolve(input, filter) {
const inputHeight = input.length;
const inputWidth = input[0].length;
const filterSize = filter.length;
const outputHeight = Math.floor((inputHeight - filterSize) / this.stride) + 1;
const outputWidth = Math.floor((inputWidth - filterSize) / this.stride) + 1;
const output = Array(outputHeight).fill(0).map(() => Array(outputWidth).fill(0));
for (let i = 0; i < outputHeight; i++) {
for (let j = 0; j < outputWidth; j++) {
let sum = 0;
for (let fi = 0; fi < filterSize; fi++) {
for (let fj = 0; fj < filterSize; fj++) {
const row = i * this.stride + fi;
const col = j * this.stride + fj;
sum += input[row][col] * filter[fi][fj];
}
}
output[i][j] = Math.max(0, sum); // ReLU activation
}
}
return output;
}
forward(input) {
// Apply each filter to input
return this.filters.map(filter => this.convolve(input, filter));
}
}
// Max Pooling Layer
class MaxPoolLayer {
constructor(poolSize = 2) {
this.poolSize = poolSize;
}
forward(input) {
const height = input.length;
const width = input[0].length;
const outputHeight = Math.floor(height / this.poolSize);
const outputWidth = Math.floor(width / this.poolSize);
const output = Array(outputHeight).fill(0).map(() => Array(outputWidth).fill(0));
for (let i = 0; i < outputHeight; i++) {
for (let j = 0; j < outputWidth; j++) {
let maxVal = -Infinity;
for (let pi = 0; pi < this.poolSize; pi++) {
for (let pj = 0; pj < this.poolSize; pj++) {
const row = i * this.poolSize + pi;
const col = j * this.poolSize + pj;
maxVal = Math.max(maxVal, input[row][col]);
}
}
output[i][j] = maxVal;
}
}
return output;
}
}
// Simple CNN for digit recognition
class SimpleCNN {
constructor() {
this.conv1 = new ConvLayer(3, 8); // 8 filters of size 3x3
this.pool1 = new MaxPoolLayer(2); // 2x2 pooling
this.conv2 = new ConvLayer(3, 16); // 16 filters of size 3x3
this.pool2 = new MaxPoolLayer(2);
}
forward(image) {
console.log("Input image size:", image.length, "x", image[0].length);
// First conv + pool
let features = this.conv1.forward(image);
console.log("After Conv1:", features.length, "feature maps");
features = features.map(fm => this.pool1.forward(fm));
console.log("After Pool1:", features[0].length, "x", features[0][0].length);
// Second conv + pool
const finalFeatures = [];
for (const featureMap of features) {
const conv2Output = this.conv2.forward(featureMap);
const pooled = conv2Output.map(fm => this.pool2.forward(fm));
finalFeatures.push(...pooled);
}
console.log("Final features:", finalFeatures.length, "feature maps");
return finalFeatures;
}
}
// Example: Process a 28x28 image (like MNIST digits)
const image = Array(28).fill(0).map(() =>
Array(28).fill(0).map(() => Math.random())
);
const cnn = new SimpleCNN();
console.log("Processing image through CNN...");
const features = cnn.forward(image);
console.log("Extracted", features.length, "feature maps for classification");
🔍 CNN Key Concepts:
- Convolution: Slide filters over image to detect features (edges, patterns)
- Pooling: Reduce spatial dimensions while keeping important information
- Feature Maps: Each filter produces one feature map highlighting specific patterns
- Hierarchical Learning: Early layers: edges → Middle: shapes → Late: objects
Recurrent Neural Network (RNN) for Sequences
RNNs process sequences by maintaining hidden state:
// Simple RNN Cell Implementation
class RNNCell {
constructor(inputSize, hiddenSize) {
this.hiddenSize = hiddenSize;
// Initialize weights
this.Wxh = this.initWeights(inputSize, hiddenSize); // Input to hidden
this.Whh = this.initWeights(hiddenSize, hiddenSize); // Hidden to hidden
this.Why = this.initWeights(hiddenSize, inputSize); // Hidden to output
this.bh = Array(hiddenSize).fill(0); // Hidden bias
this.by = Array(inputSize).fill(0); // Output bias
}
initWeights(rows, cols) {
return Array(rows).fill(0).map(() =>
Array(cols).fill(0).map(() => Math.random() * 0.1 - 0.05)
);
}
tanh(x) {
return Math.tanh(x);
}
softmax(arr) {
const max = Math.max(...arr);
const exps = arr.map(x => Math.exp(x - max));
const sum = exps.reduce((a, b) => a + b, 0);
return exps.map(x => x / sum);
}
// Forward pass for one timestep
forward(x, hPrev) {
// Calculate hidden state: h = tanh(Wxh * x + Whh * hPrev + bh)
const hidden = [];
for (let i = 0; i < this.hiddenSize; i++) {
let sum = this.bh[i];
// Input contribution
for (let j = 0; j < x.length; j++) {
sum += this.Wxh[j][i] * x[j];
}
// Previous hidden state contribution
for (let j = 0; j < hPrev.length; j++) {
sum += this.Whh[j][i] * hPrev[j];
}
hidden.push(this.tanh(sum));
}
// Calculate output: y = Why * h + by
const output = [];
for (let i = 0; i < this.Why[0].length; i++) {
let sum = this.by[i];
for (let j = 0; j < hidden.length; j++) {
sum += this.Why[j][i] * hidden[j];
}
output.push(sum);
}
return {
hidden: hidden,
output: this.softmax(output)
};
}
// Process entire sequence
processSequence(sequence) {
let hidden = Array(this.hiddenSize).fill(0);
const outputs = [];
for (const input of sequence) {
const result = this.forward(input, hidden);
hidden = result.hidden;
outputs.push(result.output);
}
return { hidden, outputs };
}
}
// Example: Character-level text prediction
const vocab = ['h', 'e', 'l', 'o', 'w', 'r', 'd'];
const vocabSize = vocab.length;
// One-hot encode characters
function encodeChar(char) {
const idx = vocab.indexOf(char);
const encoded = Array(vocabSize).fill(0);
if (idx !== -1) encoded[idx] = 1;
return encoded;
}
function decodeOutput(output) {
const maxIdx = output.indexOf(Math.max(...output));
return vocab[maxIdx];
}
// Create RNN
const rnn = new RNNCell(vocabSize, 64);
// Process sequence: "hello"
const sequence = ['h', 'e', 'l', 'l', 'o'].map(encodeChar);
console.log("Processing sequence through RNN...");
const result = rnn.processSequence(sequence);
console.log("\nFinal hidden state dimension:", result.hidden.length);
console.log("Number of outputs:", result.outputs.length);
// Predict next character
const lastOutput = result.outputs[result.outputs.length - 1];
const predicted = decodeOutput(lastOutput);
console.log("Predicted next character:", predicted);
🔄 RNN Key Features:
- Hidden State: Maintains memory of previous inputs
- Sequential Processing: Processes data one step at a time
- Weight Sharing: Same weights used across all time steps
- Backpropagation Through Time: Training accounts for sequence history
Training Deep Networks: Advanced Techniques
Optimization Algorithms
// Adam Optimizer
class AdamOptimizer {
constructor(learningRate = 0.001) {
this.lr = learningRate;
this.beta1 = 0.9; // Momentum decay
this.beta2 = 0.999; // RMSprop decay
this.epsilon = 1e-8;
this.m = {}; // First moment
this.v = {}; // Second moment
this.t = 0; // Timestep
}
update(paramName, param, gradient) {
this.t++;
// Initialize moments if first time
if (!this.m[paramName]) {
this.m[paramName] = 0;
this.v[paramName] = 0;
}
// Update biased first moment
this.m[paramName] = this.beta1 * this.m[paramName] +
(1 - this.beta1) * gradient;
// Update biased second moment
this.v[paramName] = this.beta2 * this.v[paramName] +
(1 - this.beta2) * gradient * gradient;
// Bias correction
const mHat = this.m[paramName] / (1 - Math.pow(this.beta1, this.t));
const vHat = this.v[paramName] / (1 - Math.pow(this.beta2, this.t));
// Update parameter
return param - this.lr * mHat / (Math.sqrt(vHat) + this.epsilon);
}
}
// Usage
const optimizer = new AdamOptimizer(0.001);
let weight = 0.5;
const gradient = 0.1;
weight = optimizer.update('w1', weight, gradient);
Regularization: Dropout
// Dropout Layer
class Dropout {
constructor(rate = 0.5) {
this.rate = rate;
this.mask = null;
}
forward(input, training = true) {
if (!training) {
return input; // No dropout during inference
}
// Create dropout mask
this.mask = input.map(() =>
Math.random() > this.rate ? 1 : 0
);
// Apply mask and scale
return input.map((val, i) =>
val * this.mask[i] / (1 - this.rate)
);
}
backward(gradient) {
// Only propagate gradients where mask is 1
return gradient.map((val, i) =>
val * this.mask[i] / (1 - this.rate)
);
}
}
// Example
const dropout = new Dropout(0.5);
const activations = [0.5, 0.8, 0.3, 0.9, 0.2];
console.log("Original:", activations);
const dropped = dropout.forward(activations, true);
console.log("With dropout:", dropped);
// Some values become 0, others scaled up
Batch Normalization
// Batch Normalization
class BatchNorm {
constructor(numFeatures, momentum = 0.9) {
this.momentum = momentum;
this.epsilon = 1e-5;
// Learnable parameters
this.gamma = Array(numFeatures).fill(1);
this.beta = Array(numFeatures).fill(0);
// Running statistics
this.runningMean = Array(numFeatures).fill(0);
this.runningVar = Array(numFeatures).fill(1);
}
forward(batch, training = true) {
if (training) {
// Calculate batch statistics
const batchMean = this.calculateMean(batch);
const batchVar = this.calculateVariance(batch, batchMean);
// Update running statistics
this.runningMean = this.runningMean.map((rm, i) =>
this.momentum * rm + (1 - this.momentum) * batchMean[i]
);
this.runningVar = this.runningVar.map((rv, i) =>
this.momentum * rv + (1 - this.momentum) * batchVar[i]
);
// Normalize and transform
return this.normalize(batch, batchMean, batchVar);
} else {
return this.normalize(batch, this.runningMean, this.runningVar);
}
}
calculateMean(batch) {
const numSamples = batch.length;
const numFeatures = batch[0].length;
const means = Array(numFeatures).fill(0);
for (const sample of batch) {
sample.forEach((val, i) => means[i] += val);
}
return means.map(m => m / numSamples);
}
calculateVariance(batch, mean) {
const numSamples = batch.length;
const numFeatures = batch[0].length;
const vars = Array(numFeatures).fill(0);
for (const sample of batch) {
sample.forEach((val, i) => {
vars[i] += Math.pow(val - mean[i], 2);
});
}
return vars.map(v => v / numSamples);
}
normalize(batch, mean, variance) {
return batch.map(sample =>
sample.map((val, i) => {
const normalized = (val - mean[i]) /
Math.sqrt(variance[i] + this.epsilon);
return this.gamma[i] * normalized + this.beta[i];
})
);
}
}
Learning Rate Scheduling
// Learning Rate Scheduler
class LRScheduler {
constructor(initialLR, strategy = 'step') {
this.initialLR = initialLR;
this.strategy = strategy;
this.currentLR = initialLR;
}
// Step decay: reduce LR every N epochs
stepDecay(epoch, stepSize = 10, gamma = 0.1) {
const factor = Math.pow(gamma, Math.floor(epoch / stepSize));
return this.initialLR * factor;
}
// Exponential decay
expDecay(epoch, gamma = 0.95) {
return this.initialLR * Math.pow(gamma, epoch);
}
// Cosine annealing
cosineAnnealing(epoch, totalEpochs) {
return this.initialLR * 0.5 *
(1 + Math.cos(Math.PI * epoch / totalEpochs));
}
getLR(epoch, totalEpochs = 100) {
switch(this.strategy) {
case 'step':
return this.stepDecay(epoch);
case 'exp':
return this.expDecay(epoch);
case 'cosine':
return this.cosineAnnealing(epoch, totalEpochs);
default:
return this.initialLR;
}
}
}
// Example
const scheduler = new LRScheduler(0.1, 'cosine');
console.log("Learning rate schedule:");
for (let epoch = 0; epoch <= 100; epoch += 10) {
const lr = scheduler.getLR(epoch, 100);
console.log(`Epoch ${epoch}: LR = ${lr.toFixed(6)}`);
}
⚠️ Deep Learning Challenges
- Vanishing/Exploding Gradients: Gradients become too small or too large in deep networks
- Overfitting: Model memorizes training data instead of learning patterns
- Computational Cost: Deep models require significant GPU power and time
- Data Requirements: Need large labeled datasets for good performance
- Hyperparameter Tuning: Many parameters to optimize (learning rate, batch size, etc.)
📚 Key Takeaways
- ✓ Deep learning uses many layers to learn hierarchical features
- ✓ CNNs are best for images using convolution and pooling
- ✓ RNNs handle sequences by maintaining hidden state
- ✓ Advanced optimizers like Adam improve training speed
- ✓ Regularization techniques (dropout, batch norm) prevent overfitting
- ✓ Learning rate scheduling helps models converge better