Machine Learning Basics - Learn AI | TechLead

What is Machine Learning?

Machine Learning (ML) is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Instead of writing specific rules for every task, we provide data and let the algorithm discover patterns and make decisions.

🎯 Core Concept:

ML algorithms learn from data, identify patterns, and make predictions or decisions with minimal human intervention.

Types of Machine Learning

📚

Supervised Learning

Learning from labeled data with known outcomes.

Examples:

• Email spam detection
• House price prediction
• Image classification
• Medical diagnosis

🔍

Unsupervised Learning

Finding patterns in unlabeled data.

Examples:

• Customer segmentation
• Anomaly detection
• Recommendation systems
• Data compression

🎮

Reinforcement Learning

Learning through trial and error with rewards.

Examples:

• Game playing (AlphaGo)
• Robot navigation
• Self-driving cars
• Trading strategies

The Machine Learning Workflow

Data Collection

Gather relevant data from various sources

Data Preparation

Clean, normalize, and split data into training/test sets

Model Selection

Choose the appropriate algorithm for your problem

Training

Feed training data to the model to learn patterns

Evaluation

Test the model on unseen data to measure accuracy

Deployment

Deploy the model to production for real-world use

Supervised Learning: Linear Regression

Let's implement a simple linear regression model to predict house prices based on size:

// Simple Linear Regression Implementation
class LinearRegression {
  constructor(learningRate = 0.01, iterations = 1000) {
    this.learningRate = learningRate;
    this.iterations = iterations;
    this.weight = 0;
    this.bias = 0;
    this.history = { loss: [], weight: [], bias: [] };
  }

  // Training the model using gradient descent
  fit(X, y) {
    const n = X.length;
    
    for (let i = 0; i < this.iterations; i++) {
      // Forward pass: make predictions
      const predictions = X.map(x => this.weight * x + this.bias);
      
      // Calculate loss (Mean Squared Error)
      const loss = predictions.reduce((sum, pred, idx) => {
        return sum + Math.pow(pred - y[idx], 2);
      }, 0) / n;
      
      // Calculate gradients
      let dWeight = 0;
      let dBias = 0;
      
      for (let j = 0; j < n; j++) {
        const error = predictions[j] - y[j];
        dWeight += (2 / n) * error * X[j];
        dBias += (2 / n) * error;
      }
      
      // Update parameters (gradient descent)
      this.weight -= this.learningRate * dWeight;
      this.bias -= this.learningRate * dBias;
      
      // Store history for visualization
      if (i % 100 === 0) {
        this.history.loss.push(loss);
        this.history.weight.push(this.weight);
        this.history.bias.push(this.bias);
        console.log(`Iteration ${i}: Loss = ${loss.toFixed(4)}`);
      }
    }
  }

  // Make predictions
  predict(X) {
    return Array.isArray(X) 
      ? X.map(x => this.weight * x + this.bias)
      : this.weight * X + this.bias;
  }
}

// Example: Predicting house prices
// X = house size in square feet
// y = price in thousands of dollars
const houseSizes = [1000, 1500, 2000, 2500, 3000, 3500, 4000];
const housePrices = [150, 200, 250, 300, 350, 400, 450];

console.log("Training Linear Regression Model...");
const model = new LinearRegression(0.0001, 1000);
model.fit(houseSizes, housePrices);

console.log(`\nModel trained!`);
console.log(`Weight (slope): ${model.weight.toFixed(4)}`);
console.log(`Bias (intercept): ${model.bias.toFixed(4)}`);

// Make predictions
console.log("\nPredictions:");
const testSizes = [1200, 2800, 3800];
const predictions = model.predict(testSizes);
testSizes.forEach((size, i) => {
  console.log(`House ${size} sqft → $${predictions[i].toFixed(2)}k`);
});

// Output:
// House 1200 sqft → $170.00k
// House 2800 sqft → $320.00k
// House 3800 sqft → $420.00k

🔍 How It Works:

Forward Pass: Calculate predictions using current weight and bias
Loss Calculation: Measure how wrong our predictions are (MSE)
Gradient Descent: Adjust weight and bias to reduce the loss
Iteration: Repeat until the model converges

Unsupervised Learning: K-Means Clustering

Let's implement K-Means to group similar data points:

// K-Means Clustering Implementation
class KMeans {
  constructor(k = 3, maxIterations = 100) {
    this.k = k;
    this.maxIterations = maxIterations;
    this.centroids = [];
    this.labels = [];
  }

  // Calculate Euclidean distance
  distance(point1, point2) {
    return Math.sqrt(
      point1.reduce((sum, val, i) => 
        sum + Math.pow(val - point2[i], 2), 0
      )
    );
  }

  // Initialize centroids randomly
  initializeCentroids(data) {
    const indices = new Set();
    while (indices.size < this.k) {
      indices.add(Math.floor(Math.random() * data.length));
    }
    this.centroids = Array.from(indices).map(i => [...data[i]]);
  }

  // Assign each point to nearest centroid
  assignClusters(data) {
    this.labels = data.map(point => {
      const distances = this.centroids.map(centroid => 
        this.distance(point, centroid)
      );
      return distances.indexOf(Math.min(...distances));
    });
  }

  // Update centroids based on cluster means
  updateCentroids(data) {
    const newCentroids = Array(this.k).fill(null).map(() => []);
    
    // Sum all points in each cluster
    data.forEach((point, i) => {
      const cluster = this.labels[i];
      if (!newCentroids[cluster].length) {
        newCentroids[cluster] = [...point];
      } else {
        point.forEach((val, j) => {
          newCentroids[cluster][j] += val;
        });
      }
    });
    
    // Calculate means
    const counts = Array(this.k).fill(0);
    this.labels.forEach(label => counts[label]++);
    
    this.centroids = newCentroids.map((centroid, i) => 
      centroid.map(val => val / counts[i])
    );
  }

  // Train the model
  fit(data) {
    this.initializeCentroids(data);
    
    for (let i = 0; i < this.maxIterations; i++) {
      const oldLabels = [...this.labels];
      
      this.assignClusters(data);
      this.updateCentroids(data);
      
      // Check for convergence
      if (JSON.stringify(oldLabels) === JSON.stringify(this.labels)) {
        console.log(`Converged at iteration ${i}`);
        break;
      }
    }
  }

  // Predict cluster for new data
  predict(point) {
    const distances = this.centroids.map(centroid => 
      this.distance(point, centroid)
    );
    return distances.indexOf(Math.min(...distances));
  }
}

// Example: Customer segmentation
const customers = [
  [25, 50000],  // [age, income]
  [30, 60000],
  [35, 70000],
  [45, 90000],
  [50, 100000],
  [22, 40000],
  [28, 55000],
  [55, 120000],
  [60, 150000],
  [40, 80000]
];

console.log("Clustering customers into 3 segments...");
const kmeans = new KMeans(3);
kmeans.fit(customers);

console.log("\nCustomer Segments:");
customers.forEach((customer, i) => {
  const segment = kmeans.labels[i];
  console.log(`Customer ${i + 1} [Age: ${customer[0]}, Income: $${customer[1]}] → Segment ${segment}`);
});

console.log("\nCentroid positions:");
kmeans.centroids.forEach((centroid, i) => {
  console.log(`Segment ${i}: Age ${centroid[0].toFixed(1)}, Income $${centroid[1].toFixed(0)}`);
});

🎯 Key Concepts:

Centroids: Center points of each cluster
Assignment: Each point belongs to nearest centroid
Update: Move centroids to mean of assigned points
Convergence: Stop when clusters no longer change

Model Evaluation Metrics

Classification Metrics

// Confusion Matrix & Metrics
function evaluateClassifier(yTrue, yPred) {
  let tp = 0, fp = 0, tn = 0, fn = 0;
  
  for (let i = 0; i < yTrue.length; i++) {
    if (yTrue[i] === 1 && yPred[i] === 1) tp++;
    else if (yTrue[i] === 0 && yPred[i] === 1) fp++;
    else if (yTrue[i] === 0 && yPred[i] === 0) tn++;
    else fn++;
  }
  
  const accuracy = (tp + tn) / (tp + tn + fp + fn);
  const precision = tp / (tp + fp);
  const recall = tp / (tp + fn);
  const f1Score = 2 * (precision * recall) / (precision + recall);
  
  return { accuracy, precision, recall, f1Score };
}

// Example
const actual = [1, 0, 1, 1, 0, 1, 0, 0];
const predicted = [1, 0, 1, 0, 0, 1, 1, 0];
const metrics = evaluateClassifier(actual, predicted);

console.log("Accuracy:", metrics.accuracy.toFixed(3));
console.log("Precision:", metrics.precision.toFixed(3));
console.log("Recall:", metrics.recall.toFixed(3));
console.log("F1 Score:", metrics.f1Score.toFixed(3));

Regression Metrics

// Regression Evaluation
function evaluateRegression(yTrue, yPred) {
  const n = yTrue.length;
  
  // Mean Squared Error
  const mse = yTrue.reduce((sum, val, i) => 
    sum + Math.pow(val - yPred[i], 2), 0) / n;
  
  // Root Mean Squared Error
  const rmse = Math.sqrt(mse);
  
  // Mean Absolute Error
  const mae = yTrue.reduce((sum, val, i) => 
    sum + Math.abs(val - yPred[i]), 0) / n;
  
  // R² Score
  const yMean = yTrue.reduce((a, b) => a + b) / n;
  const ssTot = yTrue.reduce((sum, val) => 
    sum + Math.pow(val - yMean, 2), 0);
  const ssRes = yTrue.reduce((sum, val, i) => 
    sum + Math.pow(val - yPred[i], 2), 0);
  const r2 = 1 - (ssRes / ssTot);
  
  return { mse, rmse, mae, r2 };
}

// Example
const actualPrices = [100, 200, 300, 400];
const predictedPrices = [110, 190, 310, 390];
const metrics = evaluateRegression(actualPrices, predictedPrices);

console.log("MSE:", metrics.mse.toFixed(2));
console.log("RMSE:", metrics.rmse.toFixed(2));
console.log("MAE:", metrics.mae.toFixed(2));
console.log("R²:", metrics.r2.toFixed(3));

⚠️ Common ML Pitfalls

Overfitting: Model learns training data too well, performs poorly on new data
Underfitting: Model is too simple to capture patterns
Data Leakage: Training data accidentally includes information about test data
Bias in Data: Training data doesn't represent real-world distribution
Feature Selection: Using irrelevant features or missing important ones

📚 Key Takeaways

✓ Machine Learning learns from data without explicit programming
✓ Three main types: Supervised, Unsupervised, and Reinforcement Learning
✓ Linear Regression is a fundamental supervised learning algorithm
✓ K-Means is a popular unsupervised clustering algorithm
✓ Model evaluation is crucial to measure performance
✓ Avoid overfitting by using proper train/test splits and validation

Learn More

Google ML Crash Course

A fast-paced, practical introduction to machine learning.

Scikit-Learn Tutorials

Official tutorials for the popular Python ML library.

Machine Learning Specialization

Andrew Ng's famous course on ML fundamentals.