Beginner
30 min
Full Guide

Machine Learning Basics

Learn the fundamentals of machine learning, types of learning, and practical implementations

What is Machine Learning?

Machine Learning (ML) is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Instead of writing specific rules for every task, we provide data and let the algorithm discover patterns and make decisions.

🎯 Core Concept:

ML algorithms learn from data, identify patterns, and make predictions or decisions with minimal human intervention.

Types of Machine Learning

📚

Supervised Learning

Learning from labeled data with known outcomes.

Examples:

  • • Email spam detection
  • • House price prediction
  • • Image classification
  • • Medical diagnosis
🔍

Unsupervised Learning

Finding patterns in unlabeled data.

Examples:

  • • Customer segmentation
  • • Anomaly detection
  • • Recommendation systems
  • • Data compression
🎮

Reinforcement Learning

Learning through trial and error with rewards.

Examples:

  • • Game playing (AlphaGo)
  • • Robot navigation
  • • Self-driving cars
  • • Trading strategies

The Machine Learning Workflow

1

Data Collection

Gather relevant data from various sources

2

Data Preparation

Clean, normalize, and split data into training/test sets

3

Model Selection

Choose the appropriate algorithm for your problem

4

Training

Feed training data to the model to learn patterns

5

Evaluation

Test the model on unseen data to measure accuracy

6

Deployment

Deploy the model to production for real-world use

Supervised Learning: Linear Regression

Let's implement a simple linear regression model to predict house prices based on size:

// Simple Linear Regression Implementation
class LinearRegression {
  constructor(learningRate = 0.01, iterations = 1000) {
    this.learningRate = learningRate;
    this.iterations = iterations;
    this.weight = 0;
    this.bias = 0;
    this.history = { loss: [], weight: [], bias: [] };
  }

  // Training the model using gradient descent
  fit(X, y) {
    const n = X.length;
    
    for (let i = 0; i < this.iterations; i++) {
      // Forward pass: make predictions
      const predictions = X.map(x => this.weight * x + this.bias);
      
      // Calculate loss (Mean Squared Error)
      const loss = predictions.reduce((sum, pred, idx) => {
        return sum + Math.pow(pred - y[idx], 2);
      }, 0) / n;
      
      // Calculate gradients
      let dWeight = 0;
      let dBias = 0;
      
      for (let j = 0; j < n; j++) {
        const error = predictions[j] - y[j];
        dWeight += (2 / n) * error * X[j];
        dBias += (2 / n) * error;
      }
      
      // Update parameters (gradient descent)
      this.weight -= this.learningRate * dWeight;
      this.bias -= this.learningRate * dBias;
      
      // Store history for visualization
      if (i % 100 === 0) {
        this.history.loss.push(loss);
        this.history.weight.push(this.weight);
        this.history.bias.push(this.bias);
        console.log(`Iteration ${i}: Loss = ${loss.toFixed(4)}`);
      }
    }
  }

  // Make predictions
  predict(X) {
    return Array.isArray(X) 
      ? X.map(x => this.weight * x + this.bias)
      : this.weight * X + this.bias;
  }
}

// Example: Predicting house prices
// X = house size in square feet
// y = price in thousands of dollars
const houseSizes = [1000, 1500, 2000, 2500, 3000, 3500, 4000];
const housePrices = [150, 200, 250, 300, 350, 400, 450];

console.log("Training Linear Regression Model...");
const model = new LinearRegression(0.0001, 1000);
model.fit(houseSizes, housePrices);

console.log(`\nModel trained!`);
console.log(`Weight (slope): ${model.weight.toFixed(4)}`);
console.log(`Bias (intercept): ${model.bias.toFixed(4)}`);

// Make predictions
console.log("\nPredictions:");
const testSizes = [1200, 2800, 3800];
const predictions = model.predict(testSizes);
testSizes.forEach((size, i) => {
  console.log(`House ${size} sqft → $${predictions[i].toFixed(2)}k`);
});

// Output:
// House 1200 sqft → $170.00k
// House 2800 sqft → $320.00k
// House 3800 sqft → $420.00k

🔍 How It Works:

  • Forward Pass: Calculate predictions using current weight and bias
  • Loss Calculation: Measure how wrong our predictions are (MSE)
  • Gradient Descent: Adjust weight and bias to reduce the loss
  • Iteration: Repeat until the model converges

Unsupervised Learning: K-Means Clustering

Let's implement K-Means to group similar data points:

// K-Means Clustering Implementation
class KMeans {
  constructor(k = 3, maxIterations = 100) {
    this.k = k;
    this.maxIterations = maxIterations;
    this.centroids = [];
    this.labels = [];
  }

  // Calculate Euclidean distance
  distance(point1, point2) {
    return Math.sqrt(
      point1.reduce((sum, val, i) => 
        sum + Math.pow(val - point2[i], 2), 0
      )
    );
  }

  // Initialize centroids randomly
  initializeCentroids(data) {
    const indices = new Set();
    while (indices.size < this.k) {
      indices.add(Math.floor(Math.random() * data.length));
    }
    this.centroids = Array.from(indices).map(i => [...data[i]]);
  }

  // Assign each point to nearest centroid
  assignClusters(data) {
    this.labels = data.map(point => {
      const distances = this.centroids.map(centroid => 
        this.distance(point, centroid)
      );
      return distances.indexOf(Math.min(...distances));
    });
  }

  // Update centroids based on cluster means
  updateCentroids(data) {
    const newCentroids = Array(this.k).fill(null).map(() => []);
    
    // Sum all points in each cluster
    data.forEach((point, i) => {
      const cluster = this.labels[i];
      if (!newCentroids[cluster].length) {
        newCentroids[cluster] = [...point];
      } else {
        point.forEach((val, j) => {
          newCentroids[cluster][j] += val;
        });
      }
    });
    
    // Calculate means
    const counts = Array(this.k).fill(0);
    this.labels.forEach(label => counts[label]++);
    
    this.centroids = newCentroids.map((centroid, i) => 
      centroid.map(val => val / counts[i])
    );
  }

  // Train the model
  fit(data) {
    this.initializeCentroids(data);
    
    for (let i = 0; i < this.maxIterations; i++) {
      const oldLabels = [...this.labels];
      
      this.assignClusters(data);
      this.updateCentroids(data);
      
      // Check for convergence
      if (JSON.stringify(oldLabels) === JSON.stringify(this.labels)) {
        console.log(`Converged at iteration ${i}`);
        break;
      }
    }
  }

  // Predict cluster for new data
  predict(point) {
    const distances = this.centroids.map(centroid => 
      this.distance(point, centroid)
    );
    return distances.indexOf(Math.min(...distances));
  }
}

// Example: Customer segmentation
const customers = [
  [25, 50000],  // [age, income]
  [30, 60000],
  [35, 70000],
  [45, 90000],
  [50, 100000],
  [22, 40000],
  [28, 55000],
  [55, 120000],
  [60, 150000],
  [40, 80000]
];

console.log("Clustering customers into 3 segments...");
const kmeans = new KMeans(3);
kmeans.fit(customers);

console.log("\nCustomer Segments:");
customers.forEach((customer, i) => {
  const segment = kmeans.labels[i];
  console.log(`Customer ${i + 1} [Age: ${customer[0]}, Income: $${customer[1]}] → Segment ${segment}`);
});

console.log("\nCentroid positions:");
kmeans.centroids.forEach((centroid, i) => {
  console.log(`Segment ${i}: Age ${centroid[0].toFixed(1)}, Income $${centroid[1].toFixed(0)}`);
});

🎯 Key Concepts:

  • Centroids: Center points of each cluster
  • Assignment: Each point belongs to nearest centroid
  • Update: Move centroids to mean of assigned points
  • Convergence: Stop when clusters no longer change

Model Evaluation Metrics

Classification Metrics

// Confusion Matrix & Metrics
function evaluateClassifier(yTrue, yPred) {
  let tp = 0, fp = 0, tn = 0, fn = 0;
  
  for (let i = 0; i < yTrue.length; i++) {
    if (yTrue[i] === 1 && yPred[i] === 1) tp++;
    else if (yTrue[i] === 0 && yPred[i] === 1) fp++;
    else if (yTrue[i] === 0 && yPred[i] === 0) tn++;
    else fn++;
  }
  
  const accuracy = (tp + tn) / (tp + tn + fp + fn);
  const precision = tp / (tp + fp);
  const recall = tp / (tp + fn);
  const f1Score = 2 * (precision * recall) / (precision + recall);
  
  return { accuracy, precision, recall, f1Score };
}

// Example
const actual = [1, 0, 1, 1, 0, 1, 0, 0];
const predicted = [1, 0, 1, 0, 0, 1, 1, 0];
const metrics = evaluateClassifier(actual, predicted);

console.log("Accuracy:", metrics.accuracy.toFixed(3));
console.log("Precision:", metrics.precision.toFixed(3));
console.log("Recall:", metrics.recall.toFixed(3));
console.log("F1 Score:", metrics.f1Score.toFixed(3));

Regression Metrics

// Regression Evaluation
function evaluateRegression(yTrue, yPred) {
  const n = yTrue.length;
  
  // Mean Squared Error
  const mse = yTrue.reduce((sum, val, i) => 
    sum + Math.pow(val - yPred[i], 2), 0) / n;
  
  // Root Mean Squared Error
  const rmse = Math.sqrt(mse);
  
  // Mean Absolute Error
  const mae = yTrue.reduce((sum, val, i) => 
    sum + Math.abs(val - yPred[i]), 0) / n;
  
  // R² Score
  const yMean = yTrue.reduce((a, b) => a + b) / n;
  const ssTot = yTrue.reduce((sum, val) => 
    sum + Math.pow(val - yMean, 2), 0);
  const ssRes = yTrue.reduce((sum, val, i) => 
    sum + Math.pow(val - yPred[i], 2), 0);
  const r2 = 1 - (ssRes / ssTot);
  
  return { mse, rmse, mae, r2 };
}

// Example
const actualPrices = [100, 200, 300, 400];
const predictedPrices = [110, 190, 310, 390];
const metrics = evaluateRegression(actualPrices, predictedPrices);

console.log("MSE:", metrics.mse.toFixed(2));
console.log("RMSE:", metrics.rmse.toFixed(2));
console.log("MAE:", metrics.mae.toFixed(2));
console.log("R²:", metrics.r2.toFixed(3));

⚠️ Common ML Pitfalls

  • Overfitting: Model learns training data too well, performs poorly on new data
  • Underfitting: Model is too simple to capture patterns
  • Data Leakage: Training data accidentally includes information about test data
  • Bias in Data: Training data doesn't represent real-world distribution
  • Feature Selection: Using irrelevant features or missing important ones

📚 Key Takeaways

  • Machine Learning learns from data without explicit programming
  • Three main types: Supervised, Unsupervised, and Reinforcement Learning
  • Linear Regression is a fundamental supervised learning algorithm
  • K-Means is a popular unsupervised clustering algorithm
  • Model evaluation is crucial to measure performance
  • Avoid overfitting by using proper train/test splits and validation