Machine Learning Basics
Learn the fundamentals of machine learning, types of learning, and practical implementations
What is Machine Learning?
Machine Learning (ML) is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Instead of writing specific rules for every task, we provide data and let the algorithm discover patterns and make decisions.
🎯 Core Concept:
ML algorithms learn from data, identify patterns, and make predictions or decisions with minimal human intervention.
Types of Machine Learning
Supervised Learning
Learning from labeled data with known outcomes.
Examples:
- • Email spam detection
- • House price prediction
- • Image classification
- • Medical diagnosis
Unsupervised Learning
Finding patterns in unlabeled data.
Examples:
- • Customer segmentation
- • Anomaly detection
- • Recommendation systems
- • Data compression
Reinforcement Learning
Learning through trial and error with rewards.
Examples:
- • Game playing (AlphaGo)
- • Robot navigation
- • Self-driving cars
- • Trading strategies
The Machine Learning Workflow
Data Collection
Gather relevant data from various sources
Data Preparation
Clean, normalize, and split data into training/test sets
Model Selection
Choose the appropriate algorithm for your problem
Training
Feed training data to the model to learn patterns
Evaluation
Test the model on unseen data to measure accuracy
Deployment
Deploy the model to production for real-world use
Supervised Learning: Linear Regression
Let's implement a simple linear regression model to predict house prices based on size:
// Simple Linear Regression Implementation
class LinearRegression {
constructor(learningRate = 0.01, iterations = 1000) {
this.learningRate = learningRate;
this.iterations = iterations;
this.weight = 0;
this.bias = 0;
this.history = { loss: [], weight: [], bias: [] };
}
// Training the model using gradient descent
fit(X, y) {
const n = X.length;
for (let i = 0; i < this.iterations; i++) {
// Forward pass: make predictions
const predictions = X.map(x => this.weight * x + this.bias);
// Calculate loss (Mean Squared Error)
const loss = predictions.reduce((sum, pred, idx) => {
return sum + Math.pow(pred - y[idx], 2);
}, 0) / n;
// Calculate gradients
let dWeight = 0;
let dBias = 0;
for (let j = 0; j < n; j++) {
const error = predictions[j] - y[j];
dWeight += (2 / n) * error * X[j];
dBias += (2 / n) * error;
}
// Update parameters (gradient descent)
this.weight -= this.learningRate * dWeight;
this.bias -= this.learningRate * dBias;
// Store history for visualization
if (i % 100 === 0) {
this.history.loss.push(loss);
this.history.weight.push(this.weight);
this.history.bias.push(this.bias);
console.log(`Iteration ${i}: Loss = ${loss.toFixed(4)}`);
}
}
}
// Make predictions
predict(X) {
return Array.isArray(X)
? X.map(x => this.weight * x + this.bias)
: this.weight * X + this.bias;
}
}
// Example: Predicting house prices
// X = house size in square feet
// y = price in thousands of dollars
const houseSizes = [1000, 1500, 2000, 2500, 3000, 3500, 4000];
const housePrices = [150, 200, 250, 300, 350, 400, 450];
console.log("Training Linear Regression Model...");
const model = new LinearRegression(0.0001, 1000);
model.fit(houseSizes, housePrices);
console.log(`\nModel trained!`);
console.log(`Weight (slope): ${model.weight.toFixed(4)}`);
console.log(`Bias (intercept): ${model.bias.toFixed(4)}`);
// Make predictions
console.log("\nPredictions:");
const testSizes = [1200, 2800, 3800];
const predictions = model.predict(testSizes);
testSizes.forEach((size, i) => {
console.log(`House ${size} sqft → $${predictions[i].toFixed(2)}k`);
});
// Output:
// House 1200 sqft → $170.00k
// House 2800 sqft → $320.00k
// House 3800 sqft → $420.00k
🔍 How It Works:
- Forward Pass: Calculate predictions using current weight and bias
- Loss Calculation: Measure how wrong our predictions are (MSE)
- Gradient Descent: Adjust weight and bias to reduce the loss
- Iteration: Repeat until the model converges
Unsupervised Learning: K-Means Clustering
Let's implement K-Means to group similar data points:
// K-Means Clustering Implementation
class KMeans {
constructor(k = 3, maxIterations = 100) {
this.k = k;
this.maxIterations = maxIterations;
this.centroids = [];
this.labels = [];
}
// Calculate Euclidean distance
distance(point1, point2) {
return Math.sqrt(
point1.reduce((sum, val, i) =>
sum + Math.pow(val - point2[i], 2), 0
)
);
}
// Initialize centroids randomly
initializeCentroids(data) {
const indices = new Set();
while (indices.size < this.k) {
indices.add(Math.floor(Math.random() * data.length));
}
this.centroids = Array.from(indices).map(i => [...data[i]]);
}
// Assign each point to nearest centroid
assignClusters(data) {
this.labels = data.map(point => {
const distances = this.centroids.map(centroid =>
this.distance(point, centroid)
);
return distances.indexOf(Math.min(...distances));
});
}
// Update centroids based on cluster means
updateCentroids(data) {
const newCentroids = Array(this.k).fill(null).map(() => []);
// Sum all points in each cluster
data.forEach((point, i) => {
const cluster = this.labels[i];
if (!newCentroids[cluster].length) {
newCentroids[cluster] = [...point];
} else {
point.forEach((val, j) => {
newCentroids[cluster][j] += val;
});
}
});
// Calculate means
const counts = Array(this.k).fill(0);
this.labels.forEach(label => counts[label]++);
this.centroids = newCentroids.map((centroid, i) =>
centroid.map(val => val / counts[i])
);
}
// Train the model
fit(data) {
this.initializeCentroids(data);
for (let i = 0; i < this.maxIterations; i++) {
const oldLabels = [...this.labels];
this.assignClusters(data);
this.updateCentroids(data);
// Check for convergence
if (JSON.stringify(oldLabels) === JSON.stringify(this.labels)) {
console.log(`Converged at iteration ${i}`);
break;
}
}
}
// Predict cluster for new data
predict(point) {
const distances = this.centroids.map(centroid =>
this.distance(point, centroid)
);
return distances.indexOf(Math.min(...distances));
}
}
// Example: Customer segmentation
const customers = [
[25, 50000], // [age, income]
[30, 60000],
[35, 70000],
[45, 90000],
[50, 100000],
[22, 40000],
[28, 55000],
[55, 120000],
[60, 150000],
[40, 80000]
];
console.log("Clustering customers into 3 segments...");
const kmeans = new KMeans(3);
kmeans.fit(customers);
console.log("\nCustomer Segments:");
customers.forEach((customer, i) => {
const segment = kmeans.labels[i];
console.log(`Customer ${i + 1} [Age: ${customer[0]}, Income: $${customer[1]}] → Segment ${segment}`);
});
console.log("\nCentroid positions:");
kmeans.centroids.forEach((centroid, i) => {
console.log(`Segment ${i}: Age ${centroid[0].toFixed(1)}, Income $${centroid[1].toFixed(0)}`);
});
🎯 Key Concepts:
- Centroids: Center points of each cluster
- Assignment: Each point belongs to nearest centroid
- Update: Move centroids to mean of assigned points
- Convergence: Stop when clusters no longer change
Model Evaluation Metrics
Classification Metrics
// Confusion Matrix & Metrics
function evaluateClassifier(yTrue, yPred) {
let tp = 0, fp = 0, tn = 0, fn = 0;
for (let i = 0; i < yTrue.length; i++) {
if (yTrue[i] === 1 && yPred[i] === 1) tp++;
else if (yTrue[i] === 0 && yPred[i] === 1) fp++;
else if (yTrue[i] === 0 && yPred[i] === 0) tn++;
else fn++;
}
const accuracy = (tp + tn) / (tp + tn + fp + fn);
const precision = tp / (tp + fp);
const recall = tp / (tp + fn);
const f1Score = 2 * (precision * recall) / (precision + recall);
return { accuracy, precision, recall, f1Score };
}
// Example
const actual = [1, 0, 1, 1, 0, 1, 0, 0];
const predicted = [1, 0, 1, 0, 0, 1, 1, 0];
const metrics = evaluateClassifier(actual, predicted);
console.log("Accuracy:", metrics.accuracy.toFixed(3));
console.log("Precision:", metrics.precision.toFixed(3));
console.log("Recall:", metrics.recall.toFixed(3));
console.log("F1 Score:", metrics.f1Score.toFixed(3));
Regression Metrics
// Regression Evaluation
function evaluateRegression(yTrue, yPred) {
const n = yTrue.length;
// Mean Squared Error
const mse = yTrue.reduce((sum, val, i) =>
sum + Math.pow(val - yPred[i], 2), 0) / n;
// Root Mean Squared Error
const rmse = Math.sqrt(mse);
// Mean Absolute Error
const mae = yTrue.reduce((sum, val, i) =>
sum + Math.abs(val - yPred[i]), 0) / n;
// R² Score
const yMean = yTrue.reduce((a, b) => a + b) / n;
const ssTot = yTrue.reduce((sum, val) =>
sum + Math.pow(val - yMean, 2), 0);
const ssRes = yTrue.reduce((sum, val, i) =>
sum + Math.pow(val - yPred[i], 2), 0);
const r2 = 1 - (ssRes / ssTot);
return { mse, rmse, mae, r2 };
}
// Example
const actualPrices = [100, 200, 300, 400];
const predictedPrices = [110, 190, 310, 390];
const metrics = evaluateRegression(actualPrices, predictedPrices);
console.log("MSE:", metrics.mse.toFixed(2));
console.log("RMSE:", metrics.rmse.toFixed(2));
console.log("MAE:", metrics.mae.toFixed(2));
console.log("R²:", metrics.r2.toFixed(3));
⚠️ Common ML Pitfalls
- Overfitting: Model learns training data too well, performs poorly on new data
- Underfitting: Model is too simple to capture patterns
- Data Leakage: Training data accidentally includes information about test data
- Bias in Data: Training data doesn't represent real-world distribution
- Feature Selection: Using irrelevant features or missing important ones
📚 Key Takeaways
- ✓ Machine Learning learns from data without explicit programming
- ✓ Three main types: Supervised, Unsupervised, and Reinforcement Learning
- ✓ Linear Regression is a fundamental supervised learning algorithm
- ✓ K-Means is a popular unsupervised clustering algorithm
- ✓ Model evaluation is crucial to measure performance
- ✓ Avoid overfitting by using proper train/test splits and validation