TechLead
Lesson 22 of 25
5 min read
Python

NumPy Basics

Learn NumPy for fast numerical computing, array operations, and mathematical functions

Introduction to NumPy

NumPy (Numerical Python) is the foundation of Python's scientific computing ecosystem. It provides the ndarray — a fast, memory-efficient multi-dimensional array that supports vectorized operations. NumPy arrays are 10-100x faster than Python lists for numerical computations because they store data in contiguous memory and use optimized C code under the hood.

Creating Arrays

import numpy as np

# From Python lists
a = np.array([1, 2, 3, 4, 5])
print(a)          # [1 2 3 4 5]
print(a.dtype)    # int64
print(a.shape)    # (5,)

# 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix.shape)  # (3, 3)

# Array creation functions
zeros = np.zeros((3, 4))        # 3x4 array of zeros
ones = np.ones((2, 3))          # 2x3 array of ones
full = np.full((2, 2), 7)       # 2x2 array of sevens
eye = np.eye(3)                  # 3x3 identity matrix
rng = np.arange(0, 10, 2)       # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1.]

# Random arrays
rng = np.random.default_rng(42)  # Seeded generator
rand_uniform = rng.random((3, 3))        # Uniform [0, 1)
rand_normal = rng.normal(0, 1, (3, 3))   # Normal distribution
rand_int = rng.integers(0, 10, (3, 3))   # Random integers

# Data types
float_arr = np.array([1, 2, 3], dtype=np.float32)
int_arr = np.array([1.5, 2.7, 3.9], dtype=np.int32)  # Truncates to [1, 2, 3]

Array Operations (Vectorization)

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

# Element-wise operations (no loops needed!)
print(a + b)       # [11 22 33 44 55]
print(a * b)       # [10 40 90 160 250]
print(a ** 2)      # [1 4 9 16 25]
print(np.sqrt(a))  # [1. 1.414 1.732 2. 2.236]

# Comparison (returns boolean array)
print(a > 3)       # [False False False  True  True]
print(a[a > 3])    # [4, 5] - boolean indexing

# Aggregation functions
print(np.sum(a))        # 15
print(np.mean(a))       # 3.0
print(np.std(a))        # 1.414
print(np.min(a))        # 1
print(np.max(a))        # 5
print(np.argmax(a))     # 4 (index of max)
print(np.cumsum(a))     # [ 1  3  6 10 15]

# Matrix operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print(A @ B)             # Matrix multiplication
print(np.dot(A, B))      # Same as above
print(A.T)               # Transpose
print(np.linalg.det(A))  # Determinant: -2.0
print(np.linalg.inv(A))  # Inverse

# Speed comparison
import time

size = 1_000_000
py_list = list(range(size))
np_arr = np.arange(size)

start = time.perf_counter()
result = [x * 2 for x in py_list]
print(f"Python list: {time.perf_counter() - start:.4f}s")

start = time.perf_counter()
result = np_arr * 2
print(f"NumPy array: {time.perf_counter() - start:.4f}s")
# NumPy is typically 50-100x faster!

Indexing and Slicing

import numpy as np

arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

# Basic indexing
print(arr[0, 0])     # 1
print(arr[2, 3])     # 12
print(arr[0])        # [1, 2, 3, 4] (first row)
print(arr[:, 0])     # [1, 5, 9] (first column)

# Slicing
print(arr[0:2, 1:3])  # [[2, 3], [6, 7]]
print(arr[:, ::2])     # Every other column

# Boolean indexing
print(arr[arr > 6])    # [ 7  8  9 10 11 12]

# Fancy indexing (index with arrays)
rows = np.array([0, 2])
cols = np.array([1, 3])
print(arr[rows, cols])  # [2, 12]

# Reshaping
a = np.arange(12)
b = a.reshape(3, 4)
c = a.reshape(2, 2, 3)  # 3D array
print(b)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Flatten
print(c.flatten())  # Back to 1D
print(c.ravel())    # Same but returns a view when possible

Broadcasting

Broadcasting is NumPy's mechanism for performing operations on arrays of different shapes. It automatically expands smaller arrays to match larger ones without copying data.

import numpy as np

# Scalar broadcast
a = np.array([1, 2, 3])
print(a * 10)  # [10, 20, 30] - scalar broadcasts to [10, 10, 10]

# Row/column broadcast
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

row = np.array([10, 20, 30])
print(matrix + row)
# [[11 22 33]
#  [14 25 36]
#  [17 28 39]]

col = np.array([[100], [200], [300]])
print(matrix + col)
# [[101 102 103]
#  [204 205 206]
#  [307 308 309]]

# Practical: normalize data (zero mean, unit variance)
data = np.random.randn(100, 4)  # 100 samples, 4 features
mean = data.mean(axis=0)         # Mean of each column
std = data.std(axis=0)           # Std of each column
normalized = (data - mean) / std  # Broadcasting handles it!

Key Takeaways

  • Vectorize, do not loop: Use array operations instead of Python for loops
  • Broadcasting: NumPy automatically handles arrays of different shapes
  • Boolean indexing: Filter arrays with conditions directly
  • Use proper dtypes: float32 uses half the memory of float64

Continue Learning