TechLead
Lección 22 de 25
5 min de lectura
Python

Conceptos basicos de NumPy

Aprende NumPy para computacion numerica rapida, operaciones con arrays y funciones matematicas

Introduccion a NumPy

NumPy (Numerical Python) es la base del ecosistema de computacion cientifica de Python. Proporciona el ndarray — un array multidimensional rapido y eficiente en memoria que soporta operaciones vectorizadas. Los arrays de NumPy son 10-100x mas rapidos que las listas de Python para computaciones numericas porque almacenan datos en memoria contigua y usan codigo C optimizado internamente.

Creacion de arrays

import numpy as np

# From Python lists
a = np.array([1, 2, 3, 4, 5])
print(a)          # [1 2 3 4 5]
print(a.dtype)    # int64
print(a.shape)    # (5,)

# 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix.shape)  # (3, 3)

# Array creation functions
zeros = np.zeros((3, 4))        # 3x4 array of zeros
ones = np.ones((2, 3))          # 2x3 array of ones
full = np.full((2, 2), 7)       # 2x2 array of sevens
eye = np.eye(3)                  # 3x3 identity matrix
rng = np.arange(0, 10, 2)       # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1.]

# Random arrays
rng = np.random.default_rng(42)  # Seeded generator
rand_uniform = rng.random((3, 3))        # Uniform [0, 1)
rand_normal = rng.normal(0, 1, (3, 3))   # Normal distribution
rand_int = rng.integers(0, 10, (3, 3))   # Random integers

# Data types
float_arr = np.array([1, 2, 3], dtype=np.float32)
int_arr = np.array([1.5, 2.7, 3.9], dtype=np.int32)  # Truncates to [1, 2, 3]

Operaciones con arrays (vectorizacion)

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

# Element-wise operations (no loops needed!)
print(a + b)       # [11 22 33 44 55]
print(a * b)       # [10 40 90 160 250]
print(a ** 2)      # [1 4 9 16 25]
print(np.sqrt(a))  # [1. 1.414 1.732 2. 2.236]

# Comparison (returns boolean array)
print(a > 3)       # [False False False  True  True]
print(a[a > 3])    # [4, 5] - boolean indexing

# Aggregation functions
print(np.sum(a))        # 15
print(np.mean(a))       # 3.0
print(np.std(a))        # 1.414
print(np.min(a))        # 1
print(np.max(a))        # 5
print(np.argmax(a))     # 4 (index of max)
print(np.cumsum(a))     # [ 1  3  6 10 15]

# Matrix operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print(A @ B)             # Matrix multiplication
print(np.dot(A, B))      # Same as above
print(A.T)               # Transpose
print(np.linalg.det(A))  # Determinant: -2.0
print(np.linalg.inv(A))  # Inverse

# Speed comparison
import time

size = 1_000_000
py_list = list(range(size))
np_arr = np.arange(size)

start = time.perf_counter()
result = [x * 2 for x in py_list]
print(f"Python list: {time.perf_counter() - start:.4f}s")

start = time.perf_counter()
result = np_arr * 2
print(f"NumPy array: {time.perf_counter() - start:.4f}s")
# NumPy is typically 50-100x faster!

Indexacion y rebanado

import numpy as np

arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

# Basic indexing
print(arr[0, 0])     # 1
print(arr[2, 3])     # 12
print(arr[0])        # [1, 2, 3, 4] (first row)
print(arr[:, 0])     # [1, 5, 9] (first column)

# Slicing
print(arr[0:2, 1:3])  # [[2, 3], [6, 7]]
print(arr[:, ::2])     # Every other column

# Boolean indexing
print(arr[arr > 6])    # [ 7  8  9 10 11 12]

# Fancy indexing (index with arrays)
rows = np.array([0, 2])
cols = np.array([1, 3])
print(arr[rows, cols])  # [2, 12]

# Reshaping
a = np.arange(12)
b = a.reshape(3, 4)
c = a.reshape(2, 2, 3)  # 3D array
print(b)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Flatten
print(c.flatten())  # Back to 1D
print(c.ravel())    # Same but returns a view when possible

Broadcasting

Broadcasting es el mecanismo de NumPy para realizar operaciones en arrays de diferentes formas. Automaticamente expande arrays mas pequenos para coincidir con los mas grandes sin copiar datos.

import numpy as np

# Scalar broadcast
a = np.array([1, 2, 3])
print(a * 10)  # [10, 20, 30] - scalar broadcasts to [10, 10, 10]

# Row/column broadcast
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

row = np.array([10, 20, 30])
print(matrix + row)
# [[11 22 33]
#  [14 25 36]
#  [17 28 39]]

col = np.array([[100], [200], [300]])
print(matrix + col)
# [[101 102 103]
#  [204 205 206]
#  [307 308 309]]

# Practical: normalize data (zero mean, unit variance)
data = np.random.randn(100, 4)  # 100 samples, 4 features
mean = data.mean(axis=0)         # Mean of each column
std = data.std(axis=0)           # Std of each column
normalized = (data - mean) / std  # Broadcasting handles it!

Puntos clave

  • Vectoriza, no hagas bucles: Usa operaciones de array en lugar de bucles for de Python
  • Broadcasting: NumPy maneja automaticamente arrays de diferentes formas
  • Indexacion booleana: Filtra arrays con condiciones directamente
  • Usa dtypes adecuados: float32 usa la mitad de memoria que float64

Continuar Aprendiendo