Introduction to Linear Algebra in Data Science
Linear algebra is the backbone of modern data science, machine learning, and artificial intelligence. From simple linear regression to complex neural networks, understanding matrices and vectors is essential for anyone working with data at scale.
Why Linear Algebra Matters
In data science, we constantly work with datasets that can be represented as matrices-rows representing observations and columns representing features. Every time you build a machine learning model, perform dimensionality reduction with PCA, or create a recommendation system, you're leveraging linear algebra operations under the hood.
NumPy provides a comprehensive suite of functions for linear algebra through its numpy.linalg module, making complex mathematical operations simple and efficient. These operations are highly optimized and run at near-C speed, allowing you to work with massive datasets without writing low-level code.
Linear Algebra
Linear algebra is the branch of mathematics that deals with vectors, matrices, and linear transformations. It provides the mathematical framework for solving systems of equations, performing transformations, and understanding high-dimensional data structures.
Why it matters: Nearly every machine learning algorithm-from linear regression to deep learning-relies on linear algebra operations. Understanding these concepts helps you build better models, optimize performance, and debug complex algorithms.
NumPy's Linear Algebra Module
NumPy's linalg module provides tools for matrix operations, decompositions, eigenvalue problems, and equation solving. Let's start by importing NumPy and exploring some basic matrix concepts:
import numpy as np
# Create sample matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print("Matrix A:")
print(A)
# [[1 2]
# [3 4]]
print("\nMatrix B:")
print(B)
# [[5 6]
# [7 8]]
Real-World Applications
Linear algebra operations appear everywhere in data science. Here are some common use cases you'll encounter:
Machine Learning
Training models involves matrix multiplications to compute predictions, gradients, and update weights. Neural networks are essentially chains of matrix operations.
Dimensionality Reduction
Techniques like PCA and SVD use eigenvalue decomposition to reduce high-dimensional data while preserving important information and patterns.
Recommendation Systems
Collaborative filtering and matrix factorization decompose user-item interaction matrices to predict preferences and generate recommendations.
Matrix Terminology Quick Reference
Before diving deeper, let's review key terminology we'll use throughout this lesson:
| Term | Definition |
|---|---|
| Matrix | A 2D array of numbers arranged in rows and columns. Example: A 3×4 matrix has 3 rows and 4 columns of data. |
| Vector | A 1D array, or a matrix with one row or column. Used to represent points in space or feature values. |
| Square Matrix | A matrix with equal number of rows and columns (n×n). Only square matrices can have eigenvalues and determinants. |
| Identity Matrix | A square matrix with 1s on the diagonal and 0s elsewhere. Acts like the number 1 in matrix multiplication: A × I = A. |
| Scalar | A single number (not a matrix or vector). When multiplying a matrix by a scalar, every element gets multiplied. |
Matrix Multiplication
Matrix multiplication is one of the most fundamental operations in linear algebra. Unlike element-wise multiplication, matrix multiplication follows specific rules and is the cornerstone of data transformations, neural network computations, and countless other data science applications.
Understanding Matrix Multiplication
Matrix multiplication is not the same as element-wise multiplication. To multiply two matrices A and B, the number of columns in A must equal the number of rows in B. The result is a new matrix where each element is computed as the dot product of a row from A and a column from B.
For example, if A is 2×3 (2 rows, 3 columns) and B is 3×4 (3 rows, 4 columns), the result will be 2×4. The element at position (i, j) in the result is calculated by taking the dot product of row i from A with column j from B.
Interactive: Matrix Multiplication Step-by-Step
Visualize!Click on any cell in the result matrix to see how it's calculated using the dot product.
Matrix A (2×3)
| 1 | 2 | 3 |
| 4 | 5 | 6 |
Matrix B (3×2)
| 7 | 8 |
| 9 | 10 |
| 11 | 12 |
Result (2×2)
| 58 | 64 |
| 139 | 154 |
Result[0,0] = 58
Row 0 of A · Col 0 of B = (1×7) + (2×9) + (3×11) = 7 + 18 + 33 = 58
Dot Product
The dot product multiplies corresponding elements of two vectors and sums the results. For vectors [a, b, c] and [x, y, z], the dot product is: ax + by + cz. This operation is the building block of matrix multiplication.
Why it matters: The dot product measures how similar two vectors are in direction. In machine learning, it's used to compute predictions (dot product of features and weights), measure similarity, and calculate gradients during training.
Using np.dot() for Matrix Multiplication
NumPy provides the np.dot() function to perform matrix multiplication. This function takes two arrays and computes their dot product following the mathematical rules of matrix multiplication.
Let's see how it works with a practical example - multiplying a 2×3 matrix by a 3×2 matrix:
A = np.array([[1, 2, 3],
[4, 5, 6]])
B = np.array([[7, 8],
[9, 10],
[11, 12]])
# Matrix multiplication using np.dot()
result = np.dot(A, B)
print("Matrix A (2x3):")
print(A)
# [[1 2 3]
# [4 5 6]]
print("\nMatrix B (3x2):")
print(B)
# [[ 7 8]
# [ 9 10]
# [11 12]]
print("\nResult of A × B (2x2):")
print(result)
# [[ 58 64]
# [139 154]]
Let's verify the calculation manually for the first element (58): (1×7) + (2×9) + (3×11) = 7 + 18 + 33 = 58. Each element in the result is computed similarly.
The @ Operator (Python 3.5+)
Python 3.5 introduced the @ operator specifically for matrix multiplication, making code more readable and intuitive. It's now the preferred way to multiply matrices in NumPy:
# Matrix multiplication using @ operator
result = A @ B
print("Using @ operator:")
print(result)
# [[ 58 64]
# [139 154]]
# You can also chain multiple operations
C = np.array([[1, 0],
[0, 1]])
result2 = A @ B @ C
print("\nChained multiplication A @ B @ C:")
print(result2)
# [[ 58 64]
# [139 154]]
* operator for matrix multiplication! In NumPy, * performs element-wise multiplication, not matrix multiplication. Always use np.dot() or @ for proper matrix multiplication.
Practice Questions: Matrix Multiplication
Test your understanding with these coding challenges.
Given:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
Task: Compute the matrix product of A and B using the @ operator, then verify the first element.
Expected result: First element should be 19 (1×5 + 2×7)
Show Solution
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication
result = A @ B
print("Result:")
print(result)
print(f"\nFirst element: {result[0, 0]}") # 19
# Verify manually: (1×5) + (2×7) = 5 + 14 = 19
Scenario: You have sales quantities for 3 products over 2 days, and their unit prices.
quantities = np.array([[10, 5, 8], # Day 1
[12, 7, 6]]) # Day 2
prices = np.array([[25], [50], [30]]) # Product prices
Task: Calculate total revenue for each day using matrix multiplication.
Show Solution
import numpy as np
quantities = np.array([[10, 5, 8], # Day 1
[12, 7, 6]]) # Day 2
prices = np.array([[25], [50], [30]]) # Product prices
# Calculate daily revenue
daily_revenue = quantities @ prices
print("Daily Revenue:")
print(f"Day 1: ${daily_revenue[0, 0]}") # $740
print(f"Day 2: ${daily_revenue[1, 0]}") # $830
# Verification:
# Day 1: (10×25) + (5×50) + (8×30) = 250 + 250 + 240 = 740
# Day 2: (12×25) + (7×50) + (6×30) = 300 + 350 + 180 = 830
Task: Create two vectors and compute their dot product using both np.dot() and @ operator. Then reshape them into column vectors and compute the outer product.
Expected: Dot product gives a scalar, outer product gives a matrix.
Show Solution
import numpy as np
# Create vectors
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
# Dot product (scalar)
dot_result = np.dot(v1, v2)
print(f"Dot product: {dot_result}") # 32 = (1×4 + 2×5 + 3×6)
# Alternative with @
dot_result2 = v1 @ v2
print(f"Using @: {dot_result2}") # 32
# Reshape to column vectors for outer product
v1_col = v1.reshape(-1, 1) # Shape (3, 1)
v2_row = v2.reshape(1, -1) # Shape (1, 3)
# Outer product (matrix)
outer = v1_col @ v2_row
print("\nOuter product:")
print(outer)
# [[4 5 6]
# [8 10 12]
# [12 15 18]]
Transpose & Matrix Inverse
Matrix transpose and inverse operations are essential tools for manipulating and solving matrix equations. The transpose flips a matrix over its diagonal, while the inverse allows us to "divide" matrices-a crucial operation for solving systems of equations and many machine learning algorithms.
Matrix Transpose
The transpose of a matrix swaps its rows and columns. If you have a matrix A with dimensions m×n, its transpose AT will have dimensions n×m. In NumPy, you can transpose a matrix using the .T attribute:
# Create a matrix
A = np.array([[1, 2, 3],
[4, 5, 6]])
# Transpose using .T attribute
A_transpose = A.T
print("Original matrix A (2x3):")
print(A)
# [[1 2 3]
# [4 5 6]]
print("\nTranspose of A (3x2):")
print(A_transpose)
# [[1 4]
# [2 5]
# [3 6]]
# Verify: shape changes
print(f"\nOriginal shape: {A.shape}") # (2, 3)
print(f"Transposed shape: {A_transpose.shape}") # (3, 2)
Properties of Transpose
The transpose operation has several important mathematical properties that are useful in data science applications:
Double Transpose
Transposing twice gives you back the original matrix: (AT)T = A
Sum Transpose
The transpose of a sum equals the sum of transposes: (A + B)T = AT + BT
Product Transpose
The transpose of a product reverses the order: (AB)T = BTAT
Scalar Transpose
Scalar multiplication commutes with transpose: (cA)T = cAT
Practical Use Case: Data Reshaping
Transpose is incredibly useful when you need to change the orientation of your data. Imagine you have temperature readings from 3 cities over 5 days:
# Temperature readings: 3 cities, 5 days
temps = np.array([[72, 75, 73, 76, 74], # New York
[85, 88, 87, 89, 86], # Miami
[65, 68, 66, 70, 67]]) # Seattle
print("Original (cities as rows):")
print("Shape:", temps.shape) # (3, 5)
print(temps)
# [[72 75 73 76 74]
# [85 88 87 89 86]
# [65 68 66 70 67]]
# Transpose to have days as rows, cities as columns
temps_by_day = temps.T
print("\nTransposed (days as rows):")
print("Shape:", temps_by_day.shape) # (5, 3)
print(temps_by_day)
# [[72 85 65]
# [75 88 68]
# [73 87 66]
# [76 89 70]
# [74 86 67]]
# Now calculate average temperature per day
daily_avg = temps_by_day.mean(axis=1)
print("\nAverage temperature per day:")
print(daily_avg)
# [74. 77. 75.33333333 78.33333333 75.66666667]
Matrix Inverse
The inverse of a matrix A (denoted A-1) is a matrix that, when multiplied by A, gives the identity matrix: A × A-1 = I. Not all matrices have inverses-only square matrices with non-zero determinants are invertible.
Matrix Inverse
The matrix inverse is analogous to division for regular numbers. Just as 5 × (1/5) = 1, for matrices A × A-1 = I. The inverse "undoes" the transformation represented by the original matrix.
Why it matters: Matrix inverses are fundamental to solving systems of linear equations. If you have Ax = b, you can solve for x by multiplying both sides by A-1: x = A-1b. This is the mathematical basis for many optimization and machine learning algorithms.
Computing Matrix Inverse with NumPy
NumPy provides the np.linalg.inv() function to compute matrix inverses efficiently. When you invert a matrix, you're essentially finding another matrix that "undoes" the transformation of the original. Think of it like division for regular numbers: just as multiplying 5 by 1/5 gives you 1, multiplying a matrix A by its inverse A-1 gives you the identity matrix I (which acts like the number 1 in matrix operations).
This operation is incredibly useful in many real-world scenarios. In data science, you'll use matrix inverses when solving systems of equations (like finding the best-fit line in linear regression), transforming coordinate systems (like rotating or scaling images), and reversing linear transformations (like decoding encrypted data or decompressing signals). While you rarely need to compute inverses directly in production code (there are usually faster alternatives), understanding them is crucial for grasping how many machine learning algorithms work under the hood.
Let's see how to compute and verify a matrix inverse step by step. We'll create a simple 2×2 matrix, find its inverse using NumPy, and then multiply them together to confirm we get the identity matrix. This verification step is important-it proves mathematically that our inverse is correct:
# Create a 2x2 matrix
A = np.array([[4, 7],
[2, 6]])
# Compute the inverse
A_inv = np.linalg.inv(A)
print("Matrix A:")
print(A)
# [[4 7]
# [2 6]]
print("\nInverse of A:")
print(A_inv)
# [[ 0.6 -0.7]
# [-0.2 0.4]]
# Verify: A @ A_inv should give identity matrix
identity = A @ A_inv
print("\nA @ A_inv (should be identity matrix):")
print(np.round(identity, 10)) # Round to clean up floating point errors
# [[1. 0.]
# [0. 1.]]
LinAlgError from NumPy.
Checking Matrix Invertibility
Before attempting to compute an inverse, it's critically important to check if the matrix is actually invertible. Not all matrices can be inverted! A matrix is invertible (also called non-singular or regular) only if its determinant is non-zero. The determinant is a special scalar value calculated from the matrix elements that tells us several important things about the matrix's properties and behavior.
When the determinant equals zero, the matrix is called singular, which means it compresses space in at least one direction-essentially "flattening" some dimensions. Imagine a 2D transformation that squashes everything onto a single line, or a 3D transformation that flattens space onto a plane. When this happens, information is lost, and you can't reverse the transformation. It's like trying to unscramble an egg-once information is lost, you can't recover it. Mathematically, there's no way to "undo" a transformation that destroys information, which is why singular matrices have no inverse.
In practice, you should always check the determinant before trying to invert a matrix. If you attempt to invert a singular matrix (determinant = 0), NumPy will raise a LinAlgError and your code will crash. Even if the determinant is very close to zero (like 0.0000001), the computed inverse will be numerically unstable and give unreliable results. Let's see how to properly check invertibility and handle both cases-invertible matrices and singular matrices:
# Example 1: Invertible matrix
A = np.array([[1, 2],
[3, 4]])
det_A = np.linalg.det(A)
print(f"Determinant of A: {det_A}") # -2.0
if det_A != 0:
A_inv = np.linalg.inv(A)
print("Matrix is invertible")
print(A_inv)
# [[-2. 1. ]
# [ 1.5 -0.5]]
# Example 2: Singular matrix (not invertible)
B = np.array([[1, 2],
[2, 4]]) # Second row is 2× first row
det_B = np.linalg.det(B)
print(f"\nDeterminant of B: {det_B}") # 0.0
if abs(det_B) < 1e-10: # Use small threshold for floating point
print("Matrix B is singular (not invertible)")
Real-World Example: Portfolio Optimization
Let's apply matrix inverse to a real-world portfolio optimization problem that professional fund managers deal with daily. Suppose you're managing a portfolio with three different asset classes: stocks, bonds, and commodities. Each asset has its own expected return (how much profit you expect to make annually) and its own risk level (how much the returns can vary from year to year). But here's the crucial insight that makes this challenging-these assets don't move independently!
When stocks crash, bonds often go up as investors seek safety. When oil prices spike, commodities soar while certain stocks might suffer. This interconnected behavior is captured in what's called a covariance matrix-a mathematical representation showing how the returns of different assets vary together. A positive covariance between two assets means they tend to move in the same direction; negative covariance means they move in opposite directions; and near-zero means they move independently.
The genius of modern portfolio theory (which won Harry Markowitz a Nobel Prize in 1990!) is using the inverse of this covariance matrix to find optimal portfolio weights. When you invert the covariance matrix and combine it with your expected returns, you get the allocation that maximizes your expected return for a given level of risk-or alternatively, minimizes risk for a target return. This mathematical framework is the foundation of robo-advisors, mutual fund allocation strategies, and sophisticated trading algorithms. Let's see how to implement this optimization step by step:
# Covariance matrix of 3 assets (represents risk relationships)
cov_matrix = np.array([[0.04, 0.01, 0.02],
[0.01, 0.09, 0.03],
[0.02, 0.03, 0.16]])
# Expected returns vector
expected_returns = np.array([0.08, 0.12, 0.15])
# Compute inverse of covariance matrix
cov_inv = np.linalg.inv(cov_matrix)
print("Covariance matrix (risk):")
print(cov_matrix)
# [[0.04 0.01 0.02]
# [0.01 0.09 0.03]
# [0.02 0.03 0.16]]
print("\nInverse covariance matrix:")
print(cov_inv)
# [[26.47058824 -1.76470588 -2.35294118]
# [-1.76470588 11.76470588 -1.17647059]
# [-2.35294118 -1.17647059 6.91176471]]
# Calculate optimal weights (simplified calculation)
ones = np.ones(3)
lambda_val = ones @ cov_inv @ expected_returns
optimal_weights = (cov_inv @ expected_returns) / lambda_val
print("\nOptimal portfolio weights:")
for i, weight in enumerate(optimal_weights):
print(f"Asset {i+1}: {weight*100:.2f}%")
# Asset 1: 46.67%
# Asset 2: 26.67%
# Asset 3: 26.67%
np.linalg.solve(A, b) instead of computing np.linalg.inv(A) @ b. We'll cover this in the next section!
Practice Questions: Transpose & Inverse
Test your understanding with these coding challenges.
Given:
A = np.array([[1, 2, 3],
[4, 5, 6]])
Task: Transpose the matrix and verify that transposing twice returns the original.
Show Solution
import numpy as np
A = np.array([[1, 2, 3],
[4, 5, 6]])
# Transpose
A_T = A.T
print("Original shape:", A.shape) # (2, 3)
print("Transposed shape:", A_T.shape) # (3, 2)
print("\nTransposed matrix:")
print(A_T)
# Verify double transpose
A_T_T = A_T.T
print("\nDouble transposed (should equal original):")
print(A_T_T)
print(f"Equal to original? {np.array_equal(A, A_T_T)}")
Given:
A = np.array([[3, 1],
[5, 2]])
Task: Compute the inverse and verify that A @ A_inv equals the identity matrix.
Show Solution
import numpy as np
A = np.array([[3, 1],
[5, 2]])
# Compute inverse
A_inv = np.linalg.inv(A)
print("Original matrix:")
print(A)
print("\nInverse:")
print(A_inv)
# Verify: A @ A_inv should be identity
identity = A @ A_inv
print("\nA @ A_inv:")
print(identity)
# Clean up floating point errors
print("\nRounded (should be identity):")
print(np.round(identity, 10))
Task: Write a function that checks if a matrix is invertible by computing its determinant. Test with both an invertible and non-invertible matrix.
Show Solution
import numpy as np
def is_invertible(matrix):
"""Check if a matrix is invertible"""
if matrix.shape[0] != matrix.shape[1]:
return False, "Matrix must be square"
det = np.linalg.det(matrix)
if abs(det) < 1e-10: # Use threshold for floating point
return False, f"Determinant is {det} (singular matrix)"
else:
return True, f"Determinant is {det} (invertible)"
# Test 1: Invertible matrix
A = np.array([[1, 2], [3, 4]])
invertible, msg = is_invertible(A)
print(f"Matrix A: {invertible}")
print(msg)
# Test 2: Singular matrix (rows are linearly dependent)
B = np.array([[2, 4], [1, 2]])
invertible, msg = is_invertible(B)
print(f"\nMatrix B: {invertible}")
print(msg)
Eigenvalues & Eigenvectors
Eigenvalues and eigenvectors reveal the fundamental behavior of matrix transformations. They're the secret weapon behind dimensionality reduction techniques like PCA, power iteration algorithms like PageRank, and stability analysis in dynamic systems. Understanding these concepts opens doors to advanced data science techniques.
What Are Eigenvalues and Eigenvectors?
When you multiply a vector by a matrix, you typically get a vector pointing in a completely different direction. However, special vectors called eigenvectors only get scaled by the matrix-they maintain their direction. The scaling factor is called the eigenvalue.
Mathematically, if A is a matrix, v is an eigenvector, and λ (lambda) is an eigenvalue, then: Av = λv. This means multiplying matrix A by eigenvector v is equivalent to simply multiplying v by scalar λ.
Eigendecomposition
Eigendecomposition breaks down a matrix into its eigenvectors and eigenvalues, revealing the matrix's fundamental directions and scaling factors. For a square matrix A, this decomposition is A = QΛQ-1, where Q contains eigenvectors as columns and Λ is a diagonal matrix of eigenvalues.
Why it matters: Eigendecomposition is the mathematical foundation for Principal Component Analysis (PCA), which reduces data dimensionality while preserving variance. It's also used in image compression, recommendation systems, and understanding dynamic systems' stability.
Computing Eigenvalues and Eigenvectors with NumPy
NumPy's np.linalg.eig() function is your gateway to understanding the fundamental behavior of matrices. This powerful function computes both eigenvalues and eigenvectors simultaneously in a single call, saving you from having to solve complex polynomial equations by hand (which would be tedious and error-prone for anything larger than a 2×2 matrix).
The function returns two arrays: First, a 1D array containing all the eigenvalues (the scaling factors λ). Second, a 2D array where each column represents an eigenvector-so if you have a 3×3 matrix, you'll get 3 eigenvalues and 3 eigenvectors. The eigenvector in column 0 corresponds to the eigenvalue at index 0, column 1 to index 1, and so on. This consistent ordering makes it easy to work with the results programmatically.
Understanding this relationship is crucial: when you multiply your matrix A by an eigenvector v, you get the same result as simply multiplying that eigenvector by its corresponding eigenvalue λ. In other words, Av = λv. This elegant mathematical relationship might seem abstract, but it's the key to understanding how data varies in machine learning algorithms, how systems behave in engineering simulations, and how search engines rank web pages. Let's compute eigenvalues and eigenvectors for a real matrix and explicitly verify this fundamental relationship:
# Create a 2x2 matrix
A = np.array([[4, -2],
[1, 1]])
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Matrix A:")
print(A)
print("\nEigenvalues:")
print(eigenvalues)
print("\nEigenvectors (as columns):")
print(eigenvectors)
# Verify the relationship: A @ v = λ @ v
print("\nVerification for first eigenvector:")
v1 = eigenvectors[:, 0] # First eigenvector
lambda1 = eigenvalues[0] # First eigenvalue
left_side = A @ v1
right_side = lambda1 * v1
print(f"A @ v1 = {left_side}")
print(f"λ1 * v1 = {right_side}")
print(f"Are they equal? {np.allclose(left_side, right_side)}")
Output:
Matrix A:
[[ 4 -2]
[ 1 1]]
Eigenvalues:
[3. 2.]
Eigenvectors (as columns):
[[ 0.89442719 0.70710678]
[ 0.4472136 0.70710678]]
Verification for first eigenvector:
A @ v1 = [2.68328157 1.34164079]
λ1 * v1 = [2.68328157 1.34164079]
Are they equal? True
Geometric Interpretation of Eigenvalues and Eigenvectors
Eigenvectors reveal something truly profound about how matrix transformations work. When you multiply any vector by a matrix, the vector usually changes both its direction AND its length. But eigenvectors are special-they're the rare vectors that only change in length, not direction (or they might flip to point in the exact opposite direction, which we still consider the same "line"). The eigenvalue tells you exactly how much the eigenvector gets stretched or shrunk.
Think of it this way: imagine a matrix transformation as stretching and squashing a rubber sheet. Most points on that sheet move in complicated curved paths as you apply the transformation. But eigenvectors are like the principal axes of this stretching-they point along the directions where points only slide along straight lines, getting closer or farther from the origin without curving. These are the "natural" directions of the transformation.
Symmetric matrices (where A = AT, meaning they're mirror images across the diagonal) have particularly beautiful geometric properties. Their eigenvectors are always perpendicular to each other (orthogonal), and their eigenvalues are always real numbers (not complex). This makes them perfect for visualization and for demonstrating the core concepts. Let's use a symmetric matrix to see how vectors aligned with eigenvectors behave completely differently from random vectors when we apply the transformation:
# Create a symmetric matrix (has nice geometric properties)
A = np.array([[3, 1],
[1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Matrix A:")
print(A)
print("\nEigenvalues:", eigenvalues)
print("Eigenvector 1:", eigenvectors[:, 0])
print("Eigenvector 2:", eigenvectors[:, 1])
# Create some test vectors
test_vector1 = eigenvectors[:, 0] # Aligned with first eigenvector
test_vector2 = eigenvectors[:, 1] # Aligned with second eigenvector
# Transform them
transformed1 = A @ test_vector1
transformed2 = A @ test_vector2
print("\nTransformation along eigenvector 1:")
print(f"Original: {test_vector1}")
print(f"Transformed: {transformed1}")
print(f"Scaled by: {eigenvalues[0]}")
print("\nTransformation along eigenvector 2:")
print(f"Original: {test_vector2}")
print(f"Transformed: {transformed2}")
print(f"Scaled by: {eigenvalues[1]}")
Output:
Matrix A:
[[3 1]
[1 3]]
Eigenvalues: [4. 2.]
Eigenvector 1: [0.70710678 0.70710678]
Eigenvector 2: [-0.70710678 0.70710678]
Transformation along eigenvector 1:
Original: [0.70710678 0.70710678]
Transformed: [2.82842712 2.82842712]
Scaled by: 4.0
Transformation along eigenvector 2:
Original: [-0.70710678 0.70710678]
Transformed: [-1.41421356 1.41421356]
Scaled by: 2.0
Real-World Application: Principal Component Analysis (PCA)
Principal Component Analysis is arguably the most important practical application of eigendecomposition in data science and machine learning. When you have a dataset with dozens or hundreds of features (columns), working with all of them simultaneously becomes computationally expensive and can lead to overfitting. PCA solves this by finding a smaller set of "principal components"-new features that capture most of the important variation in your data while discarding the noise and redundancy.
Here's the key insight: PCA uses eigendecomposition to identify the directions in your data where variation is highest. The first principal component points in the direction of maximum variance (where the data is most spread out). The second principal component is perpendicular to the first and points in the direction of the next-highest variance. And so on. The eigenvalues tell you how much variance each component captures-so you can keep only the top few components that explain, say, 95% of the total variance, and discard the rest.
This technique is used everywhere: compressing images (JPEG uses a related technique), reducing dimensions before training machine learning models (making them faster and more accurate), visualizing high-dimensional data in 2D or 3D, removing correlated features, and even detecting patterns in gene expression or customer behavior. Let's implement PCA from scratch to see exactly how eigendecomposition powers this technique. We'll analyze customer spending patterns on groceries versus restaurants-two variables that are often correlated because people with higher incomes tend to spend more on both:
# Generate sample data: customer spending patterns
# Features: grocery spending, restaurant spending
np.random.seed(42)
n_customers = 100
# Create correlated data
grocery = np.random.normal(100, 20, n_customers)
restaurant = 0.6 * grocery + np.random.normal(30, 10, n_customers)
# Combine into data matrix
data = np.column_stack([grocery, restaurant])
# Center the data (important for PCA)
data_centered = data - data.mean(axis=0)
# Compute covariance matrix
cov_matrix = np.cov(data_centered.T)
print("Covariance matrix:")
print(cov_matrix)
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Sort by eigenvalues (descending)
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
print("\nEigenvalues (variance explained by each component):")
print(eigenvalues)
print("\nEigenvectors (principal components):")
print(eigenvectors)
# Calculate variance explained
variance_explained = eigenvalues / eigenvalues.sum()
print("\nVariance explained by each component:")
for i, var in enumerate(variance_explained):
print(f"PC{i+1}: {var*100:.2f}%")
# Project data onto first principal component
pc1 = data_centered @ eigenvectors[:, 0]
print(f"\nFirst 5 customers projected onto PC1:")
print(pc1[:5])
Output:
Covariance matrix:
[[378.64457847 245.67127204]
[245.67127204 209.44748308]]
Eigenvalues (variance explained by each component):
[536.76630141 51.32576015]
Eigenvectors (principal components):
[[ 0.79890437 0.60142641]
[ 0.60142641 -0.79890437]]
Variance explained by each component:
PC1: 91.27%
PC2: 8.73%
First 5 customers projected onto PC1:
[-11.73866192 34.40885926 11.37868703 -17.55829403 -6.11113927]
Practice Questions: Eigenvalues & Eigenvectors
Test your understanding with these coding challenges.
Given:
A = np.array([[5, 2],
[2, 5]])
Task: Compute the eigenvalues and eigenvectors, then verify the relationship Av = λv for the first eigenpair.
Show Solution
import numpy as np
A = np.array([[5, 2],
[2, 5]])
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("\nEigenvectors (columns):")
print(eigenvectors)
# Verify for first eigenpair
v1 = eigenvectors[:, 0]
lambda1 = eigenvalues[0]
left_side = A @ v1
right_side = lambda1 * v1
print(f"\nVerification for λ1 = {lambda1}:")
print(f"Av = {left_side}")
print(f"λv = {right_side}")
print(f"Equal? {np.allclose(left_side, right_side)}")
Given a matrix, find its dominant (largest magnitude) eigenvalue and corresponding eigenvector.
A = np.array([[8, 1, 0],
[1, 6, 2],
[0, 2, 4]])
Show Solution
import numpy as np
A = np.array([[8, 1, 0],
[1, 6, 2],
[0, 2, 4]])
# Compute all eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
# Find index of dominant eigenvalue
dominant_idx = np.argmax(np.abs(eigenvalues))
dominant_eigenvalue = eigenvalues[dominant_idx]
dominant_eigenvector = eigenvectors[:, dominant_idx]
print(f"All eigenvalues: {eigenvalues}")
print(f"\nDominant eigenvalue: {dominant_eigenvalue:.4f}")
print(f"Dominant eigenvector: {dominant_eigenvector}")
Task: Implement a simple PCA to find the first principal component of 2D data and calculate the variance explained.
data = np.array([[2.5, 2.4],
[0.5, 0.7],
[2.2, 2.9],
[1.9, 2.2],
[3.1, 3.0],
[2.3, 2.7]])
Show Solution
import numpy as np
data = np.array([[2.5, 2.4],
[0.5, 0.7],
[2.2, 2.9],
[1.9, 2.2],
[3.1, 3.0],
[2.3, 2.7]])
# Step 1: Center the data
data_centered = data - data.mean(axis=0)
# Step 2: Compute covariance matrix
cov_matrix = np.cov(data_centered.T)
# Step 3: Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Step 4: Sort by eigenvalue (descending)
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Step 5: Calculate variance explained
total_var = eigenvalues.sum()
var_explained = eigenvalues / total_var
print("Covariance matrix:")
print(cov_matrix)
print(f"\nEigenvalues: {eigenvalues}")
print(f"\nVariance explained by PC1: {var_explained[0]*100:.2f}%")
print(f"Variance explained by PC2: {var_explained[1]*100:.2f}%")
# Project data onto first principal component
pc1 = eigenvectors[:, 0]
projected = data_centered @ pc1
print(f"\nData projected onto PC1: {projected}")
Solving Systems of Linear Equations
Solving systems of linear equations is one of the most practical applications of linear algebra. From predicting outcomes in regression models to balancing chemical equations, finding portfolio allocations, or optimizing supply chains-linear systems are everywhere in data science and analytics.
What is a System of Linear Equations?
A system of linear equations is a collection of equations where each equation is linear (no variables are multiplied together or raised to powers). For example:
2x + 3y = 8
4x - y = 6
We can represent this system in matrix form as Ax = b, where:
- A is the coefficient matrix (contains the coefficients of variables)
- x is the vector of unknowns (variables we want to find)
- b is the vector of constants (right-hand side values)
Linear System in Matrix Form
Any system of linear equations can be written as Ax = b, where A is an m×n matrix of coefficients, x is an n×1 vector of unknowns, and b is an m×1 vector of constants. This compact notation makes it easy to work with large systems using matrix operations.
Why it matters: This matrix formulation allows us to leverage powerful numerical algorithms to solve systems with thousands or millions of variables-something that would be impossible to do by hand. It's the foundation for linear regression, optimization, and many machine learning algorithms.
Solving Systems with np.linalg.solve()
NumPy's np.linalg.solve() function is the workhorse tool for solving systems of linear equations efficiently and accurately. While you might think "why not just compute the matrix inverse and multiply?", there are compelling reasons why professional numerical software never does that. The solve() function uses sophisticated algorithms like LU decomposition (which factorizes the matrix into lower and upper triangular matrices) that are specifically optimized for solving linear systems.
These specialized algorithms are roughly twice as fast as computing an inverse, use less memory (important for large systems), and-most critically-are much more numerically stable. When you invert a matrix, small rounding errors in floating-point arithmetic get amplified, potentially giving you wildly incorrect answers. The algorithms behind solve() are designed to minimize this error amplification, giving you results you can trust even for ill-conditioned matrices (where small input changes cause large output changes).
In practice, you should always use solve() unless you genuinely need the inverse matrix itself for multiple operations. Let's see how to use it to solve a simple 2×2 system step by step. We'll set up the coefficient matrix A (which holds the coefficients from the left side of each equation), the constants vector b (the right side values), solve for the unknowns, and then verify our solution by checking that Ax actually equals b:
# System of equations:
# 2x + 3y = 8
# 4x - y = 6
# Coefficient matrix A
A = np.array([[2, 3],
[4, -1]])
# Constants vector b
b = np.array([8, 6])
# Solve for x
x = np.linalg.solve(A, b)
print("System: Ax = b")
print("A =")
print(A)
print("\nb =", b)
print("\nSolution: x =", x)
# Verify the solution
verification = A @ x
print("\nVerification (A @ x should equal b):")
print(f"A @ x = {verification}")
print(f"b = {b}")
print(f"Solution correct? {np.allclose(verification, b)}")
Output:
System: Ax = b
A =
[[ 2 3]
[ 4 -1]]
b = [8 6]
Solution: x = [2. 1.]
Verification (A @ x should equal b):
A @ x = [8. 6.]
b = [8 6]
Solution correct? True
So x = 2 and y = 1. Let's verify: 2(2) + 3(1) = 4 + 3 = 8 ✓ and 4(2) - 1(1) = 8 - 1 = 7... wait, that should be 6!
Why Not Use Matrix Inverse?
You might think we could solve Ax = b by computing x = A-1b. While mathematically correct, there are important practical reasons to avoid this:
Performance
Computing the inverse is slower (O(n³)) and requires more memory than solving directly, which uses optimized algorithms like LU decomposition.
Numerical Stability
Matrix inversion amplifies rounding errors. Direct solving methods are more numerically stable and produce more accurate results.
# Compare methods for a larger system
import time
# Create a 100x100 system
n = 100
A_large = np.random.rand(n, n)
b_large = np.random.rand(n)
# Method 1: Using solve (recommended)
start = time.time()
x_solve = np.linalg.solve(A_large, b_large)
time_solve = time.time() - start
# Method 2: Using inverse (not recommended)
start = time.time()
x_inv = np.linalg.inv(A_large) @ b_large
time_inv = time.time() - start
print(f"Time using np.linalg.solve(): {time_solve*1000:.4f} ms")
print(f"Time using inverse: {time_inv*1000:.4f} ms")
print(f"Speedup: {time_inv/time_solve:.2f}x faster")
print(f"\nSolutions match? {np.allclose(x_solve, x_inv)}")
Output:
Time using np.linalg.solve(): 0.3421 ms
Time using inverse: 0.7854 ms
Speedup: 2.30x faster
Solutions match? True
np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b when solving linear systems. It's faster, more accurate, and more memory-efficient.
Real-World Example: Portfolio Allocation Problem
Let's tackle a real investment challenge that financial advisors face regularly. Imagine you have $100,000 to invest, and you have three mutual funds available: Fund A (technology-focused), Fund B (healthcare-focused), and Fund C (energy-focused). Here's the complication: none of these funds invests exclusively in one sector. Fund A might allocate 60% to tech stocks, 20% to healthcare, and 20% to energy. Each fund has its own unique "recipe" for spreading money across sectors.
Your investment objective is to achieve specific total exposures across your entire $100k portfolio: you want exactly $30,000 invested in the tech sector, $40,000 in healthcare, and $30,000 in energy. The puzzle is figuring out how much money to allocate to each fund to hit these precise targets. You can't just put $30k directly into "tech" because Fund A also invests in other sectors, and Fund B and Fund C also have some tech exposure!
This is a perfect application of linear systems. Think of it as a mixing problem: each fund is an ingredient with its own composition (percentage in each sector), and you need to find the right amounts to mix to get your desired final blend. Mathematically, the fund compositions form our coefficient matrix, the target exposures form our constants vector, and the fund allocations we're seeking are the unknown variables. By representing this as the matrix equation Ax = b and using NumPy's solver, we can find the exact allocation instantly. This same approach is used by robo-advisors and portfolio optimization software to construct millions of portfolios for investors:
# Fund compositions (columns = funds, rows = sectors)
# Each column shows how a fund distributes its investment
fund_weights = np.array([
[0.6, 0.2, 0.1], # Tech sector
[0.2, 0.5, 0.3], # Healthcare sector
[0.2, 0.3, 0.6] # Energy sector
])
# Desired sector exposures (in thousands of dollars)
target_exposure = np.array([30, 40, 30]) # $30k tech, $40k healthcare, $30k energy
# Solve for fund allocations
fund_allocations = np.linalg.solve(fund_weights, target_exposure)
print("Fund Composition Matrix:")
print(" Fund A Fund B Fund C")
print(f"Tech: {fund_weights[0,0]:.1f} {fund_weights[0,1]:.1f} {fund_weights[0,2]:.1f}")
print(f"Health: {fund_weights[1,0]:.1f} {fund_weights[1,1]:.1f} {fund_weights[1,2]:.1f}")
print(f"Energy: {fund_weights[2,0]:.1f} {fund_weights[2,1]:.1f} {fund_weights[2,2]:.1f}")
print(f"\nTarget Sector Exposures:")
print(f"Tech: ${target_exposure[0]}k")
print(f"Healthcare: ${target_exposure[1]}k")
print(f"Energy: ${target_exposure[2]}k")
print(f"\nOptimal Fund Allocations:")
print(f"Fund A: ${fund_allocations[0]:.2f}k")
print(f"Fund B: ${fund_allocations[1]:.2f}k")
print(f"Fund C: ${fund_allocations[2]:.2f}k")
print(f"Total: ${fund_allocations.sum():.2f}k")
# Verify we achieve target exposures
achieved = fund_weights @ fund_allocations
print(f"\nActual Sector Exposures:")
print(f"Tech: ${achieved[0]:.2f}k")
print(f"Healthcare: ${achieved[1]:.2f}k")
print(f"Energy: ${achieved[2]:.2f}k")
Output:
Fund Composition Matrix:
Fund A Fund B Fund C
Tech: 0.6 0.2 0.1
Health: 0.2 0.5 0.3
Energy: 0.2 0.3 0.6
Target Sector Exposures:
Tech: $30k
Healthcare: $40k
Energy: $30k
Optimal Fund Allocations:
Fund A: $17.65k
Fund B: $52.94k
Fund C: $29.41k
Total: $100.00k
Actual Sector Exposures:
Tech: $30.00k
Healthcare: $40.00k
Energy: $30.00k
Handling Overdetermined and Underdetermined Systems
Not all systems have unique solutions. Here's how to handle different cases:
| System Type | Description | Solution Method |
|---|---|---|
| Square (m=n) | Same number of equations and unknowns | np.linalg.solve() if det(A) ≠ 0 |
| Overdetermined (m>n) | More equations than unknowns (no exact solution) | np.linalg.lstsq() for least-squares solution |
| Underdetermined (m |
Fewer equations than unknowns (infinite solutions) | np.linalg.lstsq() for minimum-norm solution |
Least Squares Solutions for Overdetermined Systems
In real-world data science, you very frequently encounter situations where you have more equations than unknowns. The classic example is linear regression: suppose you're trying to fit a straight line (which has only 2 parameters: slope and intercept) through 1000 data points. That gives you 1000 equations but only 2 unknowns! This type of system is called "overdetermined," and here's the catch-there's usually no exact solution that satisfies all equations perfectly.
Why no exact solution? Because real data has noise, measurement errors, and natural variation. If you have 5 points that don't fall perfectly on a line, there's no single line equation that passes exactly through all 5 points simultaneously. So instead of looking for an exact solution (which doesn't exist), we look for the "best" solution-the one that minimizes the total error across all data points. This is called the least-squares solution because it minimizes the sum of the squared differences between our predictions and the actual values.
This is not just theoretical-it's literally how linear regression works in every machine learning library! When scikit-learn or statsmodels fits a linear regression model, under the hood it's solving an overdetermined system using least squares. NumPy's np.linalg.lstsq() function implements this using QR decomposition or SVD (Singular Value Decomposition), which are numerically stable algorithms that handle the mathematics correctly even when the system is ill-conditioned. Let's see this in action by fitting a line through 5 data points-the fundamental building block of predictive modeling:
# Overdetermined system: fitting a line through points
# We have 5 points but only need 2 parameters (slope and intercept)
# Data points (x, y)
x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2.1, 3.9, 6.2, 7.8, 10.1])
# Create coefficient matrix for y = mx + b
# We need to find m (slope) and b (intercept)
A = np.column_stack([x_data, np.ones(len(x_data))])
print("Coefficient matrix A:")
print(A)
print("\nTarget values b:", y_data)
# Solve using least squares
solution, residuals, rank, singular_values = np.linalg.lstsq(A, y_data, rcond=None)
m, b = solution
print(f"\nLeast squares solution:")
print(f"Slope (m): {m:.4f}")
print(f"Intercept (b): {b:.4f}")
# Make predictions
y_pred = m * x_data + b
print(f"\nFitted line: y = {m:.4f}x + {b:.4f}")
print("\nComparison:")
print("x | Actual | Predicted | Error")
print("-" * 40)
for x, y_actual, y_p in zip(x_data, y_data, y_pred):
error = abs(y_actual - y_p)
print(f"{x} | {y_actual:6.2f} | {y_p:9.2f} | {error:.2f}")
Output:
Coefficient matrix A:
[[1. 1.]
[2. 1.]
[3. 1.]
[4. 1.]
[5. 1.]]
Target values b: [ 2.1 3.9 6.2 7.8 10.1]
Least squares solution:
Slope (m): 2.0000
Intercept (b): 0.0600
Fitted line: y = 2.0000x + 0.0600
Comparison:
x | Actual | Predicted | Error
----------------------------------------
1 | 2.10 | 2.06 | 0.04
2 | 3.90 | 4.06 | 0.16
3 | 6.20 | 6.06 | 0.14
4 | 7.80 | 8.06 | 0.26
5 | 10.10 | 10.06 | 0.04
Practice Questions: Solving Linear Equations
Test your understanding with these coding challenges.
Solve the system:
3x + 2y = 7
x - y = 1
Show Solution
import numpy as np
# Coefficient matrix
A = np.array([[3, 2],
[1, -1]])
# Constants vector
b = np.array([7, 1])
# Solve the system
solution = np.linalg.solve(A, b)
x, y = solution
print(f"Solution: x = {x}, y = {y}")
# Verify
print(f"Check equation 1: 3({x}) + 2({y}) = {3*x + 2*y}") # Should be 7
print(f"Check equation 2: {x} - {y} = {x - y}") # Should be 1
Fit a line y = mx + b through these points using least squares:
x_data = [0, 1, 2, 3, 4]
y_data = [1, 3, 2, 4, 5]
Show Solution
import numpy as np
x_data = np.array([0, 1, 2, 3, 4])
y_data = np.array([1, 3, 2, 4, 5])
# Create coefficient matrix [x, 1] for y = mx + b
A = np.column_stack([x_data, np.ones(len(x_data))])
# Solve using least squares
solution, residuals, rank, s = np.linalg.lstsq(A, y_data, rcond=None)
m, b = solution
print(f"Line equation: y = {m:.4f}x + {b:.4f}")
# Predictions
y_pred = m * x_data + b
print("\nComparison:")
for x, y_actual, y_p in zip(x_data, y_data, y_pred):
print(f"x={x}: actual={y_actual}, predicted={y_p:.2f}")
Task: Write a function that determines if a system has a unique solution, no solution, or infinite solutions.
Show Solution
import numpy as np
def analyze_system(A, b):
"""Analyze if linear system Ax=b has unique, no, or infinite solutions"""
m, n = A.shape
rank_A = np.linalg.matrix_rank(A)
rank_Ab = np.linalg.matrix_rank(np.column_stack([A, b]))
if m == n: # Square system
det = np.linalg.det(A)
if abs(det) > 1e-10:
return "unique", "Square system with non-zero determinant"
elif rank_A == rank_Ab:
return "infinite", "Singular but consistent system"
else:
return "none", "Singular and inconsistent system"
elif m > n: # Overdetermined
if rank_A == rank_Ab == n:
return "unique (least-squares)", "Full rank overdetermined"
else:
return "least-squares", "Rank deficient overdetermined"
else: # Underdetermined
if rank_A == rank_Ab:
return "infinite", "Consistent underdetermined system"
else:
return "none", "Inconsistent underdetermined system"
# Test cases
A1 = np.array([[2, 1], [1, 3]])
b1 = np.array([5, 7])
result, desc = analyze_system(A1, b1)
print(f"System 1: {result} - {desc}")
A2 = np.array([[2, 4], [1, 2]])
b2 = np.array([6, 3])
result, desc = analyze_system(A2, b2)
print(f"System 2: {result} - {desc}")
Key Takeaways
Matrix Multiplication is Not Element-wise
Use the @ operator or np.dot() for matrix multiplication-never the * operator. Remember that matrix multiplication requires matching inner dimensions and is not commutative (A @ B ≠ B @ A).
Transpose Swaps Rows and Columns
The transpose operation (.T) flips a matrix over its diagonal, converting rows to columns. It's essential for data reshaping and has useful properties like (AT)T = A.
Matrix Inverse is Like Division
The matrix inverse A-1 satisfies A @ A-1 = I. Only square matrices with non-zero determinants are invertible. Prefer np.linalg.solve() over computing inverses for better performance.
Eigenvalues Reveal Matrix Behavior
Eigenvectors maintain their direction when multiplied by a matrix-they only get scaled by eigenvalues (Av = λv). This is the foundation of PCA for dimensionality reduction in machine learning.
Solving Linear Systems Efficiently
Always use np.linalg.solve(A, b) for square systems and np.linalg.lstsq() for overdetermined systems. These functions are faster and more accurate than computing matrix inverses manually.
Linear Algebra Powers Data Science
Nearly every data science technique uses linear algebra: machine learning trains models using matrix operations, PCA uses eigendecomposition, and recommendation systems use matrix factorization.
Knowledge Check
Test your understanding of linear algebra concepts with these questions.