Introduction to NumPy
NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them efficiently.
What is NumPy?
NumPy stands for "Numerical Python" and was created in 2005 by Travis Oliphant. It's the foundation upon which nearly all data science libraries in Python are built. When you use pandas, scikit-learn, TensorFlow, or matplotlib, you're using NumPy under the hood.
NumPy Array (ndarray)
The core of NumPy is the ndarray (n-dimensional array) object. It's a grid of values, all of the same type, indexed by a tuple of non-negative integers. Unlike Python lists, NumPy arrays are stored in contiguous memory blocks, making operations incredibly fast.
Why it matters: NumPy arrays are 50x faster than Python lists for numerical operations. This speed is essential when working with millions of data points.
Why Use NumPy?
Speed
NumPy operations are implemented in C, making them up to 50x faster than equivalent Python loops. Vectorized operations eliminate slow Python iteration.
Memory Efficiency
Arrays use less memory than Python lists because they store elements in contiguous memory blocks with fixed data types.
Clean Syntax
Express complex mathematical operations in a single line. No more nested loops for matrix operations - just clean, readable code.
Speed Comparison: NumPy vs Python Lists
One of the most compelling reasons to use NumPy is its incredible speed advantage over standard Python lists. But don't just take our word for it-let's run a real benchmark to see the difference firsthand. We'll perform the same mathematical operation (doubling every number) on both a Python list and a NumPy array containing one million elements.
The performance difference comes down to how these data structures work internally. Python lists are flexible but slow because they store references to objects scattered across memory, and Python has to check the type of each element during operations. NumPy arrays, on the other hand, store homogeneous data (all the same type) in contiguous memory blocks, and operations are implemented in optimized C code that processes elements in batch. This allows NumPy to leverage CPU cache efficiently and apply vectorized operations-techniques that can make code 10-100x faster.
Let's measure this performance difference using Python's built-in time module. We'll create identical datasets, perform the same operation, and time how long each approach takes:
import numpy as np
import time
# Create a list and array with 1 million elements
size = 1_000_000
python_list = list(range(size))
numpy_array = np.arange(size)
# Time Python list operation
start = time.time()
python_result = [x * 2 for x in python_list]
python_time = time.time() - start
# Time NumPy array operation
start = time.time()
numpy_result = numpy_array * 2
numpy_time = time.time() - start
print(f"Python list: {python_time:.4f} seconds")
print(f"NumPy array: {numpy_time:.4f} seconds")
print(f"NumPy is {python_time/numpy_time:.1f}x faster!")
Output:
Python list: 0.0821 seconds
NumPy array: 0.0016 seconds
NumPy is 51.3x faster!
Installing NumPy
If you followed our environment setup in Module 1, NumPy should already be installed. If not, install it with pip:
# Install NumPy
pip install numpy
# Or with conda
conda install numpy
Once NumPy is installed, you'll import it at the beginning of your Python scripts or notebooks. By convention, the entire data science community uses the alias np to refer to NumPy-this is so universal that you'll see it in every tutorial, Stack Overflow answer, and professional codebase. Using this standard alias makes your code immediately recognizable to other data scientists.
Let's import NumPy and verify it's working correctly by checking its version number. Different versions may have slightly different features or performance characteristics, so it's good practice to know which version you're using:
import numpy as np
# Check version
print(np.__version__) # e.g., 1.24.3
np. This is the
universal convention in the data science community. You'll see np.
in every tutorial, book, and codebase.
Creating Arrays
NumPy provides many ways to create arrays. From converting Python lists to generating sequences of numbers, you'll use these array creation methods constantly in data science.
Creating Arrays from Python Lists
The simplest way to create a NumPy array is to convert a Python list using
np.array():
import numpy as np
# 1D array from a list
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d) # [1 2 3 4 5]
For 2D arrays (matrices), pass a nested list where each inner list becomes a row:
# 2D array from nested lists (matrix)
arr2d = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr2d)
# [[1 2 3]
# [4 5 6]]
You can create 3D arrays (and higher dimensions) by further nesting lists:
# 3D array
arr3d = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
print(arr3d.shape) # (2, 2, 2)
# Mixed types get upcasted
mixed = np.array([1, 2.5, 3])
print(mixed) # [1. 2.5 3. ]
print(mixed.dtype) # float64 (integers became floats)
You can explicitly specify the data type with the dtype parameter:
# Specify dtype explicitly
integers = np.array([1, 2, 3], dtype=np.int32)
print(integers.dtype) # int32
Arrays of Zeros, Ones, and Empty
Often you need to create arrays filled with specific values. NumPy provides convenient functions for this:
Use np.zeros() to create arrays filled with zeros:
# 1D array of zeros
zeros_1d = np.zeros(5)
print(zeros_1d) # [0. 0. 0. 0. 0.]
For multi-dimensional arrays, pass a tuple with the shape:
# 2D array of zeros (3 rows, 4 columns)
zeros_2d = np.zeros((3, 4))
print(zeros_2d)
# [[0. 0. 0. 0.]
# [0. 0. 0. 0.]
# [0. 0. 0. 0.]]
Use np.ones() for arrays of ones:
# Array of ones
ones = np.ones((2, 3))
print(ones)
# [[1. 1. 1.]
# [1. 1. 1.]]
Use np.full() to create arrays with any specific value:
# Array filled with a specific value
sevens = np.full((2, 3), 7)
print(sevens)
# [[7 7 7]
# [7 7 7]]
np.empty() does NOT initialize values to zero. It contains whatever
garbage was in memory. Only use it when you're going to overwrite all values immediately.
Numeric Sequences: arange and linspace
Two of the most commonly used functions for creating sequences of numbers:
np.arange()
Like Python's range() but returns an array.
Specify start, stop, step.
np.linspace()
Returns evenly spaced numbers over a specified interval. Specify start, stop, num (number of points).
The np.arange() function works like Python's range().
Note that the stop value is excluded:
# np.arange(start, stop, step) - stop is EXCLUDED
arr = np.arange(0, 10, 2)
print(arr) # [0 2 4 6 8]
Unlike range(), it works with floats too:
# With floats
arr_float = np.arange(0, 1, 0.2)
print(arr_float) # [0. 0.2 0.4 0.6 0.8]
The np.linspace() function creates a specific number of evenly spaced
points. The stop value is included by default:
# np.linspace(start, stop, num) - stop is INCLUDED
arr = np.linspace(0, 10, 5)
print(arr) # [ 0. 2.5 5. 7.5 10. ]
This is commonly used for generating x-axis values for plotting:
# Common use: generate points for plotting
x = np.linspace(0, 2 * np.pi, 100) # 100 points from 0 to 2π
| Function | Parameters | Includes Stop? | Best For |
|---|---|---|---|
arange() |
start, stop, step | No | Integer sequences, known step size |
linspace() |
start, stop, num | Yes (by default) | Known number of points, plotting |
Identity and Diagonal Matrices
# Identity matrix (1s on diagonal, 0s elsewhere)
identity = np.eye(3)
print(identity)
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
# Create diagonal matrix from array
diag = np.diag([1, 2, 3, 4])
print(diag)
# [[1 0 0 0]
# [0 2 0 0]
# [0 0 3 0]
# [0 0 0 4]]
# Extract diagonal from matrix
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(np.diag(matrix)) # [1 5 9]
Random Arrays
Random arrays are essential for data science: simulations, initializing neural networks, sampling, and more. Always set a seed for reproducible results:
# Set seed for reproducibility
np.random.seed(42)
Generate random floats between 0 and 1 with random():
# Random floats between 0 and 1
rand = np.random.random((2, 3))
print(rand)
# [[0.374 0.950 0.731]
# [0.598 0.156 0.155]]
For random floats in a custom range, use uniform():
# Random floats in a range [low, high)
rand_range = np.random.uniform(10, 20, size=(2, 3))
print(rand_range)
Generate random integers with randint():
# Random integers from 1 to 99
rand_int = np.random.randint(1, 100, size=(3, 3))
print(rand_int)
For data following a normal (Gaussian) distribution:
# Standard normal (mean=0, std=1)
normal = np.random.randn(1000)
print(f"Mean: {normal.mean():.3f}, Std: {normal.std():.3f}")
You can specify custom mean and standard deviation:
# Custom normal distribution
normal_custom = np.random.normal(loc=100, scale=15, size=1000)
print(f"Mean: {normal_custom.mean():.1f}") # ~100
np.random.seed() when you need
reproducible results. This ensures you get the same "random" numbers each time
you run your code - essential for debugging and sharing results.
Practice Questions: Creating Arrays
Test your understanding with these hands-on exercises.
Task: Create a 3×3 matrix filled with zeros.
Expected output:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
Show Solution
import numpy as np
zeros = np.zeros((3, 3))
print(zeros)
Task: Create an array containing even numbers from 2 to 20 (inclusive).
Expected output: [ 2 4 6 8 10 12 14 16 18 20]
Show Solution
import numpy as np
evens = np.arange(2, 21, 2)
print(evens)
Task: Create an array with 5 equally spaced values between 0 and 100.
Expected output: [ 0. 25. 50. 75. 100.]
Show Solution
import numpy as np
points = np.linspace(0, 100, 5)
print(points)
Task: Create a 4×4 identity matrix (1s on diagonal, 0s elsewhere).
Expected output:
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
Show Solution
import numpy as np
identity = np.eye(4)
print(identity)
Task: Set seed to 42, then generate a 2×3 array of random integers between 1 and 10.
Hint: Use np.random.seed() before np.random.randint()
Show Solution
import numpy as np
np.random.seed(42)
random_arr = np.random.randint(1, 11, size=(2, 3))
print(random_arr)
# [[7 4 8]
# [5 7 3]]
Array Attributes
Every NumPy array has attributes that describe its structure. Understanding these attributes is essential for working with multi-dimensional data.
Interactive: Array Shape Visualizer
Explore!Click different shapes to see how array dimensions work and understand shape, ndim, and size attributes.
np.array([1, 2, 3, 4, 5])
1D Array: A simple sequence of elements. Shape is (n,) where n is the number of elements. Think of it as a single row.
Core Array Attributes
import numpy as np
# Create a sample 2D array
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
print("Array:")
print(arr)
| Attribute | Description | Example Value |
|---|---|---|
shape |
Dimensions of the array (rows, cols, ...) | (3, 4) |
ndim |
Number of dimensions (axes) | 2 |
size |
Total number of elements | 12 |
dtype |
Data type of elements | int64 |
itemsize |
Size (bytes) of each element | 8 |
nbytes |
Total bytes consumed by array | 96 |
# Exploring array attributes
print(f"Shape: {arr.shape}") # (3, 4) - 3 rows, 4 columns
print(f"Dimensions: {arr.ndim}") # 2 - it's a 2D array
print(f"Total elements: {arr.size}") # 12 elements
print(f"Data type: {arr.dtype}") # int64
print(f"Bytes per element: {arr.itemsize}") # 8 bytes
print(f"Total bytes: {arr.nbytes}") # 96 bytes (12 × 8)
Understanding Shape
The shape attribute is the most important one. It tells you the size
of each dimension. Think of it like describing a box:
1D Array
shape = (5,)
A line of 5 elements
[1 2 3 4 5]
2D Array
shape = (3, 4)
3 rows × 4 columns
[[. . . .]
[. . . .]
[. . . .]]
3D Array
shape = (2, 3, 4)
2 matrices of 3×4
2 "layers" each
with 3 rows × 4 cols
# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(f"1D shape: {arr1d.shape}") # (5,)
print(f"1D ndim: {arr1d.ndim}") # 1
# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(f"2D shape: {arr2d.shape}") # (2, 3) - 2 rows, 3 columns
print(f"2D ndim: {arr2d.ndim}") # 2
# 3D array (tensor)
arr3d = np.zeros((2, 3, 4))
print(f"3D shape: {arr3d.shape}") # (2, 3, 4)
print(f"3D ndim: {arr3d.ndim}") # 3
(rows, cols),
the first dimension is rows, and the second is columns. This is "row-major" order (C-style).
NumPy Data Types (dtype)
NumPy has many data types, each with specific precision and memory requirements:
| Category | Data Types | Description |
|---|---|---|
| Integers | int8, int16, int32, int64 |
Signed integers (8 to 64 bits) |
| Unsigned Int | uint8, uint16, uint32, uint64 |
Unsigned (positive only) integers |
| Floats | float16, float32, float64 |
Floating-point numbers |
| Complex | complex64, complex128 |
Complex numbers |
| Boolean | bool |
True/False values |
| String | str_, unicode_ |
Fixed-length strings |
# Specifying data type
arr_int = np.array([1, 2, 3], dtype=np.int32)
print(f"Int32: {arr_int.dtype}, {arr_int.itemsize} bytes")
arr_float = np.array([1, 2, 3], dtype=np.float64)
print(f"Float64: {arr_float.dtype}, {arr_float.itemsize} bytes")
# Converting data types with astype()
arr = np.array([1.7, 2.3, 3.9])
arr_int = arr.astype(np.int32) # Truncates decimals
print(arr_int) # [1 2 3]
# Boolean arrays
arr_bool = np.array([True, False, True])
print(arr_bool.dtype) # bool
float32
instead of float64 when memory is a concern (e.g., large datasets, GPUs).
Half the memory with usually sufficient precision.
Practice Questions: Array Attributes
Test your understanding of array shapes and data types.
Given:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Task: What is the shape of this array? How many total elements?
Show Solution
print(arr.shape) # (4, 3) - 4 rows, 3 columns
print(arr.size) # 12 total elements
Task: Create an array of integers [10, 20, 30] and convert it to float32.
Show Solution
arr = np.array([10, 20, 30])
arr_float = arr.astype(np.float32)
print(arr_float.dtype) # float32
Indexing & Slicing
Accessing and selecting data from arrays is fundamental to data science. NumPy's powerful indexing system lets you select individual elements, rows, columns, or arbitrary subsets of your data.
1D Array Indexing
1D array indexing works exactly like Python lists - use square brackets with the index position. Remember: indexing starts at 0!
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
Use positive indexing to access elements from the start:
# Positive indexing (from start)
print(arr[0]) # 10 (first element)
print(arr[2]) # 30 (third element)
print(arr[4]) # 50 (last element)
Use negative indexing to access elements from the end:
# Negative indexing (from end)
print(arr[-1]) # 50 (last element)
print(arr[-2]) # 40 (second to last)
1D Array Slicing
Slicing extracts a portion of the array using the syntax
arr[start:stop:step]. The stop index is excluded.
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Basic slicing uses [start:stop]:
# Basic slicing [start:stop]
print(arr[2:5]) # [2 3 4] - indices 2, 3, 4
print(arr[:4]) # [0 1 2 3] - start to index 3
print(arr[6:]) # [6 7 8 9] - index 6 to end
Add a step value to skip elements:
# With step [start:stop:step]
print(arr[::2]) # [0 2 4 6 8] - every 2nd element
print(arr[1::2]) # [1 3 5 7 9] - odd indices
Use negative step to reverse the array:
# Reverse array
print(arr[::-1]) # [9 8 7 6 5 4 3 2 1 0]
.copy() to
create an independent copy.
When you modify a slice, the original array changes too:
# Slices are views (shared memory)
original = np.array([1, 2, 3, 4, 5])
slice_view = original[1:4]
slice_view[0] = 99
print(original) # [ 1 99 3 4 5] - original changed!
Use .copy() to create an independent copy:
# Use .copy() for independent copy
original = np.array([1, 2, 3, 4, 5])
slice_copy = original[1:4].copy()
slice_copy[0] = 99
print(original) # [1 2 3 4 5] - original unchanged
2D Array Indexing
For 2D arrays (matrices), use arr[row, col] syntax. Each dimension
is separated by a comma.
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Single element access [row, col]
print(arr[0, 0]) # 1 (first row, first col)
print(arr[1, 2]) # 7 (second row, third col)
print(arr[2, -1]) # 12 (last row, last col)
print(arr[-1, -1]) # 12 (same as above)
2D Array Slicing
Slicing 2D arrays unlocks powerful data manipulation capabilities that you'll use constantly in data science. Unlike 1D arrays where you slice along a single dimension, 2D arrays let you slice along both rows and columns simultaneously. This is incredibly useful when working with tabular data-imagine selecting specific customers (rows) and certain features (columns) from a dataset with thousands of rows and dozens of columns.
The syntax for 2D slicing follows the pattern array[row_slice, column_slice], where each slice uses the familiar start:stop:step notation. You can mix and match: grab all rows but specific columns, select a range of rows with all columns, or extract a rectangular sub-region from anywhere in the array. The colon : means "all elements" in that dimension.
Let's create a simple 3×4 array (3 rows, 4 columns) and explore various slicing operations. Think of this as a small dataset with 3 observations and 4 features-we'll practice extracting different parts of it:
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
To select an entire row, you can simply index by the row number, or use a colon to explicitly mean "all columns":
print(arr[0]) # [1 2 3 4] - first row
print(arr[0, :]) # [1 2 3 4] - same thing, explicit
To select an entire column, use a colon for "all rows" and specify the column index:
print(arr[:, 0]) # [1 5 9] - first column
print(arr[:, -1]) # [4 8 12] - last column
To extract a submatrix, slice both rows AND columns:
print(arr[0:2, 1:3])
# [[2 3]
# [6 7]]
You can use step values to select every Nth row or column:
print(arr[::2, :]) # Every other row: rows 0 and 2
# [[ 1 2 3 4]
# [ 9 10 11 12]]
print(arr[:, ::2]) # Every other column: columns 0 and 2
# [[ 1 3]
# [ 5 7]
# [ 9 11]]
arr[0] or arr[0, :]
Select row 0, all columns → returns 1D array (the row)
arr[:, 0]
Select all rows, column 0 → returns 1D array (the column)
Boolean (Mask) Indexing
Boolean indexing is one of NumPy's most powerful features. You can select elements based on conditions - no loops needed!
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Create a boolean mask from a condition:
mask = arr > 5
print(mask) # [False False False False False True True True True True]
Use the mask to select matching elements:
print(arr[mask]) # [ 6 7 8 9 10]
print(arr[arr > 5]) # Same thing, inline
Combine multiple conditions using & (AND) and | (OR):
print(arr[(arr > 3) & (arr < 8)]) # [4 5 6 7]
print(arr[(arr < 3) | (arr > 8)]) # [ 1 2 9 10]
Use boolean indexing to modify elements matching a condition:
arr[arr > 5] = 0
print(arr) # [ 1 2 3 4 5 0 0 0 0 0]
& (bitwise AND) and | (bitwise OR) for combining conditions,
NOT and/or. Also, wrap each condition in parentheses!
Fancy (Integer Array) Indexing
Use arrays of indices to select specific elements in any order:
arr = np.array([10, 20, 30, 40, 50])
indices = [0, 2, 4]
print(arr[indices]) # [10 30 50]
You can repeat indices to select the same element multiple times, or reorder elements:
print(arr[[0, 0, 1, 1]]) # [10 10 20 20] - repeat indices
print(arr[[4, 3, 2, 1, 0]]) # [50 40 30 20 10] - reverse order
For 2D arrays, provide arrays for both row and column indices to select specific elements:
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
rows = [0, 1, 2]
cols = [0, 1, 2]
print(arr2d[rows, cols]) # [1 5 9] - diagonal elements
Practice Questions: Indexing & Slicing
Master array selection with these exercises.
Given:
arr = np.array([10, 20, 30, 40, 50, 60, 70])
Task: Extract the last 3 elements using slicing.
Expected output: [50 60 70]
Show Solution
arr = np.array([10, 20, 30, 40, 50, 60, 70])
print(arr[-3:]) # [50 60 70]
Given:
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Task: Extract the second column (index 1).
Expected output: [2 5 8]
Show Solution
print(matrix[:, 1]) # [2 5 8]
Given:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
Task: Select every other element, starting from the end (in reverse).
Expected output: [8 6 4 2]
Show Solution
print(arr[::-2]) # [8 6 4 2]
Given:
data = np.array([23, 45, 12, 67, 34, 89, 11])
Task: Find all values greater than the array's mean.
Hint: Use boolean indexing with data.mean()
Show Solution
data = np.array([23, 45, 12, 67, 34, 89, 11])
mean = data.mean() # 40.14
above_mean = data[data > mean]
print(above_mean) # [45 67 89]
Given:
temps = np.array([5, -2, 8, -1, 3, -5, 10])
Task: Replace all negative temperatures with 0.
Expected output: [ 5 0 8 0 3 0 10]
Show Solution
temps = np.array([5, -2, 8, -1, 3, -5, 10])
temps[temps < 0] = 0
print(temps) # [ 5 0 8 0 3 0 10]
Reshaping & Stacking
Often you'll need to change the shape of arrays - converting 1D to 2D, flattening matrices, or combining multiple arrays. NumPy makes these transformations simple.
Reshaping Arrays
The reshape() method changes array dimensions without changing the data.
The total number of elements must stay the same!
import numpy as np
# Create 1D array with 12 elements
arr = np.arange(12)
print(arr) # [ 0 1 2 3 4 5 6 7 8 9 10 11]
# Reshape to 3 rows × 4 columns
arr_2d = arr.reshape(3, 4)
print(arr_2d)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# Reshape to 4 rows × 3 columns
arr_2d = arr.reshape(4, 3)
print(arr_2d)
# [[ 0 1 2]
# [ 3 4 5]
# [ 6 7 8]
# [ 9 10 11]]
# Reshape to 3D: 2 matrices of 2×3
arr_3d = arr.reshape(2, 2, 3)
print(arr_3d)
# [[[ 0 1 2]
# [ 3 4 5]]
# [[ 6 7 8]
# [ 9 10 11]]]
-1 as a placeholder and NumPy figures out the rest.
# Use -1 to auto-calculate dimension
arr = np.arange(12)
# "I want 3 rows, figure out columns"
print(arr.reshape(3, -1)) # Shape: (3, 4)
# "I want 4 columns, figure out rows"
print(arr.reshape(-1, 4)) # Shape: (3, 4)
# "I want 2 matrices with 3 columns each"
print(arr.reshape(2, -1, 3)) # Shape: (2, 2, 3)
Flattening: flatten() vs ravel()
Both convert multi-dimensional arrays to 1D, but with an important difference:
flatten()
Returns a copy of the data.
Modifying it doesn't affect the original array.
ravel()
Returns a view when possible.
More memory efficient, but changes may affect original.
flatten() always returns a copy of the data, so modifying the
flattened array doesn't affect the original:
arr = np.array([[1, 2, 3],
[4, 5, 6]])
flat = arr.flatten()
flat[0] = 99
print(arr[0, 0]) # 1 - original unchanged
ravel() returns a view when possible, meaning changes to the
raveled array may affect the original:
arr = np.array([[1, 2, 3],
[4, 5, 6]])
raveled = arr.ravel()
raveled[0] = 99
print(arr[0, 0]) # 99 - original changed!
Transposing Arrays
Transposing swaps rows and columns. For a 2D array, this means rows become columns and vice versa.
arr = np.array([[1, 2, 3],
[4, 5, 6]])
print(f"Original shape: {arr.shape}") # (2, 3)
There are three equivalent ways to transpose an array:
transposed = arr.T
transposed = arr.transpose()
transposed = np.transpose(arr)
All three give the same result - the rows and columns are swapped:
print(transposed)
# [[1 4]
# [2 5]
# [3 6]]
print(f"Transposed shape: {transposed.shape}") # (3, 2)
Stacking Arrays
NumPy provides several ways to combine multiple arrays into one.
vstack() stacks arrays vertically (adds rows):
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
vstacked = np.vstack([a, b])
print(vstacked)
# [[1 2 3]
# [4 5 6]]
hstack() stacks arrays horizontally (adds columns for 2D, or
concatenates for 1D):
hstacked = np.hstack([a, b])
print(hstacked) # [1 2 3 4 5 6]
With 2D arrays, vstack() adds more rows:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
print(np.vstack([arr1, arr2]))
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
And hstack() adds more columns:
print(np.hstack([arr1, arr2]))
# [[1 2 5 6]
# [3 4 7 8]]
Concatenating Arrays
np.concatenate() is more general - you specify the axis along which
to join arrays:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
Use axis=0 to concatenate along rows (add more rows):
result = np.concatenate([arr1, arr2], axis=0)
print(result)
# [[1 2]
# [3 4]
# [5 6]]
Use axis=1 to concatenate along columns (add more columns):
arr3 = np.array([[7], [8]])
result = np.concatenate([arr1, arr3], axis=1)
print(result)
# [[1 2 7]
# [3 4 8]]
Splitting Arrays
The inverse of stacking - split one array into multiple parts. Let's create an array to split:
arr = np.arange(12).reshape(3, 4)
print(arr)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
Use hsplit() to split horizontally (along columns):
parts = np.hsplit(arr, 4) # 4 parts of 1 column each
print(parts[0]) # [[0] [4] [8]]
Split at specific column indices by passing a list:
left, right = np.hsplit(arr, [2]) # Split at column 2
print(left) # [[ 0 1] [ 4 5] [ 8 9]]
print(right) # [[ 2 3] [ 6 7] [10 11]]
Use vsplit() for vertical splits (along rows):
top, bottom = np.vsplit(arr, [2]) # Split at row 2
print(top) # [[0 1 2 3] [4 5 6 7]]
print(bottom) # [[ 8 9 10 11]]
| Function | Description | Axis |
|---|---|---|
vstack |
Stack arrays vertically (add rows) | 0 |
hstack |
Stack arrays horizontally (add columns) | 1 |
vsplit |
Split array vertically (into row groups) | 0 |
hsplit |
Split array horizontally (into column groups) | 1 |
Practice Questions: Reshaping & Manipulation
Test your array reshaping skills with these exercises.
Task: Create a 1D array from 1 to 12, then reshape it into a 3x4 matrix.
Show Solution
arr = np.arange(1, 13)
matrix = arr.reshape(3, 4)
print(matrix)
# [[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
Given:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
Task: Stack them vertically to get a 4x2 array.
Show Solution
result = np.vstack([a, b])
print(result)
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
Given:
arr = np.arange(6).reshape(2, 3)
Task: Transpose it and then flatten to a 1D array.
Expected output: [0 3 1 4 2 5]
Show Solution
arr = np.arange(6).reshape(2, 3)
result = arr.T.flatten()
print(result) # [0 3 1 4 2 5]
Key Takeaways
NumPy is Essential
NumPy is the foundation of data science in Python. It provides fast, memory-efficient arrays that power pandas, scikit-learn, and more
50x+ Faster
Arrays are 50x+ faster than Python lists for numerical operations because they're implemented in C with contiguous memory
Create Arrays
Use np.array(), np.zeros(), np.ones(), np.arange(), and np.linspace() to create arrays
Key Attributes
shape (dimensions), ndim (axes count), dtype (data type), size (total elements)
Powerful Indexing
Use arr[row, col] for 2D arrays. Boolean indexing (arr[arr > 5]) selects elements matching conditions
Views vs Copies
Slices are views, not copies. Use .copy() when you need an independent copy of the data
Knowledge Check
Test your understanding of NumPy arrays: