Module 3.1

NumPy Arrays

Enter the world of numerical computing with NumPy! Learn to create, manipulate, and transform arrays - the foundation of all data science work in Python.

40 min read
Beginner
Hands-on Examples
What You'll Learn
  • Why NumPy is essential for Data Science
  • Creating arrays (zeros, ones, arange, linspace)
  • Array attributes (shape, dtype, ndim)
  • Indexing & slicing arrays
  • Reshaping & stacking arrays
Contents
01

Introduction to NumPy

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them efficiently.

What is NumPy?

NumPy stands for "Numerical Python" and was created in 2005 by Travis Oliphant. It's the foundation upon which nearly all data science libraries in Python are built. When you use pandas, scikit-learn, TensorFlow, or matplotlib, you're using NumPy under the hood.

Key Concept

NumPy Array (ndarray)

The core of NumPy is the ndarray (n-dimensional array) object. It's a grid of values, all of the same type, indexed by a tuple of non-negative integers. Unlike Python lists, NumPy arrays are stored in contiguous memory blocks, making operations incredibly fast.

Why it matters: NumPy arrays are 50x faster than Python lists for numerical operations. This speed is essential when working with millions of data points.

Why Use NumPy?

Speed

NumPy operations are implemented in C, making them up to 50x faster than equivalent Python loops. Vectorized operations eliminate slow Python iteration.

Memory Efficiency

Arrays use less memory than Python lists because they store elements in contiguous memory blocks with fixed data types.

Clean Syntax

Express complex mathematical operations in a single line. No more nested loops for matrix operations - just clean, readable code.

Speed Comparison: NumPy vs Python Lists

One of the most compelling reasons to use NumPy is its incredible speed advantage over standard Python lists. But don't just take our word for it-let's run a real benchmark to see the difference firsthand. We'll perform the same mathematical operation (doubling every number) on both a Python list and a NumPy array containing one million elements.

The performance difference comes down to how these data structures work internally. Python lists are flexible but slow because they store references to objects scattered across memory, and Python has to check the type of each element during operations. NumPy arrays, on the other hand, store homogeneous data (all the same type) in contiguous memory blocks, and operations are implemented in optimized C code that processes elements in batch. This allows NumPy to leverage CPU cache efficiently and apply vectorized operations-techniques that can make code 10-100x faster.

Let's measure this performance difference using Python's built-in time module. We'll create identical datasets, perform the same operation, and time how long each approach takes:

import numpy as np
import time

# Create a list and array with 1 million elements
size = 1_000_000
python_list = list(range(size))
numpy_array = np.arange(size)

# Time Python list operation
start = time.time()
python_result = [x * 2 for x in python_list]
python_time = time.time() - start

# Time NumPy array operation
start = time.time()
numpy_result = numpy_array * 2
numpy_time = time.time() - start

print(f"Python list: {python_time:.4f} seconds")
print(f"NumPy array: {numpy_time:.4f} seconds")
print(f"NumPy is {python_time/numpy_time:.1f}x faster!")

Output:

Python list: 0.0821 seconds
NumPy array: 0.0016 seconds
NumPy is 51.3x faster!
Result: For a simple multiplication on 1 million elements, NumPy is over 50 times faster. For complex operations on larger datasets, the difference can be even more dramatic.

Installing NumPy

If you followed our environment setup in Module 1, NumPy should already be installed. If not, install it with pip:

# Install NumPy
pip install numpy

# Or with conda
conda install numpy

Once NumPy is installed, you'll import it at the beginning of your Python scripts or notebooks. By convention, the entire data science community uses the alias np to refer to NumPy-this is so universal that you'll see it in every tutorial, Stack Overflow answer, and professional codebase. Using this standard alias makes your code immediately recognizable to other data scientists.

Let's import NumPy and verify it's working correctly by checking its version number. Different versions may have slightly different features or performance characteristics, so it's good practice to know which version you're using:

import numpy as np

# Check version
print(np.__version__)  # e.g., 1.24.3
Convention: Always import NumPy as np. This is the universal convention in the data science community. You'll see np. in every tutorial, book, and codebase.
02

Creating Arrays

NumPy provides many ways to create arrays. From converting Python lists to generating sequences of numbers, you'll use these array creation methods constantly in data science.

Creating Arrays from Python Lists

The simplest way to create a NumPy array is to convert a Python list using np.array():

import numpy as np

# 1D array from a list
arr1d = np.array([1, 2, 3, 4, 5])
print(arr1d)  # [1 2 3 4 5]

For 2D arrays (matrices), pass a nested list where each inner list becomes a row:

# 2D array from nested lists (matrix)
arr2d = np.array([[1, 2, 3], 
                  [4, 5, 6]])
print(arr2d)
# [[1 2 3]
#  [4 5 6]]

You can create 3D arrays (and higher dimensions) by further nesting lists:

# 3D array
arr3d = np.array([[[1, 2], [3, 4]], 
                  [[5, 6], [7, 8]]])
print(arr3d.shape)  # (2, 2, 2)
Note: Unlike Python lists, NumPy arrays must contain elements of the same type. If you mix types, NumPy will "upcast" to the most general type.
# Mixed types get upcasted
mixed = np.array([1, 2.5, 3])
print(mixed)        # [1.  2.5 3. ]
print(mixed.dtype)  # float64 (integers became floats)

You can explicitly specify the data type with the dtype parameter:

# Specify dtype explicitly
integers = np.array([1, 2, 3], dtype=np.int32)
print(integers.dtype)  # int32

Arrays of Zeros, Ones, and Empty

Often you need to create arrays filled with specific values. NumPy provides convenient functions for this:

Use np.zeros() to create arrays filled with zeros:

# 1D array of zeros
zeros_1d = np.zeros(5)
print(zeros_1d)  # [0. 0. 0. 0. 0.]

For multi-dimensional arrays, pass a tuple with the shape:

# 2D array of zeros (3 rows, 4 columns)
zeros_2d = np.zeros((3, 4))
print(zeros_2d)
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

Use np.ones() for arrays of ones:

# Array of ones
ones = np.ones((2, 3))
print(ones)
# [[1. 1. 1.]
#  [1. 1. 1.]]

Use np.full() to create arrays with any specific value:

# Array filled with a specific value
sevens = np.full((2, 3), 7)
print(sevens)
# [[7 7 7]
#  [7 7 7]]
Warning: np.empty() does NOT initialize values to zero. It contains whatever garbage was in memory. Only use it when you're going to overwrite all values immediately.

Numeric Sequences: arange and linspace

Two of the most commonly used functions for creating sequences of numbers:

np.arange()

Like Python's range() but returns an array. Specify start, stop, step.

np.linspace()

Returns evenly spaced numbers over a specified interval. Specify start, stop, num (number of points).

The np.arange() function works like Python's range(). Note that the stop value is excluded:

# np.arange(start, stop, step) - stop is EXCLUDED
arr = np.arange(0, 10, 2)
print(arr)  # [0 2 4 6 8]

Unlike range(), it works with floats too:

# With floats
arr_float = np.arange(0, 1, 0.2)
print(arr_float)  # [0.  0.2 0.4 0.6 0.8]

The np.linspace() function creates a specific number of evenly spaced points. The stop value is included by default:

# np.linspace(start, stop, num) - stop is INCLUDED
arr = np.linspace(0, 10, 5)
print(arr)  # [ 0.   2.5  5.   7.5 10. ]

This is commonly used for generating x-axis values for plotting:

# Common use: generate points for plotting
x = np.linspace(0, 2 * np.pi, 100)  # 100 points from 0 to 2π
Function Parameters Includes Stop? Best For
arange() start, stop, step No Integer sequences, known step size
linspace() start, stop, num Yes (by default) Known number of points, plotting

Identity and Diagonal Matrices

# Identity matrix (1s on diagonal, 0s elsewhere)
identity = np.eye(3)
print(identity)
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

# Create diagonal matrix from array
diag = np.diag([1, 2, 3, 4])
print(diag)
# [[1 0 0 0]
#  [0 2 0 0]
#  [0 0 3 0]
#  [0 0 0 4]]

# Extract diagonal from matrix
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(np.diag(matrix))  # [1 5 9]

Random Arrays

Random arrays are essential for data science: simulations, initializing neural networks, sampling, and more. Always set a seed for reproducible results:

# Set seed for reproducibility
np.random.seed(42)

Generate random floats between 0 and 1 with random():

# Random floats between 0 and 1
rand = np.random.random((2, 3))
print(rand)
# [[0.374 0.950 0.731]
#  [0.598 0.156 0.155]]

For random floats in a custom range, use uniform():

# Random floats in a range [low, high)
rand_range = np.random.uniform(10, 20, size=(2, 3))
print(rand_range)

Generate random integers with randint():

# Random integers from 1 to 99
rand_int = np.random.randint(1, 100, size=(3, 3))
print(rand_int)

For data following a normal (Gaussian) distribution:

# Standard normal (mean=0, std=1)
normal = np.random.randn(1000)
print(f"Mean: {normal.mean():.3f}, Std: {normal.std():.3f}")

You can specify custom mean and standard deviation:

# Custom normal distribution
normal_custom = np.random.normal(loc=100, scale=15, size=1000)
print(f"Mean: {normal_custom.mean():.1f}")  # ~100
Tip: Always set np.random.seed() when you need reproducible results. This ensures you get the same "random" numbers each time you run your code - essential for debugging and sharing results.

Practice Questions: Creating Arrays

Test your understanding with these hands-on exercises.

Task: Create a 3×3 matrix filled with zeros.

Expected output:

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
Show Solution
import numpy as np
zeros = np.zeros((3, 3))
print(zeros)

Task: Create an array containing even numbers from 2 to 20 (inclusive).

Expected output: [ 2 4 6 8 10 12 14 16 18 20]

Show Solution
import numpy as np
evens = np.arange(2, 21, 2)
print(evens)

Task: Create an array with 5 equally spaced values between 0 and 100.

Expected output: [ 0. 25. 50. 75. 100.]

Show Solution
import numpy as np
points = np.linspace(0, 100, 5)
print(points)

Task: Create a 4×4 identity matrix (1s on diagonal, 0s elsewhere).

Expected output:

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
Show Solution
import numpy as np
identity = np.eye(4)
print(identity)

Task: Set seed to 42, then generate a 2×3 array of random integers between 1 and 10.

Hint: Use np.random.seed() before np.random.randint()

Show Solution
import numpy as np
np.random.seed(42)
random_arr = np.random.randint(1, 11, size=(2, 3))
print(random_arr)
# [[7 4 8]
#  [5 7 3]]
03

Array Attributes

Every NumPy array has attributes that describe its structure. Understanding these attributes is essential for working with multi-dimensional data.

Interactive: Array Shape Visualizer

Explore!

Click different shapes to see how array dimensions work and understand shape, ndim, and size attributes.

1
2
3
4
5

np.array([1, 2, 3, 4, 5])

shape
(5,)
ndim
1
size
5

1D Array: A simple sequence of elements. Shape is (n,) where n is the number of elements. Think of it as a single row.

Core Array Attributes

import numpy as np

# Create a sample 2D array
arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

print("Array:")
print(arr)
Attribute Description Example Value
shape Dimensions of the array (rows, cols, ...) (3, 4)
ndim Number of dimensions (axes) 2
size Total number of elements 12
dtype Data type of elements int64
itemsize Size (bytes) of each element 8
nbytes Total bytes consumed by array 96
# Exploring array attributes
print(f"Shape: {arr.shape}")       # (3, 4) - 3 rows, 4 columns
print(f"Dimensions: {arr.ndim}")   # 2 - it's a 2D array
print(f"Total elements: {arr.size}")  # 12 elements
print(f"Data type: {arr.dtype}")   # int64
print(f"Bytes per element: {arr.itemsize}")  # 8 bytes
print(f"Total bytes: {arr.nbytes}")  # 96 bytes (12 × 8)

Understanding Shape

The shape attribute is the most important one. It tells you the size of each dimension. Think of it like describing a box:

1D Array
shape = (5,)

A line of 5 elements

[1 2 3 4 5]
2D Array
shape = (3, 4)

3 rows × 4 columns

[[. . . .]
 [. . . .]
 [. . . .]]
3D Array
shape = (2, 3, 4)

2 matrices of 3×4

2 "layers" each
with 3 rows × 4 cols
# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(f"1D shape: {arr1d.shape}")  # (5,)
print(f"1D ndim: {arr1d.ndim}")    # 1

# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(f"2D shape: {arr2d.shape}")  # (2, 3) - 2 rows, 3 columns
print(f"2D ndim: {arr2d.ndim}")    # 2

# 3D array (tensor)
arr3d = np.zeros((2, 3, 4))
print(f"3D shape: {arr3d.shape}")  # (2, 3, 4)
print(f"3D ndim: {arr3d.ndim}")    # 3
Memory Layout: For a 2D array with shape (rows, cols), the first dimension is rows, and the second is columns. This is "row-major" order (C-style).

NumPy Data Types (dtype)

NumPy has many data types, each with specific precision and memory requirements:

Category Data Types Description
Integers int8, int16, int32, int64 Signed integers (8 to 64 bits)
Unsigned Int uint8, uint16, uint32, uint64 Unsigned (positive only) integers
Floats float16, float32, float64 Floating-point numbers
Complex complex64, complex128 Complex numbers
Boolean bool True/False values
String str_, unicode_ Fixed-length strings
# Specifying data type
arr_int = np.array([1, 2, 3], dtype=np.int32)
print(f"Int32: {arr_int.dtype}, {arr_int.itemsize} bytes")

arr_float = np.array([1, 2, 3], dtype=np.float64)
print(f"Float64: {arr_float.dtype}, {arr_float.itemsize} bytes")

# Converting data types with astype()
arr = np.array([1.7, 2.3, 3.9])
arr_int = arr.astype(np.int32)  # Truncates decimals
print(arr_int)  # [1 2 3]

# Boolean arrays
arr_bool = np.array([True, False, True])
print(arr_bool.dtype)  # bool
Tip: Use float32 instead of float64 when memory is a concern (e.g., large datasets, GPUs). Half the memory with usually sufficient precision.

Practice Questions: Array Attributes

Test your understanding of array shapes and data types.

Given:

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

Task: What is the shape of this array? How many total elements?

Show Solution
print(arr.shape)  # (4, 3) - 4 rows, 3 columns
print(arr.size)   # 12 total elements

Task: Create an array of integers [10, 20, 30] and convert it to float32.

Show Solution
arr = np.array([10, 20, 30])
arr_float = arr.astype(np.float32)
print(arr_float.dtype)  # float32
04

Indexing & Slicing

Accessing and selecting data from arrays is fundamental to data science. NumPy's powerful indexing system lets you select individual elements, rows, columns, or arbitrary subsets of your data.

1D Array Indexing

1D array indexing works exactly like Python lists - use square brackets with the index position. Remember: indexing starts at 0!

import numpy as np
arr = np.array([10, 20, 30, 40, 50])

Use positive indexing to access elements from the start:

# Positive indexing (from start)
print(arr[0])   # 10 (first element)
print(arr[2])   # 30 (third element)
print(arr[4])   # 50 (last element)

Use negative indexing to access elements from the end:

# Negative indexing (from end)
print(arr[-1])  # 50 (last element)
print(arr[-2])  # 40 (second to last)

1D Array Slicing

Slicing extracts a portion of the array using the syntax arr[start:stop:step]. The stop index is excluded.

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Basic slicing uses [start:stop]:

# Basic slicing [start:stop]
print(arr[2:5])    # [2 3 4] - indices 2, 3, 4
print(arr[:4])     # [0 1 2 3] - start to index 3
print(arr[6:])     # [6 7 8 9] - index 6 to end

Add a step value to skip elements:

# With step [start:stop:step]
print(arr[::2])    # [0 2 4 6 8] - every 2nd element
print(arr[1::2])   # [1 3 5 7 9] - odd indices

Use negative step to reverse the array:

# Reverse array
print(arr[::-1])   # [9 8 7 6 5 4 3 2 1 0]
Memory View: Slices in NumPy are views, not copies! Modifying a slice modifies the original array. Use .copy() to create an independent copy.

When you modify a slice, the original array changes too:

# Slices are views (shared memory)
original = np.array([1, 2, 3, 4, 5])
slice_view = original[1:4]
slice_view[0] = 99
print(original)  # [ 1 99  3  4  5] - original changed!

Use .copy() to create an independent copy:

# Use .copy() for independent copy
original = np.array([1, 2, 3, 4, 5])
slice_copy = original[1:4].copy()
slice_copy[0] = 99
print(original)  # [1 2 3 4 5] - original unchanged

2D Array Indexing

For 2D arrays (matrices), use arr[row, col] syntax. Each dimension is separated by a comma.

arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

# Single element access [row, col]
print(arr[0, 0])   # 1 (first row, first col)
print(arr[1, 2])   # 7 (second row, third col)
print(arr[2, -1])  # 12 (last row, last col)
print(arr[-1, -1]) # 12 (same as above)

2D Array Slicing

Slicing 2D arrays unlocks powerful data manipulation capabilities that you'll use constantly in data science. Unlike 1D arrays where you slice along a single dimension, 2D arrays let you slice along both rows and columns simultaneously. This is incredibly useful when working with tabular data-imagine selecting specific customers (rows) and certain features (columns) from a dataset with thousands of rows and dozens of columns.

The syntax for 2D slicing follows the pattern array[row_slice, column_slice], where each slice uses the familiar start:stop:step notation. You can mix and match: grab all rows but specific columns, select a range of rows with all columns, or extract a rectangular sub-region from anywhere in the array. The colon : means "all elements" in that dimension.

Let's create a simple 3×4 array (3 rows, 4 columns) and explore various slicing operations. Think of this as a small dataset with 3 observations and 4 features-we'll practice extracting different parts of it:

arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

To select an entire row, you can simply index by the row number, or use a colon to explicitly mean "all columns":

print(arr[0])      # [1 2 3 4] - first row
print(arr[0, :])   # [1 2 3 4] - same thing, explicit

To select an entire column, use a colon for "all rows" and specify the column index:

print(arr[:, 0])   # [1 5 9] - first column
print(arr[:, -1])  # [4 8 12] - last column

To extract a submatrix, slice both rows AND columns:

print(arr[0:2, 1:3])
# [[2 3]
#  [6 7]]

You can use step values to select every Nth row or column:

print(arr[::2, :])   # Every other row: rows 0 and 2
# [[ 1  2  3  4]
#  [ 9 10 11 12]]

print(arr[:, ::2])   # Every other column: columns 0 and 2
# [[ 1  3]
#  [ 5  7]
#  [ 9 11]]
arr[0] or arr[0, :]

Select row 0, all columns → returns 1D array (the row)

arr[:, 0]

Select all rows, column 0 → returns 1D array (the column)

Boolean (Mask) Indexing

Boolean indexing is one of NumPy's most powerful features. You can select elements based on conditions - no loops needed!

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Create a boolean mask from a condition:

mask = arr > 5
print(mask)  # [False False False False False  True  True  True  True  True]

Use the mask to select matching elements:

print(arr[mask])     # [ 6  7  8  9 10]
print(arr[arr > 5])  # Same thing, inline

Combine multiple conditions using & (AND) and | (OR):

print(arr[(arr > 3) & (arr < 8)])  # [4 5 6 7]
print(arr[(arr < 3) | (arr > 8)])  # [ 1  2  9 10]

Use boolean indexing to modify elements matching a condition:

arr[arr > 5] = 0
print(arr)  # [ 1  2  3  4  5  0  0  0  0  0]
Important: Use & (bitwise AND) and | (bitwise OR) for combining conditions, NOT and/or. Also, wrap each condition in parentheses!

Fancy (Integer Array) Indexing

Use arrays of indices to select specific elements in any order:

arr = np.array([10, 20, 30, 40, 50])

indices = [0, 2, 4]
print(arr[indices])  # [10 30 50]

You can repeat indices to select the same element multiple times, or reorder elements:

print(arr[[0, 0, 1, 1]])  # [10 10 20 20] - repeat indices
print(arr[[4, 3, 2, 1, 0]])  # [50 40 30 20 10] - reverse order

For 2D arrays, provide arrays for both row and column indices to select specific elements:

arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

rows = [0, 1, 2]
cols = [0, 1, 2]
print(arr2d[rows, cols])  # [1 5 9] - diagonal elements
Key Difference: Boolean indexing returns elements where mask is True. Fancy indexing returns elements at specific indices you provide.

Practice Questions: Indexing & Slicing

Master array selection with these exercises.

Given:

arr = np.array([10, 20, 30, 40, 50, 60, 70])

Task: Extract the last 3 elements using slicing.

Expected output: [50 60 70]

Show Solution
arr = np.array([10, 20, 30, 40, 50, 60, 70])
print(arr[-3:])  # [50 60 70]

Given:

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

Task: Extract the second column (index 1).

Expected output: [2 5 8]

Show Solution
print(matrix[:, 1])  # [2 5 8]

Given:

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

Task: Select every other element, starting from the end (in reverse).

Expected output: [8 6 4 2]

Show Solution
print(arr[::-2])  # [8 6 4 2]

Given:

data = np.array([23, 45, 12, 67, 34, 89, 11])

Task: Find all values greater than the array's mean.

Hint: Use boolean indexing with data.mean()

Show Solution
data = np.array([23, 45, 12, 67, 34, 89, 11])
mean = data.mean()  # 40.14
above_mean = data[data > mean]
print(above_mean)  # [45 67 89]

Given:

temps = np.array([5, -2, 8, -1, 3, -5, 10])

Task: Replace all negative temperatures with 0.

Expected output: [ 5 0 8 0 3 0 10]

Show Solution
temps = np.array([5, -2, 8, -1, 3, -5, 10])
temps[temps < 0] = 0
print(temps)  # [ 5  0  8  0  3  0 10]
05

Reshaping & Stacking

Often you'll need to change the shape of arrays - converting 1D to 2D, flattening matrices, or combining multiple arrays. NumPy makes these transformations simple.

Reshaping Arrays

The reshape() method changes array dimensions without changing the data. The total number of elements must stay the same!

import numpy as np

# Create 1D array with 12 elements
arr = np.arange(12)
print(arr)  # [ 0  1  2  3  4  5  6  7  8  9 10 11]

# Reshape to 3 rows × 4 columns
arr_2d = arr.reshape(3, 4)
print(arr_2d)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Reshape to 4 rows × 3 columns
arr_2d = arr.reshape(4, 3)
print(arr_2d)
# [[ 0  1  2]
#  [ 3  4  5]
#  [ 6  7  8]
#  [ 9 10 11]]

# Reshape to 3D: 2 matrices of 2×3
arr_3d = arr.reshape(2, 2, 3)
print(arr_3d)
# [[[ 0  1  2]
#   [ 3  4  5]]
#  [[ 6  7  8]
#   [ 9 10 11]]]
Use -1 for automatic dimension: NumPy can calculate one dimension for you. Use -1 as a placeholder and NumPy figures out the rest.
# Use -1 to auto-calculate dimension
arr = np.arange(12)

# "I want 3 rows, figure out columns"
print(arr.reshape(3, -1))  # Shape: (3, 4)

# "I want 4 columns, figure out rows"
print(arr.reshape(-1, 4))  # Shape: (3, 4)

# "I want 2 matrices with 3 columns each"
print(arr.reshape(2, -1, 3))  # Shape: (2, 2, 3)

Flattening: flatten() vs ravel()

Both convert multi-dimensional arrays to 1D, but with an important difference:

flatten()

Returns a copy of the data.

Modifying it doesn't affect the original array.

ravel()

Returns a view when possible.

More memory efficient, but changes may affect original.

flatten() always returns a copy of the data, so modifying the flattened array doesn't affect the original:

arr = np.array([[1, 2, 3],
                [4, 5, 6]])

flat = arr.flatten()
flat[0] = 99
print(arr[0, 0])  # 1 - original unchanged

ravel() returns a view when possible, meaning changes to the raveled array may affect the original:

arr = np.array([[1, 2, 3],
                [4, 5, 6]])

raveled = arr.ravel()
raveled[0] = 99
print(arr[0, 0])  # 99 - original changed!

Transposing Arrays

Transposing swaps rows and columns. For a 2D array, this means rows become columns and vice versa.

arr = np.array([[1, 2, 3],
                [4, 5, 6]])
print(f"Original shape: {arr.shape}")  # (2, 3)

There are three equivalent ways to transpose an array:

transposed = arr.T
transposed = arr.transpose()
transposed = np.transpose(arr)

All three give the same result - the rows and columns are swapped:

print(transposed)
# [[1 4]
#  [2 5]
#  [3 6]]
print(f"Transposed shape: {transposed.shape}")  # (3, 2)

Stacking Arrays

NumPy provides several ways to combine multiple arrays into one. vstack() stacks arrays vertically (adds rows):

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

vstacked = np.vstack([a, b])
print(vstacked)
# [[1 2 3]
#  [4 5 6]]

hstack() stacks arrays horizontally (adds columns for 2D, or concatenates for 1D):

hstacked = np.hstack([a, b])
print(hstacked)  # [1 2 3 4 5 6]

With 2D arrays, vstack() adds more rows:

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

print(np.vstack([arr1, arr2]))
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

And hstack() adds more columns:

print(np.hstack([arr1, arr2]))
# [[1 2 5 6]
#  [3 4 7 8]]

Concatenating Arrays

np.concatenate() is more general - you specify the axis along which to join arrays:

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])

Use axis=0 to concatenate along rows (add more rows):

result = np.concatenate([arr1, arr2], axis=0)
print(result)
# [[1 2]
#  [3 4]
#  [5 6]]

Use axis=1 to concatenate along columns (add more columns):

arr3 = np.array([[7], [8]])
result = np.concatenate([arr1, arr3], axis=1)
print(result)
# [[1 2 7]
#  [3 4 8]]

Splitting Arrays

The inverse of stacking - split one array into multiple parts. Let's create an array to split:

arr = np.arange(12).reshape(3, 4)
print(arr)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

Use hsplit() to split horizontally (along columns):

parts = np.hsplit(arr, 4)  # 4 parts of 1 column each
print(parts[0])  # [[0] [4] [8]]

Split at specific column indices by passing a list:

left, right = np.hsplit(arr, [2])  # Split at column 2
print(left)   # [[ 0  1] [ 4  5] [ 8  9]]
print(right)  # [[ 2  3] [ 6  7] [10 11]]

Use vsplit() for vertical splits (along rows):

top, bottom = np.vsplit(arr, [2])  # Split at row 2
print(top)     # [[0 1 2 3] [4 5 6 7]]
print(bottom)  # [[ 8  9 10 11]]
Function Description Axis
vstack Stack arrays vertically (add rows) 0
hstack Stack arrays horizontally (add columns) 1
vsplit Split array vertically (into row groups) 0
hsplit Split array horizontally (into column groups) 1

Practice Questions: Reshaping & Manipulation

Test your array reshaping skills with these exercises.

Task: Create a 1D array from 1 to 12, then reshape it into a 3x4 matrix.

Show Solution
arr = np.arange(1, 13)
matrix = arr.reshape(3, 4)
print(matrix)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]

Given:

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

Task: Stack them vertically to get a 4x2 array.

Show Solution
result = np.vstack([a, b])
print(result)
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

Given:

arr = np.arange(6).reshape(2, 3)

Task: Transpose it and then flatten to a 1D array.

Expected output: [0 3 1 4 2 5]

Show Solution
arr = np.arange(6).reshape(2, 3)
result = arr.T.flatten()
print(result)  # [0 3 1 4 2 5]

Key Takeaways

NumPy is Essential

NumPy is the foundation of data science in Python. It provides fast, memory-efficient arrays that power pandas, scikit-learn, and more

50x+ Faster

Arrays are 50x+ faster than Python lists for numerical operations because they're implemented in C with contiguous memory

Create Arrays

Use np.array(), np.zeros(), np.ones(), np.arange(), and np.linspace() to create arrays

Key Attributes

shape (dimensions), ndim (axes count), dtype (data type), size (total elements)

Powerful Indexing

Use arr[row, col] for 2D arrays. Boolean indexing (arr[arr > 5]) selects elements matching conditions

Views vs Copies

Slices are views, not copies. Use .copy() when you need an independent copy of the data

Knowledge Check

Test your understanding of NumPy arrays:

1 What is the shape of np.zeros((3, 4))?
2 Which function creates an array with 5 evenly spaced values from 0 to 10, including both endpoints?
3 Given arr = np.array([[1,2,3],[4,5,6]]), what does arr[:, 1] return?
4 What happens when you slice a NumPy array (e.g., slice = arr[1:4])?
5 What does np.arange(12).reshape(3, -1) produce?
6 Given arr = np.array([1, 5, 3, 8, 2]), what does arr[arr > 3] return?
0/6 answered