Python NumPy Tutorial: Basics to Advanced Guide

Python NumPy Tutorial: Basics to Advanced Guide

Before reading this post, make sure you have basic knowledge of Python programming. You should be comfortable with variables, data types, loops, and functions. If you are new to Python, go through a Python basics tutorial first and then come back to this post.

You should also have Python installed on your system. This tutorial works on both VS Code and Google Colab.

If you are working with data in Python, sooner or later you will come across NumPy. Almost every data science and machine learning library in Python - Pandas, TensorFlow, PyTorch, Scikit-learn - is built on top of NumPy. So understanding NumPy properly gives you a very strong foundation for everything else that comes after it.

In this post, we are going to cover NumPy from the basics all the way to advanced concepts - with real code and real output at every step. By the end of this post you will know how to create arrays, perform operations, reshape data, and use advanced techniques like broadcasting and boolean masking.

You can also watch this tutorial in youtube.

What is NumPy?

NumPy stands for Numerical Python. It is an open source library built specifically for numerical computation in Python. The core feature of NumPy is the ndarray - an N-dimensional array - which allows you to store and operate on large amounts of numerical data very efficiently.

The biggest difference between a Python list and a NumPy array is speed and efficiency. A Python list can hold mixed data types - strings, numbers, objects - all in one list. Because of this, Python has to check the type of every element before performing any operation. That takes time.

A NumPy array holds only one data type. So operations are performed directly without any type checking. This makes NumPy significantly faster than regular Python lists for numerical work - sometimes 50 times faster on large datasets.

Installing NumPy

If you are using VS Code, open your terminal and run the following command:

pip install numpy

If you are using Google Colab, NumPy is already installed. You can directly import it.

To import NumPy in your Python file or notebook, always use this line:

import numpy as np

The alias np is a universal convention. Every data scientist and machine learning engineer in the world uses np for NumPy. So follow this same convention in all your projects.

Creating NumPy Arrays

The most basic thing you can do in NumPy is create an array. Let us start with a simple one-dimensional array:

import numpy as np

a = np.array([10, 20, 30, 40, 50])
print(a)

Output:

[10 20 30 40 50]

A two-dimensional array - which is like a table with rows and columns - is created like this:

b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)

Output:

[[1 2 3]
 [4 5 6]]

NumPy also provides shortcut functions to create arrays without typing every value manually:

# Array of all zeros
print(np.zeros((3, 3)))

# Array of all ones
print(np.ones((2, 4)))

# Range of numbers with a step
print(np.arange(0, 10, 2))

# Evenly spaced numbers between two values
print(np.linspace(0, 1, 5))

# Random values between 0 and 1
print(np.random.rand(3, 3))

Output:

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]

[0 2 4 6 8]

[0.   0.25 0.5  0.75 1.  ]

[[0.37 0.95 0.73]
 [0.60 0.15 0.86]
 [0.71 0.02 0.97]]

Array Properties

Every NumPy array has properties that describe its structure. These are checked constantly when working with real data.

a = np.array([[1, 2, 3], [4, 5, 6]])

print(a.shape)   # Dimensions
print(a.ndim)    # Number of dimensions
print(a.size)    # Total number of elements
print(a.dtype)   # Data type of elements

Output:

(2, 3)
2
6
int64

The shape property is especially important. Shape mismatches are the most common source of errors in machine learning code. Always check .shape on your arrays before and after every operation.

Indexing and Slicing

Accessing specific elements or ranges from a NumPy array works as follows:

a = np.array([10, 20, 30, 40, 50])

print(a[0])     # First element
print(a[-1])    # Last element
print(a[1:4])   # Elements at index 1, 2, 3

Output:

10
50
[20 30 40]

For two-dimensional arrays:

b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(b[1, 2])    # Row 1, Column 2
print(b[:, 1])    # All rows, Column 1
print(b[0:2, :])  # First 2 rows, all columns

Output:

6
[2 5 8]
[[1 2 3]
 [4 5 6]]

Math Operations

All basic math operations in NumPy are element-wise by default. This means the operation is applied to each pair of elements automatically - no loop required.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)       # Addition
print(a * b)       # Multiplication
print(a - b)       # Subtraction
print(a / b)       # Division
print(np.dot(a, b))  # Dot product

Output:

[5 7 9]
[ 4 10 18]
[-3 -3 -3]
[0.25 0.4  0.5 ]
32

Broadcasting

Broadcasting is one of the most powerful features of NumPy. It allows operations to be performed between arrays of different shapes without writing any extra code.

For example, adding a single number to every element of a two-dimensional array:

a = np.array([[1, 2, 3], [4, 5, 6]])
print(a + 10)

Output:

[[11 12 13]
 [14 15 16]]

NumPy automatically stretches the value 10 across every element in the array. This is called broadcasting. It is used in normalisation, feature scaling, and image processing.

Broadcasting also works between arrays of compatible shapes:

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])
print(a + b)

Output:

[[11 22 33]
 [14 25 36]]

Reshaping and Transposing

The shape of an array can be changed using reshape(). The total number of elements must remain the same.

a = np.arange(12)
print(a)

b = a.reshape(3, 4)
print(b)

Output:

[ 0  1  2  3  4  5  6  7  8  9 10 11]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Transposing - swapping rows and columns - is done with .T:

print(b.T)

Output:

[[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]

Advanced NumPy

Fancy Indexing

Multiple specific elements can be extracted at once by passing a list of indices:

a = np.array([10, 20, 30, 40, 50])
print(a[[0, 2, 4]])

Output:

[10 30 50]

Boolean Masking

Elements that satisfy a condition can be filtered in a single line:

a = np.array([3, 7, 1, 9, 4, 6])
print(a[a > 5])

Output:

[7 9 6]

Stacking Arrays

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(np.vstack([a, b]))   # Vertical stack
print(np.hstack([a, b]))   # Horizontal stack

Output:

[[1 2 3]
 [4 5 6]]

[1 2 3 4 5 6]

Statistical Functions

a = np.array([[1, 2, 3], [4, 5, 6]])

print(np.mean(a))           # Overall mean
print(np.sum(a, axis=0))    # Sum down columns
print(np.sum(a, axis=1))    # Sum across rows
print(np.max(a))            # Maximum value
print(np.min(a))            # Minimum value
print(np.std(a))            # Standard deviation

Output:

3.5
[5 7 9]
[ 6 15]
6
1
1.707

The axis parameter controls the direction of the operation. axis=0 runs down the rows (column-wise). axis=1 runs across the columns (row-wise).

np.where()

A conditional replacement can be applied across an entire array at once:

a = np.array([3, 7, 1, 9, 4, 6])
result = np.where(a > 5, 'high', 'low')
print(result)

Output:

['low' 'high' 'low' 'high' 'low' 'high']

np.unique() and np.sort()

a = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5])

print(np.unique(a))   # Unique values
print(np.sort(a))     # Sorted array

Output:

[1 2 3 4 5 6 9]
[1 1 2 3 4 5 5 6 9]

Quick Command Reference

  • np.array() - Create an array from a list
  • np.zeros() - Array of zeros
  • np.ones() - Array of ones
  • np.arange() - Range of values with step
  • np.linspace() - Evenly spaced values
  • np.random.rand() - Random values
  • .shape - Dimensions of the array
  • .dtype - Data type of elements
  • .reshape() - Change array shape
  • .T - Transpose the array
  • np.dot() - Dot product
  • np.vstack() - Vertical stack
  • np.hstack() - Horizontal stack
  • np.where() - Conditional replacement
  • np.unique() - Unique values
  • np.sort() - Sort array
  • np.mean(), np.sum(), np.std() - Statistical functions

Key Takeaways

  • NumPy arrays are faster and more efficient than Python lists for numerical operations
  • Always check .shape when debugging - shape mismatches are the most common error
  • Broadcasting removes the need for loops when performing operations on arrays of different sizes
  • Use boolean masking to filter data without writing any loop
  • NumPy is the foundation of Pandas, TensorFlow, PyTorch, and Scikit-learn

All the code examples in this post are also covered in detail in the YouTube video. The link is available at the top of this page. If you have any questions, drop them in the comments section below.

Post a Comment

0 Comments