Before reading this post, make sure you have basic knowledge of Python programming. You should be comfortable with variables, data types, loops, and functions. If you are new to Python, go through a Python basics tutorial first and then come back to this post.
You should also have Python installed on your system. This tutorial works on both VS Code and Google Colab.
If you are working with data in Python, sooner or later you will come across NumPy. Almost every data science and machine learning library in Python - Pandas, TensorFlow, PyTorch, Scikit-learn - is built on top of NumPy. So understanding NumPy properly gives you a very strong foundation for everything else that comes after it.
In this post, we are going to cover NumPy from the basics all the way to advanced concepts - with real code and real output at every step. By the end of this post you will know how to create arrays, perform operations, reshape data, and use advanced techniques like broadcasting and boolean masking.
You can also watch this tutorial in youtube.
What is NumPy?
NumPy stands for Numerical Python. It is an open source library built specifically for numerical computation in Python. The core feature of NumPy is the ndarray - an N-dimensional array - which allows you to store and operate on large amounts of numerical data very efficiently.
The biggest difference between a Python list and a NumPy array is speed and efficiency. A Python list can hold mixed data types - strings, numbers, objects - all in one list. Because of this, Python has to check the type of every element before performing any operation. That takes time.
A NumPy array holds only one data type. So operations are performed directly without any type checking. This makes NumPy significantly faster than regular Python lists for numerical work - sometimes 50 times faster on large datasets.
Installing NumPy
If you are using VS Code, open your terminal and run the following command:
pip install numpy
If you are using Google Colab, NumPy is already installed. You can directly import it.
To import NumPy in your Python file or notebook, always use this line:
import numpy as np
The alias np is a universal convention. Every data scientist and machine learning engineer in the world uses np for NumPy. So follow this same convention in all your projects.
Creating NumPy Arrays
The most basic thing you can do in NumPy is create an array. Let us start with a simple one-dimensional array:
import numpy as np a = np.array([10, 20, 30, 40, 50]) print(a)
Output:
[10 20 30 40 50]
A two-dimensional array - which is like a table with rows and columns - is created like this:
b = np.array([[1, 2, 3], [4, 5, 6]]) print(b)
Output:
[[1 2 3] [4 5 6]]
NumPy also provides shortcut functions to create arrays without typing every value manually:
# Array of all zeros print(np.zeros((3, 3))) # Array of all ones print(np.ones((2, 4))) # Range of numbers with a step print(np.arange(0, 10, 2)) # Evenly spaced numbers between two values print(np.linspace(0, 1, 5)) # Random values between 0 and 1 print(np.random.rand(3, 3))
Output:
[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]] [[1. 1. 1. 1.] [1. 1. 1. 1.]] [0 2 4 6 8] [0. 0.25 0.5 0.75 1. ] [[0.37 0.95 0.73] [0.60 0.15 0.86] [0.71 0.02 0.97]]
Array Properties
Every NumPy array has properties that describe its structure. These are checked constantly when working with real data.
a = np.array([[1, 2, 3], [4, 5, 6]]) print(a.shape) # Dimensions print(a.ndim) # Number of dimensions print(a.size) # Total number of elements print(a.dtype) # Data type of elements
Output:
(2, 3) 2 6 int64
The shape property is especially important. Shape mismatches are the most common source of errors in machine learning code. Always check .shape on your arrays before and after every operation.
Indexing and Slicing
Accessing specific elements or ranges from a NumPy array works as follows:
a = np.array([10, 20, 30, 40, 50]) print(a[0]) # First element print(a[-1]) # Last element print(a[1:4]) # Elements at index 1, 2, 3
Output:
10 50 [20 30 40]
For two-dimensional arrays:
b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) print(b[1, 2]) # Row 1, Column 2 print(b[:, 1]) # All rows, Column 1 print(b[0:2, :]) # First 2 rows, all columns
Output:
6 [2 5 8] [[1 2 3] [4 5 6]]
Math Operations
All basic math operations in NumPy are element-wise by default. This means the operation is applied to each pair of elements automatically - no loop required.
a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) print(a + b) # Addition print(a * b) # Multiplication print(a - b) # Subtraction print(a / b) # Division print(np.dot(a, b)) # Dot product
Output:
[5 7 9] [ 4 10 18] [-3 -3 -3] [0.25 0.4 0.5 ] 32
Broadcasting
Broadcasting is one of the most powerful features of NumPy. It allows operations to be performed between arrays of different shapes without writing any extra code.
For example, adding a single number to every element of a two-dimensional array:
a = np.array([[1, 2, 3], [4, 5, 6]]) print(a + 10)
Output:
[[11 12 13] [14 15 16]]
NumPy automatically stretches the value 10 across every element in the array. This is called broadcasting. It is used in normalisation, feature scaling, and image processing.
Broadcasting also works between arrays of compatible shapes:
a = np.array([[1, 2, 3], [4, 5, 6]]) b = np.array([10, 20, 30]) print(a + b)
Output:
[[11 22 33] [14 25 36]]
Reshaping and Transposing
The shape of an array can be changed using reshape(). The total number of elements must remain the same.
a = np.arange(12) print(a) b = a.reshape(3, 4) print(b)
Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11] [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]]
Transposing - swapping rows and columns - is done with .T:
print(b.T)
Output:
[[ 0 4 8] [ 1 5 9] [ 2 6 10] [ 3 7 11]]
Advanced NumPy
Fancy Indexing
Multiple specific elements can be extracted at once by passing a list of indices:
a = np.array([10, 20, 30, 40, 50]) print(a[[0, 2, 4]])
Output:
[10 30 50]
Boolean Masking
Elements that satisfy a condition can be filtered in a single line:
a = np.array([3, 7, 1, 9, 4, 6]) print(a[a > 5])
Output:
[7 9 6]
Stacking Arrays
a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) print(np.vstack([a, b])) # Vertical stack print(np.hstack([a, b])) # Horizontal stack
Output:
[[1 2 3] [4 5 6]] [1 2 3 4 5 6]
Statistical Functions
a = np.array([[1, 2, 3], [4, 5, 6]]) print(np.mean(a)) # Overall mean print(np.sum(a, axis=0)) # Sum down columns print(np.sum(a, axis=1)) # Sum across rows print(np.max(a)) # Maximum value print(np.min(a)) # Minimum value print(np.std(a)) # Standard deviation
Output:
3.5 [5 7 9] [ 6 15] 6 1 1.707
The axis parameter controls the direction of the operation. axis=0 runs down the rows (column-wise). axis=1 runs across the columns (row-wise).
np.where()
A conditional replacement can be applied across an entire array at once:
a = np.array([3, 7, 1, 9, 4, 6]) result = np.where(a > 5, 'high', 'low') print(result)
Output:
['low' 'high' 'low' 'high' 'low' 'high']
np.unique() and np.sort()
a = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5]) print(np.unique(a)) # Unique values print(np.sort(a)) # Sorted array
Output:
[1 2 3 4 5 6 9] [1 1 2 3 4 5 5 6 9]
Quick Command Reference
np.array()- Create an array from a listnp.zeros()- Array of zerosnp.ones()- Array of onesnp.arange()- Range of values with stepnp.linspace()- Evenly spaced valuesnp.random.rand()- Random values.shape- Dimensions of the array.dtype- Data type of elements.reshape()- Change array shape.T- Transpose the arraynp.dot()- Dot productnp.vstack()- Vertical stacknp.hstack()- Horizontal stacknp.where()- Conditional replacementnp.unique()- Unique valuesnp.sort()- Sort arraynp.mean(),np.sum(),np.std()- Statistical functions
Key Takeaways
- NumPy arrays are faster and more efficient than Python lists for numerical operations
- Always check
.shapewhen debugging - shape mismatches are the most common error - Broadcasting removes the need for loops when performing operations on arrays of different sizes
- Use boolean masking to filter data without writing any loop
- NumPy is the foundation of Pandas, TensorFlow, PyTorch, and Scikit-learn
All the code examples in this post are also covered in detail in the YouTube video. The link is available at the top of this page. If you have any questions, drop them in the comments section below.

0 Comments