Introducing Numpy

Caio Warwar

Published Jan 7, 2019

At a glance

NumPy is a famous package among people from data science and related fields, it enables you to handle arrays of data in a easy way by providing a rich set of functions that delivers efficiently what it promises.

How to install

Assuming that you have python and pip installed, open your terminal and type:

pip install numpy

How to use

The exemple bellow shows a simple script that import the NumPy module, create a array, print it and check its type

>>> import numpy as np
>>> x = np.array([1,2,3,4,5])
>>> print(x)
[1 2 3 4 5]
>>> type(x)
<class 'numpy.ndarray'>

You can also use NumPy to create multi-dimensional arrays

>>> x = np.array([(0,1,2,3,4), (5,6,7,8,9)])
>>> print(x)
[[0 1 2 3 4]
[5 6 7 8 9]]
>>> print(x.shape)
(2, 5)

Functions to create many sorts of arrays

#Create a 3x3 array of zeros
>>> zeros = np.zeros((3,3))
>>> print(zeros)
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

#Create a 3x3 array of ones
>>> ones = np.ones((3,3))
>>> print(ones)
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

#Create a 3x3 identity matrix
>>> identity = np.eye(3)
>>> print(identity)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

#Create a random matrix
>>> random = np.random.random(3)
>>> print(random)
[0.09799449 0.48100461 0.25790119]

Reshaping

#Creates an array with 6 positions
>>> x = np.arange(6)
>>> print(x)
[0 1 2 3 4 5]

#Creates an array with 25 positions and make it 5x5
>>> y = np.arange(25).reshape(5,5)
>>> print(y)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

Performing arithmetic operations

>>> x = np.array([10,20,30,40])
>>> y = np.array([1,2,3,4])
>>> z = x - y
>>> print(z)
[ 9 18 27 36]

Slicing

#From the second to de fourth element
>>> x = np.arange(10)
>>> print(x)
[0 1 2 3 4 5 6 7 8 9]
>>> print(x[2:4])
[2 3]

#From the second to the last element
>>> print(x[2:])
[2 3 4 5 6 7 8 9]

#From the first to the third (exclusive)
>>> print(x[:3])
[0 1 2]

#From the first to the last (exclusive)
>>> print(x[:-1])
[0 1 2 3 4 5 6 7 8]

#Assign new values to a slice 
>>> x[:4] = 1
>>> print(x)
[1 1 1 1 4 5 6 7 8 9]

Broadcasting

Broadcast is a technique that makes arithmetic easier.

"NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape..."

Example:

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2.,  4.,  6.])

Now with broadcast:

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2.,  4.,  6.])

More about broadcast can be found in docs.scipy.org, but it is the general idea about the subject.

Vectorize

There are lot of controversy around vectorize, people say that under the hood this technique can take the most out of the processor's cores (even the GPU), others say that there more efficients ways of work with arrays.

Its own documentation says:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

If you google it you'll find a big range of different results when it comes to performance. I've tried in my machine with a variety of different scenarios and could recreate some scenarios that really vectorization performed over the other alternatives but trying with simple cases I not managed achive meaningful results. Regardlees the performance, lets see a example:

import numpy as np

numpyArray = np.array(list([0,1,2,3,4,5,6,7,8,9]))

def check(number):
	if(number % 2 == 0):
		print("Number %s is even" %number)
	else:
		print("Number %s is odd" %number)
	
checkVectorized = np.vectorize(check, otypes=[np.ndarray])
checkVectorized(numpyArray)

It will produce the following result:

Number 0 is even
Number 1 is odd
Number 2 is even
Number 3 is odd
Number 4 is even
Number 5 is odd
Number 6 is even
Number 7 is odd
Number 8 is even
Number 9 is odd

As we can see, vectorized methods simplify the way of handling the parameters, in this case we don't have to do a 'for in' to itarate over the values, just vectorize the function and the elements inside the array will be fetched.

If you got interested about vectorization and performance, check out this link.

Why use Numpy?

One of biggest Numpy's advantage is the speed to perform arithmetic operations. In the exemple bellow we're adding two sets with 1.000.000 of numbers. As you can see Numpy performed more than 100 times faster than the python's regular list.

import numpy as np
import time

size = 1000000

def python_raw_edition():
	t1 = time.time()
	X = range(size)
	Y = range(size)
	Z = [X[i] + Y[i] for i in range(len(X)) ]
	return time.time() - t1

def numpy_edition():
	t1 = time.time()
	X = np.arange(size)
	Y = np.arange(size)
	Z = X + Y
	return time.time() - t1


t1 = python_raw_edition()
t2 = numpy_edition()

print("Without Numpy: "+str(t1)+"\nWith Numpy: "+str(t2))

print("In this case, Numpy is "+str(t1/t2)+" times faster")

The result:

Even though it is a basic exemple, we can aleready realize how fast Numpy handle data. In a real case, with really huge sets of elements, we can save hours of work by using Numpy.

Introducing Numpy

Caio Warwar

At a glance

How to install

How to use

Broadcasting

Vectorize

Why use Numpy?

More articles by Caio Warwar

Others also viewed

Pandas 2.0 + PyArrow : A Game Changer

A Step-by-Step Guide to Data Analysis with Pandas and NumPy: Titanic Dataset Exploration

The NumPy for Data Analysts- Part:- 1

Numpy : Difference between flatten and raveal

Plotting and Data Visualization with Matplotlib

A complete Exploratory Data Analysis guide with Python

Pandas in a nutshell:- the name says it all! gigantic in the world of python

Data Cleaning & Preprocessing

The amazing Anscombe's quartet

Explore content categories

At a glance

How to install

How to use

Broadcasting

Vectorize

Why use Numpy?

More articles by Caio Warwar

A simple example of object detection with haar cascades

Others also viewed

Pandas 2.0 + PyArrow : A Game Changer

A Step-by-Step Guide to Data Analysis with Pandas and NumPy: Titanic Dataset Exploration

The NumPy for Data Analysts- Part:- 1

Numpy : Difference between flatten and raveal

Plotting and Data Visualization with Matplotlib

A complete Exploratory Data Analysis guide with Python

Pandas in a nutshell:- the name says it all! gigantic in the world of python

Data Cleaning & Preprocessing

The amazing Anscombe's quartet

Explore content categories