Introducing Numpy
At a glance
NumPy is a famous package among people from data science and related fields, it enables you to handle arrays of data in a easy way by providing a rich set of functions that delivers efficiently what it promises.
How to install
Assuming that you have python and pip installed, open your terminal and type:
pip install numpy
How to use
The exemple bellow shows a simple script that import the NumPy module, create a array, print it and check its type
>>> import numpy as np
>>> x = np.array([1,2,3,4,5])
>>> print(x)
[1 2 3 4 5]
>>> type(x)
<class 'numpy.ndarray'>
You can also use NumPy to create multi-dimensional arrays
>>> x = np.array([(0,1,2,3,4), (5,6,7,8,9)])
>>> print(x)
[[0 1 2 3 4]
[5 6 7 8 9]]
>>> print(x.shape)
(2, 5)
Functions to create many sorts of arrays
#Create a 3x3 array of zeros
>>> zeros = np.zeros((3,3))
>>> print(zeros)
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
#Create a 3x3 array of ones
>>> ones = np.ones((3,3))
>>> print(ones)
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
#Create a 3x3 identity matrix
>>> identity = np.eye(3)
>>> print(identity)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
#Create a random matrix
>>> random = np.random.random(3)
>>> print(random)
[0.09799449 0.48100461 0.25790119]
Reshaping
#Creates an array with 6 positions
>>> x = np.arange(6)
>>> print(x)
[0 1 2 3 4 5]
#Creates an array with 25 positions and make it 5x5
>>> y = np.arange(25).reshape(5,5)
>>> print(y)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
Performing arithmetic operations
>>> x = np.array([10,20,30,40])
>>> y = np.array([1,2,3,4])
>>> z = x - y
>>> print(z)
[ 9 18 27 36]
Slicing
#From the second to de fourth element
>>> x = np.arange(10)
>>> print(x)
[0 1 2 3 4 5 6 7 8 9]
>>> print(x[2:4])
[2 3]
#From the second to the last element
>>> print(x[2:])
[2 3 4 5 6 7 8 9]
#From the first to the third (exclusive)
>>> print(x[:3])
[0 1 2]
#From the first to the last (exclusive)
>>> print(x[:-1])
[0 1 2 3 4 5 6 7 8]
#Assign new values to a slice
>>> x[:4] = 1
>>> print(x)
[1 1 1 1 4 5 6 7 8 9]
Broadcasting
Broadcast is a technique that makes arithmetic easier.
"NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape..."
Example:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2., 4., 6.])
Now with broadcast:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2., 4., 6.])
More about broadcast can be found in docs.scipy.org, but it is the general idea about the subject.
Vectorize
There are lot of controversy around vectorize, people say that under the hood this technique can take the most out of the processor's cores (even the GPU), others say that there more efficients ways of work with arrays.
Its own documentation says:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
If you google it you'll find a big range of different results when it comes to performance. I've tried in my machine with a variety of different scenarios and could recreate some scenarios that really vectorization performed over the other alternatives but trying with simple cases I not managed achive meaningful results. Regardlees the performance, lets see a example:
import numpy as np
numpyArray = np.array(list([0,1,2,3,4,5,6,7,8,9]))
def check(number):
if(number % 2 == 0):
print("Number %s is even" %number)
else:
print("Number %s is odd" %number)
checkVectorized = np.vectorize(check, otypes=[np.ndarray])
checkVectorized(numpyArray)
It will produce the following result:
Number 0 is even
Number 1 is odd
Number 2 is even
Number 3 is odd
Number 4 is even
Number 5 is odd
Number 6 is even
Number 7 is odd
Number 8 is even
Number 9 is odd
As we can see, vectorized methods simplify the way of handling the parameters, in this case we don't have to do a 'for in' to itarate over the values, just vectorize the function and the elements inside the array will be fetched.
If you got interested about vectorization and performance, check out this link.
Why use Numpy?
One of biggest Numpy's advantage is the speed to perform arithmetic operations. In the exemple bellow we're adding two sets with 1.000.000 of numbers. As you can see Numpy performed more than 100 times faster than the python's regular list.
import numpy as np
import time
size = 1000000
def python_raw_edition():
t1 = time.time()
X = range(size)
Y = range(size)
Z = [X[i] + Y[i] for i in range(len(X)) ]
return time.time() - t1
def numpy_edition():
t1 = time.time()
X = np.arange(size)
Y = np.arange(size)
Z = X + Y
return time.time() - t1
t1 = python_raw_edition()
t2 = numpy_edition()
print("Without Numpy: "+str(t1)+"\nWith Numpy: "+str(t2))
print("In this case, Numpy is "+str(t1/t2)+" times faster")
The result:
Even though it is a basic exemple, we can aleready realize how fast Numpy handle data. In a real case, with really huge sets of elements, we can save hours of work by using Numpy.