The Numpy Arrays
NumPy arrays is the core of nearly the entire ecosystem of data science tools in Python, Effective data-driven science and computation requires understanding how data is stored and manipulated.
The Basics of NumPy Arrays
Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):
In[2]: print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
Another useful attribute is the dtype, the data type of the array
In[3]: print("dtype:", x3.dtype)
dtype: int64
Other attributes include itemsize, which lists the size (in bytes) of each array element, and nbytes, which lists the total size (in bytes) of the array:
In[4]: print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")
itemsize: 8 bytes
nbytes: 480 bytes
Array Indexing: Accessing Single Elements
If you are familiar with Python’s standard list indexing, indexing in NumPy will feel quite familiar.
In[5]: x1
Out[5]: array([5, 0, 3, 3, 7, 9])
In[7]: x1[4]
Out[7]: 7
To index from the end of the array, you can use negative indices:
In[8]: x1[-1]
Out[8]: 9
In a multidimensional array, you access items using a comma-separated tuple of indices:
In[10]: x2
Out[10]: array([[3, 5, 2, 4],[7, 6, 8, 8],[1, 6, 7, 7]])
In[11]: x2[0, 0]
Out[11]: 3
Array Slicing: Accessing Sub-arrays
Just as we can use square brackets to access individual array elements, we can also use them to access sub-arrays with the slice notation, marked by the colon (:) character.
In[16]: x = np.arange(10)
x
Out[16]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In[17]: x[:5] # first five elements
Out[17]: array([0, 1, 2, 3, 4])
Reshaping of Arrays
Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape() method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following:
In[38]: grid = np.arange(1, 10).reshape((3, 3))
print(grid)
[[1 2 3]
[4 5 6]
[7 8 9]]
CONCATENATION OF ARRAYS
Concatenation, or joining of two arrays in NumPy, is primarily accomplished through the routines np.concatenate, np.vstack, and np.hstack. np.concatenate takes a tuple or list of arrays as its first argument, as we can see here:
In[43]: x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
Out[43]: array([1, 2, 3, 3, 2, 1])
SPLITTING OF ARRAYS
The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:
In[50]: x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
[1 2 3] [99 99] [3 2 1]