Introduction to Numpy

Numpy is the numerical computing library in Python, that provides wonderful support for creating and manipulating arrays.

import numpy as np

Introduction to Numpy Arrays

Let’s start with creating an array.

x = np.array([1, 2, 3, 4, 5])
x
array([1, 2, 3, 4, 5])

Numpy allows vector operations on arrays. These operations that work on every element of the array.

x + 10
array([11, 12, 13, 14, 15])
x*x
array([ 1,  4,  9, 16, 25])

Multi-dimentional arrays

Numpy supports n-dimentional arrays.

d = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
d
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
d + 10
array([[11, 12, 13, 14],
       [15, 16, 17, 18]])
d*d
array([[ 1,  4,  9, 16],
       [25, 36, 49, 64]])

It supports many mathematical functions to work with 2-d arrays or matrices.

np.transpose(d)
array([[1, 5],
       [2, 6],
       [3, 7],
       [4, 8]])
d1 = np.transpose(d)
np.dot(d, d1)
array([[ 30,  70],
       [ 70, 174]])

It also suppports even higher dimentional arrays, though we may not use to them in this course.

d3 = np.array([
    [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
    [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]])

d3
array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]],

       [[13, 14, 15, 16],
        [17, 18, 19, 20],
        [21, 22, 23, 24]]])

The shape and dtype

Every array has a shape showing the size of the array in each dimention and dtype indicating the datatype of each element in the array.

Please note that all elements of an array will be of the same datatype.

x = np.array([1, 2, 3, 4, 5])
d = np.array([
    [1, 2, 3, 4], 
    [5, 6, 7, 8]])
x.shape
(5,)
d.shape
(2, 4)

The array x is a one dimentional array and d is a two dimentional array.

You may be surprised why x.shape is shown as (5,) instead of (5). In python the parenthesis are used both for grouping and to represent tuples (kind-of read-only lists). The value of (5) is 5 because it is considered as grouping. It is just like (2 + 3). However if we want to represent a tuple of size 1, the only way is to include a comma to force to treat that as a tuple.

x.dtype
dtype('int64')

The elements of x are are 64-bit integers.

x2 = np.array([0.1, 0.2, 0.3])
x2.dtype
dtype('float64')

When we use decimal numbers, it used a dtype of float64.

Creating Arrays

While we can create arrays by giving all the elements, like we did in the example above, it is not practical to create large arrays like that. Numpy has utilities to create arrays.

# create 10 zeros
np.zeros(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
# create 10 ones
np.ones(10)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
# range of numbers from 0 to 1 in steps of 0.1
# please note that the end is not included
np.arange(0, 1, 0.1)
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

The linspace function takes the begin and end and divides that into a number of points. Unlike arange, the result of linspace includes the end.

np.linspace(0, 1, 11)
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
np.linspace(0, 1, 5)
array([0.  , 0.25, 0.5 , 0.75, 1.  ])

Utilties

Numpy has many utilties and mathematical functions.

np.pi
3.141592653589793
# convert degrees to radians
np.radians(90)
1.5707963267948966
angles = np.linspace(0, 360, 13)
angles
array([  0.,  30.,  60.,  90., 120., 150., 180., 210., 240., 270., 300.,
       330., 360.])
angles_in_radians = np.radians(angles)
angles_in_radians
array([0.        , 0.52359878, 1.04719755, 1.57079633, 2.0943951 ,
       2.61799388, 3.14159265, 3.66519143, 4.1887902 , 4.71238898,
       5.23598776, 5.75958653, 6.28318531])
# we can also create the angles in radians
np.linspace(0, 2*np.pi, 13)
array([0.        , 0.52359878, 1.04719755, 1.57079633, 2.0943951 ,
       2.61799388, 3.14159265, 3.66519143, 4.1887902 , 4.71238898,
       5.23598776, 5.75958653, 6.28318531])

Numpy support trigonometric functions as well.

np.sin(angles_in_radians)
array([ 0.00000000e+00,  5.00000000e-01,  8.66025404e-01,  1.00000000e+00,
        8.66025404e-01,  5.00000000e-01,  1.22464680e-16, -5.00000000e-01,
       -8.66025404e-01, -1.00000000e+00, -8.66025404e-01, -5.00000000e-01,
       -2.44929360e-16])
np.cos(angles_in_radians)
array([ 1.00000000e+00,  8.66025404e-01,  5.00000000e-01,  6.12323400e-17,
       -5.00000000e-01, -8.66025404e-01, -1.00000000e+00, -8.66025404e-01,
       -5.00000000e-01, -1.83697020e-16,  5.00000000e-01,  8.66025404e-01,
        1.00000000e+00])

The common mathematical functions like sqrt and abs are available too.

x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.sqrt(x)
array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])
x-5
array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])
x1 = np.abs(x-5)
x1
array([5, 4, 3, 2, 1, 0, 1, 2, 3, 4])
np.sum(x1)
25

Example: Euclidean Distance

Euclidian distance between two vectors is defines as:

\(E(p, q) = \sqrt{\sum_{i=1}^{n}{(p_i-q_i)^2}}\)

Write a function euclidian_distance to compute the euclidian distance between two vectors specified as numpy arrays.

def eucliean_distance(p, q):
    d = p-q
    total = np.sum(d*d)
    return np.sqrt(total)
p = np.array([1.0, 2.0, 3.0])
q = np.array([4.0, 5.0, 6.0])
eucliean_distance(p, q)
5.196152422706632

You can verify that step-by-step.

d = p-q
d
array([-3., -3., -3.])
d*d
array([9., 9., 9.])
total = np.sum(d*d)
total
27.0
np.sqrt(total)
5.196152422706632

Problem: Manhattan Distance

Write a function manhattan_distance to compute the manhattan distance between two vectors.

The manhattan distance is defined as:

\(M(p, q) = \sum_{i=1}^{n}{| p_i - q_i |}\)

For more info see: https://en.wikipedia.org/wiki/Taxicab_geometry

>>> manhanttan_distance(np.array([0,0]), np.array([3, 4]))
7