Let us consider four two-dimensional data points x1 = (1,1), x2 = (1, -1), x3 = (-1, -1), and x4 = (-1,1). Use PCA to project these four data points into an one-dimensional space.

In [1]:
import numpy

Lets represent the data points as columns in a matrix X

In [2]:
X = numpy.zeros((2, 4), dtype=float)
X[:, 0] = numpy.array([1, 1])
X[:, 1] = numpy.array([1, -1])
X[:, 2] = numpy.array([-1, -1])
X[:, 3] = numpy.array([-1, 1])
In [3]:
print X
[[ 1.  1. -1. -1.]
 [ 1. -1. -1.  1.]]

Now lets compute the covariance matrix S.

In [19]:
mean = numpy.mean(X, axis=1)
In [20]:
mean
Out[20]:
array([ 0.,  0.])
In [23]:
Y = numpy.zeros((2,4), dtype=float)
for i in range(4):
    Y[:,i] = X[:,i] - mean
    
In [24]:
print Y
[[ 1.  1. -1. -1.]
 [ 1. -1. -1.  1.]]
In [44]:
S = numpy.zeros((2,2), dtype=float)
for i in range(4):
    S = S + numpy.outer(Y[:,i], Y[:,i].T)
In [45]:
S
Out[45]:
array([[ 4.,  0.],
       [ 0.,  4.]])
In [47]:
S = 0.25 * S
In [48]:
S
Out[48]:
array([[ 1.,  0.],
       [ 0.,  1.]])
In [50]:
numpy.linalg.eig(S)
Out[50]:
(array([ 1.,  1.]), array([[ 1.,  0.],
        [ 0.,  1.]]))

We have two eigenvectors for eigenvalues 1 and 1. We can select any one of the two here as there is no unique max. Lets select the first eigenvector p = (1, 0). The projection points will be given by the inner-product of p and each of x1, x2, x3, and x4 as follows.

In [52]:
p = numpy.array([1, 0])
Z = numpy.dot(X.T, p)
In [53]:
print Z
[ 1.  1. -1. -1.]

Therefore, the x1 and x2 will be projected to (1,0) and x3 and x4 will be projected to (-1, 0).

In [ ]: