2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

1

Development of Complex Curricula for Molecular Bionics and Infobionics Programs within a consortial* framework**
Consortium leader
PETER PAZMANY CATHOLIC  UNIVERSITY
Consortium members
SEMMELWEIS UNIVERSITY, DIALOG CAMPUS PUBLISHER
The Project has been realised with the support of the European Union and has been co-financed by the European Social Fund ***

**Molekuláris bionika és Infobionika Szakok tananyagának komplex fejlesztése konzorciumi keretben
***A projekt az Európai Unió támogatásával, az Európai Szociális Alap társfinanszírozásával valósul meg.

PETER PAZMANY
CATHOLIC UNIVERSITY

SEMMELWEIS
UNIVERSITY

sote_logo.jpg
dk_fejlec.gif
INFOBLOKK

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

2

Peter Pazmany Catholic University  
Faculty of Information Technology

Principal Component Analysis

www.itk.ppke.hu

Főkomponens analízis

András Oláh, Gergely Treplán, Dávid Tisza

Digitális-neurális-, éskiloprocesszorosarchitektúrákonalapulójelfeldolgozás

Digital-and Neural Based Signal Processing & KiloprocessorArrays


Digital Signal Processing : Principal Component Analysis

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

3

Contents
•
Data compression

•
Principal Component analysis (PCA or KLT)•
Notation

•
Summary of the algorithm

•
Problems, extensions


•
Hebbian PCA (The Oja algorithm)

•
Experiments

•
Problems


www.itk.ppke.hu


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

4

Examples of well known transformations:
•
DCT  -Discrete Cosine Transformation (JPEG,MPEG)

•
KLT  -Karhunen-Loeve Transformation


www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

Transformation

Information

Discarding

Only the important 
segment remains

Prioritized representation


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

5

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

Notation
•
Scalars are noted with normal letters

•
Vectors are noted with bold face lower case.

•
Matrices are noted with bold face upper case.•
Random data vector (e.g. video frame):

•
Correlation matrix:


•
Vectors are defined to be column vectors:


•
ithelement of a vector (scalar): 

•
(i,j)thelement of a matrix:

•
ithrow of a matix:

•
jthcolumn of a matix:

•
Submatrix (eg. first K rows and all columns):

•
ithelement of a set (eg. a set of eigenvectors)


•
Expectation operator, if X is a discrete random variable:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

6

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


•
Dot product:

•
Correlation matix: 


•
Diagonal extension  operator:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

7

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


•
eigenvectors and eigenvalues of


•
Set of Eigen equations:


•
Properties of the eigenvectors:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

8

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Basis transformations back and forth to the eigenvectors:
•
Coder, transforming to the eigenvectors basis:


•
Decoder, transforming from the eigenvector basis: 


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

9

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Digital Signal Processing : Principal Component Analysis

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

10

PCA
•
Suppose we have an element of a data set, which is a grayscale image map now.data frame at time instant k:


•
Generate the data vectorfrom it, by reading it column wise: x[k]


www.itk.ppke.hu

C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\heart_columnwise.png
C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\heart.png

•
The data usually can be modeled as a random vector variable x, because we are dealing with processes assuming quasistationarity and ergodicity. E.g. consecutive image frames from a video flow are usually „similar”.

•
Thus in a time window we examine the corr. matrix of the process:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

11

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


•
The data vector random is assumed to have 0 mean. 

•
Compute the eigenvectors and eigenvalues of the correlation matrixRand sort it in an decreasing order (by importance):


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

12

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

13

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

•Eigenvectors point to orthogonal directions in each dimension which tries to capture the independent variance directions of the distribution

•
Eigenvalues are proportional to the magnitude of variances


Eigenvalue scaled eigenvectors:blue arrows
Isocurves of the distribution:black dashed contours
Sample points of the distribution:red dots

C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\pca_variance_plot.png


•Eigen vectors correspond to the directions of the standard deviations of the data understood as a sample set of a multidimensional random process.

•
Eigen values correspond to the magnitude of the standard deviations.

•
If we use a hypothesisthat this linear quantity (correlation) captures real world “importance” then we can sortthe dimensions along “importance” and 

•
compressby discarding “not really important” dimensions.


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

14

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

15

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


•We can transform the data frame to the new basis system in a lossless fashion:


•We want to discard the „unimportant” components to achieve compression:


•
Instead of using all eigenvectors, just use the first M to get an approximate of y


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

16

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Transformation

Information

Discarding

Only the important 
segment remains

Prioritized representation


•The approximate of the transformed data:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

17

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


PCA compression
Compression algorithm: 
Decompression algorithm:

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

18

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

Compression algorithm

Access Line

Decompression


The PCA is the optimal lossy representation inthe terms of Mean Square Error:

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

19

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Minimum because eigenvalues are ordered in a monotone decreasing sequence


•The components we discard in the course of compression  contribute as much as their corresponding eigenvalues.


•
The mean square error is the sum of the eigenvalues corresponding to the discarded components. We drop the minimum contribution due to the monoton decreasing order of eigenvalues.


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

20

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


PCA architecture

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

21

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

Access Line


PCA compression overview
We assume that R is given for the process we are dealing with and R remains the same over time. (stationarity)
1.
Solve the Eigen problem for Rto get the first M largest eigenvalues and the corresponding eigenvectors based on the quality requirement.

2.
Assemble the coding/decoding matrix.

3.
Use the coder to encode the data (matrix-vector multiplication), and send through the access line.

4.
Decode the data with the decoding matrix.


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

22

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Difficulties of direct implementing the PCA for video compression
•
Nobody tells us what is R of an underlying video flow.

•
Rdoes not remain „still” for the whole time

•Computing eigenvalues and eigenvectors are computationally complex

•Practically the normalization of the variances in the original data set along each dimension before the PCA transformation is useful to get meaningful relative importance values after the transformation.


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

23

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Difficulties of direct implementing the PCA for video compression
•
Storage problems:•
N: dim(x)=640x480=307200 per color channel

•
NxN: dim(R)=dim(E(xxT))=307200x307200

•
If a float (32 bit) used to hold an element of R,

•sizeof(R)~350GBper color channel in memory to hold R or an estimate of R


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

24

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Estimation of the correlation matrix
•
Using the assumed ergodicityof the process and estimate in a time window of length K


•
From time to time we have to reestimatethe empirical correlation matrix.

•
Note that we still assume quasi stationarity for a K length time window


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

25

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Overcoming the storage problem -Introducing the kernel based PCA
•
We are using the fact that was constructed using K outer products, so

•
Using the basis transform:

•
We can rewrite:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

26

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Kernel based PCA

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

27

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Kernel based PCA
•
Storage problems:•
N: dim(x)=640x480=307200 per color channel

•Sizeof(R)~350GBper color channel in memory to hold R

•
If window length of K=1000 frames used


•
Sizeof(G)=4MB per color channel


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

28

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


The Oja algorithm
•
The Oja algorithm can be another solution to the storage problem and an alternative for the existing effective algorithms for the computation of the EigenValue Decomposition(EVD), which is to be performed from time window to time window

•
If we assume that we know           can we recursively computeassuming it is changing slowly enough?


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

29

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


The Oja algorithm


•
The Oja algorithm is a nonlinear stochastic difference equation:


, proof with Kushner Clark theorem

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

30

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


The Oja algorithm
•
Lemma: 


•
Proof outline:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

31

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


The Oja algorithm
•
Using the properties of the Rayleigh quotient, that


•
Starting from the Rayleigh quotient and using the Newton’s method to find the maximum:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

32

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


The Oja algorithm
•
We still need the knowledge of R, so we substitute:(Use the simplification discussed at the LMS and at the Robins-Monroe algorithm)


•
Stability can be proven by the Kushner-Clark thm


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

33

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


The Oja algorithm
•
Extending the algorithm with the Generalized Hebbian Algorithm one can design a feed-forward neural network to catch all the eigenvectors of the correlation matrix.

•
Generalized Hebbian Rule:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

34

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


The Oja algorithm with the Generalized Hebbian Algorithm (GHA)
•
No knowledge of R is needed for the iteration

•
Can be implemented in a feed-forward neural network structure

•
For the jthneuron’s ithweight:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

35

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Hebbian PCA
•
Hebbian learning:

•
Normalization:

•
Expanding by Taylor series:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

36

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Experiments
•An eigenvalue distribution of a small video stream.

•
We experimented on two small video streams•
The first illustrates a calm aquarium.

•The second is an action scene with quick changes.


•We will present these sequences with different compression ratios using the Kernel and the Hebbian PCA


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

37

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Experiments
•The video snippet consisted of 57 frames, window length of 10 was used.


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

38

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\kiki_eigval_dist.png

Experiments


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

39

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Original:


With 


all eigenvectors:

With  


5 eigenvectors:

With 


2 eigenvectors:


Experiments


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

40

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Original:


With all eigenvectors:


With  20eigenvectors:


With  15 eigenvectors:


Experiments
Error function:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

41

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

fish_hiba
sw_hiba

Aquarium


Star Wars


# eigenvectors

# eigenvectors


Experiments
•Difference between the approximated and the true eigenvectors


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

42

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis

hiba_500_20
hiba_100_20

100 iterations


500 iterations


# eigenvectors

# eigenvectors


Problems with solutions
1.
A simple eigenvalue problem:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

43

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
2.
An information flow, compressed from time window to time window (e.g. video flow) has the following eigenvalues of the correlation matrix:  If we want the mean square error of the PCA compression to be less or equal than 0.01, how many neurons should we implement in the coder ?


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

44

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
•
The mean square error is defined as:


•
So at least 11 neurons must be used to achieve the desired error level


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

45

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
3.
Train the underlying neuron with the Oja algorithm with the following parameters:


a)
Compute the weights after the 2nditeration: 

b)
Can the following state be stabile? Explain your answer!


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

46

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
a)
Compute the weights after the 2nditeration:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

47

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
a)
Compute the weights after the 2nditeration:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

48

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
b)
Can the following state be stable ? Explain your answer!


Let us check if it can be an eigenvector: each and every eigenvector must have a unit norm and form an orthonormal basis with the others. Since this vector has the unit norm it passes this check.
Let’s solve the eigen problem to be sure:

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

49

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
4.
Let’s assume that the correlation matrix of a weakly stationer process has the eigenvalues and vectors shown in the table below. We want to compress the process with PCA as such that the mean square error must be below the following threshold:a)
Give the compressed vector and the restored data, if a realization of the process is:

b)
Give the mean square error.


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

50

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\pca_variance_plot.png


Problems with solutions
•
First rearrange the table in descending eigenvalue order.

•
Next we compute how many eigenvectors are neededfor the coder and decoder and construct it.


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

51

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems with solutions
•
Let us check that the eigenvectors are proper eigenvectors:


•
Using the coder, we transform the data to the new basis:


•
Let us compute the restored data:


•
Let us compute the empirical mean square error:


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

52

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Problems
5.
Design an FFNN for the compression layer and use the GHA to train the neurons for two iterations for the 4. problem.

6.
Show that if the random process is not centralized (not zero mean) then the first eigenvalue is the mean and that eigenvector points to the direction of the mean.

7.
Write a computer program that performs the PCA


2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

53

www.itk.ppke.hu

Digital Signal Processing : Principal Component Analysis


Digital Signal Processing : Principal Component Analysis

2011.10.05..

TÁMOP –4.1.2-08/2/A/KMR-2009-0006 

54

Summary
•
Data compression

•
Principal Component analysis (PCA or KLT)•
Notation

•
Summary of the algorithm

•
Problems, extensions


•
Hebbian PCA (The Oja algorithm)

•
Experiments

•
Problems

www.itk.ppke.hu