2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 1 Development of Complex Curricula for Molecular Bionics and Infobionics Programs within a consortial* framework** Consortium leader PETER PAZMANY CATHOLIC UNIVERSITY Consortium members SEMMELWEIS UNIVERSITY, DIALOG CAMPUS PUBLISHER The Project has been realised with the support of the European Union and has been co-financed by the European Social Fund *** **Molekuláris bionika és Infobionika Szakok tananyagának komplex fejlesztése konzorciumi keretben ***A projekt az Európai Unió támogatásával, az Európai Szociális Alap társfinanszírozásával valósul meg. PETER PAZMANY CATHOLIC UNIVERSITY SEMMELWEIS UNIVERSITY sote_logo.jpg dk_fejlec.gif INFOBLOKK 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 2 Peter Pazmany Catholic University Faculty of Information Technology Principal Component Analysis www.itk.ppke.hu Főkomponens analízis András Oláh, Gergely Treplán, Dávid Tisza Digitális-neurális-, éskiloprocesszorosarchitektúrákonalapulójelfeldolgozás Digital-and Neural Based Signal Processing & KiloprocessorArrays Digital Signal Processing : Principal Component Analysis 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 3 Contents • Data compression • Principal Component analysis (PCA or KLT)• Notation • Summary of the algorithm • Problems, extensions • Hebbian PCA (The Oja algorithm) • Experiments • Problems www.itk.ppke.hu 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 4 Examples of well known transformations: • DCT -Discrete Cosine Transformation (JPEG,MPEG) • KLT -Karhunen-Loeve Transformation www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Transformation Information Discarding Only the important segment remains Prioritized representation 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 5 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Notation • Scalars are noted with normal letters • Vectors are noted with bold face lower case. • Matrices are noted with bold face upper case.• Random data vector (e.g. video frame): • Correlation matrix: • Vectors are defined to be column vectors: • ithelement of a vector (scalar): • (i,j)thelement of a matrix: • ithrow of a matix: • jthcolumn of a matix: • Submatrix (eg. first K rows and all columns): • ithelement of a set (eg. a set of eigenvectors) • Expectation operator, if X is a discrete random variable: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 6 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis • Dot product: • Correlation matix: • Diagonal extension operator: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 7 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis • eigenvectors and eigenvalues of • Set of Eigen equations: • Properties of the eigenvectors: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 8 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Basis transformations back and forth to the eigenvectors: • Coder, transforming to the eigenvectors basis: • Decoder, transforming from the eigenvector basis: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 9 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Digital Signal Processing : Principal Component Analysis 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 10 PCA • Suppose we have an element of a data set, which is a grayscale image map now.data frame at time instant k: • Generate the data vectorfrom it, by reading it column wise: x[k] www.itk.ppke.hu C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\heart_columnwise.png C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\heart.png • The data usually can be modeled as a random vector variable x, because we are dealing with processes assuming quasistationarity and ergodicity. E.g. consecutive image frames from a video flow are usually „similar”. • Thus in a time window we examine the corr. matrix of the process: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 11 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis • The data vector random is assumed to have 0 mean. • Compute the eigenvectors and eigenvalues of the correlation matrixRand sort it in an decreasing order (by importance): 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 12 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 13 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis •Eigenvectors point to orthogonal directions in each dimension which tries to capture the independent variance directions of the distribution • Eigenvalues are proportional to the magnitude of variances Eigenvalue scaled eigenvectors:blue arrows Isocurves of the distribution:black dashed contours Sample points of the distribution:red dots C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\pca_variance_plot.png •Eigen vectors correspond to the directions of the standard deviations of the data understood as a sample set of a multidimensional random process. • Eigen values correspond to the magnitude of the standard deviations. • If we use a hypothesisthat this linear quantity (correlation) captures real world “importance” then we can sortthe dimensions along “importance” and • compressby discarding “not really important” dimensions. 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 14 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 15 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis •We can transform the data frame to the new basis system in a lossless fashion: •We want to discard the „unimportant” components to achieve compression: • Instead of using all eigenvectors, just use the first M to get an approximate of y 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 16 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Transformation Information Discarding Only the important segment remains Prioritized representation •The approximate of the transformed data: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 17 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis PCA compression Compression algorithm: Decompression algorithm: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 18 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Compression algorithm Access Line Decompression The PCA is the optimal lossy representation inthe terms of Mean Square Error: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 19 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Minimum because eigenvalues are ordered in a monotone decreasing sequence •The components we discard in the course of compression contribute as much as their corresponding eigenvalues. • The mean square error is the sum of the eigenvalues corresponding to the discarded components. We drop the minimum contribution due to the monoton decreasing order of eigenvalues. 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 20 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis PCA architecture 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 21 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Access Line PCA compression overview We assume that R is given for the process we are dealing with and R remains the same over time. (stationarity) 1. Solve the Eigen problem for Rto get the first M largest eigenvalues and the corresponding eigenvectors based on the quality requirement. 2. Assemble the coding/decoding matrix. 3. Use the coder to encode the data (matrix-vector multiplication), and send through the access line. 4. Decode the data with the decoding matrix. 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 22 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Difficulties of direct implementing the PCA for video compression • Nobody tells us what is R of an underlying video flow. • Rdoes not remain „still” for the whole time •Computing eigenvalues and eigenvectors are computationally complex •Practically the normalization of the variances in the original data set along each dimension before the PCA transformation is useful to get meaningful relative importance values after the transformation. 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 23 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Difficulties of direct implementing the PCA for video compression • Storage problems:• N: dim(x)=640x480=307200 per color channel • NxN: dim(R)=dim(E(xxT))=307200x307200 • If a float (32 bit) used to hold an element of R, •sizeof(R)~350GBper color channel in memory to hold R or an estimate of R 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 24 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Estimation of the correlation matrix • Using the assumed ergodicityof the process and estimate in a time window of length K • From time to time we have to reestimatethe empirical correlation matrix. • Note that we still assume quasi stationarity for a K length time window 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 25 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Overcoming the storage problem -Introducing the kernel based PCA • We are using the fact that was constructed using K outer products, so • Using the basis transform: • We can rewrite: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 26 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Kernel based PCA 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 27 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Kernel based PCA • Storage problems:• N: dim(x)=640x480=307200 per color channel •Sizeof(R)~350GBper color channel in memory to hold R • If window length of K=1000 frames used • Sizeof(G)=4MB per color channel 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 28 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis The Oja algorithm • The Oja algorithm can be another solution to the storage problem and an alternative for the existing effective algorithms for the computation of the EigenValue Decomposition(EVD), which is to be performed from time window to time window • If we assume that we know can we recursively computeassuming it is changing slowly enough? 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 29 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis The Oja algorithm • The Oja algorithm is a nonlinear stochastic difference equation: , proof with Kushner Clark theorem 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 30 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis The Oja algorithm • Lemma: • Proof outline: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 31 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis The Oja algorithm • Using the properties of the Rayleigh quotient, that • Starting from the Rayleigh quotient and using the Newton’s method to find the maximum: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 32 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis The Oja algorithm • We still need the knowledge of R, so we substitute:(Use the simplification discussed at the LMS and at the Robins-Monroe algorithm) • Stability can be proven by the Kushner-Clark thm 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 33 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis The Oja algorithm • Extending the algorithm with the Generalized Hebbian Algorithm one can design a feed-forward neural network to catch all the eigenvectors of the correlation matrix. • Generalized Hebbian Rule: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 34 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis The Oja algorithm with the Generalized Hebbian Algorithm (GHA) • No knowledge of R is needed for the iteration • Can be implemented in a feed-forward neural network structure • For the jthneuron’s ithweight: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 35 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Hebbian PCA • Hebbian learning: • Normalization: • Expanding by Taylor series: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 36 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Experiments •An eigenvalue distribution of a small video stream. • We experimented on two small video streams• The first illustrates a calm aquarium. •The second is an action scene with quick changes. •We will present these sequences with different compression ratios using the Kernel and the Hebbian PCA 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 37 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Experiments •The video snippet consisted of 57 frames, window length of 10 was used. 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 38 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\kiki_eigval_dist.png Experiments 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 39 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Original: With all eigenvectors: With 5 eigenvectors: With 2 eigenvectors: Experiments 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 40 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Original: With all eigenvectors: With 20eigenvectors: With 15 eigenvectors: Experiments Error function: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 41 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis fish_hiba sw_hiba Aquarium Star Wars # eigenvectors # eigenvectors Experiments •Difference between the approximated and the true eigenvectors 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 42 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis hiba_500_20 hiba_100_20 100 iterations 500 iterations # eigenvectors # eigenvectors Problems with solutions 1. A simple eigenvalue problem: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 43 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions 2. An information flow, compressed from time window to time window (e.g. video flow) has the following eigenvalues of the correlation matrix: If we want the mean square error of the PCA compression to be less or equal than 0.01, how many neurons should we implement in the coder ? 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 44 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions • The mean square error is defined as: • So at least 11 neurons must be used to achieve the desired error level 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 45 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions 3. Train the underlying neuron with the Oja algorithm with the following parameters: a) Compute the weights after the 2nditeration: b) Can the following state be stabile? Explain your answer! 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 46 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions a) Compute the weights after the 2nditeration: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 47 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions a) Compute the weights after the 2nditeration: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 48 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions b) Can the following state be stable ? Explain your answer! Let us check if it can be an eigenvector: each and every eigenvector must have a unit norm and form an orthonormal basis with the others. Since this vector has the unit norm it passes this check. Let’s solve the eigen problem to be sure: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 49 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions 4. Let’s assume that the correlation matrix of a weakly stationer process has the eigenvalues and vectors shown in the table below. We want to compress the process with PCA as such that the mean square error must be below the following threshold:a) Give the compressed vector and the restored data, if a realization of the process is: b) Give the mean square error. 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 50 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis C:\home\tisda\itk.ppke.hu\wsn\tamop\Munka_NEURODSP\11_fejezet\pca_variance_plot.png Problems with solutions • First rearrange the table in descending eigenvalue order. • Next we compute how many eigenvectors are neededfor the coder and decoder and construct it. 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 51 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems with solutions • Let us check that the eigenvectors are proper eigenvectors: • Using the coder, we transform the data to the new basis: • Let us compute the restored data: • Let us compute the empirical mean square error: 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 52 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Problems 5. Design an FFNN for the compression layer and use the GHA to train the neurons for two iterations for the 4. problem. 6. Show that if the random process is not centralized (not zero mean) then the first eigenvalue is the mean and that eigenvector points to the direction of the mean. 7. Write a computer program that performs the PCA 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 53 www.itk.ppke.hu Digital Signal Processing : Principal Component Analysis Digital Signal Processing : Principal Component Analysis 2011.10.05.. TÁMOP –4.1.2-08/2/A/KMR-2009-0006 54 Summary • Data compression • Principal Component analysis (PCA or KLT)• Notation • Summary of the algorithm • Problems, extensions • Hebbian PCA (The Oja algorithm) • Experiments • Problems www.itk.ppke.hu