1. Independent Component Analysis

2. The Origin of ICA: Factor Analysis

• Multivariate data are often thought to be indirect measurements
arising from some underlying sources, which cannot be directly
• Examples
– Educational and psychological tests use the answers to questionnaires
to measure the underlying intelligence and other mental abilities of
– EEG brain scans measure the neuronal activity in various parts of the
brain indirectly via electromagnetic signals recorded at sensors placed
at various positions on the head.
• Factor analysis is a classical technique developed in statistical
literature that aims at identifying these latent sources.
• Independent component analysis (ICA) is a kind of factor analysis
that can uniquely identify the latent variables.

3. Latent Variables and Factor Analysis

X 1 a11S1 a12 S 2 a1 p S p
Latent variable model:
X 2 a21S1 a22 S 2 a2 p S p
X p a p1S1 a p 2 S 2 a pp S p
Observed variable
Latent components
Mixing matrix
Factor analysis attempts to find out both the mixing coefficients and the
latent components given some instances of observed variables

4. Latent Variables and Factor Analysis…

Typically we require the latent variables to have unit variance and to be uncorrelated.
Thus, in the following model, cov(S) = I.
This representation has an ambiguity. Consider, for example an orthogonal matrix R:
cov( S * ) R cov( S ) RT RIR T RR T I
X A* S * is also a factor model with unit variance, uncorrelated latent variables.
Classical factor analysis cannot remove this ambiguity; ICA can remove this ambiguity.

5. Classical Factor Analysis

X 1 a11S1 a12 S 2 a1q S p 1
X 2 a21S1 a22 S 2 a2 q S p 2
X p a p1S1 a p 2 S 2 a pq S p p
’s are zero mean, uncorrelated Gaussian noise.
q < p, i.e., the number of underlying latent factor is assumed less than
the number of observed components.
Diagonal matrix
The covariance matrix takes this form:
Maximum likelihood estimation is used to estimate A.
However, still the previous problem of ambiguity remains here too…

6. Independent Component Analysis

• Step 1: Center data:
x ; x
i 1
(x i x)
• Step 2: Whiten data: compute SVD of the centered data
[x1 x N ] UDV T ; x i UD 1/ 2U T x i
– After whitening in the factor model, X AS the covariance of
x, cov(x) = I, and A become orthogonal
• Step 3: Find out orthogonal A and unit variance, nonGaussian and independent S

7. Example: PCA and ICA

x1 a11 a12 s1
x a
2 21 a22 s2
Blind source separation (cocktail party problem)

8. PCA vs. ICA

1. Find projections to
reconstruction error
Variance of projected
data is as large as
2. 2nd-order statistics
needed (cov(x))
1. Find “interesting”
– Projected data look as
independent as possible
2. Higher-order statistics
needed to measure
degree of

9. Computing ICA

Step 3: Find out orthogonal A and unit variance, non-Gaussian and independent S.
The computational approaches are mostly based on information theoretic criterion.
• Kullback-Leibler (KL) divergence
• Negentropy
Another different approach emerged recently is called “Product Density Approach”

10. ICA: KL Divergence Criterion

• x is zero-mean and whitened
• KL divergence measures “distance” between
two probability densities
– Find A such that KL(.) is minimized:
Joint density
Independent density
H is differential entropy: H ( y )
f ( y ) log( f ( y )) dy E[log( f ( y ))]

11. ICA: KL Divergence Criterion…

• Theorem for random variable transformation says:
Minimize with respect to orthogonal A

12. ICA: Negentropy Criterion

• Differential entropy H(.) is not invariant to scaling
of variable
• Negentropy is a scale-normalized version of H(.):
• Negentropy measures the departure of a r.v. s
from a Gaussian r.v. with same variance
• Optimization criterion:

13. ICA: Negentropy Criterion…

• Approximate the negentropy from data by:
• FastICA
( is
based on negentropy. Free software in Matlab,
C++, Python…

14. ICA Filter Bank for Image Processing

An image patch is modeled as a weighted sum of basis images (basis functions):
Image patch
Basis functions (a.k.a. ICA filter bank)
x [a1 a 2 a N ]s As
s A 1x AT x
Rows of AT are filters
Columns of A are filters
Filter responses
15. Texture and ICA Filter Bank

Training textures
12x12 ICA basis functions or ICA filters
16. Segmentation By ICA FB

ICA Filter Bank
With n filters
Image, I
I1, I2,…, In
Segmented image, C
are filter
Above is an unsupervised setting.
Segmentation (i.e., classification in this context) can also be performed by
a supervised method on the output feature images I1, I2 , …, In.
A texture image
17. On PCA and ICA

• PCA & ICA differ in choosing projection
– Different principle: least-square (PCA),
independence (ICA)
• For data compression, PCA would be a
good choice
• For discovering structures of data, ICA
would be a reasonable choice
