Estimating Space-Time Covariance from Finite Sample Sets

Estimating Space-Time Covariance
from Finite Sample Sets
Stephan Weiss
Centre for Signal & Image Processing
Department of Electonic & Electrical Engineering
University of Strathclyde, Glasgow, Scotland, UK
TeWi Seminar, Alpen Adria University, 22 May 2019
Thanks to: I.K. Proudler, J. Pestana, F. Coutts, C. Delaosa
This work is supported by the Physical Sciences Research Council (EPSRC) Grant num-
ber EP/S000631/1 and the MOD University Defence Research Collaboration in Signal
Processing.
1 / 39

Overview Stats ACS Exam ST Sample Cross-Correlation Apps Concl Engage
Presentation Overview
1. Overview;
2. a reminder of statistics background;
3. a reminder on auto- and cross-correlation sequences;
4. mid-talk exam;
5. sample sapce-time covariance matrix;
6. cross-correlation estimation;
7. some results and comparisons;
8. applications: support estimation and eigenvalue perturbation;
9. summary; and
10. a shameless last slide.
2 / 39

Random Signals/ Stochastic Processes
A stochastic process x[n] is characterised by deterministic measures:
◮ the probability density function (PDF), or normalised histogram,
p(x):
p(x) ≥ 0 ∀ x and
∞
−∞
p(x)dx = 1
◮ the PDF’s moments of order l:
∞
−∞
xl
p(x)dx
◮ speciﬁcally, note that the ﬁrst moment l = 1 is the mean µ, and
that the second moment l = 2 is variance σ2 if µ = 0;
◮ the autocorrelation function of the process x[n].
3 / 39

Probability Density Function
◮ Random data can be characterised by its distribution of
amplitude values:
−3−2−10123
0
0.2
0.4
0.6
0.8
(x)d
x
0 10 20 30 40 50 60 70 80 90 100
−3
−2
−1
0
1
2
3
time index n
x[n]
◮ the PDF describes with which probability P amplitude values of
x[n] will fall within a speciﬁc interval [x1 ; x2]:
P(x ∈ [x1 ; x2]) =
x2
x1
p(x)dx
◮ a histogram of the data can be used to estimate the PDF . . . 4 / 39

Probability Density Function Estimation
◮ Histogram estimation based on 103 samples:
−4 −3 −2 −1 0 1 2 3 4
0
50
rel.freq.
sample values x
◮ histogram based on 104 samples:
−4 −3 −2 −1 0 1 2 3 4
0
500
rel.freq.
sample values x
◮ histogram based on 105 samples:
−4 −3 −2 −1 0 1 2 3 4
0
500
rel.freq.
sample values x
◮ for consistent estimates, we need as much data as possible!
5 / 39

Gaussian or Normal Distribution
◮ For the Gaussian or normal PDF, x ∈ N(µ, σ2):
p(x) =
1
√
2πσ
e
−(x−µ)2
2σ2 (1)
◮ mean is µ, variance is σ2;
◮ sketch for x ∈ N(0, 1):
−3 −2 −1 0 1 2 3
0
0.2
0.4
0.6
0.8
p(x)
x
◮ central limit theorem: the sum of arbitrarily distributed processes
converges to a Gaussian PDF;
6 / 39

Uniform Distribution
◮ A uniform distribution has equal probability of amplitude values
within a speciﬁed interval;
◮ e.g. x= rand() in Matlab produces samples x ∈ [0 ; 1] with the
following PDF:
−1 −0.5 0 0.5 1 1.5 2
0
0.5
1
p(x)
x
◮ mean and variance are
µ =
∞
−∞
xp(x)dx =
1
0
xdx =
1
2
x2
1
0
=
1
2
(2)
σ2
=
1
x2
dx − µ2
=
1
x3
1
−
1
=
1
(3) 7 / 39

Other PDFs
◮ PDF of a binary phase shift keying (BPSK) symbol sequence,
which is a type of Bernoulli distribution:
x
p(x)
-1 1
1
2
1
2
◮ PDFs for complex valued signals also exist;
◮ example for the PDF of a quaternary phase shift keying (QPSK)
sequence:
ℜ{x}
p(x)
-1 1
1
4
1
4
1
4
1
4
−j
ℑ{x}
j
8 / 39

Complex Gaussian Distribution
◮ PDF of a complex Gaussian process with independent and
identically distributed (IID) real and imaginary parts:
−3
−2
−1
0
1
2
3
−3
−2
−1
0
1
2
3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
ℜ{x}
ℑ{x}
p(x)
◮ this leads to a circularly-symmetric PDF. 9 / 39

Central Limit Theorem
◮ Theorem: adding arbitarily distributed but independent signals
will, in the limit, tend towards a Gaussian distribution;
◮ example: y[n] = h[n] ∗ x[n], with x[n] a sequence of independent
BPSK symbols:
−1 −0.5 0 0.5 1
0
2
4
6
x 10
4
x
rel.freq.
−1 −0.5 0 0.5 1
0
2000
4000
6000
8000
y
rel.freq.
h[n]
x[n] y[n]
0 5 10 15 20
−0.1
−0.05
0
0.05
0.1
index n
h[n]
◮ the ﬁlter sums diﬀerently weighted independent random processes,
and it does not take many to make the output look Gaussian!
10 / 39

Stationarity and Ergodicity
◮ Stationarity means that the statistical moments of a random
process do not change over time;
◮ a weaker condition is wide-sense stationarity (WSS), i.e. moments
up to second order (mean and variance) are constant over time;
this is suﬃcient unless higher order statistics (HOS) algorithms
are deployed;
◮ a stochastic process is ergodic if the expectation operation can be
replaced by a temporal average,
σ2
xx =
∞
−∞
x2
p(x)dx = E{x[n]x∗
[n]} = lim
N→∞
1
N
N−1
n=0
|x[n]|2
(4)
◮ remember: expectation is an average over an ensemble; a
temporal average is performed over a single ensemble probe!
11 / 39

Sample Size Matters!
◮ When estimating quantities such as PDF, mean or variance, the
estimator should be bias-free, i.e. converge towards the desired
value;
◮ consistency refers to the variability of the estimator around the
asymptotic value;
◮ the more samples, the better the consistency of the estimate;
◮ mean ˆµ and variance ˆσ2 of a uniformly distributed signal:
ˆσ2ˆµ
10
1
10
2
10
3
10
4
10
5
0.2
0.4
0.6
0.8
10
1
10
2
10
3
10
4
10
5
0.04
0.06
0.08
0.1
0.12
0.14
12 / 39

Moving Average (MA) Model / Signal
◮ The PDF does not contain any information on how “correlated”
successive samples are;
◮ consider the following scenario with x[n] ∈ N(0, σ2
xx) being
uncorrelated (successive samples are entirely random):
✲ b[n] ✲
x[n] y[n] = x[n] ∗ b[n]
N(0, σ2
xx) N(0, σ2
yy)
◮ y[n] is called a moving average process (and b[n] an MA model)
of order N − 1 if y[n] =
N−1
ν=0
b[ν]x[n − ν] is a weighted average
over a window of N input samples.
13 / 39

Filtering a Random Signal
◮ Consider lowpass ﬁltering an uncorrelated Gaussian signal x[n]:
✲ h[n] ✲
x[n] y[n] = x[n] ∗ h[n]
N(0, σ2
x) N(0, σ2
y)
0 50 100 150
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
time n
x[n]
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
norm. angular freq. Ω/π
|H(ejΩ
)|
0 50 100 150
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
time n
y[n]
◮ the output will have Gaussian distribution, but the signal only
changes smoothly: neighbouring samples are correlated. We need
a measure!
14 / 39

Auto-Correlation Function I
◮ The correlation between a sample x[n] and a neighbouring value
x[n − τ] is given by
rxx[τ] = E{x[n] · x∗
[n − τ]} = lim
N→∞
1
N
N−1
n=0
x[n] · x∗
[n − τ]
(5)
◮ For two speciﬁc speciﬁc lags τ = −3 (left) and τ = −50 (right),
consider:
0 50 100 150
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
time n
x[n],x[n+3]
0 50 100 150
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
time n
x[n],x[n+50]
◮ the curves on the left look “similar”, the ones on the right
“dissimilar”.
15 / 39

Auto-Correlation Function II
◮ For lag zero, note:
rxx[0] = lim
N→∞
1
N
N−1
n=0
x[n] · x∗
[n] = σ2
x + µ2
x (6)
◮ This value for τ = 0 is the maximum of the auto-correlation
function rxx[τ];
xxr [τ]
τ
◮ large values in the ACF indicate strong correlation, small values
weak correlation;
16 / 39

Auto-Correlation Function III
◮ If a signal has no self-similarity, i.e. it is “completely random”, the
ACF takes the following form:
xxr [τ]
τ
◮ If we take the Fourier transform of rxx[τ], we obtain a ﬂat
spectrum (or a lowpass spectrum for the ACF on slide 16);
◮ due to the presence of all frequency components in a ﬂat
spectrum, a completely random signal is often referred to as
“white noise”.
17 / 39

Power Spectral Density
◮ The power spectral density (PSD), Rxx(ejΩ), deﬁnes the
spectrum of a random signal:
Rxx(ejΩ
) =
∞
τ=−∞
rxx[τ] e−jΩτ
(7)
◮ PSD and ACF form a Fourier pair, rxx[τ] ◦—• Rxx(ejΩ),
therefore
rxx[τ] =
1
2π
π
−π
Rxx(ejΩ
) ejΩτ
dΩ (8)
◮ note that the power of x[n] is (similar to Parseval)
rxx[0] =
1
2π
π
−π
Rxx(ejΩ
) dΩ (= scaled area under PSD)
(9)
18 / 39

Mid-Talk “Exam”
◮ We are given a unit variance, zero mean (µ = 0) signal x[n];
◮ we want to estimate the mean, ˆµ;
◮ Question 1: how does the sample size aﬀect the estimation error
|µ − ˆµ|2?
◮ Question 2: does it matter whether x[n] has a lowpass or
highpass characteristic?
19 / 39

Mean Estimation
◮ Our estimator is simple:
ˆµ =
1
N
N−1
n=0
x[n] ;
◮ the mean of this estimator:
mean{ˆµ} = E{ˆµ} =
1
N
N−1
n=0
E{x[n]} =
1
N
N−1
n=0
µ = µ
◮ hurray — the estimator is unbiased;
◮ for the error, we look towards the variance of the estimator:
var{ˆµ} = E |ˆµ − µ|2
◮ this is going to be a bit trickier . . .
20 / 39

Variance of Mean Estimator
◮ tedious but hopefully rewarding:
var{ˆµ} = E{(ˆµ − µ)(ˆµ − µ)∗
} (10)
= E{ˆµˆµ∗
} − E{ˆµ} µ∗
− µE{ˆµ∗
} + µµ∗
(11)
= E
1
N2
N−1
n=0
x[n]
N−1
ν=0
x∗
[ν] − µµ∗
(12)
=
1
N2
N−1
n=0
n
m=n−N−1
E{x[n]x∗
[n − m]} − µµ∗
(13)
=
1
N2
N−1
n=0
n
m=n−N−1
rxx[τ] − µµ∗
(14)
=
1
N2
N−1
τ=−N+1
(N − |τ|)rxx[τ] − µµ∗
(15)
◮ so, here are the answers!
21 / 39

Space-Time Covariance Matrix
◮ Measurements obtained from M sensors are collected in a
vector x[n] ∈ CM :
xT
[n] = [x1[n] x2[n] . . . xM [n]] ; (16)
◮ with the expectation operator E{·}, the spatial correlation is
captured by R = E x[n]xH[n] ;
◮ for spatial and temporal correlation, we require a space-time
covariance matrix
R[τ] = E x[n]xH
[n − τ] (17)
◮ this space-time covariance matrix contains auto- and
cross-correlation terms, e.g. for M = 2
R[τ] =
E{x1[n]x∗
1[n − τ]} E{x1[n]x∗
2[n − τ]}
E{x2[n]x∗
1[n − τ]} E{x2[n]x∗
2[n − τ]}
(18)
22 / 39

Cross-Spectral Density Matrix
◮ example for a space-time covariance matrix R[τ] ∈ R2×2:
-4 -2 0 2 4
0
5
10
-4 -2 0 2 4
0
5
10
-4 -2 0 2 4
0
5
10
-4 -2 0 2 4
0
5
10
◮ the cross-spectral density (CSD) matrix: R(z) ◦—• R[τ].
23 / 39

Exact Space-Time Covariance Matrix
◮ We assume knowledge of a source model that ties the
measurement vector x[n] to mutually independent, uncorrelated
unit variance signals uℓ[n]:
u1[n] x1[n]
uL[n] xM [n]
H[n]
...
...
◮ then the space time covariance matrix is
R[τ] =
n
H[n]HH
[n − τ] ,
◮ or for the CSD matrix:
R(z) = H(z)HP
(z) .
24 / 39

Biased Estimator
◮ To estimate from ﬁnite data, e.g.
ˆr(biased)
mµ [τ] =



1
N
N−τ−1
n=0
xm[n + τ]x∗
µ[n] , τ ≥ 0 ;
1
N
N+τ−1
n=0
xm[n]x∗
µ[n − τ] , τ < 0 .
(19)
◮ or ˆR
(biased)
mµ (z) = 1
N Xm(z)X∗
µ (z−1) = 1
N Xm(z)XP
µ (z);
◮ for the CSD matrix:
ˆR
(biased)
(z) =
1
N
x(z)xP
(z) . (20)
◮ this is a rank one matrix by deﬁnition!
25 / 39

Unbiased Estimator
◮ True cross-correlation sequence:
rmµ[τ] = E xm[n]x∗
µ[n − τ] . (21)
◮ estimation over a window of N samples:
ˆrmµ[τ] =



1
N−|τ|
N−|τ|−1
n=0
xm[n + τ]x∗
µ[n] , τ ≥ 0
1
N−|τ|
N−|τ|−1
n=0
xm[n]x∗
µ[n − τ] , τ < 0
(22)
◮ check on bias:
mean{ˆrmµ[τ]} = E{ˆrmµ[τ]}
=
1
N − |τ|
N−τ−1
n=0
E xm[n]x∗
µ[n − τ]
=
1
N − |τ|
N−τ−1
n=0
rmµ[τ] = rmµ[τ] .
26 / 39

Variance of Estimate I
◮ The variance is given by
var{ˆrmµ[τ]} = E{(ˆrmµ[τ] − rmµ[τ])(ˆrmµ[τ] − rmµ[τ])∗
}
= E ˆrmµ[τ]ˆr∗
mµ[τ] − E{ˆrmµ[τ]} r∗
mµ[τ]−
− rmµ[τ]E ˆr∗
mµ[τ] + rmµ[τ]r∗
mµ[τ]
= E ˆrmµ[τ]ˆr∗
mµ[τ] − rmµ[τ]r∗
mµ[τ] ; (23)
◮ awkward: fourth order cumulants;
◮ lucky: for real and complex Gaussian signals, the cumulants of
order three and above are zero (Mendel’91, Schreier’10); example:
E xm[n]x∗
µ[n − τ]x∗
m[n]xµ[n − τ] =
E xm[n]x∗
µ[n − τ] · E{x∗
m[n]xµ[n − τ]}
+ E{xm[n]x∗
m[n]} · E x∗
µ[n − τ]xµ[n − τ]
+ E{xm[n]xµ[n − τ]} · E x∗
µ[n − τ]x∗
m[n] .
27 / 39

Variance of Estimate II
◮ Inserting for τ > 0:
var{ˆrmµ[τ]} =
1
(N −|τ|)2
N−|τ|−1
n,ν=0
E xm[n+τ]x∗
µ[n] ·
· E{x∗
m[ν+τ]xµ[ν]} +
+ E{xm[n + τ]x∗
m[ν + τ]} E x∗
µ[n]xµ[ν]
+ E{xm[n + τ]xµ[ν]} E x∗
µ[n]x∗
µ[ν + τ]
− rmµ[τ]r∗
mµ[τ]
=
1
(N −|τ|)2
N−|τ|−1
n,ν=0
(E{xm[n]x∗
m[ν]} ·
·E x∗
µ[n]xµ[ν] +
+ E{xm[n]xµ[ν − τ]} E x∗
m[ν]x∗
µ[n − τ] (24)
◮ the same result can be obtained for τ < 0.
28 / 39

Variance of Estimate III
◮ The ﬁrst term in (24) can be simpliﬁed as
N−|τ|−1
n,ν=0
E{xm[n]x∗
m[ν]} E x∗
µ[n]xµ[ν]
=
N−|τ|−1
n,ν=0
(E{xm[n]x∗
m[n − (n − ν)]} ·
· E x∗
µ[n]xµ[n − (n − ν)]
=
N−|τ|−1
n,ν=0
rmm[n − ν]r∗
µµ[n − ν]
=
N−|τ|−1
t=−N+|τ|+1
(N − |τ| − |t|)rmm[t]r∗
νν[t] .
◮ in the last step, the double sum is resolved to a single one.
29 / 39

Variance of Estimate IV
◮ using the complementary cross-correlation sequence
¯rmµ[τ] = E{xm[n]xµ[n − τ]} , the variance of the sample
cross-correlation sequence becomes
var{ˆrmµ[τ]} =
1
(N −|τ|)2
N−|τ|−1
t=−N+|τ|+1
(N − |τ| − |t|)·
· rmm[t]r∗
µµ[t] + ¯rmµ[τ + t]¯r∗
mµ[τ − t] ; (25)
◮ is this any good? (1) Particularisation to the auto-correlation
sequences matches Kay’91.
◮ (2) If data is temporally uncorrelated, then for the instantaneous
and real case, (25) simpliﬁes to
var{ˆrmµ[0]} =
1
N
rmm[0]rµµ[0] + |rmµ[0]|2
,
◮ this is the variance of the Wishart distribution.
30 / 39

Testing of Result – Real Valued Case
◮ Check for N = 100, results over an ensemble of 104
random data instantiations using a ﬁxed source model:
-50 -40 -30 -20 -10 0 10 20 30 40 50
-4
-2
0
2
4
-50 -40 -30 -20 -10 0 10 20 30 40 50
0
0.2
0.4
0.6
31 / 39

Testing of Result – Complex Valued Case
-50 -40 -30 -20 -10 0 10 20 30 40 50
-2
0
2
4
-50 -40 -30 -20 -10 0 10 20 30 40 50
-2
0
2
4
-50 -40 -30 -20 -10 0 10 20 30 40 50
0
0.2
0.4
32 / 39

Application 1: Optimum Support
◮ When estimating Rτ], we have to trade oﬀ between
truncation and estimation errors:
0 10 20 30 40 50 60 70 80 90 100
10 -2
10 -1
10 0
10 1
33 / 39

Loss of Positive Semi-Deﬁniteness
◮ Example for a auto-correlation sequence:
R(z) = A(z)AP
(z) with A(z) = 1 − ejπ/4
z−1
+ jz−2
◮ R(z) is of order 4; assume ˆR(z) is truncated to order 2;
◮ evaluation on the unit circle (power spectral density):
0 /4 /2 3 /4 5 /4 3 /2 7 /4 2
-2
0
2
4
6
8
10
◮ negative PSD awkward, but noted by Kay & Marple’81.
34 / 39

Application 2: Perturbation of Eigenvalues
◮ CSD matrix R(z) is analytic in z — we know that
there exists an analytic factorisation R(z) = Q(z)Λ(z)QP
(z);
◮ the estimate ˆR(z, ǫ) is analytic in z and differentiable in ǫ, where
ǫ = 1/N is assumed continuous for N ≫ 1;
◮ on the unit circle, ˆΛ(ejΩ, ǫ) is differentiable for a fixed Ω;
◮ however, ˆΛ((ejΩ), ǫ) is not totally differentiable (Kato’80);
example:
0 /2 3 /2 2
0
1
2
3
4
0 /2 3 /2 2
0
1
2
3
4
norm. angular freq. Ω norm. angular freq. Ω
35 / 39

Perturbation of Eigenvalues II
◮ The estimation error can be used to check on the binwise
perturbation of eigenvalues of the CSD matrix:
0 /4 /2 3 /4 5 /4 3 /2 7 /4 2
0
1
2
3
4
5
36 / 39

Perturbation of Eigenspaces
◮ Binwise subspace correlation mismatch between ground truth and
estimate:
0 /4 /2 3 /4 5 /4 3 /2 7 /4 2
10 -4
10 -3
10 -2
10 -1
10 0
37 / 39

Summary
◮ We have considered the estimation of a space-time covariance
matrix;
◮ the variance of the estimator agrees with known results for
auto-correlation sequences (1-d, correlated) and instantaneous
MIMO systems (M-d, uncorrelated);
◮ awkward, and almost forgotten: ˆR[τ] and the estimated PSD are
no longer guaranteed to be positive semi-deﬁnite;
◮ the variance of the estimate can be used to predict the
perturbation of eigenvalues (and eigenspaces);
◮ this however only works bin-wise: the eigenvalues are not totally
diﬀerentiable in both Ω and 1/N.
38 / 39

Engagement
◮ If interested, please feel free
to try the polynomial matrix
toolbox for Matlab:
pevd-toolbox.eee.strath.ac.uk
◮ I have a 2.5 year postdoc position as part of UDRC3:
dimensionality reduction and processing of high-dim.,
heterogeneous and non-traditional signals; see vacancies at the
University of Strathclyde.
39 / 39

Estimating Space-Time Covariance from Finite Sample Sets

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Estimating Space-Time Covariance from Finite Sample Sets

Similar to Estimating Space-Time Covariance from Finite Sample Sets (20)

More from Förderverein Technische Fakultät

More from Förderverein Technische Fakultät (20)

Recently uploaded

Recently uploaded (20)

Estimating Space-Time Covariance from Finite Sample Sets