ppt0320defenseday

Ph.D. dissertation presentation
Empirical properties of functional regression models and
application to high-frequency ﬁnancial data
Xi Zhang
Department of Mathematics and Statistics
Utah State University
March 20, 2013
1 Xi Zhang | March 20, 2013 1 / 48

Ph.D. dissertation presentation | Introduction
Outline
1 Introduction
Functional data analysis
High-frequency ﬁnancial data sets
2 Empirical properties of forecasts with the functional autoregressive model
3 Functional prediction of intraday cumulative returns
4 Functional multifactor regression for intraday price curves
5 Summary and Conclusions
2 Xi Zhang | March 20, 2013 2 / 48

Ph.D. dissertation presentation | Introduction | Functional data analysis
Functional Data Analysis(FDA)
It analyzes data providing information about curves, surfaces or anything else
varying over a continuum (time, spatial location, wavelength, probability, etc).
The core idea is that curves should be treated as individual and complete
statistical objects, rather than as collections of individual observations.
Statistical tools of FDA typically rely on some form of smoothing to transform
high dimensional or incomplete data building up a curve into a smoother curve
that can be described by a smaller number of parameters.
The inherent complexity of FDA makes it impossible in a meaningful way to
estimate the “distribution” of a random function, or to ﬁnd estimates that could
converge in a reasonable rate, which indicates that the properties of the FPCA are
of great importance in FDA.
3 Xi Zhang | March 20, 2013 3 / 48

Ph.D. dissertation presentation | Introduction | High-frequency ﬁnancial data sets
8 years price process
4 Xi Zhang | March 20, 2013 4 / 48

Cumulative Intraday returns
Definition
Suppose Pn(tj ), n = 1, . . . , N, j = 1, . . . , m is the price of a financial asset at time tj
on day n. The functions
rn(tj ) = 100[ln Pn(tj ) − ln Pn(t1)], j = 2, . . . , m, n = 1, . . . , N,
are defined as the intraday cumulative returns (CIDR’s/ IDCR’s).
The above definition implicitly assumes that tj+1 > tj .
we work with one minute averages, so tj+1 − tj = 1 min, and P(tj ) is the average of the
maximum and minimum price within the jth minute.
5 Xi Zhang | March 20, 2013 5 / 48

Cumulative Intraday returns
6 Xi Zhang | March 20, 2013 6 / 48

Five days closer look
7 Xi Zhang | March 20, 2013 7 / 48

Why CIDR’s/IDCR’s?
Similar to curves of the price Pn(tj ) for a trading day n which are of high interest
by stock investors
Give more relevant information by showing how the return changes during a
trading day
Can be treated as continuous curves, one curve per day, adapted to functional data
8 Xi Zhang | March 20, 2013 8 / 48

High frequency returns
9 Xi Zhang | March 20, 2013 9 / 48

Ph.D. dissertation presentation | Empirical properties of forecasts with the functional autoregressive model
Outline
1 Introduction
Introduction
Simulation study
Results
10 Xi Zhang | March 20, 2013 10 / 48

Ph.D. dissertation presentation | Empirical properties of forecasts with the functional autoregressive model | Introduction
Functional Autoregressive model(FAR)
FAR(1) model
Xn+1 = Ψ(Xn) + εn+1,
where errors εn and the observations Xn are curves, and the operator Ψ acting on a
function X is deﬁned as
Ψ(X)(t) = ψ(t, s)X(s)ds,
where ψ(t, s) is a bivariate kernel assumed to satisfy ||Ψ|| < 1, where
||Ψ||2
= ψ2
(t, s)dtds. (1)
The condition ||Ψ|| < 1 ensures the existence of a stationary causal solution to FAR(1)
equations.
11 Xi Zhang | March 20, 2013 11 / 48

Methods
Bosq (2000) advocated a standard method by estimating the operator Ψ and
forecasting Xn+1 by ˆΨ(Xn). (Estimated Kernel (EK))
The empirical version of bivariate kernel ψ:
ˆψp(t, s) =
p
k, =1
ˆψk ˆvk (t)ˆv (s), (2)
where
ˆψji = ˆλ−1
i (N − 1)−1
N−1
n=1
Xn, ˆvi Xn+1, ˆvj . (3)
where ˆvk , k = 1, 2, . . . , p, the estimated (or empirical) FPC’s (EFPC’s).p is the
number of EFPC’s.
Kargin and Onatski (2008) proposed a sophisticated method: one step ahead
prediction in FAR(1) model based on predictive factors. (Predictive Factors (PF))
12 Xi Zhang | March 20, 2013 12 / 48

Objective
Is the method of Predictive Factors (PF) superior in ﬁnite samples to the Estimated
Kernel (EK)?
13 Xi Zhang | March 20, 2013 13 / 48

Ph.D. dissertation presentation | Empirical properties of forecasts with the functional autoregressive model | Simulation study
Data generating process
FAR(1) model
Xn+1(t) =
1
0
ψ(t, s)Xn(s)ds + εn+1(t), n = 1, 2, . . . , N.
Three error processes
Brownian bridges
ε(1)
(t) = BB(t)
ε(2)
(t) = ξ1
√
2 sin(2πt) +
√
λ
√
2ξ2 cos(2πt) ,
where ξ1 and ξ2 are independent standard normals, λ can be any constant (in the
simulations we use λ = 0.5).
ε(3)
(t) = ε(2)
(t) + aε(1)
(t) ,
14 Xi Zhang | March 20, 2013 14 / 48

Kernels
Four kernels (deﬁned for (t, s) ∈ [0, 1]2
):
Gaussian : ψ(t, s) = C exp −(t2
+ s2
)/2 ,
Identity : ψ(t, s) = C,
Sloping plane (t) : ψ(t, s) = Ct,
Sloping plane (s) : ψ(t, s) = Cs.
C are chosen such that ||Ψ|| = 0.5 or ||Ψ|| = 0.8.
15 Xi Zhang | March 20, 2013 15 / 48

Measures of quality of prediction
Quantities:
En =
1
0
Xn(t) − ˆXn(t)
2
dt and Rn =
1
0
Xn(t) − ˆXn(t) dt.
are used to measure the prediction error at time n.
16 Xi Zhang | March 20, 2013 16 / 48

Ph.D. dissertation presentation | Empirical properties of forecasts with the functional autoregressive model | Results
Comparison of ﬁve prediction methods
MP Mean Prediction ˆXn+1(t) = 0.
NP Naive Prediction ˆXn+1 = Xn.
EX Exact ˆXn+1 = Ψ(Xn).
EK Estimated Kernel.
EKI Estimated Kernel Improved, using ˆλi + ˆb instead of ˆλi .
PF Predictive Factors.
17 Xi Zhang | March 20, 2013 17 / 48

Boxplots of the prediction errors ||Ψ|| = 0.5
En (left) and Rn (right); innovations: ε(1)
, kernel: sloping plane (t), N = 100, p = 3.
18 Xi Zhang | March 20, 2013 18 / 48

Conclusions
Based on all 32 sets of boxplots and 32 sets of tables, we report:
Taking the autoregressive structure into account reduces prediction errors.
None of the Methods EX, EK, EKI uniformly dominates the other. In most cases
method EK is the best, or at least as good as the others.
In some cases, method PF performs visibly worse than the other methods, but
always better than NP.
Using the improved estimation does not generally reduce prediction errors.
19 Xi Zhang | March 20, 2013 19 / 48

Ph.D. dissertation presentation | Functional prediction of intraday cumulative returns
Outline
1 Introduction
Introduction
Methods and models
Application to US stocks
Results
20 Xi Zhang | March 20, 2013 20 / 48

Ph.D. dissertation presentation | Functional prediction of intraday cumulative returns | Introduction
Capital Asset Pricing Model(CAPM)
The simplest form of celebrated Capital Asset Pricing Model(CAPM):
rn = α + βrm,n + εn (4)
where
rn = 100(ln Pn − ln Pn−1) ≈ 100
Pn − Pn−1
Pn−1
(5)
is the return, in percent, over a unit of time on a speciﬁc asset, e.g. a stock, and rm,n is
the analogously deﬁned return on a relevant market index.
21 Xi Zhang | March 20, 2013 21 / 48

Ph.D. dissertation presentation | Functional prediction of intraday cumulative returns | Introduction
Objective
Model the relationship between the IDCR’s curves for a single asset and those for
a market index
Evaluate their relevance by comparing their predictive power
22 Xi Zhang | March 20, 2013 22 / 48

Ph.D. dissertation presentation | Functional prediction of intraday cumulative returns | Methods and models
Simple Functional CAPM (SF)
A simple functional CAPM is deﬁned as
Yn(t) = α + ψXn(t) + εn(t), t ∈ [0, 1]. (6)
A model without the intercept (α ≡ 0), denoted SF*, is also considered.
23 Xi Zhang | March 20, 2013 23 / 48

Fully Functional CAPM (FF)
This model is deﬁned by the relation
Yn(t) = α(t) + ψ(t, s)Xn(s)ds + εn(t), t ∈ [0, 1]. (7)
If α ≡ 0, this model is denoted FF*.
24 Xi Zhang | March 20, 2013 24 / 48

Functional CAPM with dependent errors
This model is deﬁned by 6, but the errors are assumed to follow a functional
autoregressive process of order 1, FAR(1) process:
εn(t) = ϕ(t, s)εn−1(s)ds + wn(t), (8)
where the wn are iid mean zero random functions.
Fully Functional CAPM with dependent errors (FFDE). This model is deﬁned by 7
with errors which follow the FAR(1) process. When doing prediction, this model fails,
because kernel operators ϕ(t, s) and ψ(t, s) cannot commute.
25 Xi Zhang | March 20, 2013 25 / 48

Problems seek to solve
Can a simpler model with a scalar coeﬃcient give predictions as good as a model
with a kernel coeﬃcient?
Does including an intercept improve predictions, or does this extra parameter
actually make them worse?
Does modeling error correlation lead to improved predictions?
26 Xi Zhang | March 20, 2013 26 / 48

Estimation of regression parameters
All calculations have been performed in the R package fda.
The cumulative returns in one minute resolution are converted to functional
objects.
99 Fourier basis functions are used.
Empirical functional principal components (EFPC’s) ˆv1, . . . , ˆvp of the data are
computed.
27 Xi Zhang | March 20, 2013 27 / 48

Evaluate the quality of prediction
The integrated mean squared error deﬁned as
MSEP(N) = N−1
N
n=1
(Yn(t) − ˆYn(t))2
dt. (9)
28 Xi Zhang | March 20, 2013 28 / 48

Ph.D. dissertation presentation | Functional prediction of intraday cumulative returns | Application to US stocks
Data preparation
10 large U.S. corporations in ﬁve sectors
Standard & Poor’s 100 index representing market index
1000–day long periods: 01/03/2000 to 02/22/2006 without obvious outliers
29 Xi Zhang | March 20, 2013 29 / 48

Ph.D. dissertation presentation | Functional prediction of intraday cumulative returns | Application to US stocks
Description of 10 Stocks representing ﬁve sectors
Sector Stocks Full Name 1000 days period
Energy
XOM Exxon Mobil 05/25/2000-05/19/2004
CVX Chevron
10/10/2001-07/23/2004
12/13/2004-02/22/2006
Information MSFT Microsoft 05/25/2000-05/19/2004
Technology IBM IBM 01/03/2000-12/24/2003
Financial
CITI Citi Bank 10/17/2000-03/07/2005
BOA Bank of America 03/13/2001-12/19/2005
Consumer KO Coca-Cola 05/25/2000-05/19/2004
Staples WMT Wal-Mart Stores 05/25/2000-05/19/2004
Consumer MCD McDonald’s 10/17/2000-03/07/2005
Discretionary DIS The Walt Disney 05/25/2000-05/19/2004
30 Xi Zhang | March 20, 2013 30 / 48

Ph.D. dissertation presentation | Functional prediction of intraday cumulative returns | Results
Prediction results (1)
31 Xi Zhang | March 20, 2013 31 / 48

Prediction results (2)
32 Xi Zhang | March 20, 2013 32 / 48

Conclusions
Models with intercept, i.e. SF and FF, make better prediction than models
without intercept i.e. SF* and FF*. The latter should not be used.
Modeling error dependence with a functional AR(1) model does not improve
MSEP’s.
The two models with intercept, i.e. SF and FF, do NOT dominate each other.
They have almost the same MSEP’s.
SF model is recommended if minimizing the MSEP is the only concern. It is
intuitive, its estimation is straightforward, and the prediction equation is very
simple.
33 Xi Zhang | March 20, 2013 33 / 48

Ph.D. dissertation presentation | Functional multifactor regression for intraday price curves
Outline
1 Introduction
Motivation
Methods and models
Application to U.S. stocks
results
34 Xi Zhang | March 20, 2013 34 / 48

Ph.D. dissertation presentation | Functional multifactor regression for intraday price curves | Motivation
Objective
Whether adding additional factors beyond IDCR’s/CIDR’s on a market index are
statistically signiﬁcant and whether they lead to improved predictions?
35 Xi Zhang | March 20, 2013 35 / 48

Ph.D. dissertation presentation | Functional multifactor regression for intraday price curves | Methods and models
A general factor model
Factor model
Rn(t) = β0(t) +
p
j=1
βj Fnj (t) + εn(t). (10)
The parameters of the model are the mean function β0(·) and the vector of the
coeﬃcients:
β = [β1, . . . , βp]T
.
36 Xi Zhang | March 20, 2013 36 / 48

Parameter Estimation
The mean function is estimated by
ˆβ0(t) = ¯R(t) −
p
j=1
ˆβj
¯Fj (t), (11)
The method of moments estimator of β is
ˆβ = ˆF
−1
ˆR, (12)
where
ˆF = N−1
N
n=1
Fc
nj , Fc
nk , j, k = 1, 2, . . . , p (p × p), (13)
ˆR = N−1
N
n=1
Rc
n , Fc
nj , j = 1, 2, . . . , p
T
(p × 1). (14)
37 Xi Zhang | March 20, 2013 37 / 48

Predictive efficiency
Relative predictive efficiency gains (in percent) defined as
E = 100
MSEPM
MSEPF
− 1 ,
where MSEPM is the MSEP computing using only Mn from model SF, and MSEPF is
the MSEP computed using all factors in the model.
38 Xi Zhang | March 20, 2013 38 / 48

Conﬁdence Intervals
Asymptotical
ˆβ asymptotically distributed with the mean β and the covariance matrix
N−1
F−1
ΓF−1
.
The matrix Γ is estimated as the long run covariance matrix of the sequence ˆξn.
ˆξn = ˆεn, Fn1 − ¯F1 , . . . , ˆεn, Fnp − ¯Fp
T
.
and
ˆεn(t) = Rn(t) − ˆβ0(t) −
p
j=1
ˆβj Fnj (t).
An R function lrvar with default kernel and bandwidth values is used to estimate
ˆΓ.
The variance of ˆβj is the jth diagonal element of N−1 ˆF−1ˆΓˆF−1
.
Subsampling
39 Xi Zhang | March 20, 2013 39 / 48

Ph.D. dissertation presentation | Functional multifactor regression for intraday price curves | Application to U.S. stocks
Sector Symbol Full Name
Energy
XOM Exxon Mobil Corporation
CVX Chevron Corporation
COP ConocoPhillips
Information MSFT Microsoft Corporation
Technology IBM IBM Corporation
ORCL Oracle Corporation
Financial
CITI Citi Bank
BOA Bank of America Corporation
JPM JPMorgan Chase Co.
Consumer Staples
KO Coca-Cola
WMT Wal-Mart Stores
PG Procter Gamble Co.
Consumer MCD McDonald’s Corporation
Discretionary DIS The Walt Disney Corporation
CMCSA Comcast Corporation
Transportation
FDX FedEx Corporation
JBLU JetBlue Airways Corporation
UPS United Parcel Service, Inc.
40 Xi Zhang | March 20, 2013 40 / 48

Ph.D. dissertation presentation | Functional multifactor regression for intraday price curves | Application to U.S. stocks
Models to test
A simpler model
Rn(t) = β0(t) + β1Mn(t) + β2Ln−1 + εn(t), (15)
PA model with Ln−1 representing the asset daily return;
PI model with Ln−1 representing the index daily return;
FF Fama–French model:
Rn(t) = β0(t) + β1Mn(t) + β2Sn + β3Hn + εn(t), (16)
where Sn and Hn are the Fama–French factors (scalars).
OF model with oil futures as the extra factor:
Rn(t) = β0(t) + β1Mn(t) + β2Cn(t) + εn(t), (17)
41 Xi Zhang | March 20, 2013 41 / 48

Ph.D. dissertation presentation | Functional multifactor regression for intraday price curves | results
Table : Summary of conclusions for the OF model for the stocks
Sector Subsampling Asymptotic
Energy 0/+ +
Information Technology 0 −
Financial 0 −/0
Consumer Staples 0 −/0
Consumer Discretionary 0 0/−
Transportation 0 −
42 Xi Zhang | March 20, 2013 42 / 48

Ph.D. dissertation presentation | Functional multifactor regression for intraday price curves | results
Table : Monte Carol study results out of bootstrapping.
Data size power
Bootstrapped asymptotic subsampling asymptotic subsampling
MSFT1 7 0 74 0
WMT1 5 0 98 3
UPS1 6 0 56 0
43 Xi Zhang | March 20, 2013 43 / 48

Ph.D. dissertation presentation | Summary and Conclusions
Outline
1 Introduction
44 Xi Zhang | March 20, 2013 44 / 48

Main results
The sophisticated method of prediction recently proposed in Kargin and
Onatski(2008), actually does not dominate a simpler method based on the
functional principal components. Limits on the quality of predictions are founded
and showed that no other method can exceed them.
Complex functional regression models do not perform better than a simple model.
A functional regression framework that allows us to evaluate quantitatively how
the shapes of intraday price curves depend on the shapes of other curve–valued
factors or on scalar factors is proposed.
Scalar factors have no significant impact on the shape of the price curves.
Oil factors affect the oil companys’ intraday price evolution significantly, but
mostly negative to other stocks.
Asymptotic theory leads to practically useful confidence intervals for the regression
coefficients.
45 Xi Zhang | March 20, 2013 45 / 48

Publication
Kokoszka, P., Miao, H., and Zhang, X. Functional multifactor regression for
intraday price curves. Submitted to Journal of Econometrics.
Kokoszka, P. and Zhang, X. Functional prediction of intra-day cumulative returns.
Statistical Modeling. 12(4):377-398, 2012.
Didericksen, D., Kokoszka, P., and Zhang, X. Empirical properties of forecasts
with the functional autoregressive model. Computational Statistics.
27(2):285-298, 2012.
Kokoszka, P. and Zhang X. Estimation of the autoregressive kernel in the
functional AR(1) process. Utah State University, Utah, USA. 2011.
46 Xi Zhang | March 20, 2013 46 / 48

Acknowledgement
Special thanks to: Dr. Piotr S. Kokoszka, and my PhD committee members: Dr. Daniel
Coster, Dr. Richard Cutler, Dr. John Stevens, and Dr. Lie Zhu.
47 Xi Zhang | March 20, 2013 47 / 48

Thank You.
48 Xi Zhang | March 20, 2013 48 / 48

ppt0320defenseday

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (9)

Similar to ppt0320defenseday

Similar to ppt0320defenseday (20)

ppt0320defenseday