尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
CHAPTER 16
Regression
Regression
The statistical technique for finding the best-fitting straight
line for a set of data
• Allows us to make
predictions based on
correlations
• A linear relationship
between two variables
allows the computation
of an equation that
provides a precise,
mathematical description
of the relationship abXY 
Regression
Line
The Relationship Between
Correlation and Regression
Both examine the relationship/association
between two variables
Both involve an X and Y variable for each
individual (one pair of scores)
Differences in practice
Correlation
Used to determine the
relationship between
two variables
Regression
Used to make
predictions about one
variable based on the
value of another
The Linear Equation:
Expresses a linear relationship between variables X and Y
• X: represents any given score on X
• Y: represents the corresponding score for Y based on X
• a: the Y-intercept
• Determines what the
value of Y equals when X = 0
• Where the line crosses the
Y-axis
• b: the slope constant
• How much the Y variable
will change when X is
increased by one point
• The direction and degree of the line’s tilt
abXY 
Prediction using Regression
A local video store charges a
$5/month membership fee
which allows video rentals at
$2 each
• How much will I spend per
month?
• If you never rent a video (X = 0)
• If you rent 3 videos/mo (X = 3)
• If you rent 8 videos/mo (X = 8)
abXY 
52  XY
55)0(2 Y
115)3(2 Y
215)8(2 Y
Graphing linear equations
7560)35(3
6060)05(0


YX
YX
The intercept (a) is 60
(when X = 0, Y = 60)
The slope (b) is 5
(as we increase one value in X, Y
increases 5 points)
0
10
20
30
40
50
60
70
80
0 1 2 3 4
• To graph the line below,
we only need to find two
pairs of scores for X and Y,
and then draw the straight
line that connects them
605  XY
The Regression Line
The line through the data points that ‘best fit’ the data
(assuming a linear relationship)
1. Makes the relationship
between two variables
easier to see (and
describe)
2. Identifies the ‘central
tendency’ of the relationship
between the variables
3. Can be used for prediction
• Best fit: the line that minimizes the distance of each
point to the line
‘Best fit’
Regression
Line
Correlation and the regression line
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5
• The magnitude of the
correlation coefficient (r ) is
an indicator of how well
the points aggregate
around the regression line
• What would a perfect
correlation look like?
The Distance Between a Point and the Line
:ˆ
:
Y
Y
Each data point will have its
own distance from the
regression line (a.k.a. error)
The actual value of Y shown in
the data for a given X
The value of Y predicted for a
given X from your linear
equation
YY ˆDistance 
How well does the line fit the data?
• How well a set of data points fits a straight line
can be measured by calculating the distance
(error) between the line and each data point
YY ˆError 
hat"y"ˆ Y
How well does the line fit the data?
• Some of distances will be positive and some
negative, so to find a total value we must square
each distance (remember SS)
 2
ˆ YY
Total squared error
(SS residual):
Remember, this is
the squared sum
of all distances
The Regression Line
The line through the data points that ‘best fit’ the data
(assuming a linear relationship)
The Least-
Squared-Error
Solution
A.k.a.
• The “best fit”
regression line
• minimizes the distance
of each point from the line
• Gives the best prediction
of Y
• The Least-Squared-Error
Solution
• Results in the smallest possible
value for the total squared error abXY ˆ
Solving the regression equation
abXY ˆ
Remember:
n
YX
XYSP


x
y
x s
s
r
SS
SP
b 
XY bMMa 
meanM
I interrupt our regularly scheduled
program for a brief announcement….
‘Memba these?
We have spent the semester
utilizing the Computational
Formulas for all Sum of Squares
For sanity’s sake, we will now be
utilizing the definitional formulas
for all
n
X
XSSX
2
2 )(

n
Y
YSSY
2
2 )(

n
YX
XYSP


2
)( XX MXSS 
  YX MYMXSP 
2
)( YY MYSS 
And now back to our regularly
scheduled programming…..
Solving the regression equation
abXY ˆ
Remember:
x
y
x s
s
r
SS
SP
b 
XY bMMa 
meanM
  YX MYMXSP 
Let’s Try One!
(Example 16.1, p.563, using the definitional formula)
Scores
X Y
2 3
6 11
0 6
4 6
7 12
5 7
5 10
3 9
∑X=32
Mx=4
∑Y=64
MY=8
Error
X - MX Y - MY
-2 -5
2 3
-4 -2
0 -2
3 4
1 -1
1 2
-1 1
Products
(X – MX)(Y – MY)
10
6
8
0
12
-1
2
-1
SP = 36
Squared Error
(X - MX)2 (Y - MY)2
4 25
4 9
16 4
0 4
9 16
1 1
1 4
1 1
SSX = 36 SSY = 64
Find b and a in the regression equation
1
36
36

xSS
SP
b
448)4(18 

a
bMMa XY
36
648;364


SP
SSMSSM YYXx
441ˆ  XXabXY
Making Predictions
We use the regression to make predictions.
• For the previous example:
• Thus, an individual with a score of X = 3 would be
predicted to have a Y score of:
However, keep in mind:
1. The predicted value will not be perfect unless the correlation is
perfect (the data points are not perfectly in line)
• Least error is NOT the absence of error
2. The regression equation should not be used to make predictions for
X values outside the range of the original data
4ˆ  XY
743ˆ Y
Standardizing the Regression Equation
The standardized form of the regression equation
utilizes z-scores (standardized scores) in place of raw
scores:
Note:
1. We are now using the z-score for each X value (zx) to predict the
z-score for the corresponding Y value (zy)
2. The slope constant that was b is now identified as β (“beta”)
• The slope for standardized variables: one standard deviation change
in X produces this much change in the standard deviation of Y
• For an equation with two variables, β = Pearson r
3. There is no longer a constant (a) in the equation
because z-scores have a mean of 0
xy zz ˆ
xy bMMa 
The Accuracy of the Predictions
• These plots of two different sets of data have the same
regression equation
The regression equation does not
provide any information about the
accuracy of the predictions!
The Standard Error of the Estimate
Provides a measure of the standard distance between a
regression line (the predicted Y values) and the actual data
points (the actual Y values)
• Very similar to the standard deviation
• Answers the question:
How accurately does the regression equation predict the
observed Y values?
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
Let’s Compute the Standard Error of
Estimate (Example 16.1, p.563, using the definitional formula)
Data
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
Predicted Y
values
6
10
4
8
9
11
9
7
4ˆ  XY
Residual
-3
1
2
-2
-2
1
1
2
0
YY ˆ
Squared
Residual
9
1
4
4
4
1
1
4
SSresidual = 28
 2
ˆYY 
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
43.11
67.130
6
784
28
282





Relationship Between the Standard
Error of the Estimate and Correlation
• r2 = proportion of predicted variability
• Variability in Y that is predicted by its relationship with X
• (1 – r2) = proportion of unpredicted variability
So, if r = 0.80, then the predicted variability is r2 = 0.64
• 64% of the total variability for Y scores can be predicted by X
• And the unpredicted variability is the remaining 36% (1 - r2)
predicted variability = SSregression = r2
SSY
unpredicted variability = SSresidual = (1-r2
)SSY
An Easier Way to Compute SSresidual
sY.X =
SSresidual
df
=
1-r2
( )SSY
n-2
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
Instead of computing individual error values:
It is easier to simply use the formula for unpredicted
variability for the SSresidual
These are the steps we just went through to
compute the Standard Error of Estimate
Data
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
Predicted Y
values
6
10
4
8
9
11
9
7
4ˆ  XY
Residual
-3
1
2
-2
-2
1
1
2
0
YY ˆ
Squared
Residual
9
1
4
4
4
1
1
4
SSresidual = 28
 2
ˆYY 
sY.X =
SSresidual
df
=
å Y - ˆY( )
2
n-2
43.11
67.130
6
784
28
282





Now let’s do it using the easier formula
• We know SSX = 36, SSY = 64, and SP = 36 because we
calculated it a few slides back:
Scores
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
∑X=32
Mx=4
∑Y=64
MY=8
Error
X - MX Y - MY
-2 -5
2 3
-4 -2
0 -2
3 4
1 -1
1 2
-1 1
Products
(X - MX)2(Y - MY)2
10
6
8
0
12
-1
2
-1
SP = 36
Squared Error
(X - MX)2 (Y - MY)2
4 25
4 9
16 4
0 4
9 16
1 1
1 4
1 1
SSX = 36 SSY = 64
Using those figures, we can compute:
• With SSY = 64 and a correlation of 0.75, the predicted
variability from the regression equation is:
r =
SP
SSXSSY
=
36
36(64)
=
36
2304
=
36
48
= 0.75
SSregression = r2
SSY = 0.752
(64)= 0.5625(64) = 36
SSresidual = (1-r2
)SSY = (1-0.752
)64 = (1-0.5625)64
= (0.4375)64 = 28
• And the unpredicted variability is:
• This is the same value we found working with our table!
CHAPTER 16.2
Analysis of Regression:
Testing the Significance of the Regression Equation
Analysis of Regression
• Uses an F-ratio to determine whether the variance
predicted by the regression equation is significantly
greater than would be expected if there was no
relationship between X and Y.
F =
variance in Y predicted by the regression equation
unpredicted variance in the Y scores
F =
systematic changes in Y resulting from changes in X
changes in Y that are independent from changes in X
Significance testing
The regression equation does not account for a
significant proportion of variance in the Y scores
The equation does account for a significant
proportion of variance in the Y scores
MSregression =
SSregression
dfregression
;df =1
MSresidual =
SSresidual
dfresidual
;df = n- 2
Find and evaluate the critical F-value the same as for
ANOVA (df = # of predictors, n-2)
H0 :
H1 :
F =
MSregression
MSresidual
Coming up next…
• Wednesday lab
• Lab #9: Using SPSS for correlation and regression
• HW #9 is due in the beginning of class
• Read the second half of Chapter 16 (pp.572-581)
CHAPTER 16.3
Introduction to Multiple Regression with Two Predictor
Variables
Multiple
Regression
with Two
Predictor
Variables
• 40% of the variance in Academic Performance can be
predicted by IQ scores
• 30% of the variance in academic performance can be
predicted from SAT scores
• IQ and SAT also overlap: SAT contributes only an additional
10% beyond what is already predicted by IQ
Predicting the variance
in academic
performance from IQ
and SAT scores
Multiple Regression
When you have more than one predictor variable
Considering the two-predictor model:
For standardized scores:
ˆY = b1x1 + b2 x2 + a
ˆzY = b1zX1 + b2zX 2
Calculations for two-predictor
regression coefficients:
Where:
• SSX1= sum of squared
deviations for X1
• SSX2= sum of squared
deviations for X2
• SPX1Y= sum of products
of deviations for X1 and Y
• SPX2Y= sum of products
of deviations for X2 and Y
• SPX1X2= sum of products
of deviations for X1and X22211
2
2121
12112
2
2
2121
22121
1
)())((
))(())((
)())((
))(())((
XXY
XXXX
YXXXXYX
XXXX
YXXXXYX
MbMbMa
SPSSSS
SPSPSSSP
b
SPSSSS
SPSPSSSP
b







R²
Percentage of variance accounted for by a
multiple-regression equation
• Proportion of unpredicted variability:
Y
YXYX
Y
regression
SS
SPbSPb
SS
SS
R 22112 

Y
residual
SS
SS
R  )1( 2
Standard error of the
estimate
Significance testing
(2-predictors)
3
21



ndf
df
SS
MS
MSs
residual
residual
residualXXY
),2(
3
2
residual
residual
regression
residual
residual
regression
regression
dfdf
MS
MS
F
n
SS
MS
SS
MS





** With 3+ predictors, df
regression = # predictors
Evaluating the Contribution of Each
Predictor Variable
• With a multiple regression, we can evaluate the
contribution of each predictor variable
• Does variable X1 make a significant contribution
beyond what is already predicted by variable X2?
• Does variable X2 make a significant contribution
beyond what is already predicted by variable X1?
• This is useful if we want to control for a third variable and
any confounding effects

More Related Content

What's hot

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Attaullah Khan
 
Applications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large NumbersApplications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large Numbers
University of Salerno
 
Presentation on regression analysis
Presentation on regression analysisPresentation on regression analysis
Presentation on regression analysis
Sujeet Singh
 
Chap11 simple regression
Chap11 simple regressionChap11 simple regression
Chap11 simple regression
Judianto Nugroho
 
Estimación
EstimaciónEstimación
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Regent University
 
Non parametric-tests
Non parametric-testsNon parametric-tests
Non parametric-tests
Asmita Bhagdikar
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
Khaled Abd Elaziz
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
Anil Mishra
 
Distribuciones Muestrales
Distribuciones MuestralesDistribuciones Muestrales
Distribuciones Muestrales
Hector Funes
 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple Regression
Venkata Reddy Konasani
 
Simple interest
Simple interestSimple interest
Simple interest
walter9chambliss
 
Kwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and modeKwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and mode
Aarti Vijaykumar
 
Inferencia estadistica
Inferencia estadisticaInferencia estadistica
Inferencia estadistica
natorabet
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
Mohit Asija
 
Presentación 11 prueba t y varianza
Presentación 11 prueba t y varianzaPresentación 11 prueba t y varianza
Presentación 11 prueba t y varianza
Dr. Orville M. Disdier
 
Spc methods
Spc methods Spc methods
Spc methods
Sudarshana26
 
Chisquare
ChisquareChisquare
Chisquare
keerthi samuel
 
Linear regression
Linear regressionLinear regression
Linear regression
SreerajVA
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
jasondroesch
 

What's hot (20)

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Applications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large NumbersApplications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large Numbers
 
Presentation on regression analysis
Presentation on regression analysisPresentation on regression analysis
Presentation on regression analysis
 
Chap11 simple regression
Chap11 simple regressionChap11 simple regression
Chap11 simple regression
 
Estimación
EstimaciónEstimación
Estimación
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Non parametric-tests
Non parametric-testsNon parametric-tests
Non parametric-tests
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Distribuciones Muestrales
Distribuciones MuestralesDistribuciones Muestrales
Distribuciones Muestrales
 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple Regression
 
Simple interest
Simple interestSimple interest
Simple interest
 
Kwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and modeKwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and mode
 
Inferencia estadistica
Inferencia estadisticaInferencia estadistica
Inferencia estadistica
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Presentación 11 prueba t y varianza
Presentación 11 prueba t y varianzaPresentación 11 prueba t y varianza
Presentación 11 prueba t y varianza
 
Spc methods
Spc methods Spc methods
Spc methods
 
Chisquare
ChisquareChisquare
Chisquare
 
Linear regression
Linear regressionLinear regression
Linear regression
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
 

Similar to regression

Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
MuhammadAftab89
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
BAGARAGAZAROMUALD2
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
RidaIrfan10
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
ssuser71ac73
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
HarunorRashid74
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
krunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
MoinPasha12
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
rishi.indian
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
nuwan udugampala
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
Rashi Agarwal
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
Correlations
CorrelationsCorrelations
Regression analysis
Regression analysisRegression analysis
Regression analysis
Awais Salman
 
Regression
Regression  Regression
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
Nimrita Koul
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Neeraj Bhandari
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
ShivankAggatwal
 
Linear regression
Linear regressionLinear regression
Linear regression
Regent University
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
Rohit77460
 

Similar to regression (20)

Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Correlations
CorrelationsCorrelations
Correlations
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression
Regression  Regression
Regression
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Linear regression
Linear regressionLinear regression
Linear regression
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
 

More from Kaori Kubo Germano, PhD

Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Kaori Kubo Germano, PhD
 
Probablity
ProbablityProbablity
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
Kaori Kubo Germano, PhD
 
z-scores
z-scoresz-scores
Choosing the right statistics
Choosing the right statisticsChoosing the right statistics
Choosing the right statistics
Kaori Kubo Germano, PhD
 
Chi square
Chi squareChi square
Factorial ANOVA
Factorial ANOVAFactorial ANOVA
Factorial ANOVA
Kaori Kubo Germano, PhD
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
Kaori Kubo Germano, PhD
 
Repeated Measures ANOVA
Repeated Measures ANOVARepeated Measures ANOVA
Repeated Measures ANOVA
Kaori Kubo Germano, PhD
 
Repeated Measures t-test
Repeated Measures t-testRepeated Measures t-test
Repeated Measures t-test
Kaori Kubo Germano, PhD
 
Independent samples t-test
Independent samples t-testIndependent samples t-test
Independent samples t-test
Kaori Kubo Germano, PhD
 
Introduction to the t-test
Introduction to the t-testIntroduction to the t-test
Introduction to the t-test
Kaori Kubo Germano, PhD
 
Central Tendency
Central TendencyCentral Tendency
Central Tendency
Kaori Kubo Germano, PhD
 
Variability
VariabilityVariability
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
Kaori Kubo Germano, PhD
 
Behavioral Statistics Intro lecture
Behavioral Statistics Intro lectureBehavioral Statistics Intro lecture
Behavioral Statistics Intro lecture
Kaori Kubo Germano, PhD
 

More from Kaori Kubo Germano, PhD (16)

Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Probablity
ProbablityProbablity
Probablity
 
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
 
z-scores
z-scoresz-scores
z-scores
 
Choosing the right statistics
Choosing the right statisticsChoosing the right statistics
Choosing the right statistics
 
Chi square
Chi squareChi square
Chi square
 
Factorial ANOVA
Factorial ANOVAFactorial ANOVA
Factorial ANOVA
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Repeated Measures ANOVA
Repeated Measures ANOVARepeated Measures ANOVA
Repeated Measures ANOVA
 
Repeated Measures t-test
Repeated Measures t-testRepeated Measures t-test
Repeated Measures t-test
 
Independent samples t-test
Independent samples t-testIndependent samples t-test
Independent samples t-test
 
Introduction to the t-test
Introduction to the t-testIntroduction to the t-test
Introduction to the t-test
 
Central Tendency
Central TendencyCentral Tendency
Central Tendency
 
Variability
VariabilityVariability
Variability
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Behavioral Statistics Intro lecture
Behavioral Statistics Intro lectureBehavioral Statistics Intro lecture
Behavioral Statistics Intro lecture
 

Recently uploaded

Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Kalna College
 
How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...
Infosec
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Kalna College
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
MJDuyan
 
How to Create a Stage or a Pipeline in Odoo 17 CRM
How to Create a Stage or a Pipeline in Odoo 17 CRMHow to Create a Stage or a Pipeline in Odoo 17 CRM
How to Create a Stage or a Pipeline in Odoo 17 CRM
Celine George
 
Diversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT KanpurDiversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT Kanpur
Quiz Club IIT Kanpur
 
IoT (Internet of Things) introduction Notes.pdf
IoT (Internet of Things) introduction Notes.pdfIoT (Internet of Things) introduction Notes.pdf
IoT (Internet of Things) introduction Notes.pdf
roshanranjit222
 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
MattVassar1
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
Kalna College
 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
MattVassar1
 
What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17
Celine George
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
Kalna College
 
Interprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdfInterprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdf
Ben Aldrich
 
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
Kalna College
 
Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024
Friends of African Village Libraries
 
How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17
Celine George
 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
Quizzito The Quiz Society of Gargi College
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapitolTechU
 
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptxScience-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Catherine Dela Cruz
 
(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"
MJDuyan
 

Recently uploaded (20)

Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
 
How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
 
How to Create a Stage or a Pipeline in Odoo 17 CRM
How to Create a Stage or a Pipeline in Odoo 17 CRMHow to Create a Stage or a Pipeline in Odoo 17 CRM
How to Create a Stage or a Pipeline in Odoo 17 CRM
 
Diversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT KanpurDiversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT Kanpur
 
IoT (Internet of Things) introduction Notes.pdf
IoT (Internet of Things) introduction Notes.pdfIoT (Internet of Things) introduction Notes.pdf
IoT (Internet of Things) introduction Notes.pdf
 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
 
What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
 
Interprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdfInterprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdf
 
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
 
Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024
 
How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17
 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
 
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptxScience-9-Lesson-1-The Bohr Model-NLC.pptx pptx
Science-9-Lesson-1-The Bohr Model-NLC.pptx pptx
 
(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"
 

regression

  • 2. Regression The statistical technique for finding the best-fitting straight line for a set of data • Allows us to make predictions based on correlations • A linear relationship between two variables allows the computation of an equation that provides a precise, mathematical description of the relationship abXY  Regression Line
  • 3. The Relationship Between Correlation and Regression Both examine the relationship/association between two variables Both involve an X and Y variable for each individual (one pair of scores) Differences in practice Correlation Used to determine the relationship between two variables Regression Used to make predictions about one variable based on the value of another
  • 4. The Linear Equation: Expresses a linear relationship between variables X and Y • X: represents any given score on X • Y: represents the corresponding score for Y based on X • a: the Y-intercept • Determines what the value of Y equals when X = 0 • Where the line crosses the Y-axis • b: the slope constant • How much the Y variable will change when X is increased by one point • The direction and degree of the line’s tilt abXY 
  • 5. Prediction using Regression A local video store charges a $5/month membership fee which allows video rentals at $2 each • How much will I spend per month? • If you never rent a video (X = 0) • If you rent 3 videos/mo (X = 3) • If you rent 8 videos/mo (X = 8) abXY  52  XY 55)0(2 Y 115)3(2 Y 215)8(2 Y
  • 6. Graphing linear equations 7560)35(3 6060)05(0   YX YX The intercept (a) is 60 (when X = 0, Y = 60) The slope (b) is 5 (as we increase one value in X, Y increases 5 points) 0 10 20 30 40 50 60 70 80 0 1 2 3 4 • To graph the line below, we only need to find two pairs of scores for X and Y, and then draw the straight line that connects them 605  XY
  • 7. The Regression Line The line through the data points that ‘best fit’ the data (assuming a linear relationship) 1. Makes the relationship between two variables easier to see (and describe) 2. Identifies the ‘central tendency’ of the relationship between the variables 3. Can be used for prediction • Best fit: the line that minimizes the distance of each point to the line ‘Best fit’ Regression Line
  • 8. Correlation and the regression line 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 • The magnitude of the correlation coefficient (r ) is an indicator of how well the points aggregate around the regression line • What would a perfect correlation look like?
  • 9. The Distance Between a Point and the Line :ˆ : Y Y Each data point will have its own distance from the regression line (a.k.a. error) The actual value of Y shown in the data for a given X The value of Y predicted for a given X from your linear equation YY ˆDistance 
  • 10. How well does the line fit the data? • How well a set of data points fits a straight line can be measured by calculating the distance (error) between the line and each data point YY ˆError  hat"y"ˆ Y
  • 11. How well does the line fit the data? • Some of distances will be positive and some negative, so to find a total value we must square each distance (remember SS)  2 ˆ YY Total squared error (SS residual): Remember, this is the squared sum of all distances
  • 12. The Regression Line The line through the data points that ‘best fit’ the data (assuming a linear relationship) The Least- Squared-Error Solution A.k.a. • The “best fit” regression line • minimizes the distance of each point from the line • Gives the best prediction of Y • The Least-Squared-Error Solution • Results in the smallest possible value for the total squared error abXY ˆ
  • 13. Solving the regression equation abXY ˆ Remember: n YX XYSP   x y x s s r SS SP b  XY bMMa  meanM
  • 14. I interrupt our regularly scheduled program for a brief announcement….
  • 15. ‘Memba these? We have spent the semester utilizing the Computational Formulas for all Sum of Squares For sanity’s sake, we will now be utilizing the definitional formulas for all n X XSSX 2 2 )(  n Y YSSY 2 2 )(  n YX XYSP   2 )( XX MXSS    YX MYMXSP  2 )( YY MYSS 
  • 16. And now back to our regularly scheduled programming…..
  • 17. Solving the regression equation abXY ˆ Remember: x y x s s r SS SP b  XY bMMa  meanM   YX MYMXSP 
  • 18. Let’s Try One! (Example 16.1, p.563, using the definitional formula) Scores X Y 2 3 6 11 0 6 4 6 7 12 5 7 5 10 3 9 ∑X=32 Mx=4 ∑Y=64 MY=8 Error X - MX Y - MY -2 -5 2 3 -4 -2 0 -2 3 4 1 -1 1 2 -1 1 Products (X – MX)(Y – MY) 10 6 8 0 12 -1 2 -1 SP = 36 Squared Error (X - MX)2 (Y - MY)2 4 25 4 9 16 4 0 4 9 16 1 1 1 4 1 1 SSX = 36 SSY = 64
  • 19. Find b and a in the regression equation 1 36 36  xSS SP b 448)4(18   a bMMa XY 36 648;364   SP SSMSSM YYXx 441ˆ  XXabXY
  • 20. Making Predictions We use the regression to make predictions. • For the previous example: • Thus, an individual with a score of X = 3 would be predicted to have a Y score of: However, keep in mind: 1. The predicted value will not be perfect unless the correlation is perfect (the data points are not perfectly in line) • Least error is NOT the absence of error 2. The regression equation should not be used to make predictions for X values outside the range of the original data 4ˆ  XY 743ˆ Y
  • 21. Standardizing the Regression Equation The standardized form of the regression equation utilizes z-scores (standardized scores) in place of raw scores: Note: 1. We are now using the z-score for each X value (zx) to predict the z-score for the corresponding Y value (zy) 2. The slope constant that was b is now identified as β (“beta”) • The slope for standardized variables: one standard deviation change in X produces this much change in the standard deviation of Y • For an equation with two variables, β = Pearson r 3. There is no longer a constant (a) in the equation because z-scores have a mean of 0 xy zz ˆ xy bMMa 
  • 22. The Accuracy of the Predictions • These plots of two different sets of data have the same regression equation The regression equation does not provide any information about the accuracy of the predictions!
  • 23. The Standard Error of the Estimate Provides a measure of the standard distance between a regression line (the predicted Y values) and the actual data points (the actual Y values) • Very similar to the standard deviation • Answers the question: How accurately does the regression equation predict the observed Y values?   2 ˆ 2 .    n YY df SS s residual XY
  • 24. Let’s Compute the Standard Error of Estimate (Example 16.1, p.563, using the definitional formula) Data X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 Predicted Y values 6 10 4 8 9 11 9 7 4ˆ  XY Residual -3 1 2 -2 -2 1 1 2 0 YY ˆ Squared Residual 9 1 4 4 4 1 1 4 SSresidual = 28  2 ˆYY    2 ˆ 2 .    n YY df SS s residual XY 43.11 67.130 6 784 28 282     
  • 25. Relationship Between the Standard Error of the Estimate and Correlation • r2 = proportion of predicted variability • Variability in Y that is predicted by its relationship with X • (1 – r2) = proportion of unpredicted variability So, if r = 0.80, then the predicted variability is r2 = 0.64 • 64% of the total variability for Y scores can be predicted by X • And the unpredicted variability is the remaining 36% (1 - r2) predicted variability = SSregression = r2 SSY unpredicted variability = SSresidual = (1-r2 )SSY
  • 26. An Easier Way to Compute SSresidual sY.X = SSresidual df = 1-r2 ( )SSY n-2   2 ˆ 2 .    n YY df SS s residual XY Instead of computing individual error values: It is easier to simply use the formula for unpredicted variability for the SSresidual
  • 27. These are the steps we just went through to compute the Standard Error of Estimate Data X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 Predicted Y values 6 10 4 8 9 11 9 7 4ˆ  XY Residual -3 1 2 -2 -2 1 1 2 0 YY ˆ Squared Residual 9 1 4 4 4 1 1 4 SSresidual = 28  2 ˆYY  sY.X = SSresidual df = å Y - ˆY( ) 2 n-2 43.11 67.130 6 784 28 282     
  • 28. Now let’s do it using the easier formula • We know SSX = 36, SSY = 64, and SP = 36 because we calculated it a few slides back: Scores X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 ∑X=32 Mx=4 ∑Y=64 MY=8 Error X - MX Y - MY -2 -5 2 3 -4 -2 0 -2 3 4 1 -1 1 2 -1 1 Products (X - MX)2(Y - MY)2 10 6 8 0 12 -1 2 -1 SP = 36 Squared Error (X - MX)2 (Y - MY)2 4 25 4 9 16 4 0 4 9 16 1 1 1 4 1 1 SSX = 36 SSY = 64
  • 29. Using those figures, we can compute: • With SSY = 64 and a correlation of 0.75, the predicted variability from the regression equation is: r = SP SSXSSY = 36 36(64) = 36 2304 = 36 48 = 0.75 SSregression = r2 SSY = 0.752 (64)= 0.5625(64) = 36 SSresidual = (1-r2 )SSY = (1-0.752 )64 = (1-0.5625)64 = (0.4375)64 = 28 • And the unpredicted variability is: • This is the same value we found working with our table!
  • 30. CHAPTER 16.2 Analysis of Regression: Testing the Significance of the Regression Equation
  • 31. Analysis of Regression • Uses an F-ratio to determine whether the variance predicted by the regression equation is significantly greater than would be expected if there was no relationship between X and Y. F = variance in Y predicted by the regression equation unpredicted variance in the Y scores F = systematic changes in Y resulting from changes in X changes in Y that are independent from changes in X
  • 32. Significance testing The regression equation does not account for a significant proportion of variance in the Y scores The equation does account for a significant proportion of variance in the Y scores MSregression = SSregression dfregression ;df =1 MSresidual = SSresidual dfresidual ;df = n- 2 Find and evaluate the critical F-value the same as for ANOVA (df = # of predictors, n-2) H0 : H1 : F = MSregression MSresidual
  • 33. Coming up next… • Wednesday lab • Lab #9: Using SPSS for correlation and regression • HW #9 is due in the beginning of class • Read the second half of Chapter 16 (pp.572-581)
  • 34. CHAPTER 16.3 Introduction to Multiple Regression with Two Predictor Variables
  • 35. Multiple Regression with Two Predictor Variables • 40% of the variance in Academic Performance can be predicted by IQ scores • 30% of the variance in academic performance can be predicted from SAT scores • IQ and SAT also overlap: SAT contributes only an additional 10% beyond what is already predicted by IQ Predicting the variance in academic performance from IQ and SAT scores
  • 36. Multiple Regression When you have more than one predictor variable Considering the two-predictor model: For standardized scores: ˆY = b1x1 + b2 x2 + a ˆzY = b1zX1 + b2zX 2
  • 37. Calculations for two-predictor regression coefficients: Where: • SSX1= sum of squared deviations for X1 • SSX2= sum of squared deviations for X2 • SPX1Y= sum of products of deviations for X1 and Y • SPX2Y= sum of products of deviations for X2 and Y • SPX1X2= sum of products of deviations for X1and X22211 2 2121 12112 2 2 2121 22121 1 )())(( ))(())(( )())(( ))(())(( XXY XXXX YXXXXYX XXXX YXXXXYX MbMbMa SPSSSS SPSPSSSP b SPSSSS SPSPSSSP b       
  • 38. R² Percentage of variance accounted for by a multiple-regression equation • Proportion of unpredicted variability: Y YXYX Y regression SS SPbSPb SS SS R 22112   Y residual SS SS R  )1( 2
  • 39. Standard error of the estimate Significance testing (2-predictors) 3 21    ndf df SS MS MSs residual residual residualXXY ),2( 3 2 residual residual regression residual residual regression regression dfdf MS MS F n SS MS SS MS      ** With 3+ predictors, df regression = # predictors
  • 40. Evaluating the Contribution of Each Predictor Variable • With a multiple regression, we can evaluate the contribution of each predictor variable • Does variable X1 make a significant contribution beyond what is already predicted by variable X2? • Does variable X2 make a significant contribution beyond what is already predicted by variable X1? • This is useful if we want to control for a third variable and any confounding effects
  翻译: