Professional Documents
Culture Documents
GENEVIEVE BRIAND
Washington State University
R. CARTER HILL
Louisiana State University
Copyright© 2010, 2011 John Wiley & Sons, Inc. All rights reserved. No part of this
publication rnay be reproduced, stored in a retrieval system or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act,
without either the prior written permission of the Publisher, or authorization through
payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc. 222
Rosewood Drive, Danvers, MA 01923, website www.copyright.corn. Requests to the
Publisher for permission should be addressed to the Permissions Department, John Wiley
& Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, (201)748-6011, fax (201)748-
6008, website http://www.wiley.corn/go/permissions.
ISBN-13 978-111-803210-7
10 9 8 7 6 5 4 3 2 1
Preface
This book is a supplement to Principles of Econometrics, 4th Edition by R. Carter Hill, William E.
Griffiths and Guay C. Lim (Wiley, 2011). This book is not a substitute for the textbook, nor is it a
stand alone computer manual. It is a companion to the textbook, showing how to perform the
examples in the textbook using Excel 2007. This book will be useful to students taking
econometrics, as well as their instructors, and others who wish to use Excel for econometric
analysis.
In addition to this computer manual for Excel, there are similar manuals and support for the
software packages EViews, Gretl, Shazam, and Stata. In addition, all the data for Principles of
Econometrics, lh in various formats, including Excel, are available at
http://www.wiley.com/college/hill. Individual data files, as well as errata for this manual and the
textbook, can also be found at http://principlesofeconometrics.com.
The chapters in this book parallel the chapters in Principles of Econometrics, lh. Thus, if you
seek help for the examples in Chapter 11 of the textbook, check Chapter 11 in this book.
However within a Chapter the sections numbers in Principles of Econometrics, lh do not
necessarily correspond to the Excel manual sections.
This work is a revision of Using Excel 2007 for Principles of Econometrics, 3rd Edition by
Genevieve Briand and R. Carter Hill (Wiley, 2010). Genevieve Briand is the corresponding
author.
Genevieve Briand
School of Economic Sciences
Washington State University
Pullman, WA 99164
gbriand@wsu.edu
R. Carter Hill
Economics Department
Louisiana State University
Baton Rouge, LA 70803
eohill@lsu.edu
·
Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. Our use does not directly or indirectly imply
Microsoft sponsorship, affiliation, or endorsement.
iv
BRIEF CONTENTS
1. Introduction to Excel 1
8. Heteroskedasticity 204
Index 466
v
CONTENTS 2.4.1 Model Assumptions 45
2.4.2 Random Number Generation
47
CHAPTER 1 Introduction to Excel 1
2.4.3 The LINEST Function 49
1.1 Starting Excel 1
2.4.4 Repeated Sampling 50
1.2 Entering Data 3
2.5 Variance and Covariance ofb1 and b2
1.3 Using Excel for Calculations 3
52
1.3.1 Arithmetic Operations 3
2.6 Nonlinear Relationships 53
1.3.2 Mathematical Functions 4
2.6.1 A Quadratic Model 53
1.4 Editing your Data 6
2.6.la Estimating the Model
1.5 Saving andPrinting your Data 8
53
1.6 Importing Data into Excel 10
2.6.lb ScatterPlot ofData
1.6.1 Resources for Economists
with Fitted Quadratic
on the Internet 10
Relationship 55
1.6.2 Data Files forPrinciples of
2.6.2 A Log-Linear Model 57
Econometrics 13
2.6.2a Histograms ofPRICE
1.6.2a John Wiley & Sons
and ln(PRJCE) 57
Website 13
2.6.2b Estimating the Model
1.6.2bPrinciples of
61
Econometrics Website
2.6.2c ScatterPlot ofData
14
with Fitted Log
1.6.3 Importing ASCII Files 14
Linear Relationship
62
CHAPTER 2 The Simple Linear Regression 2.7 Regression with Indicator Variables 63
Model 19 2.7.1 Histograms ofHousePrices
2.1 Plotting the Food Expenditure Data 19 63
2.1.1 Using Chart Tools 21 2.7.2 Estimating the Model 65
2.1.2 Editing the Graph 23
2.1.2a Editing the Vertical
CHAPTER 3 Interval Estimation and
Axis 23
Hypothesis Testing 67
2.1.2b Axis Titles 24
3.1 Interval Estimation 68
2.1.2c Gridlines and Markers
3.1.1 The t-Distribution 68
25
3.1.1a The t-Distribution
2.1.2d Moving the Chart
versus Normal
26
Distribution 68
2.2 Estimating a Simple Regression 27
3.1.1b t-Critical Values and
2.2.1 Using Least Squares
Interval Estimates
Estimators' Formulas 27
69
2.2.2 Using Excel Regression
3.1.1c Percentile Values
Analysis Routine 31
69
2.3 Plotting a Simple Regression 34
3.1.1d TINY Function 69
2.3.1 Using TwoPoints 34
3.1.le Appendix E: Table 2
2.3.2 Using Excel Built-in Feature
inPOE 71
38
3.1.2 Obtaining Interval Estimates
2.3.3 Using a Regression Option
71
38
3.1.3 An Illustration 71
2.3.4 Editing the Chart 40
2.4 Expected Values of b1 and b2 44
vi
3.1.3a Using the Interval 3.4.1 Thep-Value Rule 88
Estimator Formula 3.4.1a Definition ofp-value
71 88
3.1.3b Excel Regression 3.4.1b Justification for thep
Default Output 73 Value Rule 89
3.1.3c Excel Regression 3.4.2 The TDIST Function 91
Confidence Level 3.4.3 Examples of Hypothesis Tests
Option 74 Revisited 92
3.1.4 The Repeated Sampling 3.4.3a Right-Tail Test from
Context (Advanced Material) Section 3.3.1b 92
75 3.4.3b Left-Tail Test from
3.1.4a Model Assumptions Section 3.3.2 92
75 3.4.3c Two-Tail Test from
3.1.4b Repeated Random Section 3.3.3a 93
Sampling 75 3.4.3d Two-Tail Test from
3.1.4c The LINEST Function Section 3.3.3b 93
Revisited 77
3.1.4d The Simulation
CHAPTER 4 Prediction, Goodness-of-Fit
Template 78
and Modeling Issues 95
3.1.4e The IF Function 79
4.1 Least Squares Prediction 96
3.1.4f The OR Function 79
4.2 Measuring Goodness-of-Fit 98
3.1.4g The COUNTIF
4.2.1 Coefficient of Determination
Function 80
or R2 98
3.2 Hypothesis Tests 81
4.2.2 Correlation Analysis and R2
3.2.1 One-Tail Tests with
98
Alternative "Greater Than" (>)
4.2.3 The Food Expenditure
81
Example and the CORREL
3.2.2 One-Tail Tests with
Function 99
Alternative "Less Than"(<)
4.3 The Effects of Scaling the Data 100
82
4.3.1 Changing the Scale of x 100
3.2.3 Two-Tail Tests with
4.3.2 Changing the Scale ofy 101
Alternative "Not Equal To"(:1:)
4.3.3 Changing the Scale of x andy
82
102
3.3 Examples of Hypothesis Tests 82
4.4 A Linear-Log Food Expenditure Model
3.3.l Right-Tail Tests 83
104
3.3.la One-Tail Test of
4.4.l Estimating the Model 104
Significance 84
4.4.2 Scatter Plot of Data with Fitted
3.3.lb One-Tail Test of an
Linear-Log Relationship 105
Economic Hypothesis
4.5 Using Diagnostic Residual Plots 108
84
4.5.1 Random Residual Pattern
3.3.2 Left-Tail Tests 84
108
3.3.3 Two-Tail Tests 86
4.5.2 Heteroskedastic Residual
3.3.3a Two-Tail Test of an
Pattern 111
Economic Hypothesis
4.5.3 Detecting Model Specification
87
Errors 112
3.3.3b Two-Tail Test of
4.6 Are the Regression Errors Normally
Significance 87
Distributed? 115
3.4 Thep-Value 88
vii
4.6.1 Histogram of the Residuals 5.3.2a Left-Tail Test of
115 Elastic Demand
4.6.2 The Jarque-Bera Test for 146
Normality using the CHINV 5.3.2b Right-Tail Test of
and CHIDIST Functions 118 Advertising
4.6.3 The Jarque-Bera Test for Effectiveness 147
Normality for the Linear-Log 5.4 Polynomial Equations: Extending the
Food ExpenditureModel 121 Model for Burger Barn Sales 148
4.7 PolynomialModels: An Empirical 5.5 Interaction Variables 149
Example 122 5.5.1 LinearModels 149
4.7.1 Scatter Plot of Wheat Yield 5.5.2 Log-LinearModels 151
over Time 123 5.6 Measuring Goodness-of-Fit 153
4.7.2 The Linear EquationModel
125
CHAPTER 6 Further Inferenee in the
4.7.2a Estimating theModel
Multiple Regression Model 154
125
6.1 Testing the Effect of Advertising: the F
4.7.2b Residuals Plot 126
test 154
4.7.3 The Cubic EquationModel
6.1.1 The Logic of the Test 154
126
6.1.2 The Unrestricted and
4.7.3a Estimating theModel
RestrictedModels 155
126
6.1.3 Test Template 158
4.7.3b Residuals Plot 128
6.2 Testing the Significance of theModel
4.8 Log-LinearModels 129
159
4.8.1 A Growth Model 129
6.2.1 Null and Alternative
4.8.2 A Wage Equation 130
Hypotheses 159
4.8.3 Prediction 132
6.2.2 Test Template 159
4.8.4 A Generalized R2Measure
6.2.3 Excel Regression Output 160
135
6.3 The Relationship between t- and F-Tests
4.6.5 Prediction Intervals 136
161
4.9 A Log-LogModel: Poultry Demand
6.4 Testing Some Economic
Equation 139
Hypotheses 163
4.9.1 Estimating theModel 139
6.4.1 The Optimal Level of
4.9.2 A Generalized R2Measure
Advertising 163
140
6.4.2 The Optimal Level of
4.9.3 Scatter Plot of Data with Fitted
Advertising and Price 164
Log-Log Relationship 140
6.5 The Use of Nonsample Information
166
CHAPTER 5 The Multiple Linear Regression 6.6 Model Specification 167
143 6.6.1 Omitted Variables 167
5.1 Least Squares Estimates Using the 6.6.2 Irrelevant Variables 169
Hamburger Chain Data 143 6.6.3 The RESET Test 172
5.2 Interval Estimation 145 6.7 Poor Data, Collinearity and
5.3 Hypothesis Tests for a Single Coefficient Insignificance 176
145 6.7.1 CorrelationMatrix 176
5.3.1 Tests of Significance 145 6.7.2 The CarMileageModel
5.3.2 One-Tail Tests 146 Example 177
viii
CHAPTER 7 Using Indicator Variables 180 8.4.2 Grouped Data: Wage Equation
7.1 Indicator Variables: The University Example 222
Effect on House Prices Example 180 8.4.2a Separate Wage
7.2 Applying Indicator Variables 182 Equations for
7.2.1 Interactions Between Metropolitan and
Qualitative Factors 182 Rural Areas 222
7.2.2 Qualitative Factors with 8.4.2b GLS Wage Equation
Several Categories 185 223
7.2.3 Testing the Equivalence of 8.5 Generalized Least Squares: Unknown
Two Regressions 187 Form of Variance 224
7.3 Log-Linear Models: a Wage Equation
Example 191
CHAPTER 9 Regressions with Time Series
7.4 The Linear Probability Model: A
Data: Stationary Variables 228
Marketing Example 192
9.1 Finite Distributed Lags 228
7.5 The Difference Estimator: The Project
9.1.1 US Economic Time Series
STAR Example 193
228
7.6 The Differences-in-Differences
9.1.2 An Example: The Okun's Law
Estimator: The Effect of Minimum Wage
230
Change Example 198
9.2 Serial Correlation 232
9.2.1 Serial Correlation in Ouput
CHAPTER 8 Heteroskedasticity 204 Growth 232
8.1 The Nature ofHeteroskedasticity 204 9.2.la Scatter Diagram for Gt
8.2 Detecting Heteroskedasticity 206 and Gt-1 232
8.2.1 Residual Plots 206 9.2.lb Correlogram for G
8.2.2 Lagrange Multiplier Tests 233
206 9.2.2 Serially Correlated Errors
8.2.2a Using the Lagrange 237
Multiplier or Breusch 9.2.2a Australian Economic
Pagan Test 206 Time Series 237
8.2.2b Using the White Test 9.2.2b A Phillips Curve
209 239
8.2.3 The Goldfeld-Quandt 9.2.2c Correlogram for
Test 210 Residuals 240
8.2.3a The Logic of the Test 9.3 Lagrange Multiplier Tests for Serially
210 Correlated Errrors 241
8.2.3b Test Template 211 9.3.1 !-Test Version 241
8.2.3c Wage Equation 9.3.2 T x R2 Version 243
Example 212 9.4 Estimation with Serially Correlated
8.2.3d Food Expenditure Errors 245
Example 216 9.4.1 Generalized Least Squares
8.3 Heteroskedasticity-Consistent Standard Estimation of an AR(1) Error
Errors or the White Standard Errors Model 245
219 9.4.la The Prais-Winsten
8.4 Generalized Least Squares: Known Form Estimator 245
of Variance 221 9.4.lb The Cochrane-Orcutt
8.4.1 Variance Proportional to x: Estimator 248
Food Expenditure Example 9.4.2 Autoregressive Distributed
221 Lag (ARDL) Model 252
ix
9.5 Forecasting 254 11.1.2a 2SLS Estimates for
9.5.1 Using an Autoregressive (AR) Truffle Demand
Model 254 281
9.5.2 Using an Exponential 11.1.2b 2SLS Estimates for
Smoothing Model 257 Truffle Supply
9.6 Multiplier Analysis 258 283
11.2 Supply and Demand Model for the
Fulton Fish Market 286
CHAPTER 10 Random Regressors and
11.2.1 The Reduced Form Equations
Moment-Based Estimation 262
286
10.1 OLS Estimation of a Wage Equation
11.2.la Reduced Form
262
Equation for lnQ
10.2 Instrumental Variables Estimation of the
286
Wage Equation 264
11.2.1b Reduced Form
10.2.1 With a Single Instrument 264
Equation for lnP
10.2.la First Stage Equation
287
for EDUC 264
11.2.2 The Structural Equations or
10.2.lb Stage 2 Least
Stage 2 Least Squares
Squares Estimates
Estimates 290
265
11.2.2a 2SLS Estimates for
10.2.2 With a Surplus Instrument
Fulton Fish Demand
268
290
10.2.2a First Stage Equation
for EDUC 268
10.2.2b Stage 2 Least CHAPTER 12 Nonstationary Time-Series
Squares Estimates Data and Cointegration 294
270 12.1 Stationary and Nonstationary
10.3 Specification Tests for the Wage Variables 294
Equation 273 12.1.1 US Economic Time Series
10.3.1 The Hausman Test 273 294
10.3.2 Testing Surplus Moment 12.1.2 Simulated Data 296
Conditions 274 12.2 Spurious Regressions 299
12.3 Unit Root Tests for Stationarity 301
12.4 Cointegration 306
x
CHAPTER 14 Time-Varying Volatility and 15.4.3 Estimation: Different
ARCH Models 328 Coefficients, Different Error
14.1 Time-Varying Volatility 328 Variances 384
14.1.1 Returns Data 328 15.4.4 Seemingly Unrelated
14.1.2 Simulated Data 334 Regressions: Testing for
14.2 Testing and Forecasting 341 Contemporaneous Correlation
14.2.1 Testing for ARCH Effects 388
341
14.2.la Time Series and
CHAPTER 16 Qualitative and Limited
Histogram 342
Dependent Variable Models 391
14.2.lb Lagrange Multiplier
16.1 Least Squares Fitted Linear Probability
Test 344
Model 391
14.2.2 Forecasting Volatility 347
16.2 Limited Dependent Variables 393
14.3 Extensions 349
16.2.1 Censored Data 393
14.3.1 The GARCH Model 349
16.2.2 Simulated Data 395
14.3.2 The T-GARCH Model 350
14.3.3 The GARCH-In-Mean Model
352 APPENDIX A Mathematical Tools 402
A. I Mathematical Operations 402
A.1.1 Exponents 408
CHAPTER 15 Panel Data Models 355
A.1.2 Scientific Notation 409
15.1 Pooled Least Squares Estimates of Wage
A.1.3 Logarithm and the Number e
Equation 355
410
15.2 The Fixed Effects Model 357
A.2 Percentages 413
15.2.1 Estimates of Wage Equation
for SmallN 357
15.2.la The Least Squares APPENDIX B Review of Probability
Dummy Variable Concepts 416
Estimator for Small B.1 Binomial Probabilities 416
xi
B.3 Distributions Related to the Normal
426
B.3.1 The Chi-Square Distribution
426
B.3.2 The t-Distribution 428
B.3.3 The F-Distribution 429
unkown 446
C.4.2 Interval Estimation with the
Hip Data 447
C.5 Hypothesis Tests About a Population
Mean 449
C.5.1 An Example 450
C.5.2 The p-value 450
C.5.3 A Template for Hypothesis
Tests 451
C.6 Other Useful Tests 454
C.6.1 Simulating Data 454
C.6.2 Testing a Population Variance
456
C.6.3 Testing Two Population Means
459
C.6.4 Testing Two Population
Variances 461
C.7 Testing Population Normality 463
C.7.1 A Histogram 463
C.7.2 The Jacque-Bera Test 465
Index 467
xii
CHAPTER 1
Introduction to Excel
CHAPTER OUTLINE
1.1 Starting Excel 1.6 Importing Data into Excel
1.2 Entering Data 1.6.1 Resources for Economists on the Internet
1.3 Using Excel for Calculations 1.6.2 Data Files for Principles of Econometrics
1.3.1 Arithmetic Operations 1.6.2a John Wiley & Sons Website
1.3.2 Mathematical Functions 1.6.2b Principles of Econometrics Website
1.4 Editing your Data 1.6.3 Importing ASCII Files
1.5 Saving and Printing your Data
Find the Excel shortcut on your desktop. Double click on it to start Excel (left clicks).
Alternatively, left-click the Start menu at the bottom left comer of your computer screen.
i1/,; Sta rt
... " ' .:,!o., ""
Slide your mouse over All programs, Microsoft Office, and finally Microsoft Office Excel
2007. Left-click on this last one to start Excel-or better yet, if you would like to create a
shortcut, right-click on it; slide your mouse over Send to, and then select (i.e. drag your mouse
over and left-click on) Desktop (create shortcut). An Excel 2007 short-cut is created on your
desktop. If you right-click on your shortcut and select Rename, you can also type in a shorter
name like Excel.
1
2 Chapter 1
Excel opens to a new file, titled Book I. You can find the name of the open file on the very top of
the Excel window, on the Title bar. An Excel file like Bookl contains several sheets. By default,
Excel opens to Sheet I of Book I. You can figure out which sheet is open by looking at the Sheet
tabs found in the lower left comer of your Excel window.
- "
�
$ty/es
1-0 cell reference group of
II c1>mmand.s
v
ll_
11
There are lots of little bits that you will become more familiar with as we go along. The Active
cell is surrounded by a border and is in Column A and Row I; its Cell reference is Al.
Below the title bar is a Tab list. The Home tab is the one Excel opens to. Under each tab you
will find groups of commands. Under the home tab, the first one is the Clipboard group of
commands, named after the tasks it relates to. The wide bar including the tab list and the groups
of commands is referred to as the Ribbon. The content of the Active cell shows up in the
Formula bar (right now, there is nothing in it). Perhaps the most important of all of this is to
locate the Help button on the upper right comer of the Excel window. Finally, you can use the
Scroll bars and the arrows around them to navigate up-down and right-left in your worksheet.
And you have a long way to go: each worksheet in Microsoft Excel 2007 contains 1,048,576
rows and 16,384 columns!!!!
Note that your Ribbon might look slightly different than the one shown above. If your screen is
bigger, Excel will automatically display more of its available options. For example, in the Styles
group of command, instead of the Cell styles button, you might have a colorful display of cell
styles.
Introduction to Excel 3
We will use Excel to analyze data. To enter labels and data into an Excel worksheet move the
cursor to a cell and type. First type X in cell Al. Press the Enter key on your keyboard to get to
cell A2 or navigate by moving the cursor with the mouse, or use the Arrow keys (to move right,
left, up or down). Fill in the rest as shown below:
1
2
3
4
s
What is Excel good for? Its primary usefulness is to carry out repeated calculations. We can add,
subtract, multiply and divide; and we can apply mathematical and statistical functions to the data
in our worksheet. To illustrate, we are going to compute the squares of the numbers we just
entered and then add them up. There are two main ways to perform calculations in Excel. One is
to write formulas using arithmetic operators; the other is to write formulas using mathematical
functions.
Select the Excel Help button in the upper right comer of your screen. In the window of the Excel
Help dialog box that pops up, type arithmetic operators and select Search. In the list of results,
select Calculation operators and precedence.
�Excel He.Ip
R.esults 1-25 �f l'J
- l!ll x (� ... �) �) � � Ai
arithmetic-0perators '_formulas
Standard arithmetic operators are defined as shown below. To close the Excel help dialog box,
select the X button found on its upper right comer.
Negation -1
�
.. (caret) ExponentiaUo-n 3"2
4 Chapter 1
Place your cursor in cell Bl, and type X-squared. In cells B2 through B6 below (henceforth
referred to as B2:B6), we are going to compute the squares of the corresponding values from cells
A2:A6. Let us emphasize that the trick to using Excel efficiently is NOT to re-type values already
stored in the worksheet, but instead to use references of cells where the values are stored. So, to
compute the square of 1, which is the value stored in cell Al, instead of using the formula =l*l,
you should use the formula =A2*A2 or =A2"2. Place your cursor in cell B2 and type the formula.
SUM
.. ( x "" f;o I =A2"2
A I B j c I D I
1 )(
2 1] ill •
Then press Enter. Note that: (1) a formula always starts with an equal sign; this is how Excel
recognizes it is a formula, and (2) formulas are not case sensitive, so you could also have typed
=a2"2 instead. Now, we want to copy this formula to cells B3:B6. To do that, place your cursor
back into cell B2, and move it to the south-east comer of the cell, until the fat cross turns into a
skinny one, as shown below:
A I B � c
1 x X-s91.1•nea
2.
11 11
,_ f
3 .2
Left-click, hold it, drag it down to the next four cells below, and release!
Excel has copied the formula you typed in cell B2 into the cells below. The way Excel
understands the instructions you gave in cell B2 is "square the value found at the address A2".
Now, it is important to understand how Excel interprets "address A2". To Excel "address A2"
means "from where you are at, go left by one cell"-because this is where A2 is located vis-a-vis
B2. In other words, an address gives directions: left-right, up-down, and distances: number of
cells away-all in reference to the cell where the formula is entered. So, when we copied the
formula we entered in cell B2, which instructed Excel to collect the value stored one-cell away
from its left, and then square it-those exact same instructions were given in cells B3:B6. If you
place your cursor back into B3, and look at the Formula bar, you can see that, in this cell, these
same instructions translate into "=A3"2".
There are a large number of mathematical functions. Again, the list of functions available in
Excel can be found by calling upon our good friend Help button and type Mathematical
functions. If you try it, you will be able to see that the list is long. We will not copy it here.
Introduction to Excel 5
We did compute the squares of the numbers we had. Now we will add them up-the numbers,
and the squares of the numbers, separately. For that, we will be using the SUM function.
We first need to select or highlight all the numbers from our table. There are several ways to
highlight cells. For this small area the easiest way is to place your cursor in A2, hold down the
left mouse button and drag it across the area you wish to highlight-i.e. all the way to cell B6.
Here is how your worksheet should look like:
A B I
1 x X-sauared
2 1 1
a 2 4
4 3 '9
5 4 16
6 5 025 •
Next, go to the Editing group of command, which is found in the extreme right of the Home tab,
and select :r. AutoSum.
i%Aut�� �
!ii f!IC:!:"- Z1f'
Sort & Find &
Cl;ear •
Hitt r • Selt:d •
Editing
Excel sums the numbers from each column and places the sum in the bottom cell of each column.
The result is:
-
.A El I
1 x X-squared
2 1 1
3 2 4
4 3 9
5 4 16
5 5 2.5
7 15 55
..
•
Notice that if you select the arrow found to the right of :r. AutoSum you can find a list of
additional calculations that Excel can automatically perform for you.
Alternatively, you could have placed your cursor in cell A7, typed =SUM(A2:A6), and pressed
the Enter key (and then copied this formula to cell B7).
A I B
7 l=SUM(A2:::" 6)
Note that: (1) as soon as you type the first letter of your function, a list of all the other available
functions that start with the same letter pops up. This can be very useful: if you left click on any
of them, Excel gives you its definition; if you double left-click on any of them, it automatically
finishes typing the function name for you, and (2) once the function name and the opening
parenthesis are typed, Excel reminds you of what the needed Arguments are, i.e. what else you
need to specify in your function to use it properly.
6 Chapter 1
Now, you could also have used the Insert function button, which you can find on the left side of
the Formula bar .
Once your cursor is placed in A 7, select the Insert function button. An Insert function dialog
box pops up. You can Select a function you need (highlight it, and select OK), or Search for a
function first (follow the instructions given in that window).
- --- -- - __
Select a funttiC!JQ_:
"I
In the Function Arguments dialog box that pops up, you need to specify the cell references of
the values you want to add. If they are not already properly specified, you can type A2:A6 in the
Number 1 window, or place your cursor in the window, delete whatever is in it, and then select
A2:A6. Select OK. Now that you have the formula in A7, copy it into B7 .
. -
Number1 jA2::A6
Before wrapping-up, you want to polish the presentation of your data. It actually has less to do
with appearance than with organization and communication. You want to make sure that anyone
can easily make sense of your table (like your instructor for example, or yourself for that
matter-when you come back to it after you let it sit for a while).
We are going to add labels and color/shade to our table. Hold your cursor over cell A until it turns
into an arrow-down; left-click to select the whole column; and select Insert in the Cells group of
commands, found left to the Editing group of commands.
JS.:i.-
·n � l g iH
1 x
2 l 2. 1 [ns_ert De�.e1e li'o�at
�
3 z 2
_3
4 3 4 3 C:�ll•
Excel adds a new column to the left of the one you selected. That's where we are going to write
our labels. In the new Al cell, type Variables; in cell A2, type Values; in cell A7 type Sum .
Introduction to Excel 7
A B A
1 x 1 v.a�iables
-
2 1 -
2 Values
-
3 -
2 3
4 3 -
4
5 4 5
5 5 5
L 15 7 Sum
Select column A again, make it Bold (Font group of commands, right to the Clipboard one), and
align it Left (Alignment group of commands, right to the Font one).
caribri �l I A � •
[= = =lJ�· / � wrapT�xt
�I Ir T1[03 Tl[&� ,A �/ Ii([§ �J I �� ��l
-
Font fii ·
Al1gnme-nt
Select cells Bl and Cl, and make them Bold. Repeat with cells B7 and C7. Better, but not there
yet. Select row 7, make it Italic (next to Bold). Select column B, hold your left-click and drag
your mouse over cell C to select column C too; select Center alignment (next to Left). Next,
select A2:A6; left-click the arrow next to Merge & Center (on the Alignment group of
commands), and select Merge cells.
Immediately after, select Middle Align, which is found right above the Center alignment button.
AllJJnm�nt
Select Al:C7, left-click the arrow next to the Bottom Border button and select All Borders.
61),r.ilers
BJ llQtl.Om Bo·rder
Select A7:C7 (A7:C7, not Al:C7 this time), left-click the arrow next to the Fill Color button,
and select a grey color to fill in the cell with. Choose a different color for Al:Cl.
8 Chapter 1
Theme Colms
[caJilbri T 111
rA ATJ
T
le I JI ·j I � �1 �. A 1 �
Fant � Ii
Finally, put your cursor between cells C and D until it turns to a left and right arrow as shown
here:
C + D
Hold it there and double left-click so that the width of column C gets resized to better
accommodate the length of the label "X-squared". The result is:
A B c
rtl. -
- 1-- --
1 variables x X-squared
1 1
�"''"�
2 4
3 9
4 16
5 25
7 fsum 15 55
Next, drag your cursor over the Sheetl tab, right-click, select Rename and type in a descriptive
name for your worksheet like Excel for POE 1.2-1.4, for Using Excel for Principles of
Econometrics, 4e-sections 1.2 through 1.4. Press the Enter key on your keyboard or left-click
anywhere on your worksheet.
All you need to do now is to save your Excel file. Select the Save button on the upper left comer
of the Excel window.
A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrow-down located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.
Introduction to Excel 9
In the File name window, at the bottom of the Save As dialog box, the generic name Bookl
should be outlined. Type the descriptive name you would like to give to your Excel file, like POE
Chapter 1. Finally, select Save.
If you need to create a new folder, use the Create New Folder button found to the right of the
Save in window.
A New Folder dialog box pops up; it is prompting you for the name you want to give to your new
folder, Excel for POE for example. Type it in the Name window and select OK. Finally, select
Save.
� ���folder
f::!ame: jExcel for POE
- = �CgJ
c
If you would like to print your table, select the Office Button, next to the Save button; go to
Print, and select one of the print options.
f:rint
Se•lect.a p�inter, nrumb�r of rnpies,·and
oth .. r pri111tin.g optiorn< before prri·ntfng.
Qukl<Print
s�nd th• woukbo.olcdi'r�ctly ti© tm.e default
printer with.a"! makin9 changes,
Eri nt �· •
For more print options, you might want to check out the Page Layout tab, on the upper left of
your screen, as well as the Page Layout button on the bottom right of your screen.
To close your file, select the X button on the upper right comer of your screen.
- �Ix!
,�, - !'- . � 1-'
�
10 Chapter 1
In the next section, we show you how to import data into an Excel spreadsheet. Getting data for
economic research is much easier today than it was years ago. Before the Internet, hours would be
spent in libraries, looking for and copying data by hand. Now we have access to rich data sources
which are a few clicks away.
First we will illustrate how convenient sites that make data available in Excel format can be. Then
we illustrate how to import ASCII or, text files, into Excel.
Suppose you are interested in analyzing the GDP of the United States. The website Resources for
Economists contains a wide variety of data, and in particular the macro data we seek. Websites
are continually updated and improved. We guide you through an example, but be prepared for
differences from what we show here.
ISSN 1081-·4248.
vol. 1J., No. s
RFE Seaoch May, 2010
• Int m d u ctio n
• D ta
• - "onarii=:s; G l o=a rles & Enc do edias
• E omi>ts. Dep.artments, & UniY c r s itii:-.s.
• Fore casti ng & Con:.ulting
• Jobs. Grants. Grad School. & Advice
Select the Data link and then select U.S. Macro and Regional Data.
Introduction to Excel 11
.Data
This will open up a range of sub-data categories. For the example discussed here, select the
Bureau of Economic Analysis (BEA).
dmw
Latest Information:
Federal Recovery Programs amd BEA Slatislics
Cl.:lrr.ent Re-leases
Dig ital tib·r,,.ry t Satellite Accnunt Survev Forms .aPld Related Materials
11 Rssie•arch arid De,u-elopment.
l'apers. and Working l'all"'rs
View all lnte.rnati·onal Accounts Information •••
Metho-dology P"f>""' • View all N1ational Actounts Infarm.:atio1T1 ••.
The result shows the point we are making. Many government and other web sites make data
available in Excel format. Select Current-dollar and "real" GDP.
� Selected Nll?A Tallies: Vie•"' tne ch.ange..s to the layout for the advancoo
download P"ae-
You have the option of saving the resulting Excel file to your computer or storage device, or
opening it right away-which we proceed to do next.
What opens is a workbook with headers explaining the variables it contained. We see that there is
a series of annual data and a quarterly series.
Introduction to Excel 13
,., A � B I c J _Q___j__ E I F I G I
1 JCurrent-Dollar and "RealA Gr·OSS Domestic Product
2
Quart�Jy
-�
-
3 Annual
-
7
8
-
-
9, 1929 103.6 977.0 '1.�47q1 23'7.2 1/�2·.2
10 19'30 '91.2 s92
1 .a 1947q2 240.4 1, 7169.5
·11 1931 76-5 1!34_9 19471q] 244_5 1,7@.0
12
-
1932 SS.:7 725_S. 1_19471q4 254_3 1,7'94,,B
'13 1933 56.4 716.4 1.948i;j1 2-60.3 1,823'.4
The opened file is "Read Only" so you must save it under another name to work with it, graph,
run regressions and so on.
The book Principles of Econometrics, 4e, uses many examples with data. These data files have
been saved as workbooks and are available for you to download to your computer. There are
about 150 such files. The data files and other supplementary materials can be downloaded from
two web locations: the publisher website or the book website maintained by the authors.
Using your web browser, enter the address www.wiley.com/college/hill. Find, among the authors
named "Hill", the book Principles ofEconometrics, 4e.
t*- TEXTBOOK
P1rfm:::i.p,1'es of 6c:Ooonu�trics., 4ttll EdJ1Jirn111
R Carter H ill CLouislan.a State Uni.versity), William E. Griffiths
Univers.ity Ctf'Melbourne·, Australia), Gua: C. Um (University of
Melb·ourne ustra.l ia)
January 2011, ©2012
Follow the link to Resources for Students, and then Student Companion Site. There, you will
find links to supplement materials, including a link to Data Files that will allow you to download
all the data definition files and data files at once.
14 Chapter I
The address for the book website is www.principlesofeconometrics.com. There, you will find
links to the Data definitions files, Excel spreadsheets, as well as an Errata list. You can download
the data definition files and the Excel files all at once or select individual files. The data definition
files contain variable names, variable definitions, and summary statistics. The Excel spreadsheets
contain data only; those files were created using Excel 2003.
lnstriuctor Resourrce s from John Wiley & Sons Data files, PowefPoirit Slides, Tustructo:r's.Mairnal
Student, Resources. frnm John Wiley & Sons Datafiles. .and Using Excelfor Principk� oiEconometri.c
Data files: POE includes 148 data files in various formats_ Usiri,g the links 'below you can download all files in a ".ZIP format,
or d01.Vn'load i'ndhiidual fi'le·s_ The data dennifio.n fil·es should he downloaded by all users_
Data d'e-finitfon files (•_def) are text file·s conta:ining variable- ·n ames., definitions .and summary statistics_
ASCII riles (•.dat) are text files contai.nin·g only data. Variable .names are in �.def files.
ASCII data files (* .dat) are text files containing only data.
Dnwriload all ilie * .. dat files in (a) ZIP format m· (b) a s.e1 f- exib'adin!? EXE file (download and double-dick)
Right-click on the file name. Select Save Target As. A Save As dialog box pops up. Locate the
folder you want to save your file in by using the arrow-down located at the extreme right of the
Save in window or browsing through the list of folders displayed below it. Finally, select Save.
Once the download of the file 1s completed, a Download complete window pops up. Choose
Close.
Do "'nload Complete
Start Excel. Select the Office Button on the upper left comer of the Excel window, then Open.
16 Chapter 1
Navigate to the location of the data file. Make sure you have selected All Files in the Files of
Type window. Select you food.dat file and then select Open .
. --
Open
If this.i;-,·mrrect, ·choose Next) or ch�ose the data type that best describes your �ata.
_
Original data type
/
S:hoose the file type that best describe� your data:
-
0 Q_�limited - Characters such as commas or tabs separate each fi.eld:
®fh��·�··_cii�.\F1 - Fields are aligned in colum.ns with spaces between each field. ·
Preview-of-File C:\data\econ4630\food-.dat .
l . 115.ZZ 3.69
z 135. 98 4 .. 39
3 119-. 31 4. 75
Pr.e'View of Data file
4 ll4. 9oS 6_0_3,
5 lB_'I_ 05 12: 47
__
In the next step the data are previewed. By clicking on the vertical black line you could adjust the
column width, but there is no need most of the time. For neatly arrayed data like ours, Excel can
determine where the columns end and begin. Select Next again.
Introduction to Excel 17
;
r �
Data .._reVie'l'll
30 -40 SU 60 7tl
1Hi_2:! .3 _ 6 9 �I
135._:<l·S 4-39 -1
-
11:9.34 4.7Ei I
11'4. S•o& 6. 03
1.87. 05 12.47
�I
Cam:el
l [ <�ck
l �-:· ··.· -��:it_>·_ ··� [ EJnish
]
In the third and final step Excel permits you to format each column, or in fact to skip a column. In
our case you can simply select Finish.
r - ------ -�.
@ §erJeral
"General' cooverTii rn.1meric 11aliles ill numliers, d<1te v<11ues. ID d11tEs, and all
Ore·xt r·emair;iing values. to :text.
O Q.ate.1 j'-1"1-'o--v_
__ _,,,v,,J,,, [ !!_dvanced . . .. ]
0 1Do. mit [mpcrt column (skip)
.... ,
'L3'9
=I
4_7�
fi_ 03
·
12.47 vj
�I
This step concludes the process and now the data is in a worksheet named food.
18 Chapter 1
II A I B I
1 115.22 3.69
2_ B5.98 4.39
-
3 119.�4 4.75
4 114.:96 6.03
-
-
5 187.05 12.47
1 .. � � �1 I food I<" � .•
Rl"aily
Next, you need to save your food data in an Excel File format. To do that, select the Office
Button, Save As, and finally Excel Workbook.
::
�oeel W«kboolt
Save the ffle as an El( (el Workboafc ts
· Enel M.acrn-Eniib!ed Wadl:bcmk.
• Savoe the workbook lrt !he-XML-ba5oed andi
macr.a-e·nabred me farm.at.
A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrow-down located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.
Excel has automatically given a File name, food.xlsx, and specify the file format in the Save as
type window, Excel Workbook (*.xlsx). All you need to do is select Save.
This completes our introductory Chapter. The rest of this manual is designed to supplement your
readings of Principles ofEconometrics, 4e. We will walk you through the analysis of examples
found in the text, using Excel 2007. We would like to be able to replicate most of the plots of data
and tables of results found in your text.
CHAPTER 2
CHAPTER OUTLINE
2.1 Plotting the Food Expenditure Data 2.4.2 Random Number Generation
2.1.1 Using Chart Tools 2.4.3 The LINEST Function
2.1.2 Editing the Graph 2.4.4 Repeated Sampling
2.1.2a Editing the Vertical Axis 2.5 Variance and Covariance of b1 and b2
2.1.2b Axis Titles 2.6 Nonlinear Relationships
2.1.2c Gridlines and Markers 2.6.1 A Quadratic Model
2.1.2d Moving the Chart 2.6.1a Estimating the Model
2.2 Estimating a Simple Regression 2.6.1b Scatter Plot of Data with Fitted
2.2.1 Using Least Squares Estimators' Formulas Quadratic Relationship
2.2.2 Using Excel Regression Analysis Routine 2.6.2 A Log-Linear Model
2.3 Plotting a Simple Regression 2.6.2a Histograms of PRICE and
2.3.1 Using Two Points ln(PR/CE)
2.3.2 Using Excel Built-in Feature 2.6.2b Estimating the Model
2.3.3 Using a Regression Option 2.6.2c Scatter Plot of Data with Fitted
2.3.4 Editing the Chart Log-Linear Relationship
2.4 Expected Values of b1 and b2 2.7 Regression with Indicator Variables
2.4.1 Model Assumptions 2.7.1 Histograms of House Prices
2.7.2 Estimating the Model
In this chapter we estimate a simple linear regression model of weekly food expenditure. We also
illustrate the concept of unbiased estimation. In the first section, we start by plotting the food
expenditure data.
Compare the values you have in your worksheet to the ones found in Table 2.1, p. 49 of
Principles of Econometrics, 4e. The second part of Table 2.1 shows summary statistics. You can
19
20 Chapter 2
compute and check on those by using Excel mathematical functions introduced in Chapter 1, if
you would like.
Select the Insert tab located next to the Home tab. Select A2:B41. In the Charts groups of
commands select Scatter, and then Scatter with only Markers.
40·
35
•
30
•
25 -
20
•.series1
15
• •
10
Each point on this Scatter chart illustrates one household for which we have recorded a pair of
values: weekly food expenditure and weekly income. This is very important. We chose Scatter
chart because we wanted to keep track of those pairs of values. For example, the point
highlighted below illustrates the pair of values (187.05, 12.47) found in row 6 of your table.
.... - ..
-:·
40
'
�5
• I
6:0
... ..
... - .... :
25
•
.... .. --
••• �
2:0 ......
#"• ,. •• • •seriesl
'15
. ..... .
.... - '
.I'\.
10
_"t I
Serier 1 Point "187 . 1>5000·3 "1
[1!87.050003, 12.47] I
0 I
When we select two columns of values to plot on a Scatter chart, Excel, by default, represents
values from the first column on the horizontal axis and values from the second column on the
vertical axis. So, in this case, the expenditure values are illustrated on the horizontal axis and
income values on the vertical axis. Indeed, you can see that the scale of the values on the
The Simple Linear Regression Model 21
horizontal axis corresponds to the one of the food expenditure values in column A, and the scale
of the values on the vertical axis corresponds to the one of the income values in column B.
We actually would like to illustrate the food expenditure values on the vertical axis and the
income values on the horizontal axis-opposite of what it is now. By convention, across
disciplines, the variable we monitor the level of (the dependent variable) is illustrated on the
vertical axis (Y-variable ). And by convention, across disciplines, the variable that we think might
explain the level of the dependent variable is illustrated on the horizontal axis (X-variable).
In our case, we think that the variation of levels of income across households might explain the
variation of levels of food expenditure across those same households. That is why we would like
to illustrate the food expenditure values on the vertical axis and the income values on the
horizontal axis.
X= Income
If you look up on your screen, to the right end of your tab list, you should notice that Chart Tools
are now displayed, adding the Design, Layout, and Format tabs to the list. The Design tab is
open. (If, at any time, the Chart Tools and its tabs seem to disappear, all you need to do is to put
your cursor anywhere in your Chart area, left-click, and they will be made available again.)
Microsoft Excel �i Ch
� a rt-
Ta_
· ·a_
�
� ���- 1
-
Vlew Add-ms Auobat DeiTgin [;iyo.ut Format
Chart SlylH
Go to the Data group of commands, to the left, and select the Select Data button.
Swit�n Select
Row/CO·IUrtll!l Datot'(
D.ata �
22 Chapter 2
'
Select Datil Source 11]�
Cbart Qata range: llf@ll!·MRll
rr==1 [ � S�itch,RowfColumn ]�
Le!jel'ld Entries �er,ies) Horizontal (§_ateljory) Axis Labels
���=>'!"'=='=�=rr ����----:---.
[ '§l Md )I CT? E:irut J[ X ;B;emove JI 'It I ' :r/�,
°()
Seriesl 115.220001.
l:J.5.979996
119 .. 339996
114.959999
187 .. 050003
In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select B2:B41. Highlight and delete the text from the Series Y
values window. Select A2:A41. Select OK.
-- - -
c__
________ _�
[i]
_, s..., Range m ett'lang�
.Series �values: Series� \lalrues::
�-------�
ifimiiim
m1iiq,iio1:ii
1.••'41:rl
l!ii .11rli 111,-----ji]
-- ri
ii �a. = iu. 22000 i, i3... I�=_Sh _ e_e t_1!_$8_$_2: $8_ _$4_i
___ �[iJ � 3 .. 69, 41.39, 4....
'-------------�
--'
OK iJ I Canrn ] OK t)l 1 Cancel l
The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
income are the X-values, and food expenditure are the Y-values-not the other way around.
600
500
400
•
+
300
+ •seriesl
•• •
200
100
() 2() 30 40
The Simple Linear Regression Model 23
Now, we would like to do some editing. We do not need a Legend, since we have only one data
series. Our expenditure values do not go over 600, so we can restrict our vertical axis scale to
that. We definitely would like to label our axes. We might want to get rid of our Gridlines, and
change the Format of our data series. Finally, we would like to move our chart to a new
worksheet.
Select the Layout tab. On the Labels group of commands, select Legend and None to delete the
legend.
��ila�T�olt
[;J l"i:l � lib] lil 1
11
Chart Axi·s Ltgen<11 Data Data Non<'
�Label�
De� ta.yo;!) Fermat
InleT nt1e1. �
Labers
T 1able.
Select the Axes button on the Axes group of commands. Go to Primary Vertical Axis, and select
More Primary Vertical Axis Options.
Show Axis fn !lBllons
Display �.xls with numbers
'e�resente:d in Billions
A Format Axis dialog box pops up. Change the Maximum value illustrated on the axis from
Auto to Fixed, and speci fy 600.
Next select Alignment, and use the arrow-down in the Text direction window to select Rotate
all text 270°.
I I
.ABC Horizontal
!\lumber
Line Color I
Alignment
line St>jle
Shadow
Te�tlay,,ut
I
\l_erbcal �lignment: Middle Cente.,, I v
• Rotate all text 210°
�
1..
J-0 f()rmat
Alignment�
Teir! direction: IHorizonral
C!!_•tom
. "r;ge:
I
"-'J rn c:
Stacked
4T .I i,,�
24 Chapter 2
Place your cursor on the upper blue border of your Format Axis dialog box.
Left-click, hold it, and drag the box over so you can see your chart; release. Look at the vertical
axis of your chart.
The numbers are now displayed vertically instead of horizontally, but less of them are displayed
as well:
00
00
a
a
v
00
00 0
a
"'
00
Select Axis Options again. Change Major unit from Auto to Fixed, and specify 100. Select
Close.
Back to the Labels group of commands; select Axis Titles, go to Primary Horizontal Axis
Title, and select Title Below Axis.
N�me
Do not cd'i1pll!y�nAl<i< Title
Ol·art Axir
Titlies t&
Legenlli Dat.a Datil I� Prirnt•ny !fori>:o°'tal !bi< TlUe �· Trtle Selow Axis
TrtlP · ta.be!'� · Table· Disp!ay Tiflf' belOJ•W Ho ri;zontal t.xis md f°".
�
· label�
� Prin:ui:yyentlcal Axil Title � re<Lze cha·rt
The Simple Linear Regression Model 25
Select the generic Axis Title in the bottom of your chart and type in x =weekly income in $100.
cr.:: ------------
... x= ;t?
weekly income in S10�J
[!J-- ------------�
Go back to Axis Titles, then to Primary Vertical Axis Title this time. Select Rotated Title.
None
Do nett dl!1Play a.n Aili� Trtle
Chart Axisc Legend Data Dm Primary Horizontal Axis m1e � Rotated' rrtie
Tiitle
� 1iit1E§ N -
Labels� Ta.hie
P1im;:11y Ye rtical �j5. Tltrt
[}i;sp. �a.y Rc.tt iitedl 11.Jcf,5 liitfe and' mile �
"'S labels clnart
Select the generic Axis Title on the left of your chart and press Delete, or put your cursor on top
of the Axis Title box, left-click, and press the Backspace key to delete the generic Axis Title.
Type in y =weekly food expenditure in $.
1:1
·-1
�I
�I
.,, I
=1
i1I
al
.,, I
1111
:i,1
I ...
1
I 111 I
I 1111
I :: I
I .,
o}"j
Back to the Axes group of commands now. Select Gridlines. Go to Primary Horizontal
Gridlines, and select None.
�I
� -
Axes Grldttnes !iii l?fim a ry .t!o rilzontal Gr�d Ii roes �- � M.aj'or Gr[dlirie5
i\xe5
�� "lilJ l P1imary :\[errtic.al GrldITne;; "\ Dhplay . Hmizontaf G.� icllun es for Major units
Change the Current Selection (group of commands to the far left) to Series 1 (use the arrow
down button to the right of the window to make that selection). Select Format Selection.
Fs ] _j.· . �rRf'Sl w]
� E=ornna.t Selection � l<q,, i'ormat Sell'ction�
� Rfid to M'atcll 'Styl� tij Reset to Matcll S:tyl·�
CurrentSeli:-ction Currenl Selection.
26 Chapter 2
A Format Data Series dialog box pops up. Select Marker Options. Change the Marker Type
from Automatic to Built-in. Change the Type and the Size as shown below:
Marker Type,
0 �bltoma1ic
0 NQne
@ Buili:4n
Type:�
Si2e: a
Next, select Marker Fill. Change it from Automatic to Solid fill. Color options pop up. Change
the Color to black. Select Marker Line Color, and change it from Automatic to No line. Select
Close.
@ ;i.ondfill
-
Marker Fill
·••!'�" il.'11:1 .. •�;�] 0 !?r,.dientfill
Marker Line Color
0 tlofill 0 !:'.icture or te�ture fill Markerfll
� rn
���::;:�tfi
line
Series Options 0 Al,!toma1fc Line Color
N
�olid line
Marker OptiOMS ll D Y:ary colors by poin.t
line Style
��I
0 f:ic.lure or texiure fill 0 i;;radient line
Marlcer Fill
·� @ Ab!toma1ic
�r..lor.:
�- � Markerli"1e Color� ®
-
Ay_toma1fc
- I'
1 11 Close
The result is a replica of Figure 2.6 p. 50 in Principles of Econometrics, 4e: (if it looks like some
of your dots are little flowers, left-click your cursor anywhere on your screen first)
.... .. ,
D I
D .
�
-
.!ii 0
D
I!! VI .
" .
:t: ..
p
.,, D �
c
... . . . •
8. . .
.
>< D . . . .
llJ 0 . .
.,, m . . . . .
0 . . ..
.g D . . .
::.. 0
. . .
::;;: "' . .
II .
I
"' D . . . .
ii: 0
.....
II
::..
0
0 5 10 15 20 25 30 35 40
:
x� w�eldv inoome in $100
....
I
2.1.2d Moving the Chart
Go back to the Design tab. (Remember if you don't see your Chart Tools tabs, what you need to
do is place your cursor in your chart area and left-click). Select the Move Chart button on the
Location group of commands to the far right of your screen.
Ch.a.rt
li>esngn
T110!5
�: Layout Format
Move
Cha
��rt �<
loGJhcn;
I
The Simple Linear Regression Model 27
A Move Chart dialog box pops up. Select New sheet and give it a name like Figure 2.6. Select
OK.
Rename Sheet 1 Data (if needed, see Section 1.4 of this manual on how to do that).
We have plotted our data, and edited our chart. Next, we want to estimate the regression line that
best fit the data, and add this line to the chart.
In this section, we are going to use two different methods to obtain the least squares estimates of
the intercept and slope parameters {31 and {32. Method 1 consists of plugging in values into the
b1 and b2 least squares estimators' formulas. Method 2 consists of making use of Excel built-in
regression analysis routine.
(2.2)
These formulas are telling us two things: (1) which values we need, and (2) how we need to
combine them to compute b1 and b2.
We need the (xi, Yi) pairs of values-they do appear explicitly in equation (2.1). We also need x
and y, which are the sample means, or simple arithmetic averages of the xi values and Yi
values-those averages appear both in equation (2.1) and equation (2.2). Note that the subscript i
in xi and Yi keeps count of the x and y values. In other words, i denotes the ith value or ith pair
of values. Also, x and y, are referred to as "x-bar" and "y-bar".
28 Chapter 2
The numerator is the sum of products; L is the Greek capital letter "sigma" which denotes sum.
The first term of each product is the deviation of an x value from its mean (xi x). The second
-
term of each product is the deviation of the corresponding y value from its mean (yi y). The -
products are computed for each (xi,yJ pair of values before they are added together.
The denominator is the sum of the squared deviations from the mean, for the x values only. In
other words, each x value deviation from its mean is first squared, and then all those squared
deviations values are summed.
This equation tells us to multiply b2 by x, and then subtract this product from y. Note that b2
must be computed first-before b1 can be computed.
There is actually no magic to this. We use the food expenditure and income values we have
collected from our random sample of 40 households, and perform simple arithmetic operations to
compute the estimates the intercept and slope coefficient of our regression line.
As for the computation of b1 and b2 itself, there is only one trick. We need to make sure we
know which values are the x 's and which ones are the y' s. So, we are going to start by adding
labels to our columns of data.
You should be in your Data worksheet. If not, you can go back to it by selecting its tab on the
bottom of your screen.
Select row 2 and insert a new row (see Section 1.4 of this manual if you need help on that). In the
new cell A2, type y; and in the new cell B2, type x. Right-align Al :B2.
I A I B
j' jfood_exp income
_I_J 'J x
Next, we need to lay out the frame of the table where we are going to store our intermediate and
final computations. Type x_bar=in cell D2, y_bar=in cell D3, b2 =in cell D6, and bl=in cell
D7. In cell G2:J2, type x_deviation, y_deviation, (x_dev)(y_dev), and (x_deviation)2,
respectively. (Note that you can use your Tab key, instead of moving your cursor or using the
Arrow key, to move to the next cell to your right).
The Simple Linear Regression Model 29
D E 'F G H I J K
·
2 x_bar= J:<�delliatiory_delliatior (x_dev)(y !ex deviation
_ )2
J. y_bar-=
4
5.
& b2 =
7 b1 =
Below x_deviation we are going to compute and store the deviations of the x values from their
mean. Below y_deviation, we are going to compute and store the deviations of they values from
their mean. Below (x_dev)(y_dev), we are going to compute and store the products of the x
deviation and they deviation for each pair of values. Finally, below (x_deviation)2 we are going
to compute and store the x deviations squared.
To show the 2 of (x_deviation)2 as a square, place your cursor in J2, if it is not already in it.
Move to the Formula bar to select the 2, and select the arrow to the right comer of the Font
group of commands.
A Format cells dialog box pops up. Select Superscript and then OK.
�_nt_; _________, F �� nt _s cy
r � le_: __ �iz _e:_____,
r � r
Arial Regular 10
Underline : C.ol on
,,_N-on -e -------.,.�1 1 Automatic v I D 't!i.ormal font
.Effects.
I g��::�ut
Osul;i_saipt
This is a TrueType funt. The same fonh'lliTI be used on both y0ur printer.and your
ween.
OK� [ Cancel
In cells D6 and D7 proceed to format the 2 and 1 of b2 and b1 as Subscripts instead. Bold all
the labels you just typed, and Align Right the ones from G2:J2. Finally, resize the width of
columns G:J to accommodate the width of its labels (see Section 1.4 of this manual if you need
help on that).
30 Chapter 2
l'1P'I D j E I F I G I H I I I J
2 )( bar=
- -
!<_:deviation ·y_devia1io11 (�_lfev'}()'�dev) 1(x�d'evi11tionf I
3 y_bar=
4
__§_
-
6 bl=
7 b1 = l " I
We have computed averages before. The formula you should have in cell E2 is
=AVERAGE(B3:B42), and the one in cell E3 is = AVERAGE(A3:A42). Compare the averages
you get to the sample means of Table 2.1 in Principles of Econometrics, 4e (p. 49); they should
be the same.
D I E I F I G I H I I I J
-1:_ x bar= 19_60475 1t _devfatfon l..Y. de
' viation lx dev)(y_d!ev) (1<_ deviati'onf-
_
-� y_bar= 283.5735
-
4
_j_
6 b:z=
-
7 b1 =
Next, we want to compute the deviations. Think about what you are trying to compute. And then
type the needed formulas in G3:J3.
You should type =B3 - E2 in cell G3, =A3 - E3 in cell H3, =G3*H3 in cell 13, and G23A2 in
cell J3. Here are the values you should get:
D I E I F I G I H I I I J I
2 x-bar= 19.60'475 x_deviation y_d'.eviation (x_�ev}{y_d:ey] (x_dE:Jviaticrnf
,__
J y_bar= :283.5.735-- -15_9 1 4 7 501 -16.8_353498 2679. 303845 253_2792692
,_
4
>--
2-
6 b2=
I-
7 b-1= I
Now, in cells G3 and H3, we gave cell references E2 and E3, where the averages are stored. Note
that we will need to use those averages again, and get those averages from these same exact
locations, to compute the deviations of the next 39 observations.
So, what we actually need to do is to transform these Relative cell references (E2 and E3) into
Absolute cell references ($E$2 and $E$3). This will allow us to copy the formula from G3:H3
down below without losing track of the fact that the values for the averages are stored in cells E2
and E3.
A Relative cell reference is made into an Absolute cell reference by preceding both the row and
column references by a dollar sign. Place your cursor back in cell G3 (i.e. move your mouse over
and left-click); in the Formula bar, place your cursor before the E and insert a dollar sign (press
the Shift-key and the $ key at the same time); move your cursor before the 2 and insert another
dollar sign; place your cursor at the end of the formula and press Enter.
Go to cellH3, and add the needed dollar signs there too. Now, you can select G3:J3. Select
Copy on the Clipboard group of command. Select G4:J42, and select Paste (next to Copy). You
have just copied the formulas to compute the needed deviations for the rest of the (xi, Yi) pairs.
-
D I E I F G H I J 1
2
I--
x-bar= 1 9 60475
_
:C�d!.Y!a�t!C?:'l J�d!:'!l'!;t!�n.. J����Y1!�U!�'!.t Lx�d_e_v11!.'l�'!t.
y_bar 283.5735 : 15 9147501 �68 353498 2679-30�845 253.2792,692'
�
4
= - ,_
-15-214!501
- _
t
-14.8547501 _ 3
6 b'2= - 13 51475 01
_ -168_6135 221!8_886121 184.27363891
7 b1 = 7 13475005 -96.52349£3 681!.<6710199 50 .9'Q4,65 828-
-
- _
Place your cursor in cell E6, and again think about what you need to compute b2. Recall that the
least squares estimators are:
= L(Xi - .i)(yi - y)
b2 2 (2.1)
L(xi - x)
(2.2)
If you refer back to equation (2.1), you can see that =SUM(I3:142)/SUM(J3:J42) is the formula
you need in cell E6. The one you need in cell E7 is =E3 - E6*E2 for equation (2.2).
- - - - - - - - -
A I B I c I D I E I F I G H I I j
2 y x- x bm= 19-60475 x_deviation y_deviation lx_dev')(y_d�ev) 1(x_deviatio·nf
3 115.22 3!69 y_bar = 283.5735 -15,.9N7501 -1-68.3 53498 2679.303845 253.279269'2
4 135.98 4.39 -151-214 7501 -147 5935 03
_ 2245_5 98251 231-48861911
5
�
119.34 4.75·
--
-14.8.547501 -1'64.233503 243-9.64 7641 220•.-66�599
6- 114.96 6.031 �= 10.2096:4 ·-13.5747501 -168_6135 221! 8.8 86121 184.273838 9
7 187.05 12-47 ht= 83_41501 7 13475005 -9 6_ 5234%3 688:6710199 50 90465828
-
- _ _
In the table above we obtain the same exact least squares estimates as those reported on p. 53 of
Principles of Econometrics, 4e.
That was Method 1 of obtaining the least squares estimates of the intercept and slope parameters
/Ji and {32. For Method 2, we are going to use the Excel built-in regression analysis routine.
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
32 Chapter 2
If the Data Analysis tool does not appear on the ribbon, you need to load it first.
Select the Office Button in the upper left comer of your screen, Excel Options on the bottom of
the Office Button tasks panel, Add-Ins in the Excel Options dialog box, Excel Add-ins in the
Manage window at the bottom of the Excel Options dialog box, and then Go.
. ------
! Excel Options
Popular
Fcrmurlas
Proofin.!1
'iave
Advanced
In the Add-Ins dialog box, check the box in front of Analysis ToolPak. Select OK.
!!dd-Ins available.:
1(8! .. ·-iiirlj
0 ·•· mmiiij \
,___K
D _ .I'...
-=---<"'P
I
O AnalysisTo dlPak - VB A
Now Data Analysis should be available on the Analysis group of commands. Select it.
A Data Analysis dialog box pops up. In it, select Regression (you might need to use the scroll up
and down bar to the right of the Analysis Tools window to find it), then select OK.
-
, Data An alysi s [1.JL.8]
�rna'lysis Tools
'HistIJgram
Movil]g Average
Random Number Gener.ation
Rank arnl Percentile
tfelP'
Re ESSIDn
Sampling
t-Test: Paired Two Sample filr Means
t-Test: Two·Sample Assuming Equal Variances
t-Test: Two-Sample Assuming Une:qual Variances
z-Ted:Two Sam�e for Means
The Regression dialog box that pops up next is very similar to the Edit Series box we
encountered before (see Section 2.1.1). Place your cursor in the Input Y Range window, and
select A3:A42 to specify they-values you are working with. Similarly, place your cursor in the
Input X Range window, and select B3:B42 to specify the x-values you are working with. Next,
place your cursor in the New Worksheet Ply window and type Regression-this is going to be
the name of the new worksheet where Excel regression analysis results are going to be stored.
Select OK.
The Simple Linear Regression Model 33
r - -
1 Re-gressfon l1J �
lilput
[nputJ_Range;
lnput:KJRange;
I :$A$.3:$A$42
I '$8$3::$13$42
�
�
�
1jelp
O!oabe:ls D ·Constant is fero
D Confidem:e Level: �%
0Ulp11I options.
Q .Quj;put Rllflge: �j
0 New l!'JQrkslieet:pJy� I Regre�sionl I
0 New �orlibook
Reslduals--
013.esiduals D Re:sigual P:lotE
Ostandardized Residuals D L!i:ie: RtPlots
Normal Prcliabihty
D·!'.!orrnal-'Probability Plots
The Summary Output that Excel just generated should be highlighted as shown below:
-.,-, A B c D I E I F I G H r J
1 SUMMARY OUTPUT
2
3 Regression Sraiistics
4 MultiplB- R 0.-0.204.85
.5 R Square 0_385D_Q2
& Adjusted F 0.368S·1-6
7 Standard :E .89.517
B Observat.io 40
9
iO ANOVA
11 df SS MS F :Qrr;ficarerc F
12 R1l9ressio1 1 190627 190627 23.78684 USE-OS
1 3 Residual 38 304505.2 8fil13_294
14 Total 39 4951'32.2
15
16 CoefficienManaa'«i E:m t·Sfat P-vaJi.Je l..ower 95% UpDer 95%.ower 95. OfJipper 95. 09�
1 7 lntemept 83_41ifiQ1 43_41orn B21.518 ()'_Qfi2182 -4.4•fi327 1712953 -4_46327 HL2953 -
18 X Variable 10.20964 2.G9326J· 4.87138:1 t.95.E-05· 5.972052 14.44723 5.972(}52 14.44723
19
20
21
•·
22 I L 1
Select the Home tab. In the Cells group of commands, select Format, and AutoFit Column
Width; this is an alternative to adjust the width of the selected columns to fit their contents.
=
,._ n�
rn · �
EB n Cclumn'Width ...
:;:
Autolt=ft CoEUlll'l'li'I Wi1dth.�
�ef,;ult Width ...
34 Chapter 2
A I B c I
D I
E I F G H I
1 SUMMARY OUTPUT
f-
2
3 Re_qression S/:alisfics
4 Multi'f1leR O.S20485472
5 R $:quare oiSS001Z22�
6. Adjus1ed R Squ;;ire C1.:Jliea1 sos9
7 Stal'ld<ird Errnr 89.51700429
1i OhSe.l'Vatlon s 40
9
10 AN OVA
11 ,rJf SS MS F SianificaQce f
12 R"J:rr:e<ssicm 1 13062�.�788 190626.9788 23".7$884'1 Q7 1.94586E-05
13 Residual 38 )0450.5.1742 8013.294058
14 Tota.I 39 4951'32.153
15
16 Coefficients Slandani Error t Stat P-velue Lower95% Upoer95% LDwer 95.0%. Uppe.t 95. 0%
17 lntemef11 83.4""'16U0997 43.4"1016.1921.9215779�1 0.06,2182379 -4.46:1267721 11129.s2srr -4.4632&n21 1112952877
Hl X Variable 1 10.2095425 2.0!t3263461 4.BTTJB0554 1 .94586E-O� 5.97205221f2 14.4472328 5.972052202 14.447.2;328
The least squares estimates are given under the Coefficients column in the last table of the
Summary Output. The estimate for the Intercept coefficient or b1 is the first one; followed by
the estimate of the slope coefficient (X variable 1 coefficient) or b2. The summary output
contains many other items that we will learn about shortly. For now, notice that the number of
observations or pairs of values, 40, is given in cell BS.
A convenient way to report the values for b1 and b2 is to write out the equation of the estimated
regression line:
Yi = 83.42 + 10.21xi (2.3)
Now that we have the equation of our straight line, we would like to graph it. This is what we are
doing in the next section.
There are different ways to draw a regression line. One way is to plot two points and draw the
line that passes through those two points-this is the method we are going to use first. Another
way is plot many points, and then draw the line that passes through all those points-this is the
method that Excel uses in its built-in features we are going to look at next.
When we draw a line by hand, on a piece of paper, using a pen and a ruler, we can use any two
points. We can extend our line between the points, as well as beyond the points, up and down, or
right and left. Excel does not use a ruler. Instead, it uses the coordinates of two points to draw a
line, and it draws the line only between them. So, to have Excel draw a line that spans over the
whole range of data we have, we need to choose those two points a little bit more strategically
than usual.
The Simple Linear Regression Model 35
If you look back at your scatter chart (Figure 2.6 worksheet) or back in your table (Data
worksheet), you can see that our x values range from about 0 to 35 (from 3.69 to 33.4 exactly).
So, we choose our first point to have an x value equal to 0, and our second point an x value of
35.
The point with an x value of zero is our y intercept. It is the point where the line crosses the
vertical axis. Its coordinates are x = 0 and y = b1 or (0, 83.42). This is our first point.
For our second point, we let x = 35; plug this x value in equation (2.3), and compute its
corresponding or predicted y value. We obtain:
Go back to your Data worksheet (if you are not already there). In cell Ll, type Points to graph
regression line. In columns L and M we are going to record the coordinates of the two points we
are using to draw our regression line. In cell L2, type y; in cell M2, type x. In cell M3, type O; in
cell M4, type 35. In cell L3, we actually want to record the value for our y intercept or bi, which
we already have in cell E7. So, we are going to get it from there: in cell L3, type= E7, and press
Enter. In cell L4, we want to have the computed predicted y value from (2.4). So we type
=E7+E6*M4, and press Enter. Note that instead of typing all those cell references, you can just
move your cursor to the cells of interest as if you were actually getting the needed values-this is
a very good way to avoid typing errors. So, you would type the equal sign, move your cursor to
E7 and left-click to select it, type the plus sign, move your cursor to cell E6 and left-click to
select it, type the asterisk, move your cursor to sell M4 and left-click to select it, and finally press
Enter. Once you have done all of that, your worksheet should look like this:
L J M J N
1 P'oints fo graph regre.ssion line
2 y. x
,_ ..
j
83_41601 0
,_l_
4 440.7535 35
Note that the predicted y value we obtain in the worksheet for x = 35 is slightly different than
the one we just computed in equation (2.4) due to rounding number differences.
Now, go back to your Figure 2.6 worksheet. The data we have plotted on the chart represent one
set or series of data. The two new pairs of values we want to add to this chart represent a second
set or series of data.
Select the Design tab, then the Select data button from the Data group of commands.
Chart loCJh
In the Legend Entries (Series) window of the Select data source dialog box, select the Add
button.
,..- _ _;____ .
JP
Legend Entries §eries)
Place your cursor in the Series X values window of the Edit series dialog box, and select
M3:M4 in the Data worksheet. Place your cursor in the Series Y values window (delete
whatever is in there), and select L3:L4 in the Data worksheet. Select OK.
· � dit Series
�
- rli �
Series.name:
[�] :deURlitl!JF
Series.� valLles:
=Dara1�$3-:.$M$4 � = 0, 35
GK Can(eJ
The Select data source dialog box reappears. A second data series, Series2, was created from the
selection you just specified. Select OK.
The two points from your new series are plotted on your chart (squares below):
:
.. .. ..
0
D .
�
"II>
.5 D
0
:!! Lil
.
.
"'
..
D
.. •
.., 0
., .
..,- ··. .
K .
.
" D . .
.
.. D . . .
.., "' . .
. .
.. . . . .
4! 0 . .
r
D .
J:-
""'
N
. .
. .
.
.. .
D . . .
� ;'; II
.
II
;=,.
D
0 5 JlO 15 20 25 SD 35 40
Now, we need to draw a line across those two points. Go to the Layout tab. Change the Current
selection (group of command to the far left) to Series 2 (use the arrow down button to the right of
the window to make that selection). Select Format selection.
!series 2. 1. I SerHeS 2
i�
L� Fi;nma,t_SelectiCJ�
�
Chart roars
� Form<>t S:l'll: 'rtior:i
I � Rrset to Matcl'.I Sfyl<". � Resetto Match Style
[}esign �ayout ts Fmmat C:unenlS:ele'Cllon Current 5clection
A Format data series dialog box pops up. Select Line color and change its selection from No
line to Solid line. Select Close.
'"'
,11,!1.-ur.1 •lm.-�w"1...:<:J] line Color
-
�so'ii"cfl 1ile1
Series. Options.
0 r:-!o Line
Marker Optlons I
�;�di�tlne
Marker Fill 0 Ay_tomatk
I
(;_olor;
11
I
0
0
lD
�
.5 0
0
E lf"I
=
:t: 0
-g 0
..
<;!"
l
x 0
111 0
"Ill f"l
..
.s 0
0
z- IN
...
111
111 0
� 0
rl
II
::..
0
0 5 10 15 35 40
Note that while you need only two points to be able to draw a straight line, you can use more than
two points. So we could have computed a predicted level of food expenditure for every level of
income we have in our original data set, and use the 40 (xi, .Ya pairs of values as our data Series
2. This is actually what Excel does when it adds a Linear Trend Line to a Scatter chart or a
Line of best Fit to Plots of data as part of the Regression Analysis routine.
We are going to delete the line and two points we just added to our graph and successively look at
these other two ways to plot our regression line.
38 Chapter 2
In the Design tab, go back to the Data group of commands, and select the Select Data button. In
the Select Data Source dialog box, select Series2 and Remove. Finally select OK.
J
r �S\'!']t:h Row/C<;;fumn
Chart Tool!
To add a Linear Trend Line, select the Layout tab. Go to the Analysis group of commands,
select Trendline, and then Linear Trendline.
No.ne
Removes the <etecte-d Tr..r1dline OJ all
' Trendlines ili none are selerted
1 Lines UpiDmwn Error Uneatr Trend nne
Layout � Format
Bars·
i!>.n�lysis
Bar1 •
.Ad1'sfse1s a UneafTrendHne for the
�e-lected chart ser�e�
"'
"i I
Your chart should look like this (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):
-0
0
ID
...,.
.! 0
0
� If)
.
"
·" ..
0
.., D
" ...
w..
" a
OJ 0
.., m
0
.e ·O
�
.a
0
N
.,
OJ a
3 0
rl
II
i:--
·o
0 5 10 15 20 25 30 .35 40
x� weeklyiru:ome ini$1IDO
You can also have Excel add the Line that best Fit your data by choosing that option on the
Regression dialog box.
Select the Data tab, located in the middle of your tab list. Select Data Analysis on the Analysis
group of commands to the far right of the ribbon. Select Regression in the Data Analysis dialog
-----l1J (g]
box, and then OK.
- -
a _ _n a-l -.- ------
:' Da-t_ A ysi s
�alysis Tools
Covariance
Descriptive Sratisties
Exponential Smoothing
F'Test Two-Sample fur �ariances
Fouri:er Analysis I t[elp
Hi.s�ram
M.:wing Average
Random Number Gen�ation
Rank and Percentile
1..-F-
o _
r m_
u a�_ o_al!...._,
s ____ a t! Review
Analysis
Re ess1on V'
In the Regression dialog box, proceed as you did before, except this time, name your worksheet
Regression and Line, and check the box in front of Line Fit Plots. Select OK.
Output options.
0 QutputRange: �1
0 New Worksheet!ely: I Regression anci Line I
0 New W.orkbook
Residuals
D Residuals
D siandar.dized Re.siduals
In addition to the Summary Output you now have a Residual Output table and a Chart in your
new worksheet. The Residual Output table is only partially shown below, and shown after
AutoFitting the Column Width (see Section 2.2.2 for more details on that).
The Predicted Y or Yi values have been computed for all the original observed xi values,
similarly to the way we computed y for x = 35 (see Section 2.3.1).
(2.5)
You can compare the Predicted Y and Residuals values reported in the Excel Residual Output
to the ones reported in Table 2.3 of Principles of Econometrics, 4e (p. 66). They should be the
same.
40 Chapter 2
Now, the chart needs a little bit of editing. For one it looks like it is a Column chart as opposed
to a Scatter one. The scales could be changed. Finally, Chart and Axis titles are not currently
very helpful.
Place your cursor anywhere in the Chart area, and left-click, so that Chart Tools are made
available to you again. Select the Design tab. Go to the far left group of commands, Type, and
select Change Chart Type. In the Change Chart Type dialog box, select X Y (Scatter) chart,
and then Scatters with only Markers. Finally, select OK.
,- -
Templates
lltill Co1umn
� Line
@ Pie
� Bar
Chi!rt 1oor.s
� Area
11:1 XY (Scatter) �I
-
.•'4
,.. so: .
I
0.
-
w
J!�
30 40 • �redicted V
X \I ariable 1
Now that we have the correct chart type, we would like to draw a line through all the Predicted Y
points. Actually, since we are using those points to draw our regression line, what we want to
show is only the line. So, we will use the points to draw the line, and then get rid of those big
square points. This way our chart won't be as busy.
On your chart, select the Predicted Y points with your cursor. Your cursor should turn into a fat
cross as shown below:
I (26.6100CJI, 5.0946()'71)
35
11 30 40 • Pr<edicted Y
XVariable 1 XVariab'le 1
The Simple Linear Regression Model 41
Right-click and select Format Data Series. A Format Data Series dialog box pops up. Select
Line Color and Solid line. Change the line color to something different from the Y points.
Select Marker Options, and change the Marker Type from Automatic to None. Select Close.
Qelete
� Reset to MQtch Stylle -
r --
Marker Option.-
0
0
�olidline
§_r adientftne.
Series Options
Marker Options
Marker Type
Adlf: Data. La.Q�f>
0 A�toma1ic
0 A�toma1ic
�
Marker Flll
Adc!Trendline... Marker Fill
-
-Predkted'!I
)( Va rfable 1
On your chart, select the Legend with your cursor, right-click and select Delete.
1- ,J'' t\;1
A Eont...
,_ 500
Clilange Cnart TYJ:H' ...
0
0 10 20 30 40 � :S:�lect Data ...
3-n _E'.nt;;ilon
ICVaria'ble !I.
� Eor_mat Legen.a...
Change the Chart and Axis titles as you see fit. Below, we show you how you can change the
Chart title. You can follow a similar process to change the Axis titles.
; )- 5-00
0
I ..
HJ 20 30 40
XVariable 1
42 Chapter 2
G------------ -------------_i;i
l X rVariahle ll Line Fit Plot l
woo
lch>rtTIle;
� - -1T ------------ - ------0
•
> 500
0 I
0
..... I
10
' ••, ...
·�=··�. !
30
.
40
X Varlab'le l
You can select any of the titles and change the Font size by going back to the Home tab. Select
what you need on the Font group of commands.
You can reformat the y-axis (and/or the x-axis) by selecting it with your cursor, right-clicking and
selecting Format Axis.
Q.elete
..
� SS:lect Data ...
If you proceed as you did before to edit your vertical axis (see Section 2.1.2a), you should obtain
the following:
'Figure2.8 The frttedl.regres:<ion
To resize the whole Chart area, put your cursor over its lower border until it turns into a double
cross arrow as shown below.
1·
The Simple Linear Regression Model 43
Hold it, and drag it down until you are satisfied with the way your chart looks.
"Cl ro
0
Ji! a
0
? N
-""
Ill
0
�" a
..;·
;:a.
a
0 5 10 15 20 25 3() 35 40
You can delete the Gridlines by first selecting them, right-clicking and then selecting Delete.
,. D
II D
"Cl rn
0
_Qelete
.s
1!-
O•
0 � -
....
Ill
N
� Re5i't to M;!hh- �tyle
II 0
�
"
0
.--i oll Change- Cha.rt Type ...
;:.. LE@i S.tledi Data...
0
� 3-D _Batat1!ln ...
0 :m 20 40
� furm af Grl d l i n, e-s ...
JI= weeklyinoome iru$10lD
Forma.t Axls...
You can also reformat the Data Series Y by selecting the points, right-clicking and selecting
Format Data Series. Then proceed as you did before to change your markers' options (see
Section 2. l .2c).
44 Chapter 2
� �
0
0 Sgl e ct Da.ta ...
.-i
ll
>- 3-D B.ol:al1on
CJ
Your result might be (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):
:f
.2 0
0
"'
. . . .
.. .
II
0
�
Ii
0
.,.,
II
...
0
0 10 20 30 4'()
To show that under the assumptions of the simple linear regression model, E(b1) = {31 and
E(b2) = {32, we first put ourselves in a situation where we know our population and regression
parameters (i.e. we know the truth). We then use the least squares regression technique to unveil
the truth (which we already know). This allows us to check on the validity of the least squares
regression technique, and specifically to check on the unbiasedness of the least squares
estimators.
The Simple Linear Regression Model 45
First, let us restate the assumptions of the simple linear regression model (see p. 45 of Principles
ofEconometrics, 4e):
• The mean value of y, for each value of x, is given by the linear regression function:
• For each value of x, the values of y are distributed about their mean value, following
probability distributions that all have the same variance:
var(ylx) = a2 (2.7)
• The sample values of y are all uncorrelated and have zero covariance, implying that there
is no linear association among them:
(2.8)
• The variable x is not random and must take at least two different values.
• (optional) The values of y are normally distributed about their mean for each value of x:
In the specific and simplified case we are considering in this section, half of our hypothetical
population of three person households has a weekly income of $1000 (x = 10), and half of it has
a weekly income of $2000 (x = 20). Because we are all mighty, we know the values of our
population parameters, and consequently the values of our regression parameters. Let µylx=lO =
200, µylx=ZO = 300, and var(ylx = 10) = var(ylx = 20) = a2 = 2500. This implies
{31 = 100 and {32 = 10.
The probability distribution functions of weekly food expenditure, y, given an income level
x = 10 and an income level x = 20, are assumed to be Normal. They look like this:
- t(vl�=10J
-t(vlx=20)
46 Chapter 2
The linear relationship between weekly food expenditure and weekly income looks like the
following:
lJ
300
200
() 10 20
Let us emphasize the difference between this section and Chapter 2 in Principles of
Econometrics, 4e. In this section, we do know the truth. In other words, we have information
regarding weekly food expenditure and weekly food income on all three person households that
constitute our population. In Chapter 2 of Principles of Econometrics, 4e, like it is the case in
real-life, you do not have that population information. You must thus rely solely on your random
sample information to make inferences about your population.
Now, as an exercise, and as a way to prove the unbiasedness of the least squares estimators, we
are going to use the least square regression technique to unveil the truth.
Insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the bottom of
your screen (or Press the Shift and Fl 1 keys). Name it Simulation.
Simu lation�'
We are going to draw a random sample of 40 households from our population. Half of the sample
is drawn from the first type of households, with weekly income x = 10; and half of the sample is
drawn from the second type of households, with weekly income x = 20.
Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and Right-Align it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.
The Simple Linear Regression Model 47
A A
1 20
2 20
3 10 20
4 10 20
5 10 20
6 10 20
7 1Q 20
8 10 20
9 rn 20
1.0 10 20
11 10 io
12 10 33 20
13 10 34 20
14 10 35 20
15 10 36 20
16 10 37 20
17 10 38 20
1.8 10 39 20
19 10 40 20
20 10 41 20
21 10 42
We use the Random Number Generation analysis tool to draw our random sample of
households. We keep record of their weekly food expenditure in column B of our Simulation
worksheet: type y in Bl, and Right-Align it.
I A I B II
1 J x y
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
Anal1111sc
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
�alysi,.Tools
f-Test Two-Sample
Fowrier Analysis
Histogram
Movi�verag_e
for \/ariances
,� [
l c:�
DMfti@Miii.ffil§§·@M·'·!· I tfelp I
Rank and Per c:entile
Regression
Sampling
t-Test: PairedT,..C>Sample for<Means
1YI
t-Test: Two-Sample Assuming Equal Variances vi
A Random Number Generation dialog box pops up. Since we are drawing one random sample,
we specify 1 in the Number of Variables window. We first draw a random samples of 20 from
48 Chapter 2
households with weekly income of x = 10, so we specify the Number of Random Numbers to
be 20. For simplicity we assumed that our population of households has weekly food expenditure
that is normally distributed, so this is the distribution we choose. Once you have selected Normal
in the Distribution window, you will be able to specify its Parameters: for x = 10, its Mean is
µylx=io = 200 and its Standard deviation is .Jvar(ylx = 10) = a = 50. Select the Output
Range in the Output options section, and specify it to be B2:B21 in your Simulation worksheet.
Finally, select OK.
M!::,an=
�
Standard deviatior;i = �
�dom S eed;
Output options
0 Quljxit Range;
0 'New Worksheet.�ly:·
0 New Wodcbook
Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:B41.
ParametErs
QutpLlt options
-
M�an=
� I e Qulµ!Jlt'R<lnge:. 1$8$22;$6$41 �
Here is the random sample that we obtained. NOTE: you will obtain a different random sample,
due to the nature of random sampling.
The Simple Linear Regression Model 49
A B A B
1 :x y 22- :m. ·214.6751
2 HJ 122.490&' 23 20 336.57.85
3 11() 163.1711 24 20 303.5467
4 11() 211.0i02 .25 20 .216.4365'
5 10 294.12.95· 26 20 358.9562.
6 10 192.9407 27 20 278.1513
l8 10 116.1414 39 20 273.67.85'
Next, we use the LINEST function to obtain the least squares estimates for the intercept and
slope parameters, based on the random sample we just drew. The LINEST function is an
alternative to using the Least Squares Estimators' Formulas (see Section 2.2.1) or the Excel
Regression Analysis Routine (see Section 2.2.2). It allows us to quickly get the least squares
estimates for the intercept and slope parameters. For this purpose, the general syntax of the
LINEST function is as follows:
= LINEST(y's, x's)
The first argument of the LINEST function specifies the y values, and the second argument
specifies the x values, the least squares estimates are based on. In our case, we thus need to
specify:
= LINEST(B2:B41,A2:A41)
The LINEST function creates a table where it stores the least squares estimates in Excel memory.
It first reports the slope coefficient estimate, and then the intercept coefficient estimate. So, if we
were to look into Excel memory, the estimates would be reported as shown below:
column 1 column 2
rowl
We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. In the case of a table with only one
row, the INDEX function general syntax is as follows:
The first argument of the INDEX function specifies which table to get the results from. In our
case, this is the table of results generated by the LINEST function above. So, we replace "table of
results" by "LINEST(B2:B41,A2:A41)". The second argument indicates from which column of
the table to retrieve the result of interest to us. So, if we want to retrieve the estimate of the
intercept coefficient, b1, from the table above, we would indicate that it can be found in column 2
by replacing "column_num" by "2".
We are going to report our estimated coefficients at the bottom of our table. In cell A43, type bl
=; in cell A44, type b2 =. Bold those labels. In cell B43 and B44, type the following equations,
respectively:
A B
43 bl= =INDEX(LINEST(B2:B41,A2:A41),2)
44 b2= =INDEX(LINEST(B2:B41,A2:A41),l)
44 b2= 11.47325
The estimates of the intercept and slope coefficients are based on one random sample. Our
random sample is different than yours, and each random sample yields different estimates, which
may or may not be close to the true parameter values. The property of unbiasedness is about the
average values of b1 and b2 if many samples of the same size are drawn from the same
population. In the next section, we are thus going to repeat our sampling and least squares
estimation exercise.
Go back to the Random Number Generation dialog box. We would like to draw 9 additional
random samples, so we specify 9 in the Number of Variables window. Again, we first draw
random samples of 20 from households with weekly income of x = 10, so we specify the
Number of Random Numbers to be 20. We also select Normal in the Distribution window,
and specify its Parameters. For x = 10, its Mean is µylx=lO = 200 and its Standard Deviation
is .Jvar(ylx = 10) = a= 50. Specify the Output Range to be C2:K21. Finally, select OK.
The Simple Linear Regression Model 51
Parameters-
M�an=
�
::i_t:and"rrl dev.ialion = �
8_andom Seed:
Outp;Jt op lions
@ QutputRa'J9e: �$2:$C$21
Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to C22:K41.
Parameter.s.
I�
Output apfons
QutputR,ange:
Next, before we copy the formula to get our coefficient estimates, we need to transform their
Relative cell references A2:A41 into Absolute cell references $A$2:$A$41, since we will be
using the same x-values for our next 9 rounds of least squares estimates.
Copy the formulas from B43:B44 into C43:K44. In cells L43:L44 compute the AVERAGEs of
your estimates from your 10 samples. In cell L43, you should have =AVERAGE(B43:K43); in
cell L44, you should have =AVERAGE(B44:K44). The estimates and average values that we get
for our 10 samples are:
A I B I c I D I E I F I G I H I I I I I I<'. I l
43 bl: 67.64114 65.92893 110.0?45 50.41892. 102.9383 12.7. 2p �6 68.025{)8 30.43498 132..2953 75.4688 89.14425
--·
. -.
44 . b2: 11.4732.6 12.2687 S:.813-088 11.73885 10.11185 8.61•69 11.5.521 10.8758 8.048971 11.33003 10.48296
If we took the averages of estimates from many samples, these averages would approach the true
parameter values {31 and {32. To show you that this is the case, we repeated the exercise again.
Here are the average values of b1 and b2 that we did get as we increased the number of samples
from 10, to 100, and finally to 1000:
The next section of this chapter is very short. It points out how you can compute an estimate of
the variances and covariance of the least squares estimators b1 and b2 using Excel. It also outlines
other numbers you can recognize in the Excel summary output. Note that for this section we are
getting back to our food expenditure and income data of Sections 2.1-2.3, i.e. data from one
sample of 40 households that was drawn from a population with unknown parameters.
You can compute an estimate of the variances and covariance of the least squares estimators
b1 and b2, the same way you computed b1 and b2. Consider their algebraic expressions (see
below or p. 65 of Principles of Econometrics, 4e), and perform the simple arithmetic operations
needed. You might want to do that as an exercise; you will be able to check on your work by
comparing your estimates to the one reported on pp. 66-67 of Principles ofEconometrics, 4e.
Estimates of the variances and covariance of the least squares estimators b1 and b2 are given by:
(2.10)
(2.11)
(2.12)
2 L -2
and 8 = _!J_ is an estimate of the error variance, (2.13)
N-K
The square roots of the estimated variances are the standard errors of b1 and b2. They are denoted
as se(b1) and se(b2).
(2.14)
Excel regression routine does not automatically generate estimates of the variances and
covariance of the least squares estimators b1 and b2, but it does compute the standard errors of b1
and b2, as well as other intermediary results.
The Simple Linear Regression Model 53
Specifically, the following estimates can be found in the Excel Summary Output you generated
earlier:
A I B I c I D I E I F G I H I I
� SUMMARY OUTPUT
JI RefJ.e
I Ssfon Statistic:s:
4 Mul tii:>le :R 0_620485472
c-5 - R Sqllaie 0 _385002221.
� �djus!erl R Square 0.368818059
7 Stan.dard Error 89_51700429
8 Observations 40
e-fo-IANOVA
i! I __
dt SS MS f Sig_niflc1111ce f
'
J1_ f3egr·ession 1· 190 G2'
- &.9788 190626_9'788 .23-78884107 1 . 94585E-()5
-
Note that :L if, the Sum of Squared Residuals (SS Residual), is also referred to as the Sum of
Squared Errors - hence the abbreviation SSE used in p. 51 of Principles ofEconometrics, 4e.
Open the Excel file hr. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it pr data, and in it, copy the data set you just opened.
This data set contains data on 1080 houses sold in Baton Rouge, LA during mid-2005, which we
are using to estimate the following quadratic model for house prices:
(2.15)
54 Chapter 2
In your br data worksheet, insert a column to the right of the sqft column B (see Section 1.4 for
more details on how to do that). In your new cells Cl:C2, enter the following column label and
formula.
c
1 sq ff
2 =B2J\2
Copy the content of cells C2 to cells C3:C1081. Here is how your table should look (only the
first five values are shown below):
A I B I c
1 pric� sq!f sqft2
-
2 6:6500 741 549'081
�
3 56000 741 549·081
4 68500 790 624100
1 02000
-
5 2783 7745089
6 -
54000 11165 1357225
In theRegression dialog box, the Input Y Range should be A2:A1081, and the Input X Range
should be C2:C1081. Select New Worksheet Ply and name it Quadratic Model. Finally select
OK.
i Regress.io n ��
Input
I $A 2': $As1os [�
�
.
Input)'. Range: 1
I
A B I c D I E I F I G I H I I
1
_,_
SUMMARY O'UTPUT
,_
2
3 Rec:ire-ssion Stab-.Slics
4. Multiple R U32075415
5 R .S�uare 0.&92349497
� Adju�.!e<:I R. Sq':J_ar.e
--
OJj.920£4107
.1__1
Standard Error 68205: 74032
8 Observations 1080
9
10 AN OVA
111 (jf SS MS F Stg_nif.lcar1cr;: F
12 Regression 1 1.1286Et13 1.12B6Et13 2425.976064 3.3748E-278
13 Residual -- 1078 5.0150JE+12 465�21594.26 �
Q_OOQ31"3095 49.2'5419844
The Simple Linear Regression Model 55
Go back to your br data worksheet and select A2:B1081. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.
Scatter
9000
8000
700!()
•
6000
5000
4000 • S.erie'l=l
3001[)
2000
10{)0
You can see that our house price values are on the horizontal axis and square footage values are
on the vertical axis; we would like to change that around and edit our chart as we did in Section
2.1 with our plot of food expenditure data. The result is (see also Figure 2.14 on p. 70 in
Principles ofEconometrics, 4e):
150000{)
<I>
.5
100000<0
·�ll.
$
::I .
0 . ..
:c soon no .
·. '
. ':..··�·.��· . ....:· .
' •I II
Finally, we add the fitted quadratic relationship to our scatter plot. In cells Nl:N2 and 01:03 of
your br data worksheet , enter the following column label and formula.
56 Chapter 2
N 0
1 quadratic price-hat sq ft
2 ='Quadratic Model'!$B$17+'Quadratic Model'!$B$18*'br data'!02 0
3 400
Select cells 02:03, move your cursor to the lower right corner of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell 022: Excel recognizes
the series and automatically completes it for you. Next, copy the content of cell N2 to cells
N3:N22. Here is how your table should look (only the first five values are shown below):
N I 0
Go back to your scatter plot and right-click in the middle of your chart area. Select Select Data.
In the Legend Entries (Series) window of the Select Data Source dialog box, select the Add
button. In the Series name window, type Fitted Quadratic Relationship. Select 02:022 for the
Series X values and select N2:N22 for the Series Y values. Finally, select OK. The Fitted
Quadratic Relationship series has been added to your graph.
Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re
appears, select OK again.
-------
elect IJata Source
'
. - -
Chart Q.ara range: !==: Edit Seri es
�p
______
Serie�;( values:
=br data'!$A'$2:-$A�108 l �
Llit;J
Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
The Simple Linear Regression Model 57
Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.
�� [i] �
Chart Tool! Chart Axrs leg.enp'j Data Data
Title· Titles� • [:?Labels• Table
Design
Labels
Finally, we want to reformat our Fitted Quadratic Relationship values series. Select the plotted
series in your chart area, right-click and select Format Data Series. A Format Data Series
dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.
Qelete
� Reset to M.!!_!ch S.tyle
-
Change S:eries Cl'lallt T�!J·e ... I Format Oata s�ries Line Color --
• •
1500000
•
. ..
1h
.5! - Fitte.d Quadratic
1000000 Relationship
-�
... •
�
..
0
x SOOOGO
• •
D'
a 2000 4000 6000 8000
D
1 ln(price)
2 =ln(A2)
Copy the content of cells D2 to cells D3:Dl081. Here is how your table should look (only the
first five values are shown below):
A I B I c I D J
I-
1 price sqft sqtt2 lnlpric:e)
i 6&500 741 5490�1 1UCi496
3 6600-0 741 549081 11.09741
,_
4 68500 79n &24100 11-13459
I-
5 102000 2:183 7745089 11.53273
>---
� 54 000 1165 1357225' 10.89674
Next, we specify BIN values. These values will determine the range of PRICE and ln(PRICE)
values for each column of the histogram. The bin values have to be given in ascending order.
Starting with the lowest bin value, a PRICE or ln(PRICE) value will be counted in a particular
bin if it is equal to or less than the bin value.
In cells Sl:T3 of your br data worksheet , enter the following column labels and data.
s T
1 price bin lnprice bin
2 0 9
3 50000 9.2
Select cells S2:S3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell S34: Excel recognizes
the series and automatically completes it for you. Similarly, select cells T2:T3, move your cursor
to the lower right comer of your selection until it turns into a skinny cross; left-click, hold it and
drag it down to cell T29. Here is how your table should look (only the first five values are shown
below):
s. I T
, price bin I npric-e bin
-
2 0 9
s J T I --
3 50000 9.2
1 ori&e bin 1 lnnrioe IJ.in
T 1()000.0 9_4
c--2
3 500 0a 2
3
-9.2+
9:1
-
5 150000 9.6
, ' . ' 6 200000 9.8
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
F1Jrmulas
The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
The Simple Linear Regression Model 59
� - _-_ -:--
=I
F-TestTwo-5ample for Variances
!:!!elp
Fouirier Analysis
w 1sto ram
MC1ving Average
Random N�mber Generation
Rarik and Perceritile
Regression vi
An Histogram dialog box pops up. For the Input Range, specify A2:A1081; for the Bin Range,
specify S2:S34. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Price Histogram; check the box next to Chart Output. Finally, select OK.
Input
Output options
0 Qutput RQnge:
@New Workshe.etBJy: I Price Hismgr.im
I
0 Ner/11 '8'_arkbm1k
D Pgreto (s(!)rted histogram)
Ooumulative Percentage
�l��·�r.t·90_tiJUt]
Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.
�r - -
--
'F·�rmat Data Seri e$ -(1)�
S"erfe. s Options
Border Color
0 !:!oline
� 8'Jrder Colmr J 0 :i_olldh
Border Styles
0 �radlent line
0 A!!_tomatic
Shado'll'
3-0
�oler: �
�
Format
Iransp (Co l<>r) Q"----- �1 Clo
After editing our chart as we did in Section 2.1 with our plot of food expenditure data, the result
is (see also Figure 2.16(a) on p. 72 in Principles ofEconometrics, 4e):
450
400
350
...
"
300
"
250
Ill
"
...
...�
200
150
100
50
Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.16(a) of Principles ofEconometrics, 4e are relative ones.
Go back to your br data worksheet. In the Histogram dialog box, specify D2:D1081 for the
Input Range and T2:T29 for the Bin Range. Check the New Worksheet Ply option and name it
lnPrice Histogram; check the box next to Chart Output. Finally, select OK.
,· Histognm LIJ"�
[nput
lnput RaiJge: 1$()$2:$0$1081 �
!l.in Range: I $T$2�n�9 �
tielp
D�abe1s
Output options
0 QiJlplJtRiOnge: I �I
® New Worksheet Ely:: ItnPcice Hisfugram I
0 New \!!!_orkbook
D P!!!eto (sorted histogram)
D Cumulative Pern:er'll:age
� !;;_h,.rt Output
The final result is (see also Figure 2.16(b) on p. 72 in Principles ofEconometrics, 4e):
The Simple Linear Regression Model 61
25()
200-
;,..
"
c::
15()
al
"
1!:11'
� 100
...
50·
"' -<:" 00 "" w rl . ..,. 00 "" .., "' ... 00 "' "'
a'i ai 0 c:i ·rl ,,...j ..... ..:; ...; rl ,.,; ,,.; <i
rl ·rl ·rl rl rl .,.., ·rl rl rl
0
�
lnPrloe
Again, note that the frequencies given in the graph above are absolute ones, while the frequencies
given in Figure 2.16(b) of Principles ofEconometrics, 4e are relative ones.
In the Regression dialog box, the Input Y Range should be D2:D1081, and the Input X Range
should be B2:B1081. Select New Worksheet Ply and name it Log-Linear Model Finally select
OK.
1' R�----------ITJ@
Input
InputY. Range:
Input'� Range:
I $0:�::$051081
I sssz: : ��1oa 1
[fil
[�J
� el
!:ielp
Dtoabels. D !Coo stant is ;;'.ero
D Confidence Level: EJ %
Output opb"onSo
0 Qurtput·Rarige� rii J
e New Worksheet E:IY': I Log-linear Model I
-.Hs'UMMAR:YA ouTPm I
h-1·
B I
}
C. I D I E I F G H I I
3 I Reg_ress1on Stalislics I
,_i_ _Mulliple R 0-79·(}4.13619
.-3 R Square 0.624753·&89
� A·djusted R s.q�are 0.6.24405594
l Standard Error
ti Observations.
0.'.3'2:1465013
108-0
10 AN'OVA I
11 I I df SS MS F Sig_nificc11nG"! F 1
1 2 R·egressiun
13 Residual
i 1
1078
1·85.4720974
111.4002553
185.4720-_9'74
0 .103339'75 4
1794-779738 t1066E-231
In cells Ql:Q2 of your br data worksheet, enter the following column label and formula.
Q
1 log-linear price-hat
2 =EXP('Log-Linear Model'!$B$17+'Log-Linear Model'!$B$18*'br data'!P2)
Next, copy the content of cells Q2 to cells Q3:Q22. Here is how your table should look (only the
first five values are shown below):
=
.. Q
.Llog-linear p.rice hat
_L_ 50949.81045
-3 6006Qi.27135
4 70799.7%17
5- 83459.681 BJ
-6-
9B383.3t279
Select your scatter plot of actual data points and fitted quadratic relationship and make a copy of
it. Right-click in the middle of the copy of your chart. Select Select Data. In the Legend Entries
(Series) window of the Select Data Source dialog box, select the Fitted Quadratic
Relationship series, and then the Edit button. In the Series name window, replace the old name
by Fitted Log-Linear Relationship. Select P2:P22 for the Series X values and select Q2:Q22
for the Series Y values. Finally, select OK, twice. The Fitted Log-Linear Relationship series
has been added to your graph.
, ------
Select Data Source
,... - --
chart Qata ranl'Je: ·c= ' Edit 5-erit'S
The data range is lo1:1 compi_ex t Series name:
:the. S..ries in the Series pan el .
I ='M�d Log�inear Rela1ionship" m =Fi'
I
Delefe
• •
1500000 • Actual
•
••
-4Jl
�= - fittce d Qi.iail n1t(c
.� 100000-0 R'e J.atlon:>hip
&: •
�..
0
;c 500000
Open the Excel file utown. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it utown data, and in it, copy the data set you just opened.
This data file contains a sample of 1000 observations on house prices in two neighborhoods. One
neighborhood is near a major university and called University Town. Another similar
neighborhood, called Golden Oaks, is a few miles away from the university.
In cells Hl:H3 of your utown data worksheet, enter the following column label and data.
H
1 bin
2 125
3 137.5
Select cells H2:H3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell H20. Here is how your
table should look (only the first five values are shown below):
64 Chapter 2
H
1 bin
I-
2- 125
H .I f-
:j 137.5
1 bin
2 12� 4·- 15.Qr
137_�1
,_
5 162_5
3 ,_
' I ,5 175
In the Histogram dialog box, specify A2:A482 for the Input Range and H2:H20 for the Bin
Range. Check the New Worksheet Ply option and name it Golden Oaks Prices Histogram;
check the box next to Chart Output. Finally, select OK.
I H istogram rn�
Jnput
!npwt Range: $!.$2:$<\$482: li3 rn;: 1£]
!:!in Range·:
cancel ]
$H$:2:')H� �
O�abels t:Jelp J
Output 01:rtions
0 Qutput �nge:
0 NeVY Worksheet �ly.; J Oaks Prices. Histogram J
0 Nelil Workbo;;ok
0 P,grero (SQl"ted hisilogram)
0 Cu!!!ulanve Percentage
� Chart Output
The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):
90
80
70
60
t"
"
ill 50
"'
'Ir"
�
40
ILL
30
20
10
0
125 :1!50 175 201J .225 .25() 275 300 325 350
Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.18 of Principles ofEconometrics, 4e are relative ones.
Go back to your utown data worksheet. In the Histogram dialog box, specify A483:A1001 for
the Input Range and H2:H20 for the Bin Range. Check the New Worksheet Ply option and
name it U Town Prices Histogram; check the box next to Chart Output. Finally, select OK.
The Simple Linear Regression Model 65
. -
i Histogram � t8]
Input
0 :New illorkboo\_
D 'P�reto (scf rted histogram)'
D Cumulative Percentage
� Q.hartOutput
The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):
90
80
70
50
e-
i= 50
�
u..
40
30
20
10
() -t--.--.--i-.,_
125 150 ]75 200 225 250 275 300 325 350
In the Regression dialog box, the Input Y Range should be A2:A1001, and the Input X Range
should be D2:D1001. Select New Worksheet Ply and name it Indicator Variable Model.
Finally select OK.
66 Chapter 2
OK
Input)'Jl.�e: 1
Cancel
Irlpuf� Range.: si:J$2:$Dsioo1
t!elp
Dkabeis. 0 Cons'tant is f:er'o
D Confldence Level:
Output �ptions
0 Qutput Range:
___ _____ _
___
_
l1] �
_11
g r_e-_s51-_0
� R-e-0 _ _ _
lnput �
I SA$2: $A, 5WO [ifil I
The result is (matching the one reported on p. I 75 in Principles ofI Econometrics,
I 4e):
[�l
l J
SUMMARY OUTPUT 6=) %
Statfslir::s
MultipleR 0.728744479, �1
Adjusted R Square 0.53106851&. I I
Standard Ermr 28.90745008
Obser\l:alions 1000
A.NOVA
F
A I 8 I 944476.7536'
c I 94447D6.7536 I 11.30.242684
E 2.'64F
79E-1<&�.I G I H I I
1 I 83 3969.3888 835. 640670>1
f---
2
14 I H78446.143:
3 l Reg_ressrofi'
Coe.fficienfo Standard Error I stal P-value Lowre 95% Lower95.0%
,__!_
L R_Sq�are 215.7324947 131.806625S 1163.673481"2. 213.145S956 213.1459'956 218.J18993�
�
7
X Vari.a11Jre 1 0 ._53 0598645 ,
61,.5091066'&: 1.829589113· 38.6190.8214
I-
2.'6479E-166 57.9188238 65.0fHr3:89-51 57_9188238 6.5.D9938951
f---
8
..
9j
This
1 a ends Chapter 2 of this manual.
f1 I
I
You might want to save your work before you close shop.
df SS MS
I
F Sif!.n'lfic11nce
J? r�r.essi ar:i 1
J.3 Re�si<iual 9SS. ·
Total 999',
15 1
�Intercept
16 "1
,____
r-
0
Urper95%
21 8.J.189939
Uopw-95. 0%
CHAPTER 3
CHAPTER OUTLINE
3.1 Interval Estimation 3.2 Hypothesis Tests
3.1.1 The t-Distribution 3.2.1 One-Tail Tests with Alternative "Greater
3.1.1a The t-Distribution versus Normal Than"(>)
Distribution 3.2.2 One-Tail Tests with Alternative "Less
3.1.1b t-Critical Values and Interval Than"(<)
Estimates 3.2.3 Two-Tail Tests with Alternative "Not
3.1.1c Percentile Values Equal To" (;t)
3.1.1d TINV Function 3.3 Examples of Hypothesis Tests
3.1.1e Appendix E: Table 2 in POE 3.3.1 Right-Tail Tests
3.1.2 Obtaining Interval Estimates 3.3.1a One-Tail Test of Significance
3.1.3 An Illustration 3.3.1b One-Tail Test of an Economic
3.1.3a Using the Interval Estimator Hypothesis
Formula 3.3.2 Left-Tail Tests
3.1.3b Excel Regression Default Output 3.3.3 Two-Tail Tests
3.1.3c Excel Regression Confidence Level 3.3.3a Two-Tail Test of an Economic
Option Hypothesis
3.1.4 The Repeated Sampling Context 3.3.3b Two-Tail Test of Significance
(Advanced Material) 3.4 The p-Value
3.1.4a Model Assumptions 3.4.1 The p-Value Rule
3.1.4b Repeated Random Sampling 3.4.1a Definition of p-Value
3.1.4c The LINEST Function Revisited 3.4.1 b Justification for the p-Value Rule
3.1.4d The Simulation Template 3.4.2 The TDIST Function
3.1.4e The IF Function 3.4.3 Examples of Hypothesis Tests Revisited
3.1.4f The OR Function 3.4.3a Right-Tail Test from Section 3.3.1b
3.1.4g The COUNTIF Function 3.4.3b Left-Tail Test from Section 3.3.2
3.4.3c Two-Tail Test from Section 3.3.3a
3.4.3d Two-Tail Test from Section 3.3.3b
67
68 Chapter 3
In this chapter we will use the t-distribution to construct interval estimates and perform
hypothesis tests. We continue to work with the simple linear regression model of weekly food
expenditure.
Rename Sheet 1 Data. Quickly re-estimate the regression parameters using Excel regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Regression; you do not need to check the box next to Line Fit Plots.
The t-distribution is a bell-shaped curve centered and symmetric around its mean, equal to zero. It
looks like the standard normal distribution, except it is more spread out, with a larger variance
and thicker tails. The exact shape of the t-distribution is controlled by a single parameter called
the degrees of freedom, often abbreviated as df The notation tern) is used to specify a t
distribution with m degrees of freedom.
Below is a graph of the t-distribution with m = 3 degrees of freedom and the standard normal
distribution.
D.40 l-"""===;;;;
" ;;;;
;; ;;:='
;; :=::==�k===.;:=:1 - - - N(0.1)
--
fl'3.\
D.1QI===:::
[)_2{)1-""""==�
n.on ------=-----------...__
_ ._____._....__
.. ___
-6 -2 0 6
Interval Estimation and Hypothesis Testing 69
In order to construct interval estimates, we will need critical values of I-distributions with various
degrees of freedom. The abbreviation used for a critical value is tc. The values -tc and tc are the
endpoints of a closed interval around zero such that the probability of drawing a I-value in this
interval is (1 - a), and the probability is a that a value is either less than -tc or greater than tc.
Since the distribution is symmetric, the probability that a I-value is less than -tc is (a/2), and
the probability that a I-value is greater than tc is (a/2).
We are usually interested in the critical value tc such that the probability that a randomly drawnt
value is within the closed interval [-tc, tc] is 0.95 or 0.99, which means that the probability of a
value outside the interval, in the tails of the distribution, is only 0.05 or 0.01.
Let a 0.05. This leads to a closed interval [-tc, tc] such that
= the probability is (1 - a) =
/(!)
Since the probability is(a/2) that at-value is greater than tc, this also means that the probability
of drawing a t-value less than or equal to tc is (1 - a/2). The critical value tc is the 100(1 -
a/2) percentile of the I-distribution, denoted tci-a/Z,m)·
We will use the TINV function to computet-critical values. First, we create a new worksheet and
table where we will store our computations.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the Data tab. Name it t-critical value.
Select cell Al. Select the Insert tab located next the Home tab. In the Text group of commands
select Symbol. In the Symbol dialog box, the Symbols tab should be open. Select a (you might
70 Chapter 3
need to use the scroll bar to move up and down the window and find this symbol). Finally, select
Insert.
- ------
5ymlbol
t-critical values are obtained in Excel by using the TINV function. The syntax of the TINV
function is as follows:
=TINV(a, m)
To find the t-critical value for a= 0.05 (the combined probability in two-tails) and m= 38,
given the way we organized our table above, we need to write the following formula in B3:
- -
I- 3 le=
A I B
2.0243�3'4
I
In cell Bl, change a from 0.05 to 0.01. Here is how your table should look like:
A I B I
I-
1 :tt=·
l 0.0·1
2 m= 36
I-
3 tc =
2.711556
,___
Alternatively, we could have gotten those t-critical values from Table 2 at the end of Principles of
Econometrics, 4e. Recall that the critical value tc is also the 100(1 - a/2)th percentile of the t
distribution, denoted tci-a/Z,m)· For a= 0.05 and m= 38, the critical value tc is the 100(1 -
a/2)= 100(1 - 0.05/2)= 100(1 - 0.025)= 97.5 or 97.5th percentile of the t-distribution,
denoted tc.975,38). At the intersection of the column labeled "tc.975,df)" and the row "38" degrees
of freedom (dj), tc= 2.024.
For a= 0.01, holding m constant, the critical value tc is the 100(1 - a/2)= 100(1 -
0.01/2)= 100(1 - 0.005)= 99.5 or 99.5th percentile of the t-distribution, t(.995,38). Its value
is found at the intersection of the column labeled "tc.955,df)" and the row "38" degrees of
freedom (dj): tc= 2.712. Those t-critical values are slightly different from the ones we obtained
in Excel due to rounding in Table 2.
(3.1)
The interval bk± tcse(bk) has probability (1 - a) of containing the true but unknown parameter
f3k· When using data, we say that we have a 100(1 - a)o/o interval estimate or 100(1 - a)o/o
confidence interval.
We are usually interested in constructing either a 95% or a 99% confidence interval, so the
corresponding a values that we would use to get our t-critical values are a= 0.05, and a= 0.01.
To obtain the interval estimates, we use equation (3.1) and replace the least squares estimators bk,
the critical t-value tc, and the standard errors of bk's, se(bk), by their estimated values. The
lower limit (LL) and the upper limit (UL) of the interval will be:
(3.2)
(3.3)
3.1.3 An Illustration
In this section, we will first illustrate how to obtain an interval estimate by plugging values into
the interval estimator's formula. Next, we will go back to the Excel regression analysis tool and
look at the output we already have generated, as well as look at the built-in option available to
generate additional interval estimates.
We create a template to compute the interval estimates for the least squares regression parameters
of the food expenditure model.
72 Chapter 3
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the t-critical value tab. Name it Interval Estimate.
11� �
I Re.ady
�
I
�1 I Reo rassTcrn , Dciti J t-cri.Hcal v;J1ue /15:'1;{1
"
i
� ! Rewe�ion r" Data , t-aitiGJI Vi3lue ] . J Estim.ilte < "'t:J
Interva _A
L--(
A B c
1 Data Input Sample Size= =Regression!B8
2 Confidence Level=
3 Estimated bk= =Regression!B18
4 Standard Error of bk= =Regression!C18
5
6 Computed Values a= =l-C2
7 df or m= =Cl-2
8 tc = =TINV(C6,C7)
9
10 Interval Estimate Lower Limit= =C3-C8*C4
11 Upper Limit= =C3+C8*C4
Note that we get the sample size, estimated coefficient and standard error from our Regression
worksheet. All you have to do in cells Cl and C3:C4 is, first, type the equal sign, and then, go
select the needed value in the Regression worksheet with your cursor. Finally, press Enter. We
are computing the interval estimate for {32, the slope parameter. Cell C2 is left blank for now.
Later, you will enter either 95 or 99 depending on whether you are constructing a 95% or a 99%
confidence interval, but you could also enter any other confidence level. In cell C6, the a level
will be computed based on the level of confidence entered in C2. In cell C7, the degrees of
freedom are set equal to N - 2, where N is the sample size, which we record in cell Cl. Cell C8
is where the critical t-value is computed, as shown in Section 3.1.ld. Cells ClO-Cll are where
the limits of the interval estimate are computed, using equations (3.2) and (3.3).
Before we specify our level of confidence, we would like to reformat C2 so that the level of
confidence can be displayed as a percentage. In cell C2, right-click, and select Format Cells on
the tasks panel that opens up. In the Format Cells dialog box, select Percentage in the Category
window, choose 0 decimal place (use the up and down arrows for that, to the right of the Decimal
places window). Finally, select OK.
f"
- - • -
'Category:
General 1 Sample.
Number
Currenc:y
Accounting
Da'te
Q.edmal places: [ii�
.)I; Cut Time
-
lii@i@.ir.l•i-
J;;opy Fraction
f'aste Scientific
Text
Past' �pec�ar ... Special
Custom
Insert .. .
Q�let� .. .
Cle<ir Content�
Fflt�.r
S.Qrt
Percentage formats multiply the cell value by 100 and displays the result with a.percent
� symbol.
� ,Eorrmat Cells...
.
�
I
·
Pie.ls From Drop-dovin U.s.t...
Here are the results you should get for a 95% confidence interval estimate for {32 (make sure you
type 95, and not 0.95, in C2):
A B c
1 Data. Input Sample Size=
2 Confide.nee Level =
3 Estimated ��=
4 Standard Error of [ii:=
5
6 Comput·edl Values (]'=
7 dfafm=
B �=
9
10 Interval Estimate L·ower Limit =
11 Upper Limit·=
The lower limit and upper limit of the interval estimate above should be the same as those
reported on p. 98 of Principles ofEconometrics, 4e.
We plugged values in equation (3.1), and built a template, to obtain interval estimates. Next, we
will go to our Regression worksheet and look at the interval estimates Excel has already
generated in the regression summary output.
Go to your Regression worksheet, and look at the last table of the summary output. Columns F
and G of that table present the lower limits and upper limits of the interval estimates for the
intercept and slope parameters, {31 and {32 (shaded cells below). Excel regression analysis routine
automatically generates the 95% confidence interval estimates.
74 Chapter 3
In cell F18, you can find the lower limit of the interval estimate for {32. In cell G18, you can find
the upper limit of the interval estimate for {32. Those values are identical to the ones you
computed in your Interval Estimate worksheet.
111'1 A B I c () I E I F I G H I I
1 SUMMARY OUTPUT
2
3 I Regression Stalistics
J.
i
+I Mwltip·le R 0_&2:04�5472
R�g,uam 038500•2221
Adjusted R Square 0.36.8818069
Tl standard-Error 89_51700429 I
-a-I Otiservations 40
foi ANOVA
11 1 df SS MS F S'E.nific.ance F
�Regressi on - 1 190626.!1788 190626_97BB 2'];_78884107 1 _94586E-05
Re·siduaJ 38 304505.174Z 8013.2'.94058 I
Total 39 495132-153
15
16 I Coefficients SlanciardEmo.r 1 Stat P'·VB/tJe -Lower95% eUpper95% Low�r 95 0% UeE,er 95 0% ,
1�11'!tercept 83_416.00997 43.4101 &19>2'. l 9>2'15 77(!51 0•.052182379 -·L46�26.n21 '171-2952&!77 -4_463267721, 171 .2%2S77
1. 8 )( Variable 1 10.2a.95425 2_0932534&1 4.877380554 1 : 94 586E·05 5-972.052202 14.4(1.72328 5_972052202· 14.4472328
Excel actually reported the interval estimate for {32 twice: in cells F18:G18, and again in cells
H18:118. The table is set so that, if you choose to, Excel will be able to report confidence interval
estimates, other than the 95% one.
Go back to your Data worksheet. From there, select the Data tab, the Data Analysis button in
the Analysis group of commands, and Regression in the Analysis Tools window. In the
Regression dialog box, check the box next to Confidence Level and type in 99. Select New
Worksheet Ply and name it Regression and 99% CI (for Confidence Interval). Select OK.
Input
Input;yRange:
!jelp
D �abels D Constant Is �ero
� Coniidli'.nce Lev.eJ: EJ %
Output options
0 Q1JtptJt Range: �1
@ New Worksheet �ly:
0 New !!/.orkbook
R.esiduals
0 8.esiduia!. 0 Residual Plols
0 Standardizi=d Ri=siduals D L[fli= Fit Phlls.
Normal PHlbabllity
0 �ormal Probability Plots
Alongside the 95% interval estimates, Excel now has also generated 99% interval estimates for
{31 and {32 (cells H16:118, shaded below):
Interval Estimation and Hypothesis Testing 75
I 8 c E F G H
TT$UMMARY A
OUTPUT
I I D I I I I I
I""fl
��
Rearession Slatk;tirxr
�4- Multiple R
§qu a:re
� Adjastet1 R Sgaare
0_620485472
0_385002221
0 _358818069·
+
------'- l
,_]_ Standard Error
8 0 bservafons 401
89. 517004291
-
;01ANOVA
11 1 df SS MS F Sig_aificance F
�i Regression
�y 1 1906-2:6_9'788 190&26_978ll 2'3-78884107 1 _94!i86E-O!i
1.3 Resi-dual 3a- ,'304505.1742 8013.294050
14 Total 391 495132.1-53
t5 1
1·5 I Goefflcierrts Slane/a.rd Error _!Stal P-va/ue Lower95% Upper95%. lowef-99.0% Uppei99 0% I
��
>-1 Intercept 8 3-4Hi00997 43_4101&192 1.92'15779'51 0·_062'182379 4.463267721 171.2952&T'7 -:n4,29314438 201.1251'643
1 6 )C Variable 1 10.. 2095425- 2_ 0932534•61 4_ B 773805-54 1 _�4586E-05 5_912052202 14-4472328 4-fi336(3 8051! 15"88564�341
The interpretation of confidence intervals requires a great deal of care. The true meaning of being
95% or 99% confident about our interval estimates is that, if we were to repeat this exercise of
drawing a sample size of N = 40, estimate the least regression parameters, and construct interval
estimates for those regression parameters, many more times, then 95% or 99% of all the interval
estimates constructed this way would contain the true parameters' values. To illustrate this
concept we are going back to our simulation exercise of Section 2.4.4.
In Section 2.4.4 we drew many random samples of size N = 40, and, based on each, estimated
the corresponding least squares regression parameters. We can repeat this exercise and extend it
to compute, for each sample, not only least squares estimates, but interval estimates as well.
Note that in Section 3.1.4 of Principles of Econometrics, 4e, 10 samples were randomly drawn
from a population with unknown parameters, while in this section we will draw 100 samples from
a population with known parameters.
In the simulation exercise we are considering in this section, half of our hypothetical population
of three person households has a weekly income of $1000 (x 10), and half of it has a weekly =
income of $2000 (x = 20). Because we know the data generation process, we know the values of
population parameters for the normal distribution, and consequently the values of our regression
parameters. Let µylx=io = 200, µylx=zo = 300, and var(ylx = 10) = var(ylx = 20) = a2 =
2500. This implies {31 = 100 and {32 = 10.
We will draw random samples of 40 households from our population. Half of each sample will be
drawn from the first type of households, with weekly income x = 10; and half of each sample
will be drawn from the second type of households, with weekly income x = 20.
First, insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the
bottom of your screen, next to the Interval Estimate tab. Name it Simulation.
76 Chapter 3
Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and Right-Align it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.
20
20
3 20
4 20
5 20
6 20
7 20
-B 20
9 20
10 20
11 20
12 20
13 20
14 20
15 20
16 20
17 20
18 20
-19 20
2:0 20
21
Next, use the Random Number Generation analysis tool to draw 100 random samples of
households.
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
A Random Number Generation dialog box pops up. Since we are drawing 100 random
samples, we specify 100 in the Number of Variables window. We first draw random samples of
Interval Estimation and Hypothesis Testing 77
20 from households with weekly income of x= 10, so we specify the Number of Random
Numbers to be 20. For simplicity we assumed that our population of households is normally
distributed, so this is the distribution we choose. Once you have selected Normal in the
Distribution window, you will be able to specify its Parameters: for x= 10, its Mean is
µylx=lO = 200 and its Standard Deviation is �var(ylx = 10) = a= 50. Select the Output
Range in the Output options section, and specify it to be B2:CW21. Finally, select OK.
Qls.trib\Jtioo: !::!elp
Pari'lllleters
M�an=
�
�dard deviation = �
B..amlom .Seed::
Outµ.A opti()flS
@ Quiput Range:
0 New Worksheet Ely:
0 New W.orilbook
Repeat to draw a random sample of 20 from households with weekly income of x= 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:CW41.
Parameters
Output options
@ Qutput Range;
This time we use the LINEST function to obtain the least squares estimates and their standard
errors. The LINEST function can compute the latter, if you ask it to return additional regression
statistics. For this purpose, the general syntax of the LINEST function is as follows:
The first argument of LINEST function specifies the y values; the second argument specifies the
x values; we ignore the third argument by putting a space between the second and third commas;
and the fourth argument, TRUE, indicates that we would like LINEST to return additional
regression statistics.
The LINEST function creates a table where it stores the least squares and standard errors
estimates in Excel memory. The following illustration shows the order in which they are reported:
78 Chapter 3
column 1 column 2
row 1 bz b1
row 2 se(b2) se(b1)
We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. The INDEX function general
syntax is as follows:
= INDEX(table of results, row_num, column_num)
The first argument of the INDEX function specifies which table to get the results from. The
second argument and third argument indicate the intersection of a row and a column at which the
result of interest can be found.
b1: =INDEX(LINEST(y-values,x-values,,TRUE),1,2)
se (b1): =INDEX(LINEST(y-values,x-values,,TRUE),2,2)
b2: =INDEX(LINEST(y-values,x-values,,TRUE),1,1)
se (b2): =INDEX(LINEST(y-values,x-values,,TRUE),2,1)
We will report our estimated coefficients and standard errors at the bottom of our table of random
samples. We will also compute our !-critical value and limits of our interval estimates (Lower
Limit: LL and Upper Limit: UL). Finally, we would like to count how many of our 100 interval
estimates contain the true parameters' values.
We will specify cells A42:B57 as shown below (we outlined some cells in different shades of
gray only to distinguish groups of similar or related cells which we comment on shortly):
A B
42 N= 40
43 a= 0.05
44 m= =B42-2
45 tc= =TINV(B43,B44)
46 b1= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,2)
47 se(b 1 )= =INDEX(LINEST(B2:B4l,$A$2:$A$4l,,TRUE),2,2)
48 LL= =B46-$B$45*B47
49 UL= =B46+$B$45*B47
50 fhin CI =IF(OR(lOO<B48,lOO>B49),"No", "Yes")
51 Yes' =COUNTIF(B50:CW50, "Yes")
52 b2= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,1)
53 se(b2 )= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),2,1)
54 LL= =B52-$B$45*B53
55 UL= =B52+$B$45*B53
56 lh in CI =IF(OR(lO<B54,lO>B55),"No", "Yes")
57 Yes' =COUNTIF(B56:CW56, "Yes")
Interval Estimation and Hypothesis Testing 79
In cells A42:B43, the N (sample size) and a values are specified so that m (degrees of freedom)
and tc (t-critical value) can be computed and reported in cell A44:B45. tc is computed as shown
in Section 3.1.ld.
Cells A46:B47 and A52:B53 are used to report and compute coefficient and standard error
estimates, as explained in Section 3.l.4c. The cell references to the x values are in Absolute
format, $A$2:$A$41, as opposed to Relative format, as we will be using the same x values for
all 100 repetitions.
Cells A48:B49 and A54:B55 are used to report and compute interval estimates, as explained in
Section 3.1.2. The value for tc will be the same over all repetitions; its cell reference is thus in
Absolute format, $B$45, in the formulas of the intervals limits.
We make use of the IF and OR logical functions to indicate, for each interval estimate, whether
or not it contains the true parameter value. The general syntax for the IF function is as follows:
IF(logical_test,value if true,value_if_false)
_ _
Logical_test is any value or expression that can be evaluated to be TRUE or FALSE. In this
exercise we want to determine whether or not the true parameter value, pk, is within the estimated
interval [LL, UL], where LL =bk - tcse(bk) and UL =bk+ tcse(bk)· The logical expression
we use is: if pk < LL or pk > UL. If pk is outside [LL, UL], then this expression is TRUE.
Otherwise, the expression is FALSE.
Value_if_true is the value that is returned iflogical_test is TRUE. For example, if this argument
is the text string "No" and the logical_test argument is TRUE, then the IF function displays the
text "No".
Value_if_false is the value that is returned if logical_test is FALSE. For example, if this
argument is the text string "Yes," and the logical_test argument is FALSE, then the IF function
displays the text "Yes".
We use the OR function to write our logical test. The general syntax of the OR function is as
follows:
OR(argument_1,argument 2) _
If the first logical expression, argument_!, or the second logical expression, argument_2, is
TRUE, then the OR function returns TRUE. It returns FALSE only if both arguments are
FALSE.
The general syntax for the OR function, nested in the IF function, is:
Applied to our exercise, the nested function looks like this (which is what we have in cellB56):
If flk is outside
[LL, UL], then the logical test flk <LL or flk > UL is TRUE, and "No" is
returned to indicate that flk is not in the estimated confidence interval. Otherwise, the logical
expression is FALSE, and "Yes" is returned to indicate that flk is in the estimated confidence
interval.
Finally, we use the COUNTIF function to count the number of times flk is found within the
estimated interval [LL, UL].
The COUNTIF function is a statistical function that counts the number of cells within a range
that meet a given criteria. Its general syntax is:
COUNTIF(cell_range,criteria)
Cell_Range is one or more cells to count. Criteria is the number, expression, cell reference, or
text that defines which cells will be counted. Since we are interested in counting how many
interval estimates, among all the ones we will construct, actually contain the true parameter value,
we will count the "Yes" that are generated following the application of our logical test (this is
what we do in cellB57):
COUNTIF(cell_range,"Yes")
Once you have reviewed and understood the formulas and values from B42:B57, you can copy
the content ofB46:B50 toC46:CW50 and copy the content ofB52:B56 toC52:CW56.
Here is how our worksheet looks like (only 10 out of 100 simulations results are shown below):
A 8 c D E F G H J K
42 Ill= 40
43 a= 0.65
44 m= 38
45 'le= 1.024394
46 b1= 163_162645 12:!L1E79 4i6.826J6i 1WW13 13 . 5.5 64J 85._4841>5 93.69496, 89.25071 117.0464 1l9.4847
47 se{b1)= 28.53373 22. 14-145 24.0()9091 23.8.1712 27-41891 25. 52'32'9 19241()2 19.19294 27-79757 22.4184
48 LL= !i.862943 83.33492 �i.7774�1 &2.�56'°3 80.0.5716 31-79'105 54_ 74354 50.39S6S 6.0. 77321_ 74.10106
49 UL= 121.39 172 981 95:4.30:22: 159L1S65 -, 91.()f1 139.178.2 132.6464 128_ 10491 173.3197 164:86!!4
50 �1 in -Cl Yes Yes Na Yes Yes Yeis Yes Yes Yes YEJ>s
51 Yes· �8
52 bi= 12 32048 7.215456 13:31 B9� 9'-29'7985 8.060182 11.0701>1 10.90295 10.74238 9'.0090 1 1 B.548776
53 seCb2)= 1.804631 1.4.00348 l .5164,6&, 1.506]27 1.734124 1.167748 l.2:169()9- 1.21386& 1. 758073 1.417864
54 LL= l:l.&67196 4,380-599 j 0:24497 S..248586 4.549531 7.674729 8.439441 82650·28 5.44998 5�6-i'B459
55 .UL= 15.97377 1 0 . 05031 1°16:39293 12.34738 11.57073 14.46i649 U.36645 1.3.199172: 12.56604 11.4190,9
56 S2 in Cl Yes Yes Na, Y�s Yes Yes Yes_ Yes Y�s Yes
57 Yes' 911
We find that 98 out of our 100 confidence intervals contain the true parameter value, both for our
intercept and slope coefficient confidence intervals. Note that you will draw different random
Interval Estimation and Hypothesis Testing 81
samples, obtain different interval estimates and thus obtain a different number of intervals that
will contain the true parameters values.
We first extended our repetitions to 1,000 samples, and found that 959 out of 1,000 interval
estimates contained {31, and 962 out of 1,000 interval estimates contained {32. Finally, we
extended the repetitions to 10,000 samples and found that 95.08% of both the intercept and slope
coefficients interval estimates contained the true parameters values.
In the next section of this chapter, we will perform hypothesis tests. To go over examples of
hypothesis tests, we are getting back to our simple linear regression model of weekly food
expenditure.
If the null hypothesis H0: {Jk = c is true, then the test statistic t =(bk - c)/se(bk) follows at
distribution with m = N - 2 degrees of freedom:
(3.4)
When we reject H0, we accept a logical alternative hypothesis H1. There are three possible
alternative hypotheses to H0:
(3.5)
(3.6)
(3.7)
reje t J-10:
�k=c
do 11ot
rej�ct H �
�k =c
Note that in this case the probability is a that a randomly drawnt-value is equal to or greater than
tc, where tc is defined as the lower limit of the right-tail of the distribution shown in the graph
above.
82 Chapter 3
If the alternative hypothesis (3 .6) is true, then the value of the computed test statistic will tend to
be unusually small. We will reject H0 if the test statistic is in the left-tail of the distribution.
1(m)
Note that in this case the probability is a that a randomly drawn t-value is equal to or less than tc,
where tc is defined as the upper limit of the left-tail of the distribution shown in the graph above.
Note that in this case the probability is a that a randomly drawn t-value will fall in the tails of the
distribution, either equal to or less than tca;2,N-2) or equal to or greater than t(l-a/2,N-2). Those
limits are shown in the graph above. (Note that those limits correspond to values -tc and tc first
defined in Section 3.1.lb.)
We illustrate the mechanics of hypothesis testing using the food expenditure model. We give
examples of right-tail, left-tail, and two-tail tests. Note that when the null hypothesis of a test is
that the parameter is zero, the test is called a test of significance. We can have one-tail tests of
significance or two-tail tests of significance.
Interval Estimation and Hypothesis Testing 83
Recall our estimated regression model; below the estimated values for b1 and b2, we report their
estimated standard errors, se(b1) and se(b2):
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Simulation tab. Name it Right-Tail Tests.
A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B 18
3 se(bk) = =Regression!C18
4 Ho: flk=
5 a=
6
7 Computed Values df or m = =Cl-2
8 tc= =TINV(C5*2,C7)
9
10 Rb?ht-Tail Test t-statistic = =(C2-C4)/C3
11 Conclusion: =IF(C10>=C8,"Reject Ho","Do Not Reject Ho")
We get the sample size N, estimated coefficient b2 and standard error se(b2) from our
Regression worksheet. All you have to do in each of cells Cl:C3 is, first, type the equal sign, and
then, select the needed value in the Regression worksheet with your cursor. Next, press Enter.
We are performing hypothesis tests on the slope parameter, {32. Cells C4:C5 are left blank for
now. Later, you will specify the value you hypothesize /32 takes, as well as the level of
significance of your test (a). In cell C7, the degrees of freedom are set equal to N 2, where N is -
Cell CS is where the critical-value for the right-tail rejection region is computed. Recall that all
the probability a of rejecting H0 is in the right tail of the distribution greater than or equal to tc.
The TINV function, on the other hand, gives us a tc value such that P(tm > tc) = a/2. So, what
we need to do, to get the correct critical-value for the right-tail rejection region, is to multiply the
specified a value by 2 in the TINV function (half of a x 2 is a, which is what we want).
Cell ClO is where the test-statistic t is computed. The test statistic is computed by plugging the
least squares estimate and its standard error into the equation fort in (3.4).
84 Chapter 3
Finally, in cell Cll, we use the IF function to determine whether or not our t-statistic falls into
the rejection region. If it does, we reject our null hypothesis; if it does not, we do not reject it (see
Section 3.1.4e for details on how the IF logical function works).
B c
N= 40
b;: 10.20964
3 .se{bl<)= 2.09326
· 3
4 Ho: Pk= 0
5 a= 01.05
6
7 C:omrmted Values dfo-rm= 38
6 le= 1.685954
9
10 Right-Tail Test t-statistic: 4.877381
11 C::onc�u�ion: Rejed H·o
Let a= 0.01; H0: {32 :::;; 5.5 and H1: {32 > S.S.
Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 :::;; 5.5
against the alternative hypothesis H1: {32 > 5.5 is exactly the same as testing H1: {32 = 5.5 against
the alternative hypothesis H1: {32 > 5.5.
A I B c I D
1 �ta Input N= 40
2. b. = 18•.20964
t
I-
3 se(bie)= �.0.93;;"63 -
I-
4 f-!o: �k = 5.5
-
T cr= 01.01
�-
7 C()mputed Values df OF m =
-·-
38
8 tc =
-
2.42B.568
9
10 Ri.ght-Tail Test· t-stati stic = 2.249904
>---
11 Condusio_n� _po Not Reje<:,t H()
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Right-Tail Tests tab. Name it Left-Tail Tests.
Simulation l Riaht-Tail Tests / £:! IV" ,-\ I SimulatloM . RIO'ht-TaHests J Left-Tai!Tests,. 'D "'11
t·F =ll�lI� � I
,
======�ll=n= W=
s� =rt = � s h e=et�Sh�=if
o= == 11
!=========== =
Interval Estimation and Hypothesis Testing 85
The left-tail test template will be very similar to the right-tail test template. You can copy cell
Al:Cll from the Right-Tail Tests worksheet to cells Al:Cll in the Left-Tail Tests worksheet.
Alternatively, you can select the whole Right-Tail Tests worksheet by left clicking on the upper
left-comer of the worksheet. Your cursor should turn into a fat cross as shown below:
Select Copy. Left-click in cell Al of the Left-Tail Tests worksheet, and select Paste.
m A II s I
N-
You will need to make just a few modifications to create the following left-tail test template:
A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!Bl8
3 se(bk)= =Regression! C18
4 Ho: Pk=
5 a=
6
7 Computed Values df or m= =Cl-2
8 tc= = -TINV(C5*2,C7)
9
10 Left-Tail Test t-statistic= =(C2-C4)/C3
11 Conclusion: =IF(ClO<=C8,"Reiect Ho","Do Not Reiect Ho")
The rejection region for a left-tail test is the mirror image of the rejection region for a right-tail
test; it is on the left-tail instead of the right-tail of the distribution. The critical value for a left-tail
test is thus the negative of the critical value for a right-tail test: in cell C8, we precede the TINV
function by a minus sign to reflect that.
In a left-tail test, we reject our null hypothesis if our !-statistic is less than or equal to our critical
value, not greater than or equal to our critical value as it is the case in a right-tail test; we adjust
the equation in Cll accordingly.
Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 � 15
against the alternative hypothesis H1: {32 < 15 is exactly the same as testing H1: {32 = 15 against
the alternative hypothesis H1: {32 < 15.
86 Chapter 3
A I 8 I c
1 Data Input N= 40
I-
2 b,,= ·rn.20964
I-
3
-
se(b�= 2.0-93263
,_
4 Ho: �k = 15
5 ci= 0.05
&
f--
1 Computed Values df or m = 38
8 t., = -1.6-85954
-�-
10 Left-Tail Test
-
t-statistic = -2.288464
11 Conc.lusion: Reject Ho
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Left-Tail Tests tab. Name it Two-Tail Tests.
Left-Tail Tests q R.icJht-T<iU Te.51:s / Lef-t Tall Tests l Two-Ta'il Tests, �:1
l.:-�.;;;;;;;
;; ;;;;
;;;;;;;; ;; ����
;;
The two-tail test template will also be very similar to the right-tail test template. You can copy
cell Al:Cll from the Right-Tail Tests worksheet to cells Al:Cll in the Two-Tail Tests
worksheet. Alternatively, you can select the whole Right-Tail Tests worksheet and copy it in the
Two-Tail Tests worksheet.
You will need to make just a few modifications to create the following two-tail test template:
A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B18
3 se(bk)= =Regression!C18
4 Ho: �k=
5 a=
6
7 Computed Values dfor m= =Cl-2
8 tc= =TINV(C5,C7)
9
10 Two-Tail Test t-statistic= =(C2-C4)/C3
11 Conclusion: =IF(OR(C10<=-C8,C10>=C8),
"Reject Ho","Do Not Reject Ho")
The rejection region for a two-tail test is split in half between the left-tail and the right-tail of the
distribution: only a/2 of the probability is in each tail of the distribution. So, we do not need to
multiply a by 2 in the TINV function any more: delete *2 in cell CS.
Interval Estimation and Hypothesis Testing 87
In a two-tail test, we reject our null hypothesis if our t-statistic is less than or equal to the left-tail
critical value, or greater than or equal to right-tail critical value: we adjust the equation in Cll to
reflect that (see Section 3.1.4f for details on how the OR logical function works).
A B c D
ii Data Input N= 40
2 b;.= 10-20964
3 se(b.i,) = 2.093263
4 �= l.6
5 .er= 0.05
6
7 Comp·uted Vilues df or m = 38
B le= '2.024394
9
10 Two-Tail T est · t-stati stic 1.29445 8
=
A B I c
-
1 Data Input N= 40
-
2 10.209'6'4
b,.=
-
3 se(b,;)= 2.0932&3
-
4 Ho:�= 0
-
5 II= 0·.05
6
---
Note that the t-statistic in a two-tail test of significance is equal to the !-statistic in one-tail test of
significance (compare the !-statistic value above to the one obtained in Section 3.3.la). Also note
that this t-statistic value for tests of significance is reported in the regression summary output
generated by Excel.
Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the left-arrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.
I Re a ressio n rv<.
0
•� '4 � �1
Ready I 'IC
88 Chapter 3
Column D of the last table of the summary output presents the t-statistic values for tests of
significance of the intercept and slope parameters, {31 and {32 (shaded cells below).
�i A
MMARY OVTPITT
I B I c I D I E I F G I H I
J[ Re11.reson
s1 Stil'tisfjcs
4 I Mult ipl e R 0.620485472
'
When reporting the outcome of statistical hypothesis tests, it has become standard practice to
report the p-value (an abbreviation for probability value) of the test. If we have the p-value of a
test, we can determine the outcome of the test by comparing p to the chosen level of significance,
a. This is an alternative to comparing the test-statistic value to the critical value(s) or limit(s) of
the rejection region for a test.
In order to explain the p-value decision rule for hypothesis tests, we first give a definition of the
p-value.
How the p-value is computed depends on the alternative hypothesis of our test. If H 1: Pk > c, p
is the probability that at-value be equal to or greater than the test statistic t value.
0 t
Interval Estimation and Hypothesis Testing 89
If H1: Pk < c, pis the probability that at-value be equal to or less than the test statistic t value.
t 0
If H1: f3k * c, pis the probability that at-value be equal to or less than - It I or equal to or greater
than It I, where t is test statistic value.
p/2
-t
l l 0 t
ll
We can see that when the test statistic value t falls into the rejection region, this means that its p
value is less than, or equal to, the level of significance a.
For H1: f3k > c; if t > tc, t is in the rejection region and p < a. The case illustrated below is
where t > tc, and p < a. H0 is rejected.
90 Chapter 3
reject Ho
0 fc = f (l-a,N-2) f
For H1: f3k < c; if t � tc, t is in the rejection region and p � a. The case illustrated below is
where t < tc, and p <a. H0 is rejected.
reject Ho
f fc = f(a,N-2) 0
For H1: {3k =F c; if t � tc on the left-tail of the distribution or t � tc on the right-tail of the
distribution, t is in the rejection region and p � a.
The case illustrated below is where t > tc on the right-tail of the distribution, and p <a. H0 is
rejected.
reject Ho reject Ho
a/2
tc tca12,N-2) 0 tc = t(l-a/2,N-2) t
=
Interval Estimation and Hypothesis Testing 91
The case illustrated below is where t < tc on the left-tail of the distribution, and p < a. H0 is
rejected.
reject Ho reject Ho
p/2
f fc = f(o12,N-2) 0 fc = f(l-o12,N-2)
We can thus compare the p-value of a test, p, to the chosen level of significance, a, and
determine the outcome of our hypothesis test: if p ::::; a, we reject H0 and accept H1; if p > a, we
do not reject H0. This is the p-value rule.
p-values are obtained in Excel by using the TDIST function. For hypothesis tests purposes, the
syntax of the TDIST function is as follows:
=TDIST(ABS(t),m,tails)
t is the value of the computed test statistic, ABS is a mathematical function that will return the
absolute value oft, mis the degrees of freedom, and tails specifies whether we are seeking the p
value for a one-tail test or a two-tail test. Set tails to 1 for a one-tail test, and set tails to 2 for a
two-tail test.
Go back to your Right-Tail Tests and Left-Tail Tests worksheets and add the following at the
bottom of each template:
A B c
12 p-value = =TDIST(ABS(C10),C7 ,1)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")
Go back to your Two-Tail Tests worksheet and add the following at the bottom its template:
A B c
12 p-value = =TDIST(ABS(C10),C7 ,2)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")
92 Chapter 3
Note that the hypothesis testing procedure for testing the null hypothesis that H0: P2 < 5.5
against the alternative hypothesis H1: Pz > 5.5 is exactly the same as testing H0: Pz = 5.5
against the alternative hypothesis H1: Pz > 5.5.
� A B I c I D
I' Oata Input N·= 40
-
2 b, = 10.20964
-
3 se(b,,) = 2-09326�
4 H<'l��-= 55
5
-
(l = 0�01
_ _§____
7 Comput·edl Values df or m = 38
8 t.c= 2_42856.&
-�
9
--
to Rig1ht-Ta.il Test t-statistic = 2'.249994
11 Co·�clusi6n: Do Not Reject .Ho
·12 -
p-'llalu·e �.·015163
=
Let a= 0.05.
A B c
1 Data Input N= 40
2 bi.= 10_20964
3 se(bk) = 2_093253
4 Hu: �1<_= 5_5
5 ll = 0_05
6
7 Computed Values dfor m = 38
8 tc-
- 1 . .585954
11 Concl�sion: Heject Ho
12 _p-value 0 . 0 1 51 & 3
=
Note that the hypothesis testing procedure for testing the null hypothesis that H0: Pz > 15 against
the alternative hypothesis H1: Pz < 15 is exactly the same as testing H0: Pz = 15 against the
alternative hypothesis H1: Pz < 15.
Interval Estimation and Hypothesis Testing 93
A I 8 I c I D
1 D11ta Input N= 401
,___.
-
2: bk= 10_20964
3 se(bk)= 2.093263
f-
4
·-
Ho: P'k = 1S
5 a= 0.01 -
,....._
5
3B·
� Computed: Values df mm=
a. r.., = -2..4285681
,___.
9'
,_____
10 Left-Tail Test t-statistic = -2.:288464
,___.
11 Conclusion: Do Not Re}ect Ho
'12 p-value = 0.013881
13 Conclusion: Do NotRajed Ho
Let a= 0.05.
A B c
Data lnp·ut N= 40
2 bi\:= 10-20'964
.3 s·e(bk) = 2_09'3263
4 '.15
5 a,= 0.05
6
7 Computed Values dform= 38.
a !.: = -1-685-954
9
10 Left-Tail Test t-statistic = -2.2684>64
11 Conclusion: Reject Ho
12 __ p-value = Q_QH881
13 -Conclusion: R·eject Ho
A B
Data Input N= 40
2 b·= 10.20964
]. se(bx) = 2.093263
4 Ho:�= 7.5
5 a= 0.0�
6
7 C•omput•.e.di Values df OF m = 38
8 tc = 2.024394
g,
10 Two-Tail Test t-statistic = 1.29M.5B
11 Conclusion: Do ':Jot �eject Ho
·12 p-val�e = 0.20331.8
13 Conclusion: Do N ot ReJect Ho
A B c
1 Q11ta. Input N= 40
2 b,;= 10.20964
3 se'(b1::)= 2.0'9'3263
4 Ho� �k = 'Q
5 o:= 0Jl5
e;
7 Compute<fValues dfor m = 38
8 t,,= 2..(}24394
9
t-statistic = 4.877381
Conclusion: Rej(?ct Ho
p-value- 1.95E-05
Ho
=
Conclusion: Reject
Note that the p-value for this test is very tiny. "l .95E-05" is a standard scientific notation which
means "1.95 times 10 exponent -5":
1 1
"1.95E-05" = 1.95 x 10-5 .95 .95 0.0000195
10s 100,000
= = =
Also note that this p-value for the two-tail test of significance 1s reported m the regress10n
summary output generated by Excel.
Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the left-arrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.
0
Column E of the last table of the summary output presents the p-statistic values for the two-tail
test of significance for the intercept and slope parameters, /31 and /32 (shaded cells below).
A I B I c I D I E I F I .G l K I I
1 SUMMARY OUTPLJT
T +
3 Hearession Slati:slics
-
4 Multiple R 01.1620485472.
,___
RSgua.re 0.38:5()02221
I
� Obsenra'tions 4 01
9
10 MOVA
I--
I
�!!.. ,df SS MS F Sfg_nJticance F
12 Regress-i Ctn 1 19()&26-9788 1!10626_9'7.BS 23. 7ss.s4·107 1.9458:6E-()5
13 Residual 3( 304505.1i4:2 80•1 i2940.58
14 Total 39 495132.153
15
16 Coefficients Slandam Error r Stat P-v;alae tower !15% Utmer 95% Lower95.0% Upper95.0%
17 lnteicl'!'pt g·3_4 1160 0997 43.4101619'2 1._92
• 1577951 O_Q.621823 79 4)153267721 17129528 77 -4.463267721 171.2%2877
18 X Variable 1 2.0932&3451 4.877380554 1._94586E-!li5. 5.9720522()2 14.4472328 5._ 97'2052202 R4472328
I--
10.2:1)%425
CHAPTER 4
CHAPTER OUTLINE
4.1 Least Squares Prediction 4.6.3 The Jarque-Bera Test for Normality for
4.2 Measuring Goodness-of-Fit the Linear-Log Food Expenditure Model
4.2.1 Coefficient of Determination or R2 4.7 Polynomial Models: An Empirical Example
4.2.2 Correlation Analysis and R
2 4.7.1 Scatter Plot of Wheat Yield over Time
4.2.3 The Food Expenditure Example and the 4.7.2 The Linear Equation Model
CORREL Function 4.7.2a Estimating the Model
4.3 The Effects of Scaling the Data 4.7.2b Residuals Plot
4.3.1 Changing the Scale of x 4.7.3 The Cubic Equation Model
4.3.2 Changing the Scale of y 4.7.3a Estimating the Model
4.3.3 Changing the Scale of x and y 4.7.3b Residuals Plot
4.4 A Linear-Log Food Expenditure Model 4.8 Log-Linear Models
4.4.1 Estimating the Model 4.8.1 A Growth Model
4.4.2 Scatter Plot of Data with Fitted Linear 4.8.2 A Wage Equation
Log Relationship 4.8.3 Prediction
4.5 Using Diagnostic Residual Plots
2
4.8.4 A Generalized R Measure
4.5.1 Random Residual Pattern 4.8.5 Prediction Intervals
4.5.2 Heteroskedastic Residual Pattern 4.9 A Log-Log Model: Poultry Demand Equation
4.5.3 Detecting Model Specification Errors 4.9.1 Estimating the Model
4.6 Are the Regression Errors Normally
2
4.9.2 A Generalized R Measure
Distributed? 4.9.3 Scatter Plot of Data with Fitted Log-Log
4.6.1 Histogram of the Residuals Relationship
4.6.2 The Jarque-Bera Test for Normality using
the CHllNV and CHIDIST Functions
In this chapter we continue to work with the simple linear regression model of weekly food
expenditure to make predictions, compute goodness-of-fit measures, and address modeling issues.
We also work with additional examples.
95
96 Chapter 4
A 100(1 - a)% prediction interval at value x0 of the explanatory variable is defined as:
(4.1)
2
where: 8 is the estimate of the error variance or mean square residual (MS residual),
The lower limit (LL) and upper limit (UL) of the prediction interval are:
LL = Yo - tcse(f) (4.4)
LL = Yo + tcse(f) (4.5)
Before we create a template to compute prediction intervals, we quickly re-estimate the food
expenditure model; note that this time we also want to generate the residual output. We are
interested in the Predicted Y values generated in this output. Also, since we will use more than
one data set and run more than one regression in this chapter, we will choose to give our data and
regression worksheets more explicit names.
Rename Sheet 1 food data. Re-estimate the regression parameters using Excel Regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Food Regression; and do check the box next to Residuals.
Prediction, Goodness-of-Fit, and Modeling Issues 97
-- - -- -
Reg.-essi�n - --- -- l1J(g]
Input
Input")'. Range,:
lnput.l\; Range:
[�]
O'kabels D Constintis �ero
0 Con6d!i!nte Le�el: @=] %
Oulj'JUI tlpt!bns
0 Qt;lputRange: �I
0 New W11r.k:sheet f'.ly:
0 New !!11.orld;loolc
Residuals
� 'R:e�d;;,aii' D Resi�al !>lots
D si�.J�;.i'iz!i!d Residuals D Line Fit Plofu
Normal Probabilicy
D !'iormal Probability P.lots
Next, insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of
your screen. Name it Prediction Interval.
l
I Insert Work.sheet (Shift �FHJ M
Create the following template to construct interval estimates. In the last column you will find the
numbers of the equations and the formatting options used, if any, in the template.
A B c
1 Data Input Sample Size= ='Food Regression'!B8
2 Confidence percentage
3 Xo =
4 b1 = ='Food Regression'!B17
5 b2 = ='Food Regression'!Bl8
6 se(b2) = ='Food Regression'!C18
7 MS residual= ='Food Regression'!D13
8
9 Computed a= =l-C2
Values
10 df or m= =Cl-2
11 tc= =TINV(C9,C10)
12 predicted Yn= =C4+C5*C3 (4.2)
13 x-bar= =AVERAGE('food data'!B2:B41)
14 se(f) = =SQRT(C7+C7/Cl+((C3-C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit= =C12-Cl1*C14 (4.4)
Interval
17 Upper Limit= =Cl2+Cl1*C14 (4.5)
98 Chapter 4
5 b2 = 10-20964
6 se'(b2) = 2.093263
7 MS msidual = B O U294
s
9 Comput,ed Values « = 5,3
10 df or rn = 38
11 t,, = 2.024.394
i2 preidicted rm= 287 .6089
13 x-i>ar = 19 _,50475
14 se(f) = gQ_'63D86
15
16 Prediction Interval Lower Limit= 104.1363
17 Up[>E!r Limit= 471.0'814
(4.6)
where: SSR is the sum of squares due to the regression (SS Regression),
and SSE is the sum of squared errors or sum of squared residuals (SS Residual).
Rz =
z
r:xy (4.7)
R2 can also be computed as the square of the sample correlation coefficient between Yi and
Yi = b1 + b2xi. This result is valid not only in simple regression models but also in multiple
regression models that will be introduced in Chapter 5.
Rz =
2�
r.yy (4.8)
Prediction, Goodness-of-Fit, and Modeling Issues 99
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Correlation Analysis and R2.
Create the following template (in the last column, you will find the numbers of the equations used
in the template):
A B c
1 Data Input SS Residual= ='Food Regression'!Cl3
2 SS Total = ='Food Regression'!Cl4
3
2
4 Computed R = =l-Cl/C2 (4.6)
Values
5 rxv= =CORREL('food data'!B2:B41, 'food data'!A2:A41)
6 r2xv= =CY'2 (4.7)
7 ryy-hat= =CORREL('food data'!A2:A41, 'Food
Regression'!B25 :B64)
2
8 r vv-hat= =C7A2 (4.8)
The sample correlation coefficients in cells C5 and C7 are computed using the CORREL
statistical function. CORREL returns the correlation coefficient between two data sets. The
general syntax of this function is:
=CORREL(cell_rangel, cell_range2)
In cell C5, we compute the correlation coefficient between x and y values, which we find in the
food data worksheet. In cell C7, we compute the correlation coefficient between y and y values;
the latter are found in the Food Regression worksheet, under the column labeled "Predicted Y"
from the residual output.
Here are the results you should get (see also p. 138 of Principles ofEconometrics, 4e):
A B
1 Data Input SS Residu.al = 3'04505.2'
2 SS Total= 495132-2
3
4 Compuled Values Rz= 0.38.5002
5 rX\' = 0.620485
6 �xy= 0.385002
1 =
ryy-11at 0.620485
-
8 r2yy-hat = 01.385002
100 Chapter 4
Note that ryy and R2 are actually reported in the summary output of your regression analysis:
cells B4:B5, shaded below (ryy is labeled "Multiple R" and R2 is called by its familiar name "R
Square").
I A I B
1 SUMMARY OUTPITT 1
,_
2
3 Reqression Statistics
�"lti�eR �.620¢.85472
R Square L0.38500.2221
Adjusted R Square 0.3'68818069
7 Standard Error 89.517Cl0429
-slohstirvations 40
In our food data worksheet, weekly food expenditure (y values) are recorded in dollars while
weekly income (x values) are recorded in units of $100.
Recall our estimated regression model. Below the estimated values for b1 and b 2, we report their
estimated standard errors, se(b1) and se(b2):
Yi 83.42 + 10.21xi
=
(4.9)
(se ) (43.41) (2.09)
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 10.21 units, i.e. $10.21. The interpretation of the estimated intercept coefficient is
as follows: weekly food expenditure for a household with zero income is estimated at $83.42.
Let x* = 100x. We change the scale of measurement of our x values so that weekly income is
now recorded in dollars.
Go back to your food data worksheet. In Dl, enter the column label x*=lOOx. In cell D2, enter
the formula =100*B2; copy it to cells D3:D41. Here is how your table should look (only the first
five values are shown below):
A B c D
1 food_exp income x..=100x
2 115.22 3.69' 369
3 135.98 4.39' 439
4 119.34 4_75. 475
5 ..
114 96 6.03 50.3
6 18'7.05 12.47 1247
We want to re-estimate the food expenditure model using our original y values and our re-scaled
x* values.
Prediction, Goodness-of-Fit, and Modeling Issues 101
In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be D2:D41. Select New Worksheet Ply and name it Food Regression lOOx (you do not
need to select Residuals).
------
• Regres.siorn -LIJ�
'.Input
�l OK
Input)'. Range::
(�] [ Cancel
Input 1( Range::
[�]
ttelp
D babels D Constant!is �ero
D Confidence Level:. (0 %
Output opfions
0 Qutput fl.ange:
®New WoFks!ieet Ely: IFood Regression 100x I
0 New '\!'.'l_orkboo�·
Yi 83.42 + o.1021xi
=
(4.10)
(se ) (43.41) (0.0209)
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
1 unit, i.e. $1, weekly food expenditure is expected to
as follows: as weekly income increases by
increase by 0.1021 $0.1021 or 10.21 cents. Note that this is equivalent to saying that
units, i.e.
as weekly income increases by $100, weekly food expenditure is expected to increase by $10.21;
rescaling the data does not affect the measurement of the underlying relationship.
Go back to your food data worksheet. In E l, enter the column label y*=y/100. In cell E2, enter
the formula =A2/100; copy it to cells E3:E41. Here is how your table should look (only the first
five values are shown below):
A I B I c I D I .E
1 foodi_exp in.come x'"=11}0x 'f=ylU>O
-
2 --
115.22 3.69 369 1.152:2
-
,3 135.98 4_39 439 1-3598
4 119.34 4.75· 475 1.-'.1934
-
5 114.96 6.03 603 1.1496
-� 187.05 12:.47 1247 1.87Cl5
102 Chapter 4
We want to re-estimate the food expenditure model using our original x values and our re-scaled
y* values.
In the Regression dialog box, the Input Y Range should be E2:E41, and the Input X Range
should be B2:B41. Select New Worksheet Ply and name it Food Regression divided by 100.
------------------- -
'. Regression
'
�L8J
Input
!J1Jput Y. Range: �
�
O !..ab€ls D Gonstll'ilttis;:'_ero
D Conjjaence !Level: �%
Output ep66ns
Q.·Qutput Range;
@New Worksheet �ly:: I;ion divided by 1001 l
0 New IJ!orkbcck
Residuals
Yi o.8342 + o.1021xi
(4.11)
=
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 0.1021 of a $100 unit, i.e. $10.21. The interpretation of the estimated intercept
coefficient is as follows: weekly food expenditure for a household with zero income is estimated
at 0.8342 of a $100 unit, i.e. $83.42. Again, note that rescaling the data does not affect the
measurement of the underlying relationship.
Go back to your food data worksheet. In Fl, enter the column label x*=4x. In Gl, enter the
column label y*=4y. In cell F2, enter the formula =4*B2. In cell G2, enter the formula =4*A2.
Copy the content of cells F2:G2 to cells F3:G41. Here is how your table should look (only the
first five values are shown below):
A I B I c I D I E I F I G
food_e:xp income. x"=100.x x"�.x:
_1_ y"=yJ11H) y*�}'
115.2'2 3.&9' 369 1.152'2 14.75:' 4&0.88
c-1---
3 ns_g.s 4.39 439 1_3S.9'8 17.56: 543.92
4 119.34 4.75' 475 1.1934 19' 4n.J.s
i---
We want to re-estimate the food expenditure model using our newly rescaled x* and y* values.
In the Regression dialog box, the Input Y Range should be G2:G41, and the Input X Range
should be F2:F41. Select New Worksheet Ply and name it Regression 4x and 4y.
1 Regrnssiorn LZJ[8]
Jnp:rt
illpllt Y. Range: $G52:5G$41
0 QutputRange:: I ·�l
@New W-0rksheet·e_ly: j. egression 4x a'rid 4y I
0 New W.orkOOok
Residuals
D B_esiduals D Reslgual plots
D Standardized Residuals, D L"!ne Fit :f'.!Otl
Normal Preb,abUi ty
D �ormal Pr.obability pJ. ots
Yi 333.66 + 10.21xi
(4.12)
=
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as monthly income increases by 1 unit, i.e. $100, monthly food expenditure is
expected to increase by 10.21 units, i.e. $10.21. The estimated monthly food expenditure for a
household with zero income is $333.41; this is 4 times the estimated weekly food expenditure for
a household with zero income (see Section 4.3.1). Again, rescaling the data did not affect the
measurement of the underlying relationship.
104 Chapter 4
In your food data worksheet, insert a column to the right of the income column B (see Section
1.4 for more details on how to do that). In cells Cl:C2, enter the following column label and
formula.
c
1 ln(income)
2 =ln(B2)
Copy the content of cells C2 to cells C3:C41. Here is how your table should look (only the first
five values are shown below):
· - _._
·
-A _
l B l c
+--
_ ______
In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be C2:C41. Select New Worksheet Ply, name it Log-Linear Food Model and do check
the box next to Residuals. Finally select OK.
r R�----- -f1j�
OK tiJ
Input
inputYRange: �:$2::$Aµ1 �
Cancel I
[nputKRange: I !iC$2:: $t �1 [iJ
Ol,,abels D Cornstant is �ero
!ielp· ]
D Con�dence Level: �%
Output op�on:;;
0 Qu:tputlRange:: !iii
® Ne•111 Worksheet Ely:; I Log-linear Food Model I
0 New �orKbook
Residuals
D ResiQUJal Plots
The result is (matching the one reported on p. 144 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 105
A I B I c I D I E I F I G H I I
1 SUMMARY OUTPUT
.
2 I
:3 R.egression Sl�lislics·
I '
4 Multiple H 0_5917084.978
f
'5 R ?quar� _ 0-35651;04 71
,___,__ I
t
- ·I
t3 Resrdwal 38 318612.3 5 59 8384,_535G82
'
14 Total 39 495132.153
1.5
�6 GoeHicients Sl<1ndani Error I Stef P-vil')Ue .Lowe.r95% Uooer95% Lower95.0% Uooer95.0%
r--
11 lnleKept. cS7.18641517 8423744235 -1.1537199'19 0.255620028. -2&7.716:2004 73.34337005' -267. 7162004 73.34337005
-
Ta x variabie.1 13z:1 s584.24 28_8 0461184 ··.uag357 ii .759'93E--os 73 8 53'g.54-77
_ 190.47773· 7f8.S395477 f 9fl._,f7773:
Note that your ANOVA table should be followed by a RESIDUAL OUTPUT table. This last
table contains a column of Predicted Y or fitted values and a column of Residuals values. We use
the fitted values in the next section.
'! A I B I c I
2:.2 RESJDUAL OUTPUT
I
·23
24 ObseNation Predicted Y .Residua.ts
25 1 75.37280548. 3�.84719-552:
2.6 2 98.330ll827 37.649-51773
27 3 108.747080-8 10,59291519
28
-- 4 140 -282'1-6,7 -2532216803
.5 23K31 1 059·4 --4!}-25105·644
�
Go back to your food data worksheet and select A2:B41. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.
-
Scattn -
c111urn·111
Cham
A,rea Srntf->J
� 1-'-S
Other
Charts T
fi
! Ll �
• a
.... ..
4()
35
•
3()
•
•
.. .. ....
25
T
••••
2()
••• .. ...�
. .. ,.. . . .- •Seriesl 1:
15 ..... .
• ••
t()
5
•
r
()
You can see that our food expenditure values are on the horizontal axis and income values are on
the vertical axis; we would like to change that around and edit our chart as we did in Section 2. 1.
(
The result is see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):
0
D
lD
....,. 0
.i; 0
l1' .
!!! �.
" 0
:= a
,,
"
"" . I
.
8. 0
. .
" <>
" "'
"C . . ·'
0 0
.e a . .
,,_ "' . .
::;;; .
" a . . .
0
"
!: ....
0
0 5 10 15 20 25 .:15 40
.... .. - -
Finally, we add the fitted linear-log relationship to our scatter plot. Right-click in the middle of
the chart area of your scatter plot and select Select Data. In the Legend Entries (Series) window
of the Select Data Source dialog box, select the Add button. In the Series name window, type
Fitted Linear-Log Relationship. Select B2:B41, from the food data worksheet, for the Series X
values; select B25:B64, from the Log-Linear Model worksheet (Predicted Y values) , for the
Series Y values. Finally, select OK. The Fitted Linear-log Relationship series has been added
to your graph.
� Eo·mat Pr.otAr,·a...
Seriesl DK G;J
Prediction, Goodness-of-Fit, and Modeling Issues 107
Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re
appears, select OK again.
r
1 S'0lffi:t Data Source
..
I Edit Se rie5.
J:l:!lart gata range; c=
The dlliD range is roo complex t Series g_ame·:
the serieos in ttie· Series.pal':lel.
[IJ '>!!lei
JP Series �values.:
Series Y 'lalUJes:
I ='fuod dat:l'!$A$2:-$A$41 liJ. = 1.l
OK_E;J
Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.
Ol!erl!ay Legemf at .Right
Sh1ow Leg1e-ndl at iight of
Ch;ntTool�
the chart
��
wbthouli resizing
ov'11rl.ay i�;;n di at L1eft
Show Legrend at ren of 1-'of
Chart Axi1 Leg:end D.ot:i Data
Title• Title>• • �f;;ablJ"ls • Tattle•
the chort wWlou1i re si:zin: g
Design Layo Format
Labe�
wC::s
Finally, we want to reformat our Fitted Linear-Log Relationship values series. Select the
plotted series in your chart area, right-click and select Format Data Series. A Format Data
Series dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.
Qel:ete
�-
Marker Fill
A.dtd lirendli:ne.,, Marker Fill
The result is (see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):
108 Chapter 4
0
0
"' • Adunl
"' 0
.!ii 0
11\ .
!:! -Fitted Linear-Log '·
" 0
:!: 0 R"latiombip
"Cl <t- • .r,
c
8. 0
"' 0
" m
] 0
.g 0
"'
.i:'
...
!II 0
" 0
....
s
0
0 5 10 15 20 2.5 30 35 40
we•!klyinoome in $100
y= 1+x+e (4.14)
First, 300 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4 and 3.1.4. The variable x is simulated,
using a random number generator, to be evenly, or uniformly, distributed between 0 and 10. The
error term e is simulated to be uncorrelated, homoskedastic, and from a standard normal
distribution, or e-N(0,1). We generate these simulated observations next.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Random Residual.
In cells Al:Bl of your Random Residual worksheet, enter the following column labels.
A B
x e
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
----- ------=---- '-
ta Analysis (1]�
Analysis, Tools
ti
Rank and Percentile
Regr=ion
Sampling
t-Test� Paired Two Saomple fur Means
t-Test: Two-Sample Assuming EQual Variances �I
A Random Number Generation dialog box pops up. The Number of Variables simulated is 1,
and the Number of Random Numbers generated is 300. The variable x is simulated to be
Uniformly distributed between 0 and 10. Select the Output Range in the Output options
section, and specify it to be A2:A301 in your Random Residual worksheet. Finally, select OK.
,. -
Random Numb-er Generation [1]�
Nw-riber of !£ariables.:
1�1.----�
Number of Random Numb_ers: �I
.3
_ 0
0 ____� �-C_an_ce_I �
Q.lstnbutiom I uniform
Parameters
Ri111dom Seed:
Outputop\iarui
@ Qutput Range:
We repeat to draw a random sample of 300 error term from a standard normal distribution. Select
the Output Range in the Output options section, and specify it to be B2:B301 in your Random
Residual worksheet. Finally, select OK.
-
1' Rarnidom Numltl·er Generation [1] �
Nll!Tlber of ilariables: lt
._ ___ __.I �
Ni.imber of Rilif'ldom NurnQers: l ::m_o
._
____
_.I I cancel ]
Q.istribution:
�IN _rn ma_ _I _____
v I [ ttelp ]
�
Parameters
M�an=
!CJ
:i_randard deviation = �
Random Seed:
Ou1put oplions
In cells Cl:C2 of your Random Residual worksheet, enter the following column label and
formula.
c
y
=l+A2+B2
Select cell C2 and copy it to cells C3:C301. Here is how our worksheet looks (only the first five
values are shown below):
A B c
x e y
4-405957 0.998193 6.40415
'9.518723 1.011883 11.53061
3.821223 -0.0063 4.812 922
5 :2.649'922 - 0 . 4 32 0 1 3.217908
6 3.976562 0.25586 5..23:2422
Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.
Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.
In the Regression dialog box, the Input Y Range should be C2:C301, and the Input X Range
should be X2:X301. Select New Worksheet Ply, name it Simulated Model 1 and do check the
box next to Residual Plots. Finally select OK.
- - ----�
--== - - -
Regr,es�ion llJ�
Input
In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.
After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.7
p. 146 of Principles ofEconometrics, 4e):
z
•
• I
.. • 4ii • •
,. •
, . ••:
l - .. : .,... , •e •. ... .. ..
•
:.
••
4e45 • I
-: •
.., :
: I
• • • . , 41 \.,. • • .I • "
• ••• •
= • • • • • •
....•• "'1i
ll 0 I • ••
..., I .. ...
••
i II .Ii ;.. :· I ol
•
U •• a : I ..• • I •; 6 • • A • ••• e e • II .. �
•
-l : • ,: • • .. � • I fl..:.: .:
• •• II I e • I .1-·
•
.. z
j ' • • .. •
• •
•
-� -+--���������������
0 2 4 6 8 10
Go back to your food data worksheet, select your scatter plot of food expenditure-income data
points and fitted linear-log relationship and make a copy of it. Right-click in the middle of the
copy of your chart. Select Select Data. In the Legend Entries (Series) window of the Select
Data Source dialog box, select the Fitted Linear-Log Relationship series, and then the Remove
button.
' - ������-
3-1:! Rotcuon.
Next, select the Actual series, and then the Edit button. In the Edit Series window, replace delete
the old Series name and re-specify r the Series Y values to be C25:C64, from the Log-Linear
Food Model worksheet. Finally, select OK, twice.
,. -- - ---
�
I Select ITlata S0>urce
Edit Series
chart !l_ara ranQ€: c= Series name�
The data�ange is 'too corn,Ple:x t
the series irn '!he Series panel.
Senes X �alues.:
-
J� �_cd �d_a:ta_;!-'-�-'--
l,_·rn_ -' -'-4-
- $2::58
---'--$ IJ � 3.
1 - --�•
=
OK fiJ
The result is (see also Figure 4.8 p. 146 of Principles ofEconometrics, 4e):
112 Chapter 4
. " '·
.
.
. . .
: . .. .
10 20 .3 0 40
I mcomein S 1!!0
y = 15 - 4x2 + e (4.15)
First, 50 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4, 3 .1.4 and 4.5.1. The variable x is
simulated, using a random number generator, to be evenly, or uniformly, distributed between 0
and 10. The error term e is simulated to be uncorrelated, homoskedastic, and from a normal
distribution with mean 0 and variance 4, or e�N(0,4). We generate these simulated observations
next.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Specification Error Residual.
In cells Al :C2 of your Random Residual worksheet, enter the following column labels and
formula.
A B c
1 x e
2 1 =2.5-((A2-1)/10)
3 2
Select cells A2:A3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell A52.
A J
1
-
2 1.1
3
. ' �
Prediction, Goodness-of-Fit, and Modeling Issues 113
Copy cell B2 to cells B3 :B52. Your table should look as the one below (only the first five values
are shown).
., A I B I c
1
�
x e
2 1 '.2-S
3 2 2.4
_!._ 3 23
5 4 2-2
6 5 2_ 1
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
, -
H
Rank and Percentile
Re:gre.ssio n
Sampling =
We draw 1 random sample of 51 error terms from a normal distribution with Mean 0 and
Standard Deviation 2. Select the Output Range in the Output options section, and specify it to
be C2:C52 in your Random Residual worksheet. Finally, select OK.
Mg_an=
�
::trandard deviation = �
B.andom Seedc.
OUtputoptions
0 Quti:>ut-Range: I $1::$2: $C$52
In cells Dl:D2 of your Specification Error Residual worksheet, enter the following column
label and formula.
114 Chapter 4
D
1 y
2 =15-4*(A2A2) +B2
Select cell D2 and copy it to cells D3:D52. Here is how our worksheet looks (only the first five
values are shown below):
A B c D
x e J. -
1 2_5 2.72.3068 _7_275g3
2 2-4 -0_50477 -8_54477
3 2.3 1_115236 -5_04476
4 2-2 2_916886 -1_44311
5 2__ 1 2.982706. 0.342706
Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.
Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.
In the Regression dialog box, the Input Y Range should be C2:C52, and the Input X Range
should be A2:A52. Select New Worksheet Ply, name it Simulated Model 2 and do check the
box next to Residual Plots. Finally select OK.
Input
Input :t_ Rar.ige:
Cancel
InputlIR,,,,ge: �$2::$11$52 �
ltjelp
D loaliels D CenSctallt is f_ern
0 Coojider.ice Level: EJ <>r.
Output options
0 QutputRange: l 'sm0roted !odd� �I
0 NeVi' W"rksheet E'.IY: I S"wnulated M"del � I
0 New �orkbook
In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.
- -- -
�
29 5 6_035026236- -5-'692320172
After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.9
on p. 147 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 115
15
10
.Iii 5
..
::I
JZ ()>
,;! -5
-10
-15
-20 -1-----.,.----.---.---,,...---.
-3
Our analysis of normality of the regression errors will include a histogram of the residuals and the
Jarque-Bera test for normality.
Go back to your Food Regression worksheet. If you do not see your Food Regression tab, it is
because it is hidden. Use either one of the left-arrows at the left comer of your screen so that the
first worksheets you were working with can be seen again. (If the worksheet you need to go back
to is a recently created one, use the right-arrows.)
�
�
Next to the columns of Residuals in the residual output section of the worksheet, we will create a
BIN column. In cell D24, type BIN. The bin values will determine the range of residual values
for each column of the histogram. The bin values have to be given in ascending order. Starting
with the lowest bin value, a residual value will be counted in a particular bin if it is equal to or
less than the bin value.
Fill in the bin values as shown below. Note that all you need to do is enter the first two values:
-225 and -200, select cells D25:D26, move your cursor to the lower right comer of your selection
until it turns into a skinny cross as shown below, left-click, hold it and drag it down to cell D43:
Excel recognizes the series and automatically completes it for you.
116 Chapter 4
D I
24 BIN
2
-
25
25
26
27
28
-200
-1'75
-150
1
29 -125
30 -100
31 -75
32 -50
33 -25
34 0
35 25
36 50
37 75
38 100
39 125
D J 40
41
'1'50
175 .I
��I
2
. + r--1\
:���I I 42
43
200
225
7J T:
E:::========::::::::!I
. ,
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
Ii& Data.�rcalysls I
-
nata[;j�11te�
- -
I I
I
I
"Ila5
E'orn:i J.ln.arym
The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
analysis Tools
Covariance
Desi::i-iptive Stalistirn
'Exponential Smoothing
.f-TestTwo-Sam ple for V;;irianc:es
1�Fonum
r ie�r�M l ·� s tl_elp
iai
1i#ij!.Ji. I'"' r 1........ ......
I .. ..·.�
'Mov.ing .Average �
'RaAdom Number (:;ener.ation
Rank and Percentile
'Regression -vll
An Histogram dialog box pops up. For the Input Range, specify C25:C64; for the Bin Range,
specify D25:D43. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Residuals Histogram; check the box next to Chart Output. Finally, select OK.
Prediction, Goodness-of-Fit, and Modeling Issues 117
r -
Hi stogram tz:J�
Input
Input Range:
[�] DLt1
!:l.inRilflge: �
't!elp
0!,abels
Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.
-
· - ----- _-
3 DRotat1r;m, Shadow
Gap�dth
Add Dato LaQels 3-0 Format
No Gap Large Gap
Add lirmdline ...
0 Agtomatic
Shadow
3--0 Format
�olor: I��
]"ansp �Col'or] 0 [0% �1 Close.ti]
Finally, delete the Legend, and increase the size of the Chart area (see Section 2.3.4 for more
details on that). The result should be very similar to Figure 4.10 on p. 148 of Principles of
Econometrics, 4e:
118 Chapter 4
Histogram
Dim
4.6.2 The Jarque-Bera Test for Normality using the CHllNV and CHIDIST
Functions
When the residuals are normally distributed, the Jarque-Bera statistic UB) follows a chi-squared
distribution with m = 2 degrees of freedom:
]B =
N
6
( S
z
+
(K - 3)2 ) "'X(m=2)
4
z
(4.16)
where S = µ3
0'3
is a measure of skewness and K = �a 44 is a measure of kurtosis,
where (4.17)
(4.18)
(4.19)
If the hypothesis of normally distributed residuals is true, there is 100a percent chance that the
computed ]B statistic is equal to or greater than the chi-square critical value Xci-a,m)· If the
computed ]B statistic is equal to or greater than the chi-square critical value Xci-a,m)' then this
presents us with evidence that our hypothesis of normally distributed errors is false; we thus
reject it.
Prediction, Goodness-of-Fit, and Modeling Issues 119
2 reject Ho
X(m)
2 x_'L value
X(1-a,m)
We will create a template for the Jarque-Bera test for normality. But before we do that, we need
to go back to our Food Regression worksheet to perform intermediate calculations.
Before we compute the measure of skewness S and the measure of kurtosis K, note that since
2
L( ei - �4,
3
� = 0, the numerators of equations (4.17)-(4.19): L( ei - �) , L( ei - � , and can
.
s1mpl.1fy to.
. � "2, .t...
.t.... e
� "3
. e and .t.... e"4 .
�
i i i
To the right of the residual output section, create the following table:
F G H
2 3 4
24 residuals residuals residuals
25 =C25/\2 =C25/\3 =C25/\4
Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H
24 Residuals? Residuals3 Residuals"
25
-- 34A_S208433 -202219603 1186.945115
2'6 59_.�§41�98 �,. 464.3421034 3595.705.263
f--
27 158.0505536� -1986.98245 24979_9n4s
,..__
28 901-2:097207 -2:7054.4557 812178.9608
r--
2:3 560.7541899 - 1 3278 798
� -- -
. 8 -314445.2614
-- - -
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Jarque-Bera Tests.
lt:J f'{ 1
I 1 rnre.lit Warkihe.•t CShi�-Fli) �
A B c
1 Data Input N= ='Food Regression'!B8
2 a=
3 dfor m= 2
4
5 Computed a-tilde= =SQRT(SUM('Food Regression'!G25:G64)/Cl) (4.17)
Values
6 µ3-tilde= =SUM('Food Regression'!H25:H64)/Cl (4.18)
7 µ,i-tilde= =SUM('Food Regression'!125:164)/C1 (4.19)
8 S= =C6/C5"'3
9 K= =C7/C5"'4
2
10 x -critical =CHllNV(C2,C3)
value=
11
12 Jarque-Bera JB= =(Cl/6)*(C8"'2+((C9-3)"'2)/4) (4.16)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")
14 p-value= =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")
2
The x -critical value is computed using the CHIINV statistical function. For our purpose, this
function syntax is:
=CHIINV( a,m)
where a is the level of significance of the Jarque-Bera test, and m is the degree of freedom of the
chi-squared distribution.
The p-value is computed using the CHIDIST statistical function. For our purpose, this function
syntax is:
2
=CIDDIST(x -value,m)
2 2
where x -value is the x -critical value for which we are computing the p-value, and m is the
degree offreedom ofthe chi-squared distribution.
At a= 0.05, the results ofthe Jarque-Bera test are (see p. 148 ofPrinciples ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 121
A B c D E F G
1 Data Input N= 40
2 a= o_o5
3 df or m = 2
4
5 Computed Values cr-ti'lde = 87_250383
6 µ.,.-tilde = "�.�39,..66
7 Jl�tilde = 173:220834
8 s= -0_ 097319
--r
9 I{ = 2 _9'890333
10 I-critical value = s:!l-914645
'11
12 Jan1u�-Bera Test -
JB = 0.0633402 �
14 p-valua = 0_9'680262
1.5 Cenci us ion= Do not reject the l'lypoth·esis of normally distributed emars
4.6.3 The Jarque-Bera Test for Normality for the Linear-Log Food
Expenditure Model
We first go back to our Log-Linear Food Model worksheet to perform intermediate calculations.
To the right of the residual output section, create the following table:
F G H
2 3 4
24 residuals residuals residuals
25 =C25"'2 =C25/\3 =C25/\4
Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H I
2 � 4
24
--
Resid'uals Residuals Resid'uals
25 1587_ 79ll991 6326�.33&84 2521105.635
--'-
1.§. 1417-493715
..
53368.09651
.
.20092
1 68.432'
Now, we are ready to modify a few cell references in our Jarque-Bera test template.
Replace all references to the Food Regression worksheet to the Log-Linear Food Model
worksheet (see outlined below in bold).
A B c
1 Data Input N= ='Log-Linear Food Model'!B8
2 a=
3 df or m = 2
122 Chapter 4
A B c
5 Computed a-tilde= =SQRT(SUM('Log-Linear Food Model'!G25:G64)/Cl)
Values
6 µ3-tilde= =SUM('Log-Linear Food Model'!H25:H64)/Cl
7 l.14-tilde= =SUM('Log-Linear Food Model'!I25:164)/Cl
8 S= =C6/C5/\3
9 K= =C7/C5/\4
10 X,2-critical =CHIINV(C2,C3)
value=
11
12 Jarque-Bera JB= =(Cl/6)*(C8/\2+((C9-3)A2)/4)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")
14 p-value = =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")
At a = 0.05, the results of the Jarque-Bera test are (see p. 149 of Principles ofEconometrics, 4e):
A B c D E F G
1 Data Input f\I= 40·
2 II= 0.05
3
4
1df o� m = 2
t
5 Computed Valu�s er-tilde= 8:9.248579
(); µ3-tilde = 99251.00
7 �-tilde= 20Q.3.353n
5, S= 0.�3961-45
9t K= J: . 2048499
10 i-criticaJ va'lue = 5'.9914645·
.
H
·12 Jarque-Bera Test JB =
0.'.1998875
-
13 Conclusion= Do �ot rej.ect the. hypothesis of normally_distributed emors
�4 p_-value =
01.S048883
15 Conclusion= D_o nat_�eJect the_hypa:the_sis of n_grmally_distributed en:rors
Open the Excel file wa-wheat. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it wa-wheat data, and in it, copy the data set you just opened.
This data set gives average wheat yield for different regions of Australia for the period 1950-
1997. Time is measured using the values 1, 2, ..., 48 in column E. We would like to plot the
yield data for the Greenough Shire area, reported in column D.
Select the Insert tab located next to the Home tab. Select D2:E49. In the Charts group of
commands select Scatter, and then Scatter with only Markers.
Scatter
Cllarts r.
50
•
# •
....• .
40
.,
;
•"·
..
'30
•Series1
20
-
- .••- \ I:
. ...
• #
10
·-�
.. - ...
0
�· .. .
0 0.5 ·1 1.5 2 �5
: - - ...... - - ..
You can see that our yield values are on the horizontal axis and our time values are on the vertical
axis; we would like to change that around as we did in Chapter 2 with our plot of food
expenditure data. Select the points on your plot, right-click and select Select Data.
·60
50
•
40
•
.Qel.ete
so
+.s.eries1
,l:J Reset to M:i!_tch Style
.20
_1£11 -Change.SHies Charil:Type...
10
• lliJJ S:i;lect D.atta ....
ht
3 D B�ta'J1rn,,
• -
Adidl Data La!):els
0
AddiT1endlin• ...
0 0..5 1 L5 z Z_'i
� I �S�hli Ro.,,/Column J 9
L =e
r;=e i. es �'=
=
n d=En=tr
g= er=
ies=� ===;-;====-==7---, ,Hori2orit:aJ (!;;ategory) Axis Label£
=;;i'
lk � ��=dd�..
� ll
rN..,
...., · �d
.... it"*'t'J�I
=X=�=c w=
em=
u
--1 -1 ' 2{ E 1 I
_ •
e [_'.'_ J ll
Seriei:l 0.9141
0.6721
0.71.91
O.nlill
o.:ms
In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select E2:E49. Highlight and delete the text from the Series Y
values window. Select D2:D49. Select OK.
- . .
I Edit Se ries m� Edit Series - �rg)
Seriesoame: Series :[lame :
OK � [ Cancel OK Cancel
The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
time are the X-values, and yield are the Y-values - not the other way around.
After editing your chart like you did in Sections 2.1.2a-2.1.2c, the result is (see also Figure 4.11
p. 150 of Principles ofEconometrics, 4e):
J '
..
L5
:II .
� .
.
.. . .
. ..
. .
1
. ..
..
OS
0 10 20 30 40 �o
Tunec
Prediction, Goodness-of-Fit, and Modeling Issues 125
In the Regression dialog box, the Input Y Range should be D2:D49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Linear Equation Model; and do check the
box next to Residual Plots.
- ������� --
!Input li Ran.ge.:
·l::!elp
D \_abels D Constant is G_er()
D Con�dence Level: EJ %.
Output option&
0 Qutput Range: �1
0 New WorkSheettely: jLinear E:quation M1>d;el I
0 New Woi\Cbook
itesidual,;
DResiduals
The results are (only part of the residual output section is shown below; the residual plot is not
shown at all) :
A I B .C I D I E I F G I H
1 SUMMARY OUTPUT
-;zi--�����-
3 Reores:;ion Sraostir;s
4 Multiple R (J.805849601
5 R·squaM� 0.6'4939358
T Adjusted R Squ.are a.'641771101
- Standard Error
T
8
·
Ohs(!rvatic.ns
0.:21B69Zz34
48
_;�ANOVA
11 ] df SS MS F Signific1tnce F
1 4.074859899 4.074859899 85.20124832 4.B7517E-12
4,6 2.200009496 0 . 04 7826293
47 6.274869'3.95
Coeffidents Standard Error t Slat- P-vaJue lower 95% Upper 95% lower 95. 0% Upper 95. 0%
I ntsrc_spt
J..l.j tL!J.37777837 0_064130508. 9'.944999006 4,.fi492E-13 0.5.QB689822 CJ.7661Hi5B52 .
0. 50868
. 9822' 0,_7'6&865852
1"8 X Vanab.le 1 () ..021031942 o.06227as.:fo �.23o4s2221' 4.87577E-12 ().01G445482 0_02s6ls·402 o.o 16'i4s4si o.01s&1s402
Predrr;fed Y Re.siduaJs
1 (J.:�5fl80,9�?9 0-255290e21
2 a:&79B41721 -o.o"Qn,f1121
3 ()_ 700873663. 0.01822.&337
126 Chapter 4
The estimated linear equation model is (see also p. 150 of Principles ofEconometrics, 4e):
0.8
0.6
0.4
. . .
. . . . ..
.Jll
"'
0.2
"'
3Z
. ..
-··.
. .
••
••it
,i (J
..
-0.2 . . . ..
-0.4
...
-0.6
(J 10 20 s;o 40 5()
Time
Note: to draw the horizontal axis below all the points, select the vertical axis on your chart, right
click, and select Format Axis. In the Format Axis dialog box, under the Axis options panel,
select the Horizontal axis crosses at the Axis value -6.0. To draw an horizontal line at level 0 of
the residuals values, select the plot of residuals on your chart, right-click and select Add
Trendline. Choose the Linear option, and Close.
.Qelete
r
- ------ ------
� R.-�et to M�tch Style
-- 1 Format T ren d line
I Add�-Data La_!!els
Line Style
Shadow
Jl£J 0 EJgJOnenbal
AddlTrendU n,.. ... � JV!.'J �binear
� .Em•rmat Datta 5eries ...
Close c;J
Let TIMECUBEt = TIMEt /1,000,000: our explanatory variable is redefined as our original
explanatory variable, cubed; and it is also rescaled before the equation above is estimated.
3
Go back to your wa-wheat data worksheet. In Fl, enter the column label time . In cell F2, enter
the formula =(E2A3)/1000000; copy it to cells F3:F49. Here is how your table should look (only
the first five values are shown below):
I D I E I F
1 gre.enoug,h time time3
l 0>.9r141 f Oi.000001
3 0..6721 2 Oi.000008
'4
15 o.i1s1 3 01.000021
0·.7258 4 0>.000064
£ 0.7998 5 0'.000125
We want to re-estimate our wheat yield model using our original y values and our re-defined and
re-scaled x values.
In the Regression dialog box, the Input Y Range should be D2 :D49, the Input X Range should
be F2:F49. Select New Worksheet Ply and name it Cubic Equation Model; and do check the
box next to Residuals Plots.
,- - ------- - -
0 Quq:utRange: .�1
0 New WoFllSheete,ly: [ Cubidquaficn Modell I
0 New �orkboolc
Residuals
0Boe'liduals � Re'liQ_ual Plots
The results are (only part of the residual output section is shown below):
128 Chapter 4
A I B I c I D I E F I G I H I I I
�
I
2
SUfl!'IMARY OLJTPLJT
J( Rearession Statistics
�M,IUpoR •Oi.86&495734
R Square 01. 750814858• -
Adj u sted R Square 0.745397789
S1and:ard Error 01. 1 84367557
0 bs ervatio ns 48
1901ANOVA
11 I df SS MS f Sig_nrficance F
J_?_ Regressi'on 1 4.711265172 4.7112&5172• 1,3S.:50'16965 1. 76303.E-15
,.11 Resi-Ow1I
14 T�t�I --
46:
47
1.56:i604223
6.274869395
0_03399139'6.
. -�· .... ...--
1
I
15 I
161 Uee_er 95% Lower95.0% Upper95.0%
�lnterVG1crie-po11blle
X 1
Coefficients Standa!d Error
(}_8.7411<6-582
9.68151584.
t Stet
0.0·35'63066'3 24.532702.71
0.1322354527' 11. 7729<2217
P-val11e
4.6(}22-3 E-28
V680.3E-15
fo1•;redl5%
!H02395IG9i 0:9458373 96, O<. $0�3�5:76,9
!L 026202058. 11.336829&2 8. 02620205 ll
Q.945837.396·
11.33682962
-
19
JQ_
21
f---I-
22 RESIDUAL OLJTPLJT
,___
:23
241 Observalion Predicted Y Residuals
1 0.8.74126?64 Q.()3991373&
�6 2 0.8741941:)34 _:_iL2D�.094034
rm 3 0:874377983 -0.1552n9B3 I
The estimated cubic equation model is (see also p. 151 of Principles ofEconometrics, 4e):
Notice that when you choose the Residual Plots option in the Regression dialog box, Excel
generates a plot of the residuals against the explanatory variable, which, in this case, is
TIMECUBE. We would like to have a plot of residuals against time instead. Select the data point
in your chart, right click and select Select Data. A Select Data Source dialog box pops up. Select
Seriesl and then Edit. In the Edit Series dialog box, change the Series X values references to
E2:E49. Finally, select OK, twice.
,., - .
S-9'lec:t Data Source
Chartgatar:ainge: c=
1he data range is too c.omplex to
e.
!he s r ies in the Series paneL . ������
� Re>etto M�tch·S:tyle J�
1Le!iiend Entries (§eries)
Change Se<ies Chart Type.. .
l.q S�led i)a�a... � Seri�s K values:
2.1 or
After editing the chart as we did in Section Section 2.3.4, the result is (see also Figure 4.13
on p. 151 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 129
0.2 . .
�
. �
Cl.:l! ..
Jll . ..
.. 0
.
=
:!:I
-ll.ll ... .. ..
.�
•
..
.. .
..
--0.2 .+
-{)_3
-0.4
--0.5
() 10 20 :!O 50
Time
where y; = ln(YIELDt); i.e our dependent variable is redefined as the natural logarithm of our
original dependent variable.
In your wa-wheat data worksheet, move your charts to the right a little bit if you would like. In
cell Gl, enter the column label ln(greenough); resize the width of your column so it fits the new
label. In cell G2, enter the formula =ln(D2); copy it to cells G3:G49. Here is how your table
should look (only the first five values are shown below) :
D I E I F I G
,_
1 gre.enough time tim!!3 ln(gre-::nough)
2 (}_9141 1 1E-06· - 0 _ 089'815304
3- ()_6721
- 2 BE-05 -0_39'7348.14
-
,_
4 0>_7191 3 3E-05· 0 _ 329'754849
--
-
We want to re-estimate our wheat yield model using our original x values and our re-defined y
values.
In the Regression dialog box, the Input Y Range should be G2:G49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Growth Model.
130 Chapter 4
Input
Input� Rar;ige:
Input lt Range:
I :$G$2: $(;s49
:$E$2::$E:$49
�]
�
� el
tielp
D b.abels D t;onstant:is �ero
D Con�deni:;e Level: �%
output oi;i�i;rns
0 QufputRarige: �1
@ New WGrk:ihleet Riv: IGrowth Modell I
0 Ne111J �orkbook
Residuals
I I I I I I
A B I c I D E F G fl I
mSUMMARY OUTPUT I I I
l
JI Reg_re;i.sjon S/ati:>fic;:; I l
4 Multirile:R 0.785168587
f-"-
5 £l. �qua�e o.51648911 •
-
c-§__ Adjusted R Square -o.1ios1s.2s:i
7 Standard Errm 0.1'.'l9164869
r--- '
8 Ol:iservations 48
'
9
c-- j
1 0 AN OVA
11 df SS MS F Sr11.nifica11ce F
t2
f----
Regres'Siol'l 1 _2 9·3313542 2!}3313'542 73. 944£3042 3.9'3229E-11
13 Re,sidual 46 1.8.24655579 0 .0396-€>&645.
c-- 1
14 Tota.I 47 4. 7s.rno 1099
15 [
t61 Coefficienft; Standard Enor t Slal P-rnlue lowr
e 95% UeP_er'95% lower95.0% Upper95.D%
H j lnterce19t -0. 3 43366453 0. 0•5 8404196 -.5.8791400!34 4,�9317E-07 - (). Mi,0> 928 0 0 1 -0.2258049'05, -0.460928001 -(). 225.804905
Ts ' x vaiia'l>le 1 b� Q,j 7843872 0.0 0'2075084 8.599106374 3.93229E-11 0-. 013666943 (LOi20,2:08- 0.013666 943 - ·a_o.2262osl
The estimated growth model is (see also p. 153 of Principles ofEconometrics, 4e):
Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it cps4_small data, and in it, copy the data set you just opened.
l
[Insert Worbheet (Shift-FilJ M
Prediction, Goodness-of-Fit, and Modeling Issues 131
This data set gives information on hourly wages, years of education and other variables. Based on
this data, we would like to estimate the following wage equation:
(4.26)
where Yt ln(WAGEa;
= i.e our dependent variable is defined as the natural logarithm of the
variable WAGE.
In cell Ml of the cps_small data worksheet, enter the column label ln(wage). In cell M2, enter
the formula =ln(A2); copy it to cells M3:M1001. Here is how your table should look (only the
first five values are shown below):
A I l'3 I c I D I E I F I G H I I I j I K I L I M
-
j. 11.5 1·2 16 62 QI 0 0 1 0 QI U1 0 2 . 4423 4704
4 15.04 16. 13 4·01 1 Q1 1 0 0 1 � 0 2.71071332
5 25.95 14 11 401 Qi 1 1 0 1 Qi I 0 3 255171 51
-
-+-"'"
G 24. 03 '1:2 51 401 1 o· 1 0 0 O· O· 0 3.179-3·03'05
We want to estimate our wage equation using our original x values and our re-defined y values.
In the Regression dialog box, the Input Y Range should be M2:M1001, the Input X Range
should be B2:B1001. Select New Worksheet Ply and name it Wage Equation.
������� --·-
0 Qutput Range; �I
0 New Workslieet ['.ly: J WE1ge Equation
0 Nel\' j&'.or•kbook
Re s iduals
0 B.esiduals
D SiandEirdized ResiduElis
Normal Probability
0 t:!ormal Prnbability PJot&
I
A __J_ B I c _j_ D _)___ E �
F _l_ G J_ H I I
1 SUMMARY OLJTPUT
,_
2
3 Regressiorr Slelistio.S
4 Multiple f3: 0_4?2142751
5 RSquare 0_ 1 7BZ04 502
'T
, A\ljust<ed R Square o. 17738106:
0-526611364:
_
MAN
I
OVA t �
11 I df SS MS F SiS]_ni'f(carice F
Jl Regression 1 &0_01.5342·69 60_015842:69 2!6.4;1'!_q?_11 U455 9E-44
f3 Residual ----i- 998' 276.7648898 0 2773195'29
� ., �·
JfiI
16
I nterc-ept
18 X Variable 1
_
CoeffiGient:i
1-60>944446&.
0_090408247
Standard Eiror
0. 08642:2944
0.006145615
t Star
18__ 622_381
14-71101802
P-value
1. 14·645E"66
U4559E-44
Lower 95% Ue_l!_er 95% Lower95.0% Upper 95_0%
1.43c9852B7 1.7790'35995 1.4391652937 1_7790%995
0.078348438· Q_ 1·02468 056 ff.O:i8:34B438 o_ 102458 as&:
The estimated wage equation is (see also p. 153 of Principles ofEconometrics, 4e):
4.8.3 Prediction
For the natural logarithm the antilog is the exponential function, so a natural choice for prediction
in a log-linear model is:
2
Ye = exp(b1 + b2X + 8 /2)
(4.29)
2
where b1 b2 are the estimated intercept and slope coefficients of the log-linear model, and 8
and
is the estimate of the error variance or mean square residual (MS residual).
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Prediction in Log-Linear Model.
Create the following template to make prediction (in the last column below you will find the
numbers of the equations used in the template):
Prediction, Goodness-of-Fit, and Modeling Issues 133
A B c
1 Data Input Xo = 12
2 b1 = ='Wage Equation'!Bl7
3 b1 = ='Wage Equation'!B18
4 MS residual = ='Wage Equation'!D13
5
6 Computed Values natural predicted y0= =EXP(C2+C3*Cl) (4.28)
7 corrected predicted y0= =C6*EXP(C4/2) (4.29)
Here are the results you should get (see also p. 154 of Principles ofEconometrics, 4e):
A I B I C_
1 Data Input Xo = 12
-
2 b1 = 1_60H444
,_
3 �= Qi.090'408
;-----
4
--
-MS residual = 0.27732-
5
;-
6 Computedi Values natural predicten y0 = ·r4_795s
7 correc1ed pr edicted y = 16_9'9'5431
Next, we want to show graphically how the correction affects our prediction. Go to your
cps4_small data worksheet. Here are the formulas and labels you should enter (in the last row of
each of the tables below, you will find the numbers of the equations used):
N 0
1 educ Yhatn
2 0 =EXP('Wage Equation'!$B$17 + 'Wage Equation'!$B$18 * N2)
3 1 (4.28)
p
1 Yhatc
2 =02*EXP('Wage Equation'!$D$13/2)
3 (4.29)
Select cells N2:N3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell N23.
N �
1 l 1educ
�
�.LL !
Select 02:P2 and copy their content to 03:P23. Here is how your table should look (only the
first five values are shown below):
134 Chapter 4
I N I 0 I p
-
Select the Insert tab located next to the Home tab. Select Nl :P23. In the Charts group of
commands select Scatter, and then Scatter with only Markers.
45
40
•
;15
• •
'30
• •
25
+vhatn
20
• vhatc
15
10
0
0 5 15 20 25
Next, we would like to plot the actual values on the same chart. Select the points on your plot,
right-click and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
Edit Series dialog box, specify earnings per hour for the Series name, select B2:B1001 for the
Series X values and A2:A1001 for the Series Y values-all from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.
After editing your chart like you did in Sections 2.l.2a-2.l.2c, the result is (see also Figure 4.14
p. 155 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 135
BO .
.
70 ,.
501 ..
50•
4'0
3()• •
�
201
]0
0 5 :to 15 20 25
Rz r.2�
(4.30)
=
YYc
Make sure you are in your cps4_small data worksheet. We will compute the corrected predicted
y values in column Q, and next to it, we will compute the generalized R2.
Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):
Q
1 corrected predicted y
2 =EXP('Wage Equation'!$B$17 +'Wage Equation'!$B$18 * B2)
*EXP('Wage Equation'! $D$13/2)
3 (4.29)
R
2
1 generalizedR
2 =(CORREL(A2:Al 001,Q2:Q1001))"'2
3 (4.30)
Q I R
correctedl predlicted y· generalized R2
____!__
2 24_40129449 0-185930705
3 E6:!>9M2785
>--
4 24-40129449
s 2{)_36503968
,_
6 [6_99642785 I
The lower limit (LL) and upper limit (UL) of the prediction interval in a log-linear model are:
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it PI in Log-Linear Model.
Copy the template from the Prediction Interval worksheet (if you cannot see it, it is because it is
hidden further to the left of your visible worksheets) to the PI in Log-Linear Model worksheet.
You just need to make a few modifications to it: (1) get your regression results from the Wage
Equation worksheet instead of the Food Regression worksheet, (2) change x0 to 12, (3) compute
i from the cps4_small data worksheet instead of the food data worksheet, and (4) take the anti
logs of the interval limits using the EXP function. Those modifications are outlined in the table
below.
A B c
1 Data Input Sample Size = ='Wage Equation'!B8
2 Confidence percentage
3 Xo = 12
4 b1 = ='Wage Equation'!B17
5 b7 = ='Wage Equation'!B18
6 se(b2) = ='Wage Equation'! C18
7 MS residual = ='Wage Equation'!D13
9 Computed a= =l-C2
Values
10 df or m= =Cl-2
11 tc = =TINV(C9,C10)
12 predicted Yo= =C4+C5*C3 (4.2)
13 x-bar = =AVERAGE ( 'cps4_small
data'!B2:B1001)
Prediction, Goodness-of-Fit, and Modeling Issues 137
A B c
14 se(f) = =SQRT(C7+C7/Cl +((C3-C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit = =EXP(C12-Cl 1*C14) (4.31)
Interval
17 Upper Limit = =EXP(C12+Cl 1*C14) (4.32)
Here are the results you should get (see also p. 155 of Principles ofEconometrics, 4e):
A 8 c
; 9 Computed Values a= 5%
A I B I c
1 Data Input SamJl'le .Size= 1000 10 df.or m = 998
r---
Confiden{;e Le.vel 9'5% 11 le = 1-962344
,_L =
r
14
5 b1 = 0.090408
=
- 15
6 :se (b2) = 0.006146
16 Predictio·n Interval Lower Limit= 5.0631()6
,_
1 MS residlilal = 0.27732 17 Ufl'per Limit= 43.23744
Note that the results above and the ones from your textbook might differ slightly due to rounding
number differences.
Next, we want to show graphically how our prediction interval changes over the range of years of
education. Go to your cps4_small data worksheet. Here are the formulas and labels you should
enter (in the last row of each of the tables below, you will find the numbers of the equations used
in the template):
s
1 lb wa2e
2 =02* EXP(-'PI in Log-Linear Mode1'!$C$11*'PI in Log-Linear Mode1'!$C$14)
3 (4.31)
T
1 ub wa2e
2 =02* EXP('PI in Log-Linear Mode1'!$C$11*'PI in Log-Linear Mode1'!$C$14)
3 (4.32)
Select S2:T2 and copy their content to S3:T23. Here is how your table should look (only the first
five values are shown below):
s T
1 lb_wage ub_wag·e
2 1-711005 14Ji114!l
3 1.872!102 15.99404
4 2.050118 17.50741
5 2.244103 19.16398
s Vl56442 20.9773
Select the whole plot area you completed in Section 4.8.3, which compares the natural and
corrected predictors of wage (replica of Figure 4.14 p. 155 of Principles of Econometrics, 4e).
Select Copy and then Paste. You should have two identical charts. Below we will work with one
138 Chapter 4
of them. On that chart, we want to remove the yhatc series and add the lb_wage and ub_wage
series instead.
Select the points on the chart, right-click and select Select Data. A Select Data Source dialog
box pops up. Select the yhatc series, and then Remove. Then select Add. In the Edit Series
dialog box, specify lb_wage for the Series name, select N2:N23 for the Series X values and
S2:S23 for the Series Y values-all from the cps4_small data worksheet. Select OK.
'
Select Data 5oura!
otf!-
Legend Entr,ies <s_eries) 1he eri in the series 1
-
[ t:JMd ][ � Edit ] I J< &em
vhatn
JL Series 'f. �alues�
JI
legend Enlries �eries)
Select Add. In the Edit Series dialog box, specify ub_wage for the Series name, select N2:N23
for the Series X values and T2:T23 for the Series Y values-all from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.
-
�· -
1 Edit Series
Select Data Source
Serie s o.ame:
lub_wage
�--------
�
The.data range is too cc Ser. ies l( �alues::
the series in 1he Series J
I ='cps4_smaU data'!$N$2::$f\1$2.l [ii]
·series f 'lalues�
After editing your chart like you did in Sections 2.1.2a-2.1.2c, the result is (see also Figure 4.15
p. 156 of Principles ofEconometrics, 4e):
BO
70
60
� i
. . _i 1 .-i.
: -
__
1: �-�-�J�-;;:-�-�1�-���-�. -�r��-��::1.::jrt=::::_
• I : ; • !. -
__
�
0 5 10 15 20 25
Yearsof Education
Open the Excel file newbroiler. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it newbroiler data, and in it, copy the data set you just opened.
where Q is the U.S. per capita consumption of chicken, in pounds and P is the real price of
chicken, for annual observations over the period 1950 - 2001.
In cells Kl:L2 of your newbroiler data worksheet, enter the following column labels and
formulas.
K L
1 ln(q) ln(p)
2 = ln(B2) = ln(D2)
Select K2:L2 and copy their content to K3: L53. Here is how your table should look (only the
first five values are shown below):
K I L J
_1_ ln.{q) ln{p)
2 2.66026 1.0591116
-3 2. 714595 1.030993
4 2.727853 1 .0 1 4683
T 2.721295 0_992232
6 2. 7@'01 0.872986
In the Regression dialog box, the Input Y Range should be K2:K53, and the Input X Range
should be L2:L53. Select New Worksheet Ply and name it Log-Log Model. Finally select OK.
' -- - ....
1 Regre>.sion ITJ l'.8J
Input
OKW
lnput '.!'. R21fl9e: I $K$2, $1q;53 � Cancel I
Input,); Range: I $;$2: u 53 [�1
DLabels. D Consmnt is.lero
b!elp l
D Cor>�denc:e Level: �%
Oulput l.lplfons
0 Qutput Rar;ige: 'fiii
0 New Worl<Sheet e:1y: j Log-log Model! I
140 Chapter 4
The result is (matching the one reported on p. 157 of Principles ofEconometrics, 4e):
'
A I B I G I D I E I F I G I H I I
1 SUMMARY OUTPUT
-
2:
1S Coeofffofon fa Standaro Error I Stal P-value Lower 95% Ut>.o&r 95% Low.er95. 0% Upper95_0%
1 7 lnteFc-ept 3.716943882' 0_022:3594'14 166.236191 -!i 2.94446E-70 3_672 Q 336.77 3.761854086 3_6720336.77 3 .7618.54086
----
18
-· · -·
X Varia.tlle 1 -1.121358001 0_0487�6431 -22-999118135 2_99987E-28 -1-2192881 74 -1.02342782'9 -1-219288174 -1. 02342782'9
Make sure you are in your newbroiler data worksheet. We will compute the corrected predicted
y values in column M, and next to it, we will compute the generalized R2.
Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):
M
1 corrected predicted y
2 =EXP('Log-Log Model'!$B$17 +'Log-Log Model'!$B$18 *L2)
*EXP('Log-Log Mode1'!$D$13/2)
3 (4.29)
N
1 2eneralized R2
2 =(CORREL(B2:B53,M2:M53))1'2
3 (4.30)
M N
Enter the following formulas and labels you should in your newbroiler data worksheet (in the
last row of each of the tables below, you will find the numbers of the equations used):
Prediction, Goodness-of-Fit, and Modeling Issues 141
0 p
1 p Yhatc
2 =EXP('Log-Log Model'!$B$17 +'Log-Log Model'!$B$18 * ln(02))
0.9
* EXP('Log-Log Model'!$D$13/2)
3 1.0 (4.29)
Select cells P2:P3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell P22.
0 :I
1 D
l
2 0.9
3 1.0
. '
Select P2 and copy its content to P3:P22. Here is how your table should look (only the first five
values are shown below):
0 I p
1 p y'hatc
2 0.9 40.62103
>--
3 1-0 4t.42584
I\. u 37.22676
>---
5 1.2 33.76609
6 1.3 3-0.&674
Select the Insert tab located next to the Home tab. Select Ol:P22. In the Charts group of
commands select Scatter, and then Scatter with only Markers.
Chart<
yhatc
so
•
45
40 •
•
35
•....
30
.. ...
25
... .
21{) *•
15 ···�
••••••
10
Next, we would like to plot the actual values on the same chart. Select the points on your plot,
right-click and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
142 Chapter 4
Edit Series dialog box, specify actual values for the Series name, select D2:D53 for the Series
X values and B2:B53 for the Series Y values-all from the newbroiler data worksheet. Select
OK, and then OK again in the Select Data Source dialog box.
'
nata Souroe
.Qelcte Edit Series
Series.� values:
Data.. ,
- 2.
E.otd11Dn,
Legend Entries. (S_eries) Series, I values:
Addi Data Lll_Q_e·li
. ------
----- --- --·
! S·elect
After editing your chart like you did in Sections 2.l.2a-2.l.2c,
: the result is (see also Figure 4.16
p. 157 of PrinciplesS:fyle
ofEconometrics,
Chart 4e):
'---
---------
- ��
�
S,gtect
� - yhatc:
1-'newbroCTer dam'!$6$2.;$6SS3 liJ - 1·
� 4.()
� a dual values
,___O._
K t;J OK .G;l
..
£0
.. Price Gf Chicken
50
..
.i
�
...
u
....
0
>
.t::
- .30
r::
a
20
10
05 LO :1!5 2.0 2.5 3.0
CHAPTER 5
CHAPTER OUTLINE
5.1 Least Squares Estimates Using the Hamburger 5.4 Polynomial Equations: Extending the Model for
Chain Data Burger Barn Sales
5.2 Interval Estimation 5.5 Interaction Variables
5.3 Hypothesis Tests for a Single Coefficient 5.5.1 Linear Models
5.3.1 Tests of Significance 5.5.2 Log-Linear Models
5.3.2 One-Tail Tests 5.6 Measuring Goodness-of-Fit
5.3.2a Left-Tail Test of Elastic Demand
5.3.2b Right-Tail Test of Advertising
Effectiveness
This chapter is a simple extension of the material covered in Chapters 2-4. Instead of only one
explanatory variable in the simple linear regression model, two or more explanatory variables will
be used in the multiple linear regression model.
Open the Excel file andy. Save your file as POE Chapter 5. Rename Sheet 1 data.
We would like to estimate the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain:
(5.1)
where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).
143
144 Chapter 5
As we have done before, we will use the Excel Regression analysis tool. There are only two
things to note.
• First, because we have more than one explanatory variable, we will include the labels of
the variables in the input ranges we specify. Those labels will then be reported in the
summary output Excel produces, and we will be able to distinguish the different
estimated slope coefficients.
• Second, as long as the data on the explanatory variables are stored in adjacent columns,
all we have to do is select the whole range of data and Excel will recognize each column
of data as separate observations on separate explanatory variables.
In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:C76; do check the box next to Labels. Finally, select New Worksheet Ply and
name it Regression.
-----
• �ression -r:fj�
'.Input
Input 'Y' Range: I $>1.$1: $A$76 1�1 I
l1'1pllt l!'.. Range:
1$$1:$($75 �
�
tielp
�babek D Cori�tant is �ero
0 Confidence Level; �%
Ou !put op fions
0 Qutpl.Jt Rionge: I 1�1
@ New Worksheet f:ly: J Regre s�io nl I
0 New worklrock
Residuals
O&esiduals 0 Reslgual Plots
0 Standardized Re$idoals D L[r1e FitPl1,1tll
N.;.rmal Pro bability
O.t>!ormal Prdbab"dity Plots
A I B I c I D I E F I G I H I
1 SUMMARY OLJTPITT
-2� ,._���������-
.3 I Regression Statistics
_i_ Multif>le R 0J)69520_55
5 R Square 0-446257766
JL Adju_sted R S<�l!<lre 0.432931593
7 Stand.ard Error 75
4.886124039
T ' O �s erv at i o n s
i
_:i_Q__AN OVA
1j dt SS, MS F SignifiGarace F
-¥- R.e91!:ssion 2 13:9'6.538993 6982694963 29_24785998 _ 9.86J=-W I
5-:.01
1 3 Resid11al 72 11'18_942995 23-87420813 - - ·
14 Tot a'I 74 3-11. 5.481"978 I
�5 1
161;--������G -
o e
_ffi_ e-
6 _t s�S-a
n - f-ffd_a_
rd _Eiro
�r- . �-l-S_
a
r_f ��-P
--v- ��-
mu-e L-o- %5 -��-U
__9- ---- -�-L- --59-_0_% _ _U___-9�-0-%�
w w pp er 5� % •o .,,.r a p pe r _
�Intercept 11 S.913613 1 6.351637.595 18_72172:5-12 2_21 42.9E-29 106-.2518552 B1.5753711 1 06.2.51$5-5-2 131.575-3711
PRICE -7_907854804 .(0%993()37 -7.215241826 4_423.9'9E-10 -10_09267696 -.5.12J032645 -10.0926Tfi,9fi -5_7iJ�Q·645
,
ADVERT (S625B3787 0.6831954 BJ 2.726282349 0�0-08 0381 99 0_500658501 3_224-509073 0_500658501 3-224509 · 073·
Multiple Linear Regression 145
Recall from Chapter 3 that the interval estimator of {Jk is defined as:
(5.2)
The one important thing to notice is that, in the case of the multiple linear regression model, the
critical value tc is from a !-distribution with m = N - K degrees of freedoms, where K is the
number of parameters in the multiple linear regression model.
To compute interval estimates, we could use the template we created in Chapter 3 and make sure
we specify the degree of freedom correctly.
Instead, we use the interval estimates Excel has already generated in the regression summary
output.
The results of interest to us, reported on pp. 182-183 of Principles of Econometrics, 4e are
highlighted below:
A I B I c I D I E I F I G
1·6 Coefficients Slendard Error tStat P-�elu& Lol'l'er 95% Upper95%
18-72172512 2. 2142 9E-29'
___R Intercept 118.9136131 635 i ()375.95 105,2518552 1.31-5753711
18 PRICE 7 9 078548 04
-
- 1.0959930:37 -1.i1 s24.ns26- 4.42399E-101 -1b.o9261,595 -5..7:21032645
O.OOH038199'lo.500058501 3.·224509073
-
·�
Recall that to obtain interval estimates other than the 95% ones, all we have to do is to specify a
different Confidence Level in the Regression dialog box (see Section 3.l.3c).
Similarly to results from Chapter 3, we have the following: if the null hypothesis H0: {Jk = c is
true, then the test statistic t =(bk - c)/se(bk) follows a !-distribution with m = N - K
degrees of freedom:
(5.3)
Again, note that in the case of the multiple linear regression model, the !-distribution of interest
has m = N - K degrees of freedom, where K is the number of parameters in the multiple linear
regression model.
Recall that when the null hypothesis of a test is that the parameter is zero, the test is called a test
of significance. Results of two-tail test of significance are reported in the Excel summary output
and highlighted below (see also pp. 185-186 of Principles ofEconometrics, 4e):
146 Chapter 5
I A I B I c I D I E I F G
161 Coefficients Standard Etror t sral P-value Lovrer95% Upper 95%
-mfioto<e•� 118.913&131 6.351637595 18..72172512 2.2142.9E-29 iOG.2518552 131_5753711
PRICE -7. 907 8-54804 1.0%99'3037 -7.215241826' 4.42399E-10 -i 0_09267696 -5_ 723032645,
ADVERT i _8625.83787 0_683H5i83 I 2.72fi282349 QJ}081l381991 0_ 50 0 658501 3.224509073;
Note: you could also have used the Two-Tail Tests template you created in Chapter 3.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left corner of your
screen, next to the data tab. Name it Left-Tail Tests.
Open your POE Chapter 3 Excel file and go to the Left-Tail Tests worksheet. Copy its content
to the Left-Tail Tests worksheet you just created in your POE Chapter 5 Excel file.
You will need to make just a few modifications to create the left-tail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, insert a new row, underneath the first one, for
K. Finally, modify the degrees of freedom formula. All needed changes are highlighted below:
A B c
1 Data Input N= =Re2ression!B8
2 K= =Regression!B12+ 1
3 b k= = Reg r essio n !B 18
4 se(bk)= =Regression!C18
5 Ho: Bk=
6 a=
7
8 Computed Values df or m= =Cl-C2
9 tc= = -TINV(C6*2,C8)
10
11 Left-Tail Test t-statistic= =(C3-C5)/C4
12 Conclusion: =IF(Cl1 <=C9,"Reject Ho","Do Not Reject Ho")
13 p-value= =TDIST(ABS(Cl1),C8,1)
14 Conclusion: =IF(C13<=C6,"Reject Ho","Do Not Reject Ho")
Multiple Linear Regression 147
Let a - 0.05; H0:{33 2:'. 0 and H1:{33 < 0. The result is (p. 187 of Principles of Econometrics,
4e):
A B c
1 Data Input N= 75
2 K= 3
3 b.= -7.90785�
4 se{b<) = 1. 0 9'5.99'3
5 Ho: �k = 0
6 a= O,O:S
7
8 Compu-fe.d Values dform= 72
9 le= -1.G66;2937
1 ()
11 Left-T ailT est !··statistic"' -7.215.2418
12 Conclusion: Reject_Hci
13 f)-Valwe = 2.212E-10
14 Conclu:1;.ion: Rej�ct H()
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the Left-Tail Tests tab. Name it Right-Tail Tests.
In your POE Chapter 3 Excel file, go to the Right-Tail Tests worksheet. Copy its content to the
Right-Tail Tests worksheet you just created in your POE Chapter 5 Excel file.
You will need to make just a few modifications to create the right-tail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, change the reference to bk and se(bk) to the
ADVERT coefficient estimates instead of the PRICE coefficient estimates. Also, insert a new row,
underneath the first one, for K. Finally, modify the degrees of freedom formula. All needed
changes are highlighted below:
A B c
1 Data Input N= =Regression!B8
2 K = =Regression!B12+ 1
3 bk = =Regression!B19
4 se(bk)= =Regression! C19
5 Ho: J3k=
6 a=
Let a = 0.05; H0:{33 < 1 and H1:{33 > 1. The result is (see also p. 188 of Principles of
Econometrics, 4e):
A B c D
1 Data Input N= 76·'
2 K= J.
3 b,= 1_8,62583787
- --
4 se(b,)= O.S83195483.
5 Ho: �k = 1
5 cr= CJ.OS
7
8 Computed Values dform= 72
9 t., = 1-666293697
10
11 Right-Tail T�s_t t-statistic = 1 _262572438
12 _,___Conclusion: Do Not Rej�ct Ho
13 f)-value = 0_ 105408444
14 _ C onc � �s i �r:i_ Q_ o
: . �ot_B_��ct_ H9 _
We estimate the following extended model for Big Andy's Burger Barn hamburger chain.
(5.4)
2
Go back to your data worksheet. In Dl, enter the column label ADVERT • In cell D2, enter the
formula =C2A2; copy it to cells D3:D76. Here is how your table should look (only the first five
values are shown below):
A I B I c I D
2
_l_ SALES PRICE ADVERT ADVERT
-
2 732 5_69 1_3 u;.9
3 71.& 5_49 2_9 !L41
-
4 52_4 5_53 0_8 (Ui-4
5
-
67_4 5.22. 0.7 ()_49
6 893 5._02 1.5 :225
In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Extended Model. Finally, select OK.
,'Regression
CIJ�
[nput
A I B I (; I D E I F G H I I
� SUMMARY OUTPIJT
2
l
3 I Reares.s;on Sfafjsfics ·
� MultipJe R 0-712906125
_R !?quare _q.§Q82,3.5142'
�� �.�
6
,....__
Adjwsted J3. �iqua��, 0.48145'6345
--- PRICE
f18
-7.6',100G0543 1-045938915 '· -7:30:14443_84 3_23648E-10 -9.725543479 _5_55445,7,5oa _9_ 725543479 -5_55445750B
J.! ADVERT
20 ADVERT2
12.15123398
-2_ 767°%2762
J.556164048 3.41&949784
Cl'. 94062405'9 -2.94:i6 876:07
0.001"0516
0.004391267 1°
5.060444353
-4.643513842
1.92'1:2:0235
-0_892411-683
5.060444353
-4.643513842
19.24202:36
-0.892"4116.83
where PIZZA is annual expenditure on pizza, AGE is age, and INCOME is income of a random
sample of 40 individuals, age 18 and older.
Open the Excel file pizza4. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it pizza4 data, and in it, copy the data set you just opened.
In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:G41. Check the box next to Labels. Select New Worksheet Ply and name it Life
Cycle Model 1. Finally select OK.
150 Chapter 5
Input
lmpu t I Range� �
Input! Range:: I $!" $1:$Gs41 �
�Labels D Consta111trs z.ero
D Con6dencoe: Level:. � D/�
Output options
0 QulputR.ange: I 1 ili1l Clde 100I
Ur� e;
A I B I c I D' E I F I G I H I I
t-+-ISUMMARY OUTPUT i
I
3 I Regression Stefistie;.s
I
4 Mulrifl'le R 0_573803829
- l
5 R Squa�.e 0:32·g:-2.sos34
-
6 Adjusted R Sq;llare 0�292994123 ;
I
�1a -- Standard Error 1 '.}1.070099
Observations 40 I
1
I
!�ANOVA
11 elf SS MS .F Srgnificnnce F ,
l
I
i
Regression 2 �12015_ 1787 1560()7_5894 9'�0·81100278 0.000618533
Jg_
13 Residual 37 1&356 35_ 7213 17179-370 85 I
14 Total 39 I
947651_9
15
1£ Coe ffic;ients Standard E1JUI f Sfaf P-va!ue lower 95% Uee_er95% l'._01'1'er 95_0% Uee_er950%
17 Intercept 342. 8848.279 72.3434.19'66 4_ 739-682�3 3. 14373E-05 196.3()3H73 4891.4665184 196.3031373 469.4665°184
- -
�1f income 1 _832:478934 0'.-4643()0741 3.946749963 o:_.060340943 0.8917162:78 2.773'241589 0_89171'6.278 2-773241589
19 aqe --1.57.5555694 2',J169.fl758:J -3269571209 °'- 0 0233260'7 -12_27021864 -2.8808931.53 -12.2 7021664 -2.860893:153
To account for an effect of income that depends on the age of the individual, we add the
interaction variable (AGE x INCOME) to the life-cycle model:
Go back to your pizza4 data worksheet. In Hl, enter the column label age x income. In cell H2,
enter the formula =F2*G2; copy it to cells H3:H41. Here is how your table should look (only the
first five values are shown below):
H
1 age K°i'ncome
2 487_5
3 1755
4 312
5 728
6 487.5
In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:H41. Check the box next to Labels. Select New Worksheet Ply and name it Life
Cycle Model 2. Finally select OK.
Multiple Linear Regression 151
.
Regression Ll] [8]
Input
lnjXJt r_ R"nge�
Inpuq Range�
t!elp
0 h<it>els 0 Constant is f;ern
Ocmn�denGelevei:· �%
output optioro-
0 Qutput R<lnge: I lf�lr rte:J Hode .�1
1
� ______A
__ I B c I D I E. I F I G H I I
1 SUMMARY O'UTPUT
T
3 Rearession Statistics
+
l MultirJ"R ()_62:2349295
R Square (),38'73111645
__1__
-
10 ANOVA
11 rff SS MS f Sig_nifi@noe F
12 Regression 3, )67043.25 122347.75 7.5650;37514 0.00046:8085
13" fl;esidual ·36 580608.65 16128.01806
t
-
14 Total 39· 947551-9
15
16 1 Coefficients- Slandaro EnDr ISfaf P-value Lo�•er95% Upper95% Lower95.0% Upper95.0%
];. ln_terc_ept, 1;61.46?4.32 120.6G34096 1.338147434 0.189?3�6.8�9 -83.2513.0349' 406.1821675 -83-25130349 4 06.18211675
J_! inc-ome 6.917990507 2.82276761 2_,if�116s"fi4 0.01826628' 1_z55ofi:7055 12-7o414309 1-isso61055 12_70474309·
__:12,_age_ -z__9t7423365 3.352100814 -0_88§22SQ 8 0.380315589 :9. ns798!J7 3 _s20952_139 -?-17579?8 7 3Jl_209_52139,
20 , age x income
-
-().1232393.51 0. 0136718728 -1.847147792 O.ll7Z957528 -0.Z5il55·12Q2 0.01<'0725 -0.258551202 Q_0120725
Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it cps4_small data, and in it, copy the data set you just opened.
I
lrrisertw�rk.sheet rsnift... lflla
Go back to your cps4_small data worksheet. In cells M1:02, enter the following column labels
and formulas.
M N 0 p
1 In(wage) educ exper educ x exper
2 =ln(A2) =B2 =C2 =M2*N2
152 Chapter 5
Copy the content of cells M2:P2 to cells M3:P1001. Here is how your table should look (only the
first five values are shown below):
M I N I 0 I p
1 _ln(wage·) educ: exp educ x ·e.:icp
�
_1 2.9285:235, 1·& 3,9 624
3'
-
2.442347 1:2 16 19'2
4 2.710>7133 15 13 208
5 3.25·S:H16 14 11 154
rS: 3.179303 12 5,1 612
In the Regression dialog box, the Input Y Range should be Ml:MlOOl, and the Input X Range
should be Nl:PlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Log-Linear Model w Interaction. Finally select OK.
, - -
1 Regression L1Jrg)
Inpu° t
Input)!'. Range:
Irnput ! Range::
j .§M.$1:$M�1001 �
1�$1:�§;1001 � � I
t!e:lp
0r�abels D Constant1s :?_er.o
D Confider;ice Level: � o/.
Output opliol'ls
0 QutpJ't Range.; �1
@New WorkShee:t Ely: j icie I w In te•iilciion
I
1
b- A-�.l
SUMMARY OLJTPLJT
B I c I D E I F I G H I I
2" 1
t
2I\ 1 Multip.le Pleg_ressipn
.!!.__
Staofistics
Q_44115987
'
R Squa�·El 0 .19�W22031
.2.... -
��
r-- ·
8
Adju;:iteu RSqu.ar•e
Standard Ermr
Qbservat·iorns
0·-1921961�
0.521847758
1000
t
-
9'
10 AN OVA -
11 df SS MS f Sig_niffca11ce F
��Re� r0ess i·oii 3 6.5.54495019' 21.B4&nwn 8022880786 1.7205E-46
1J. Re·s:idual 99'6' 2'71,.2357823 : 0.27232508);
14
,__
To1<11 99'9; -336.760732'.5
15
16' Coe.fficients Sfanrierri Error t Slat P-vaiue Low.er95% Upper95% L-ower 95. 0% Upper 95.0%
17 lnt�rce�t 1 :392317989 Oi:206�44H3• 5. 7377J.7'608 2.7172'.E-11 0: 985:808 989 1.7978;2£91l8 Cl'.98680B�ey9: 1-79'7 8259 88
18. educ
f---
0..-09493849'5 0.. 0145245$7 ' 5.4.91712999' 1.33643E-10 0.0652399-95 0.123£3699'4 0 .0£62'3 999>5 0.1236'.ls.994
1,g exp GJ.006329514., 0.00569851 CJ>.94:4.9'13664-; 0·�3449'32118 -0.006615298 0.019474326 -0.00681529>8 0.01947432.6
2o OOUC.1'( rait:f!i ,J_64453E-O.S o 000483,rss' -ci.07533629�1 o.9'.3�96227 -o:·ooci9s516-61 o.ooo.9fa8is- -0�.000�8576-6 o.ooo�fi2:a75
Multiple Linear Regression 153
The coefficient of determination R2 is reported in the Excel regression summary output. For Big
Andy's Burger Barn multiple linear regression model of Section 5 .1, it is highlighted below:
SUMMARY OUTPUT
sties
Multiple. R 0.66952055
R Square o_44B25no6
Adjusted R Square
Standard Error 4.886124039
Observations
--
A I B
-
1
2 I
3 I R.egress.ion Sfott
4
5 I
El 0 .4]2931593
-
7
8 75
CHAPTER 6
CHAPTER OUTLINE
6.1 Testing the Effect of Advertising: the F-test 6.4.2 The Optimal Level of Advertising and
6.1.1 The Logic of the Test Price
6.1.2 The Unrestricted and Restricted Models 6.5 The Use of Nonsample Information
6.1.3 Test Template 6.6 Model Specification
6.2 Testing the Significance of the Model 6.6.1 Omitted Variables
6.2.1 Null and Alternative Hypotheses 6.6.2 Irrelevant Variables
6.2.2 Test Template 6.6.3 The RESET Test
6.2.3 Excel Regression Output 6.7 Poor Data, Collinearity and Insignificance
6.3 The Relationship between t- and F-Tests 6.7.1 Correlation Matrix
6.4 Testing Some Economic Hypotheses 6.7.2 The Car Mileage Model Example
6.4.1 The Optimal Level of Advertising
In this chapter we continue to work with the multiple linear regression model of Big Andy's
Burger Barn hamburger chain to illustrate the F-test procedure. We also work with additional
examples to address nonsample information, model specification and collinearity issues.
In Chapters 3 and 5 we worked with t-tests for null hypotheses consisting of a single restriction
on one parameter f3k· An F-test will be used when a null hypothesis consists of a single or more
restrictions, each regarding two or more parameters.
154
Further Inference in the Multiple Regression Model 155
An F-test is based on a comparison of the sum of squared errors from the original, unrestricted
model, with the sum of squared errors from the model in which the null hypothesis is assumed to
be true and in which the restriction(s) implied by it has(have) been imposed-this latter model is
referred to as the restricted model.
If the null hypothesis is true, then the following F-statistic follows an F-distribution with m1 = ]
numerator degrees of freedom and m2 = N - K denominator degrees of freedom:
(SSER - SSEu)/J
F F -K) (6.1)
SSEu/(N - K) � (m1=f,m2=N
=
where SSER is the sum of squared errors from the restricted model,
If the null hypothesis is not true, then the value of the computed F-statistic will tend to be
unusually large. We will reject the null hypothesis if F ;::: Fe, where Fe is the critical value shown
below.
We will use the Big Andy's Burger Barn model to illustrate the F-test procedure. We start by
specifying and estimating the unrestricted and restricted models.
Recall from Chapter 5, the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain. This is the unrestricted model.
(6.2)
where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).
156 Chapter 6
Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0:{33 = 0,/34 = 0 and H1:{33 * 0 or /34 * 0 or both are nonzero. If we impose our null
hypothesis or restriction to equation (6.2), we obtain the following restricted model:
(6.3)
We would like to successively estimate the unrestricted model (6.2) and the restricted model
(6.3). First, open your Excel file andy. Save your file as POE Chapter 6. Rename Sheet 1 andy's
hamburger chain data.
2
In Dl, enter the column label ADVERT . In cell D2, enter the formula =C2"2; copy it to cells
D3:D76. Here is how your table should look (only the first five values are shown below):
A I B I c I D
1 SALES PRICE ADVERT ADVERT2
2 73.2 5_,59 u i.69
,_
3 7'i.8i ·5.49 2.9 8.41
,_
I---
4 62'.4 S_fi;3 0.8 0.64
For the unrestricted model (6.2), the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Unrestricted Model. Finally select OK.
l.[lput
OK!8
lnput'Y:'Range: I SA;$1:$AS76 �
I $8$1:$[)$76
Cancel
I
[nput;X1Riilng�:
[�]
.!ieip,
�b.abels: D Constant.is �ere
D Con�dence Level; EJ o;,.
Output cptian�
A I B I c I D E I F I G I H I I
1 SUMMARY OUliPUli
2 T
3. I Reg·ression Statistic.s
4 Multtple_ R 0.712'9051.33
5 RSqusre 0.50&2'35155-
1--
5 Ad'justed R square 0.48-7455358
7 StaJ11dard Error 4 .·645.2&3021
a Observatiorus 75
�
10 ANOVA
_J
,___ --
15 J
.t6J Caeffidenfs Standrn:d Er'mr tSt·at P-vril/Ue Lowef'95% Upper95%· tpwef'95.0% Uppff 95.lJ%
108.719035 •6.79'.9045455
�' "'""''
16.13.741763 1.B7037E-25 96.16212457 123.2759'474 95.16212457 123.2759474
-
18 price - 7 ,.540000035 1.04593888'4 -7. 3-04442117 3.23548E-l0 -9.725542907 -5 .554457162 - 9 . 725542907 -5.554457162
19' advert 12.15123567 C3.S.55HB941 3.416850354 0.001051598 5.050446253 19.24:202509 5.060446253 19.242025()9
f- -
20 adveat2 -2. ?57963()89 {).9'40624011 -2..94Zn88043. 0.004392655 -4.643514112 -0.812412065' -4.6435·1411"2 - O . S92412G 56 ,
Go back to your andy's hamburger chain data worksheet. For the restricted model (6.3), the
Input Y Range should be Al:A76, and the Input X Range should be only the PRICE data
Bl:B76. Check the box next to Labels. Select New Worksheet Ply and name it Restricted
Model. Finally, select OK.
Input OKW
input Y:Rcinge; I ¥'-111: $11:$76 �
input�Range: 513$1::$8�76 rs
cancel
l
t:!elp
�b:a'bels D •Coras�nt is �ero
D ConBdeno:e 'Level: �%
Oatput optlans
0 Qul::put Rijnge: �1
@Ne\111 WQrl;:;heet1�Jy;; j Restricted Model I
The result is:
158 Chapter 6
I I I I I I I
�SUMMA.RY
A B c D E F G H
OUTP'UT
i
r I
3 I Regr:essivn Statistics
4 Multple R 0.62554053
- -t
5 R:5'qua[e 0.391300�55
i
5 Adjusted R Square 0.382952'612
- .,
-
7 standard Error 5.09685752'9
8 Obs.grvatlons. 75
l�ANOVA I
i
i
nl df SS MS F Sign.if.icancf! F
i
12 _Re15re:;sion 1 12l<j,091Q3. 121'9 .09103 46.9279-0295 1.97078E-09
- --- + i
13
-
Re.si,dual n 1896.390837 25.97795667
i
14 Iota I 74 3115.4818&7 I
15·
101 Coefficients Stamfarri Error t stat P-voiue lowef"'J5% Upper95% Lower95.0% Upper95.0%
�ter<0e-pt 121.9001736 6.5262906'98'. 18.67832421 l.5876E-29 108.8932951 134.9o7052 108.&932951 134.907052
pri ce. - 7 .829'073515 1.142864644 -6.850394365· 1.97-078E-09 -10.10679943 -5.551347597 - 10.10679943 -5,551347597
'
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it F-test.
I F-test '
F-critical values are obtained in Excel by using the FINV function. The syntax of the FINV
function is as follows:
where a is the level of significance of the test, m1 is the numerator degrees of freedom and m2 is
the denominator degrees of freedom of the F-distribution.
p-values for F-statistics are obtained in Excel by using the FDIST function. For hypothesis tests
purposes, the syntax of the FDIST function is as follows:
A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B 12+ 1
4 SSEu= ='Unrestricted Model'! C 13
5 SSER= ='Restricted Model'! C 13
6 a=
Further Inference in the Multiple Regression Model 159
A B c
8 Computed Values mi= =Cl
9 mz = =C2-C3
10 Fc= =FINV(C6,C8,C9)
11
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")
Note that the number of parameters K is equal to the Excel regression degrees of freedom plus
one (see cell C3 above).
With 2 restrictions in the null hypothesis H0:{33 = 0, {34 = 0, and at a 0.05, the results of the
F-test are (see also p. 225 of Principles ofEconometrics, 4e):
A I B I c
I
8 Computed V1:1lu.es m1= 2
,_
A I B I c 9
-
n12 = 71
1 Data Input J= 2 10 Fe= 3.125764
-
I-
2 N= 75 ll
- �
3 K!: 4 12 F.test F-statistic = .&.44136
�
For a general unrestricted multiple regression model with K 1 explanatory - variables and K
unknown coefficients: Yi = �1 + �zXiz + �3xi3 + + �KxiK + ei> the null · · · and alternative
hypotheses of a test of significance of the model are:
Note that, in this one case, in which we are testing the null hypothesis that all the model
parameters are zero, except the intercept, the sum of squared errors from the restricted model is
equal to the total sum of squares from the unrestricted model: SSER = SSTu.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Test of Significance of Model.
160 Chapter 6
Copy the template from your F-test worksheet into your new worksheet. You just need to modify
the reference in cell CS, as highlighted below, to obtain a template for a test of the overall
significance of the regression model.
A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B12+1
4 SSEu= ='Unrestricted Model'!C13
5 SSER = ='Unrestricted Model'!C14
6 a=
For the unrestricted model (6.2), SALESi = {31 + {32PRICEi + {33ADVERTi + {34ADVERTl + eb
the null and alternative hypotheses of a test of significance of the model are:
The null hypothesis above contains two restrictions. With 3 restrictions, at a = 0.05, the results
of the test of significance of model (6.2) are (see also pp. 226-227 of Principles of Econometrics,
4e):
A 8 I c
s Computed: Value's: m1= 3
A I 8 I c '9 m2= 71
-
2 N= 75 11
-
-
4 SSEu= 1532.084' 13
-
Conclusion = Reje-ct Ho
5 SSfai= 3115.482 14 p-v:alu·e = 5.6E-11
- �
6 a= 0.05 15 Conclusion = Reject Ho
For the test of significance of a model, since SSER = SSTu, there is no need to estimate a
restricted model-all the information needed to compute the F-statistic is available from the
regression analysis of the unrestricted model. This is why the F-statistic of the test of significance
Further Inference in the Multiple Regression Model 161
of a model and its p-value are found in the Excel summary output (see your Unrestricted Model
worksheet):
A I B I G I D I E F
11 I rJF SS MS F Sifl_niffr;anc;e F
.R Reg,re-ssion t
2 1396.536993 693 .26'94963 29.24785,998 5.0'408SE-10
Jl. Reisidual 72 rna.94.2sss 23-8:7420B13
14 Total I 74 3115-.481978
Reconsider the following multiple linear regression model for Big Andy's Burger Barn
hamburger chain. This is the unrestricted model.
(6.2)
Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0: /32 = 0 and H1: /32 * 0. If we impose our null hypothesis or restriction to equation (6.2),
we obtain the following restricted model:
(6.4)
Go back to your andy's hamburger chain data worksheet. In the Regression dialog box, the
Input Y Range should be Al:A76, and the Input X Range should be Cl:D76. Check the box
next to Labels. Select Output Range and specify it to be cell Al in your Unrestricted Model
worksheet: you can place your cursor in the Output Range window and move it to that cell to do
that, or type 'Restricted Model'!Al in the Output Range window. Finally, select OK.
I $CS1:$0$76 �
�
� I
t:j_elp
0babels D co,,,;rant i• !.ero
D Co.nfjdence Level: �%
Ouiput options:
0 Qu'tputR�e: (i6?dei'!�:$.1. m
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.
. - ��������---
? Regression - Outµut ramge will lilverwrite eliisting data. Press OK to overwrib:: data in range
A I B l c I [) I E 1 F 6 H I I
t
1 SUMMARY OUTPUlf
-
3 I Regression Statistics
-
7 star11dard Error 6.1048829
& Observations I 75
j
�
-
10 A NOVA I
11 I I cJf 'SS MS F Significan oe F
12 1 Regres�irnn 5. 796561616 0.004632556.
t
2 432.0710103 216.0355051
-
14 Total 74 3115.481857 I I
15
16J Coefficients Standan:i Error t Stat P-vafu·e Lower.95% Upper.95% Low.er.95.Cl% Upper-.95.0%
17 lrnt·ercept 64.1l4148981 3.827012492 1·6. 9431()� 7.87896E-27 57.2.12:47994 72.47Q4S968 57.. 21247994 n.4704996B
18 advert 14.249.15942 4.6582829 3.058886559' 0.003118901 4.96304.2303 23.53527653, 4.%3042303 :B.53527653
19 advert2 -3 .3 55•8'94266 1.231488631 - 2 7331915C 7
. J 0.00788'726& - 5.&2082195 -0.9'lfr966582 -5 .. 82082195 -0.910965582
Go back to your F-test worksheet. With 1 restrictions, at a = 0.05, the result is (see also p. 227
in Principles ofEconometrics, 4e):
A I B I c I
g Computed Values m1= 1
�
I I
I
Data Input
A B
J=
c
1
,_
'9
11()
m2=
Fe= 3.97581
71
,_
2 N= 75 11
-
,_
3 K= 4 12 F-11est F-s.talisti c = 53.35487
I-
4 SSEu= 1532.084 13 I Conclusion = Reject Ho
1-
.SSER = 2683.411 14 p-value = 3.24E-10
,_
1+ o= 0.05 15 I Conclusio·n =·Reject Ho
Note that we used at-test in Chapter 5 (Section 5.3.1) for this same test of significance of {32.
When testing a single "equality" null hypothesis (a single restriction) against a "not equal to"
alternative hypothesis, either a t-test or an F-test can be used and the test outcomes will be
identical.
If you go back to your Unrestricted Model worksheet and look at the p-value for b2, you should
find that it is exactly the same as the one computed in your F-test template. We highlight both
results below:
- - - -
-
A I B I c I Di I E
A B c 1fj Coefficients Standard Error t'Stat P-11u/ue
11 F-tes1 F-statishc = 53.35487 17 Intercept 109.719:D35 5.7990454551 15.13741753 1.87937E-25
,_
13 Conclusion= Reject Ho 18 price -7.540000035 1.04.59:388841 -7.304442117 3.23648E"l0
14 p-valuB = 3.24E-10 19 advert
-
12.1512356.7 3 .555153941 3.416�50364 0.001051598
Go back to your andy' s hamburger chain data worksheet. Because explanatory variables must
be adjacent, insert a new column to the right of the PRICE data column. In Cl, enter the column
label x*. In C2, enter the formula =E2-3.8*D2; copy it to cells C3:C76. In Fl, enter the column
label y*. In F2, enter the formula =A2-D2; copy it to cells F3:F76.
Here is how your table should look (only the first five values are shown below):
·1 A I B I c I D I E I F
1 SALES, PRICE x:" ADVERT ADVERT2 y"
2 73-2 S,_&9 -3_2S 1-3 U19 71_ 9'
'3
--
71-8 ,5_49 - 2 61
. 2.9 a_41 68_9'
-
4 62-4 £._6;3 -2'-4 o_a o,_fi4 61 _ fi,
5 67-4 .fi,_22 -2.17 0.7 0-49 Eifi'-7
....____
6 89_3 s._n2 -
3 45
. 1-5 2'-25 ll7_8
For the restricted model (6.5), the Input Y Range should be Fl:F76, and the Input X Range
should be Bl:C76. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Restricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Restricted Model'!Al in the Output Range
window. Finally, select OK.
"-� ------- � -
b[elp
�labels: D Ct?nstant is f;ero
D Gonfjdence Level: EJ %
OIJtpUt CpllOl:'IS-
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
164 Chapter 6
mSUMMARY OUTPUT I I I I I
A B C. I 0 E I F G I H I
3 I Regression S falistics
4 R
Multipl., 0 _ 693339057
f--
� R Sqwarn 0-480719048
6 Adjusted R �quare
,__
o_4662945n -.
T Standard Error 4.643224.3-9
1-�
a Ol>sen1a.tions 75
1�AN0v'A
J_'.1_ I dt SS MS F Sig_riificance F
12
f--
Regression 2 1437_0 1327 1 718._5066355 33�321'&3303 5J>-818E-11
R�sidua.I 72 1552.28"6;357 21 �55953273
�
14 Total 74 2989-2�96.?8"
15 I
�16-I Coefficients Sla11dard Error t Slat P-v:alue Lower95% Upper95% Lov,.er 95.0% Ueeer95.0%
17 Intercept 110.35B95-9'9 ,6_763!10,3393 16.31610996 6.84193E-26 96.87556446 123-B4Z 3554 96_8.7,5.56446. 123.. 8423554
-7 .60310422 -7.21722771 -5:5203-727·7 -9_ 6 8 5 835675 , -5 §2037_27GS'
J_i[ PRICE
19 x* -2-87651491
1.044 78()'30 9
0.9334%59 -3.Cl8144457
3.3961 TE-10
0.0029'17717
-9Ji,85835675
4. fi7404337 -1-0156·2'549, 4-737404337 -1.01562503
Go back to your F-test worksheet. With 1 restriction, at a = 0.05, the result is (see also p. 229 in
Principles ofEconometrics, 4e):
A I B I c I D
8 Computed! V.alues m1= 1
A B
9 m2= 71
1 Data Input J=
2 N=
iO
-f--
Fe= 3.�7581
11
3 K= 4 -
12 F-test
--
F-statistlc = 0.936194
4 SSEu= 1532.0BS ·13 Conclusion= Do No.\ Reject Ho
-
SSER= 1 552 _ 2!!6 14 p-value = 0.336543,
-
a= Q. (15 15 Conclusion= Do Not ·Rej.Bcl Ho
Go back to your andy's hamburger chain data worksheet. In cells G1:12, enter the following
column labels and formulas.
G H I
1 y** X1** X2**
2 =A2-D2-78.l =B2-6 =E2-3.8*D2+3.6 1
Copy the content of cells G2:12 to cells G3:176. Here is how your table should look (only the
first five values are shown below):
G I H I I
�
1 y"* I x.i* X2"'
2
,__
-6.i -0.31 0.3·6
3 -9.2: 0.4'9 1
4 -165
-
-0.37 1.21
5 -11.4 0.2.2 1.44
f--
6 9.7' -0.9:& 0.16
Further Inference in the Multiple Regression Model 165
For the restricted model (6.6), notice that there is no intercept; so you will need to select the
Constant is Zero option in the Regression dialog box. The Input Y Range should be Gl:G76,
and the Input X Range should be Hl:I76. Check the box next to Labels and Constant is Zero.
Select Output Range and specify it to be cell Al in your Restricted Model worksheet: you can
place your cursor in the Output Range window and move it to that cell to do that, or type
'Restricted Model'!Al in the Output Range window. Finally, select OK.
Input.ii RMlge;
I ·�$1::$1;$76
$--1$1:$.1$76
[�]
�
� I
'tielp
�Labels. � CoMt:arntls z_ero
D Confidence tevel; EJ %
Output options
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
A B I G I D E I F I G H
1- SUMMARY OUTPUT
2
3 R.�gr.ession Statlstic.s
4 .Multiple R 0._699423441
5R Square 0-489193159!
6 Adjusted R Square 0•-466497175
'7 Standard Error - 4_937778213
T Dbservations 75
9'
10 ANOVA
J1_.�
I ____________ � s_s�-�-1w _ s _______ F w�n_m_ca_n_
s� ec _F_
12 Regression 2 1 704_549'/'rn 1152-2748861 34-_95558173 2_46249E-'11
B Residual 73 1719_8'60-71.9 24-18165368
'14
f--.-\
.
Total 75 3484_410'495 , ,
15
���������������������������-
Goeffrcienls StandB'rd Error t SlB't F'-Vil']ue .Lower 95% Upper 95% lowedl5. 0% Upper 95 0%-
� Intercept _ 0. #NIA #NJ.A ___ #NIA, _ _ #NIA #NIA #NIA #NIA
-.i1-_17957010 -:.s_2os f2o s3'4 -4.1191sf81�1
_
Go back to your F-test worksheet. With 2 restrictions, at a= 0.05, the result is (see also p. 231
in Principles ofEconometrics, 4e):
I
A B I c
8 Compute.di Values m1= 2
,_
A I B I c
� m2 = 71
_J_ Da1a Input. J= 2: -
1Qi Fe= 3.125764
2 N.= 75 ,_
-
-
11
3 K= 4
12 F-t·est F-statistic = 5,_7412:33
4 SSEu= 153-2_085 ,__
-
,_
13 Conclu�ion = Reje_ct .Ho
-
5 SSER= 1779.Jl61 14. p-value = [)_004885
1�
6 a= o_o.s 15 I Conclu.sicin = 'R0eject Ho
166 Chapter 6
where Q is the quantity demanded, PB is the price of beer, PL is the price of liquor, PR is the
price of all other remaining goods and services, and I is income. All information for this model
has been collected over a period of 30 years from a randomly selected household.
The assumption that economic agents do not suffer from "money illusion" can be imposed on the
demand model. This lead to the following restricted demand model for beer (see pp. 231-232 in
Principles ofEconometrics, 4e for more details):
(6.8)
Open the Excel file beer. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, rename it beer data, and in it, copy the data set you just opened.
U heerdata,./
I I Insert Wen.ks heft [S!lifHfll ) �
I
In cells Fl:I2 of your beer data worksheet, enter the following column labels and formulas.
F G H I
1 y
* X1* X2* X3*
2 =ln(A2) =ln(B2/D2) =ln(C2/D2) =ln(E2/D2)
Copy the content of cells F2:I2 to cells F3:131. Here is how your table should look (only the first
five values are shown below):
F G H
r x1• x:l X3�
2 4_403054 0_472253 1_834382 10_025:7a
3 4.0412915 1.2:20257 2.39'1:088 10.58768
4 4_160444 0_979322' 2_Wi509 10_.33316
5 4.180522 1-05315 2.2'58981 10.49'711
6 4.160444 0.757095 1.%1287 10.15131
In the Regression dialog box, the Input Y Range should be Fl:F31, and the Input X Range
should be Gl:I31. Check the box next to Labels. Select new Worksheet Ply and name it
Restricted Beer Demand Model. Finally, select OK.
Further Inference in the Multiple Regression Model 167
Input '!'.Range:
i:Aput �Range:
J.$F$1:�'.$.,31
J$G$i:$1$3i
�
r�l
� -
t:[elp
�Labels D Cons:tant is ;'.ere
D Crin�clence Lev-el: �%
Output options
0 Qutput Range: �
@New Worksheet['_lyo J�er Demand Mc-dell I
A B c I D I E I F I G I H I
11-1""'1-"'-
S � UM�M
�....
ARY.....: OLJTp UT
'2
3 -----Re- a-,r. -e s-·
s, on_
- S_ta _t1-
- sti -c_s __
4 Multiple R 0.898659761
'5 RSquar-e 0.80794887
L AdJ usied R Square
-
o i as i a9124
_
�ANOVA
11 I df SS MS F Significance F _
15
Coefficienls Sfapd,.rd Error t Sfat P-value LowerY5% Upper 95% lowe.r95J)% Upper95_0%
17 1 lnterce·pt _-4.7'f7797376 3.7139(}504
- .:1.2_9184707-9 0207775913 -12�43'183 844 2.83624369'1 -12.43'183844 2.83624369'1
1if !K"'l' -1299�8¥8:4- _0.16573!623 -7.840021241 �.57799E-08 -l.640065044 - 0 95s i' o1925 �1.64oo6s-044
_ -0_9�8ro7�.?�
tl
,_
x2. 0.186615879 0284383258 0.656915882 051700'8126
- - - -0,3!!'77�?275 0. 771374032 -01.3917742275 0. 77137403'2
20' x3� 0.945628579 �-427046831 2.214812313 O.Qo3574:2225 0.0&.8021255 1. 823&35904 0·.0&8021255
. 1.823&359.04
(6.9)
where FAM INC is the annual family income of married couples where both husbands and wives
work; HEDU is the years of education of the husband and WEDU is the years of education of the
wife.
If we incorrectly omit the relevant variable WEDU (wife's education) from the family income
model, it becomes:
(6.10)
If we add the omitted relevant variable KL6 (number of children less than 6 years old) to the
family income model, it becomes:
168 Chapter 6
(6.11)
You can estimate models (6.9)-(6.11) using the edu_inc data set. Below, we will show you how
to get the correlation matrix as shown in Table 6.1 of Principles ofEconometrics, 4e (p. 235).
Open the Excel file edu_inc. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, name it education and income data, and in it, copy the data set you just
opened.
�
I
edura.tion and in.come datal, 'tJ
Jln�"rtWor�lrteet (�pitl'"-FtlJ 1 Q
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Correlation (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
Analysis Tools
e Factor
OK
Ano�a: Singl "' I
e
l io!An o .,. a; m
/:ova: '11/iiiiii fi caibiioniiin •••• l
Cancel
hD
•oi -fi
a ctiiiil
o r Without !h· iiRliipii
i!ijlim
Cov ariance
T'lll o-Fact w Reolicaoo
�
dl :t[elp
Descriptive Statsncs
Exponenti;31 Srnaething
F·TestT•wo-Sample fur Variances
Fowrier Anill�sis
Histogram
A Correlation dialog box pops up. Specify the Input Range to be Al:F429. Select Grouped by
Columns, as this is the way the data on each variable are stored. Check the box next to Labels in
first row. Select New Worksheet Ply and name it Correlation Matrix. Finally, select OK.
,
Correlation (f]�
Input
lnput Range:
ISA$k.�.$'\:2.9 �
Grouped By: 0 Column�
OB.ows
� eel
t;!elp
0 tabels in first rn•w
Output options
0 QutputRi0n9e: I"" 1 . �1
@ NewW11rkl:heet E'Jw:· ICorrelation Matrix I
0 New \O!_nr :kbool
Further Inference in the Multiple Regression Model 169
� FAMINC 1
3 HE 0.354·684 1
•
T WE
>---
0-3-62328 0.594343 1
'
5, KL·6 -D·. .0-7195 (}_ 104877 0 . 1 2�34 1
T XTRA XS 0.289!!17 0.!!35468 0.517798· 0.148742 1
T XmA-X6 0.351.365, 0.820563 o.7:m6& 0.159522 0.900206 1
To see the effect of irrelevant variables, we can add two artificially generated variables X5 and X6
to the family income model (6.11):
(6.12)
You can estimate model (6.12) using the edu_inc.xls data set. Below, we will show you how the
variables X5 and X6 were generated.
Variables X5 and X6 were constructed so that they are correlated with HEDU and WEDU, but
they are not expected to influence family income. Specifically, they were defined as follows:
where N(0,1) are random numbers from a normal distribution with mean 0 and standard
deviation 1, generated the way we generated our random samples in Section 2.4.4 and Section
3.1.4.
Go back to your education and income data worksheet. In cells Hl :N2 enter the following
column labels and formulas. In the last row of the table you will find the numbers of the
equations used in the formulas.
H I J K L M N
1 N(0,1) for x5 N(0,1) for x6 HEDU WEDU KL6 Xs x6
2 =B2 =C2 =D2 =J2+2*H2 =M2+K2+I2
(6.14) (6.15)
Note that we copy the values of the HEDU, WEDU and KL6 variables in columns J-L. The
reason we are doing this is that we need to have the columns of explanatory variables next to one
another to be able to use the Excel regression analysis tool.
In columns H-1, we will generate samples of random numbers from a normal distribution with
mean 0 and standard deviation 1.
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
170 Chapter 6
I� Data�nialys.f� I
I Fcirl!!ILilll� I l':lata� --
R:evh•w
Anal)'�is
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
r· ---= -------
I Data Analysis. m�
,;nalysis,Too1$
OK
F-TestTwo-Sam�e for Varianc.es
Fourier Analysis Cancel
l:lisfogrlilm
Movi".1.9.!l"era�
IMf;ffldl MNfui,@@l�!.J, !::!.elp
A Random Number Generation dialog box pops up. We need to generate two sets of random
numbers: one for our X5 variable and one for our X6 variable, so we specify 2 in the Number of
Variables window. We would like to generate as many data points as we have in the data set we
are working with, so we specify 428 in the Number of Random Numbers window. We select
Normal in the Distribution window; the selected Parameters should be Mean equal to 0, and
Standard deviation equal to 1. Select Output Range and specify it to be H2:1429. Finally, we
select OK.
Random tfomb<er Generation
��
'Number'of!!'.ariables;
J 2.-
'- - ---� �
'Number of Random 'Num!;\ers: I.._4:2_8
_____,
�
'gjstribulion.: ! Normal �1 t!elp
M'�an=
�
2,tandard deviation = [=1
"B,arndom Seed::
Output options
0 Quiput Range;
0 Nell)' �Gtkbook
After you copy the content of cells J2:N2 to cells J3:N429, your table should look like the one
below (only the first five values are shown below):
Further Inference in the Multiple Regression Model 171
l;-l I I I j I K I L I M I N
1 NIO� 1 �for x� Nfa>, 1) for x6 HE WE KLG X'!J XG
2 1-167550>181 0-2U471 5il9 12 12 1 14.3351 2&_53982'
]. 0_2412639'33 0_08421011 9 12 0 9_482528 2'1 _56'674
T -0_ 7'237940·74 0_549'94871 12 12'. 1 10_55241 23,_1023G
-
5,- 0.459443648 0.53'153258 10 12 0 1 o.:J36:89 23.47042
_? 1_7905404-0.9 -0. 5.&18233 12 14 1 15_58108 29'_01926
In the Regression dialog box, the Input Y Range should be Al:A429, and the Input X Range
should be Jl:N429. Check the box next to Labels. Select new Worksheet Ply and name it
Irrelevant Variable Model. Finally, select OK.
!
������������������- �-
R·egre5.sicm ��
Input
Irnput I Range:
D Consblnt is Zern
t!elp
0!..abels
0 Confider1a> !Level: EJ ·ry,.
Output opoons.
Note: we obtained different random samples than the ones recorded in the edu_inc data set, this is
why our resulting estimated equation will also differ from the one reported on p. 236 of
Principles of Econometrics, 4e. You will also obtain different parameters estimates for equation
(6.12) because your random numbers will differ from those above.
_-6_ B c D E F H
1 SUMMARY OUTPUT
2
3 Re ression Sfalistics
4 MultipleR 0 _421302759<
5 _R ��;uarn ,,,--+-0.1774960·�5
6 R Square-
T
Adjust•ed 0_.1§n�_0707
7 Stantlard Erri:rr 40247-24063
8 0 bs erva't i o-n s 428
190 IANOVA
11 I df SS MS F Sig_nificance F !
t2 Regressi(m 5 1A751.5E+11 29502937711 18.21348455 2.23?01E-i 6:
13 R·esidt:1al 422' 6_83573E+11 1619'840378
14 fatal 427 8_31 OS7'E+11
151
16-I C1Jeffic;lerrls Slandard &ror t Stat P-�aJue L1Jwer 95% Upper95% Lower95.0% Ue_e_e95"0%
17 Intercept -7'682_625 15.2 11�8!U2S23 -0_6.86602894 DA927!�098 -2967· 6 . 3&31 2 14311.132'81 -2967iU8"312
.. .. ... 14311_132'81
18 HE 2_ 4592'645
_ 1
� .
(6.15)
Consider further the following two artificial models and their associated test for misspecification.
We will use an F-test for both even though at-test could be used for the RESET test 1.
(6.16)
FAMINCi
- 2 - 3
=
(6.17)
/31 + {32HEDUi + {33WEDUi + {34KL6i + y1FAMINCi + y2FAMINCi + ei
Go back to your education and income data worksheet, from where we will first estimate the
restricted model (6.12). In the Regression dialog box, the Input Y Range should be Al:A429,
and the Input X Range should be Bl:D429. Check the box next to Labels. Select Output Range
and specify it to be cell Al in your restricted Model worksheet: you can place your cursor in the
Output Range window and move it to that cell to do that, or type 'Restricted Model'!Al in the
Output Range window. Finally select OK.
. ------- · - �
, Regres.s.imi ��
Input
lr\pu t y Range ::
Input� Ral'illlle,:
I $A$1:SA�$42.9
$8$1:�$429
�
�
� el
t!�P
�!..abels 0 Consrant is. £em
D Coojjdena- Le.11el :, EJ '%
Output options
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
Further Inference in the Multiple Regression Model 173
A I B I c I D E I F I G H I
1 SUMMARY OUTPUT I I
-
2 t I
3 Regr;ession SJatisfics
l Multiple R ()_420919613
2-
-
15
16 CoefficietJ/S Standar:d Etror 1 t Stat P-val!I� Lowe-r95% UtJtJer95% Lovler 95.0% Uooer 95. 0%•
JL Intercept -7755.33133 11162. 9'344 7 -Q_,594 7394!l 1 0-4 87.5 9 9 098 - . -
-29696.9'12. 14186�24934 -29696.912 14186.24934
1S HE 3211.525676 796.7(}2:!);365 4.03"1021775 6_5 84QITE-05 1645_547195. 4777.504158 1645_547195 4.7H5CM158
J2_ WE 4776.907489 1ID6.,.16:�I2- 4 .5�157�47 8.727013E-OG
_ .
.. -
2691.11101? 6862.703965 2691.1110·12 - --
6862. 70·3%5--
20 KL6 -1431 Or.9203 5 0 03. 9'28; 369· -2_85993709 0.004446558 -241465149'2 -44 75.32572 -24146.5-14-9'2' -4475.3'2:5-719
Go back to your education and income data worksheet. In cells Pl: W2 enter column labels and
formulas as shown in the tables below.
p Q
1 b1= ='Restricted Model'!Bl 7
2 b2= ='Restricted Model'!B18
3 b3= ='Restricted Model'!Bl9
4 b4 = ='Restricted Model'!B20
In the last row of the table you will find the numbers of the equations used in the formulas, if any.
R s T u v w
2 3
1 yhat HEDU WEDU KL6 yhat yhat
2 =($Q$1+$Q$2 *J2+$Q$3 *K2+$Q$4 *L2) =J2 =K2 =L2 =R2"2 =R2"3
/10000
(6.16)
Again note that we copy the values of the HEDU, WEDU and KL6 variables in columns S-U
because we need adjacent columns of explanatory variables. Also, in cell R2, the division by
10,000 is there to re-scale they values.
Copy the content of cells R2:W2 to cells R3:W429. Here is how your table should look (only the
first five values are shown below):
I :p I Q I R I s I T I u I v I w
1 b1 =
.7755_33 yhilt HEDU wrnu KL6 yh11t2 yhafl
2 b2 =
3211-526 7-3794.95 12 12 1 54-456941 401.8647041
3 b3 = 4"77-�_907 7_847129 9 12 0 61.5774329 483.2060575
- -
We are now ready to estimate equation (6.16) and subsequently run the RESET test 1.
174 Chapter 6
In the Regression dialog box, the Input Y Range should be Al:A429, and the Input X Range
should be Sl:V429. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output
Range window. Finally, select OK.
Input
Input Y R:an_ge:
Jnpu t � Range:
1$A$1:5A$429
1.$5$1: $V$429
[E§]
�
� I
!:felp
0!,_abels 0 C:onstantis ?;_ero
D Confidera:e level: EJ %
Output options
@Qutput Range: IJ Model'! $A$:1I [r¥]
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
A I B I c I D I E I F I G I H I
1 SUMMARY OUTPUT
--
2
!
3 I Reg_ression Statistics
�
4 M�lfip.le R
--t--c0-4343�9!!37
2- R S.qLJare 0 . 1886510 94
6 A_OjustedR Square o,1� 0978 �64
__]___ St an-d a rd Error �992G.fOT72
8 dbservatioas 426
___._;._,
__i_
10. ANO VA
11 I df SS MS F Sic:mificance F !
12 Regression 4 1.56786E+11 391.95383356: 24.58850071 2. 5•6531'E-18
15940940T7
�
Go back to your F-test worksheet. With 1 restriction, at a 0.05, the results of RESET test 1 is
(see also p. 239 of Principles ofEconometrics, 4e):
A I B I c
8 Computed Values m1= 1
-
A I B I c 9 m2-= 423
-
.:L Dat11 Input .J = 1
Fe= 3.86-3536
2 N= 428 -1!
- 11
K= 5 --
]___ 12 F-test F -s1 ati stic = 5.�83!183
4 S-SEu = 6.74E-t11
�
- 13 Conclusion= Reje-ct Ho
5 SSE:R = 6.84E-t 11 14
- pw;alue 0.014643
--
�
G a= 0.05 15 �
-Conclusion = Reject Ho
Next, we estimate equation (6.17) and subsequently run the RESET test 2.
Further Inference in the Multiple Regression Model 175
Go back to your education and income data worksheet. From there, go to the Regression dialog
box. The Input Y Range should be Al:A429, and the Input X Range should be Sl:W429.
Check the box next to Labels. Select Output Range and specify it to be cell Al in your
Unrestricted Model worksheet: you can place your cursor in the Output Range window and
move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output Range window.
Finally, select OK.
r ------ - - -·
, Regre:.sion LZJ �
Input
Input. I 'Range: J $A$1:$A$4Z9 [oo] eel
Input Jt Ra11ge::
I $S$1::$W$4l9 �]
�
t1dJ
�!,_abels D Constant is iero
D Conficlence Level: EJ·%
Oulj:rutoptions
0 Q.uf'lJut Range: j Model'!$A�� ri3
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
A I B I c I D I E I F G I H I I
1 SUMMARY OUTPUT I
..._, ,_
>---·-
2
3 Re_qresslon Statistics
� Multiple·H 0•.434939912'
. . .
5 R .Squa!e 0 .189172727
1--
f-6_
Adjws!oo R Square 0 .179565 768
J_ Standard Er.ror 39960.53362 - -
B Observations 428
9
,__
1 0 AN OVA
1j df SS MS F SirJ!J.ificanr:;e f
-=� Regre5sion 5 1.57219E+11 31443811188 19.69121988 1_19924E-17
13 Residual 422 •6.738G8E+11 1596844247
t- -
14 Total 427 B.3'1087E+11
15
161 Coefficients Staf'fdarcf Err-or I Stat P-vaJu e i:ower95% Upper95% l·ower 95. 0% Upper95.0%
1-17
- I nlerc epl 150186S287 127386_8411 1.H8979927 0-.239070463 -100205.2101 4005782&74 -100205.2101 400578.2'674
With 2 restrictions, at a 0.05, the results of RESET test 2 is (see also p. 239 of Principles of
Econometrics, 4e): ·-
'
A I B c
8 Computed Values m1 = 2
-
A I B I c
9• !l'l2 = 422
1 Data Input J= 2: -
- 10- Fe= 3.0171
2 N-"' 428
11
K·"' -
-�-
3
-- 5
12 F-test F-statistic = 3.122582
4 SSELJI= 6.74E-t11
- 13 C?nclusicm = R�jed Ho
5 SSER 6.84E-t11 14 p-value = 0.045063
- --
=
Open the Excel file car. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, name it cars data, and in it, copy the data set you just opened.
I
q cars data , ·�
I In;ert Workshut.tShitt-FU) �
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Correlation (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
������� -�•
1ol
Anova: Two-factor W ithoutRephration
t•.JllQ$�(.J,
&
I
Covar�nce
:t:J.elp
Descriptive Statistics
Expcmeni!ial Sr;noothlhg
F-Test Two-Sample. for Variances
Fourier Analysis
Histogram �I
A Correlation dialog box pops up. Specify the Input Range to be Al:D393. Select Grouped by
Columns, as this is how the data are stored. Check the box next to Labels in first row. Select
Output Range and specify it to be Fl. Finally, select OK.
Input
lfiput Range:
Grol1j)ed By:
I $A$1;$0 $39�
0!:!.olumns
Qgows
[�]
� !:!!!Ip
l
0 �abe!s in firs.tro..
Oulpu t options.
0 QutputRilnge: 1�$1 lOOl
The result is:
Further Inference in the Multiple Regression Model 177
-
F I G I H I I I j
1 MPG CY!.. ENG WGT
-
2 MPG 1
3 CYL -0.ms.2 1
T 'ENG
-
-0.805,13 0_950823 1
5 WGT -0 .8.32'.24. 0'Jl97527 0.932994 1
(6.18)
In the Regression dialog box, the Input Y Range should be Al:A393, and the Input X Range
should be Bl:B393. Check the box next to Labels. Select New Worksheet Ply and name it Car
Mileage Model. Finally select OK.
Input
Input '.!:.Range:
InJ')lit K'Ran1:re:
1 $.o.$1':�$393
1 $8$1:$8�393
[�]
�
� I
'tielp
8J..abels. D Constcintis.,ero
D Confjdence Le�el: � "lo
Output options.
0 Qutput fl_ange: �I
@ New Worksheet E'.1¥: J car Mileage Model I
j I I I I I I I
�SUMMARY A
OUTPUT
B
I
c D E F G
l
H I
-�J S liS'tistics I
_i_ M.l!_ltiple
R1;ig_re'5'sicm
R 1 0.177617509
I
5 'R Square; i1'.&046ss99
6 Aaju5t,EJd R Squar� 0.60367'5372
_]_ Standard Er.ror 4.913589267
MANO
8 ·Ol:lse.rvati·ons 392
I
VA t
11 df SS lv!S F Srg_niffc-anc-e F
1
12 Regress ii;rn 1 14403.08236 144U3.08:28G 596.5649839 U1138E-8()
13 Re�jdL1al
�
390 9415.910199 24.14 335948
14 Total I 391 23818.99306
Hi[
113'! Co efficients Standard Error I Sl at P-va/CJe lower95% Uo.oer.95% Lov.1er 95. 0% Upper 95. 0%.
1 7 l1jtercepl 4-2. 915 505.2 0.!!)4866841 51.4040121 8.1!2�_E-1/6 4127410251 44 55690789 41.2741_0251 44 56.690789'
18 CYL -3.55 8078341 0.145G7.5537 -24.42467981 1.31131!E-fl0 -3. 84448 5952 :J.271670729 -3.844485952 -3 .27 1 61 0 729
178 Chapter 6
(6.19)
where ENG is the engine displacement in cubic inches and WGT is vehicle weight in pounds.
In the Regression dialog box, the Input Y Range should be Al:A393, and the Input X Range
should be Bl:D393. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output
Range window. Finally select OK.
. ---------
' Regressio11 [1]�
lnput
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is (see also p. 242 in Principles ofEconometrics, 4e):
A I B I c I D• E F I G I H I I I
1 SUMMARY OUlPUT I
-
2 t
3 I Rearess:ion Sla>t1!Slics
__!_ Multiple R 0_836237128
r
_i_ R Sciuare 0_1)�9292534.
-� A'df�sted R Sq:u are -0.596.9
- . 6747&- �
L Standar'.1- Error 4-291i530924
8 Observations 392'
9
--
10 AN OVA
11 cff SS MS F S1qnificance f
J1_ Regression 3 166-56-444 5 5 5-2 . 148001 300.7635141 7 .58 54E-10>1
13 Resi•dual 3!!8 7 152-549()5 7 18A6017798
�
1'5
1s I \]oeffide11fs' StancJa.rd Error tStat P-va.lue Lo�;r,r·95% Upe_er 95'%
Lowe1 .95. 0% Uep_ef950%
�lnterc_ �J>l 44_37096115 1.4 80685053 29.9665(}844 5.3199E-103 41.459791 47.2821313 41-459791 47.2821313
�CYL - 0 . 26' 7 796 7ll.7 0.41J0673i5 -0.64831Z588 0.517166276 -1.07S9270�4 0'.544333601 -1.lll'.9.927094- 0 ..5443 33 6.01
EN'G -0..9 t�6i'.3% 0. 0 00250068 -1 . .5;3622488�' 0.1252"98269 -0.028894392· 0.0035413473 -(l.02:8894392. 0.003546'4 73'
-
WGT -0_0057078fJ4 0. 0 00713919 - 7:99 5 1 42'549 1 50112'E'-14 -0.0071 11518 -0 _004311425 -0.0071'1151.8 -0.1104304.25
To test the null hypothesis H0: {32 = {33 = 0 against the alternative H1: {32 * 0 and/or {33 * 0, we
need to run the following restricted model:
(6.20)
Go back to your cars data worksheet, and then to the Regression dialog box. For the restricted
model (6.20), the Input Y Range should be Al:A393, and the Input X Range should be
Bl:B393. Check the box next to Labels. Select Output Range and specify it to be cell Al in
Further Inference in the Multiple Regression Model 179
your Restricted Model worksheet: you can place your cursor in the Output Range window and
move it to that cell to do that, or type 'Restricted Model'!Al in the Output Range window.
Finally select OK.
Input
lnput'Y__'Range: $:Ao$1: $AS393
Ihput�·R<llilge: 1:$0$39·3
t!elp
�!,_abels D Const.ant
D Confidence Levd;
Output options
1� in the null
� hypothesis H0: {32
� l
= {33 = 0, and at
a = 0.05, the results of the F-test are (see also p. 242
is, �ero
of Principles ofEconometrics, 4e):
EJ %
1 Data Input J=
0 11 N= [oo] 39'2
K=
SSEu= 71i52_549
SSER=
a= G_05
'14 �:vaJu�
15
CHAPTER 7
CHAPTER OUTLINE
7.1 Indicator Variables: The University Effect on 7.3 Log-Linear Models: A Wage Equation Example
House Prices Example 7.4 The Linear Probability Model: A Marketing
7.2 Applying Indicator Variables Example
7.2.1 Interactions Between Qualitative Factors 7.5 The Difference Estimator: The Project STAR
7.2.2 Qualitative Factors with Several Example
Categories 7.6 The Differences-in-Differences Estimator: The
7.2.3 Testing the Equivalence of Two Effect of Minimum Wage Change Example
Regressions
This chapter considers the use of indicator variables to add more flexibility to the regression
model. We work with different examples to illustrate the use of this tool.
where PRICE is house price in $1000, SQFT is number of hundreds of square feet of living area,
and AGE is the age of the house in years. Three dummy variables are used to indicate the house
location (UTOWN = 1 for homes near the university, 0 otherwise), whether the house has a pool
(POOL = 1 if a pool is present, 0 otherwise) and whether the house has a fireplace (FPLACE
1 if a fireplace is present, 0 otherwise).
Open the Excel file utown. Save your file as POE Chapter 7. Rename sheet 1 utown data. In
cell Gl of your utown data worksheet, enter the column label sqft x utown. In cell G2, enter the
180
Using Indicator Variables 181
formula =B2*D2; copy it to cells G3:G1001. Here is how your table should look (only the first
five values are shown below):
A I 8 I c I D I E I F I G
1 price s_qift a.ge ut.own pool fplaoe . sqft x: utown
2 205.4.52:-
23.4,S 6 0 0 1 0·
3 185.328. .20.03 5 0 0 1 a
4- 248.422: 27.77 Ei 0 0 n (}
-
5 154. 6 9' 20•_17 1 0 0 a a
5 221-801 2fi45 0 0 0 1 (}
In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Bl:GlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
house price equation. Finally select OK.
0 Qutput Range: �1
®New Workmeet eJy: I house price eqlli3tion l I
The result is (see also Table 7.2 on p. 264 of Principles ofEconometrics, 4e):
A I B I c I ri: E F I G I H I I I
t
1 SUMMARY OUlPUT I
2
3 Re:Jression Sf:atislios
j
�
5,
Multiple· R.
R Sqwi.re
0· 9330433·69
..
IJ.8705'6g·928·
1
f--- I
fi,
f----
AdJ_usl� R Sq�are �L 869,l' 8 7873<
I
7 St a·nd a rd E rroer 1 5 225'2'114'1'
81 Observations 1 0 00 1i
��1 ANOVA
11 rif SS MS F Sicmmc;-anc.e F
•
1I
12: Re-gress.ion 6 15482·61.7 3 2-580·43_&21& 11tU82743 a
1
-
1f
f-,--
Re-sidual 993 230184.413 231.8070624
1 4 Total 99·9 1ns44G.14-3 I
15
15 Goefficien
. ts Stl'l'ndard' Error f Stat P-va/u.e Lower95% UoDer95% lower95.0% Uooer95.0%
:;>4_49,9 98'329 8.13332E-05 12. 3496232,6
Ji_ 1.nteroept 6.191721216. 3.956"893801 36.55034333 12.34962326 36.6.5()34333
1lil
1�
sqfl 7.'6121766: 1� 0.2�517fi45.8 31-_047746�1 1-8 674E-1 48 7_ 131.05J1�� 8.093300056 7_131053169 8.09'330.0056'
J.! a:ge
:2� u1own
- 0· 1 90 0 86·36 8
2:7.45295:601.
. ' 0.0512:04606, -3.712290'812'
&.42iss204' J.2594465.55
0'. 000'2168 1 2
0.0011542:08
-0.290568043 -0.089604732 -0_290568043 -o_os.9604 nn;
·10_92485.33 4i9·s·w·sar2 10_9248533 43_9,§"1 OSB72:
2T pool
--
4.3771'64078, 1-19669
° 168'9 3.65772104 CLOOCl267B>36 2.028829359 6. 72'5498798
.. -
2.028 �29359 ,5 _72508798,
22 fpla.ce 1.64917557 0.9'7196679'1, 1.'696758113 01.090055792 -0.258 1494.75 3_556500614 -0<.2581494.75 J_S56.S00614
23 sg!\ x utown 1-29940476 0.3 3204 7741 .3:913307036 9.72454E.-05 0_6478089.51 1 . % 1 00057 0-. 64 78 089·51 1.95100057
182 Chapter 7
where WAGE is hourly wage and EDUC is years of education. BLACK and FEMALE are dummy
variables for race and gender.
Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it cps4_small data, and in it, copy the data set you just opened.
Note that we first copy the values of EDUC, BLACK and FEMALE in columns M-0. Next, we
create our interaction variable in column P; this way we end up with contiguous columns of
explanatory variables.
Copy the content of cells M2:P2 to cells M3:P1001. Here is how your table should look (only the
first five values are shown below):
M I N I 0 I p
1 educ black femal� black x female
_1_ i 6i 0 1 0
-
3 i2' 0 a 0
_1_ 16i -1 a 0
5 i4 1 i 1
-
6 12' 0 0 0
In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Ml:PlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Unrestricted Model. Finally select OK.
Using Indicator Variables 183
. ------ ------ - -
! Regres>ion [1][8]
Input
Ihp11t '!_Range:
lmput {Range:
I SA$1:SA�l001 [00)
1�s1:$P$1001 � � :le
tJ.el p
� !.,abels D Constant�s £:era
D Co.n�dence le�el: EJ %
Output opt OM�
0 Quip.;t Raplje: �1
®New Wciksheet�ly: J Unrestricted MmdEll I
The result is (see also Table 7.3 on p. 265 of Principles ofEconometrics, 4e):
A B I c I [)' E I F G I H I I
SUMMARY OUTPUT
r-1-
2
j
3 Reg_i:essipn Sfolistic;s
4 Mult.ipie,R 0.4570-09'544
5 R Square 0.2088.577
· 2'.4 l
Er A,jjuste:dj Squ.are 0.2056772:52
11 .4'389'2:0911
,_]_ Standard Error
8
9
b bs ervations 1000 t j
To AN OVA
11 elf SS MS F SianfficMce F I
R�ressi-0.n 4 34310.76063 65.66879348
$. 8592.690158 2_52617E-49
13 Residl:lal g,95 130194.6671 no:a4s9111
14 Total 9'9'9 164565.4278' I
15
16 C-Oefficier:rts SiandareJ Error ISfaf P-vaJue Lower95% Urwer 95% Lower95.0% 1Jooer95 0%
•.
-5.28115·�154
.J1_ lnteccept
educ 2.070391064.3
1.9·00457714 -2.778873387
0.134878085 15.35008912
0.005557491
6.90939E-48
-9.010543846 -1_551774462 -9.010543 84 6 -1_551774462
;_ilo5712491'l 2.:3J5068788 1.?osi1249a 2.335068788
�.
...ll. blad -4-.116907'7148 H74713857 ·2.34 9 15456 0. 0•19010856 -7. 651688634 -0.686465662 -7.651688634 -0_686465662,
za femal•e.
>---
-4-.78460•1i i7 o .7734 nsi2 -5.'i BG34 7764 s:9s137E-10 -6,36231665'1 3 2558 91704 -6.302J166!i1 -3..256897704.
- .
2:1bl.a-ck x female 3.844i9t3�·�1 2.3:276S282B U51575J31 0. 098�36686 -o. r233n831 8_411965813 -0.723377831 8.411965813
-
To test the hypothesis that neither race nor gender affects wages (H0: 81 = 0, 82 = 0, y = 0), the
restricted model is:
WAGE = /Ji + {J2EDUC + e (7.3)
Go back to yourcps4_small data worksheet. From there, go to the Regression dialog box. The
Input Y Range should be Al:AlOOl, and the Input X Range should be Bl:BlOOl. Check the
box next to Labels. Select New Worksheet Ply and name it Restricted Model. Finally select
OK.
� Regressim11 LllIBJ
Input
Input r_ 'Range::
lnput �.Range::
I $A$1:jA�1001 �I
I $8$1:$8$1001 � � 1
t!e'lp.
0 !,_abels D Constantis l,er-o
D C:onBdence Level: � �a
Oulput options
0 QJJtput·Range: �1
@tNew Worksheet �ly;: J Restricted Model I
184 Chapter 7
A B I c I D I E I E I G I H I I I
J_ SUMMARY OUTPUT
2
i-
3 Regression Statistics
MultipleR 0:4182�6152
,_i_
5 R Square C)_ 1 i'�97!�71
r
,__
l
10 AN OVA
1-11 . rff SS MS F Siqnifice1Fce F
-<
J£
t
Regressi o·n 1 287St4.28782 28794.281'8.2, 211"6·554318 1.24945E-43
13 135771.1 J.91� 136Jl'4.3.2264
-
Residual 998
T4 Total 999 1 64565.4278:
1s.1
Rfi
1 &1
lnte�c .ep_
educ
t
_
Coefficients
-6-71032842
--+- ·
1 . 980287588
Stilmlard Error
0_ 13l)1'17J.72
t Stai
1 . 91415.5839' -3. 5_Q5Gi3321.14.
14.548JB244
P-11alue
0.000475773
1.2:4945E-43
Lower95% U[!per·95% Lm11er 95. O"Ai
-10.46656027 -2.954096574 1 0 46 656027 -2.%409657
·- - -
- . ' · 4
1.713178506 22473966691 1. 71317850-0 2:24 73 9 666:9'
Lfpper % . O')f]
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen,
rename it F-test.
[ti ,I'S' II
I 1 Insert Work1h�et (Sh.tft-Fil] II
Open your POE Chapter 6 Excel file, go to its F-test worksheet and copy its content in the one
you created in your POE Chapter 7 Excel file. Go back to each formula and delete the references
to POE Chapter 6 Excel file: [POE Chapter 6.xlsx]; this way the F-statistic will be computed
based on the regression results of your current Excel file: POE Chapter 7. Your F-test template
should look like the one below:
A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!Bl2+1
4 SSEu= ='Unrestricted Model'!C13
5 SSER = ='Restricted Model'!C13
6 a=
7
8 Computed Values m 1= =Cl
9 m z= =C2-C3
10 Fc = =FINV(C6,C8,C9)
11
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(Cl2>=Cl0,"Reject Ho'',"Do Not Reject Ho")
14 p-value = =FDIST(Cl2,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")
Note that the extension to your POE Chapter 6 Excel file might be different than .xlsx if you
chose to save your file in a different format.
Using Indicator Variables 185
With 3 restrictions, at a 0.01, the results of the F-test are (see also p. 266 in Principles of
Econometrics, 4e):
A
8 Computed! Valu.es m = 3
A I 8 I c
9 m2 = 995
1 Data Input J= 3
- 10 Fe= 3.80134471
2 N= 1000
-
H
3 K= 5
12' F-tesl F-statistic = 14205882:55
4 SSEu= 1301 �4.'6671
- 13 Co·nclusian = Reje-ct Ho
5 SSER= 135IT1.1399 14 •p--value = 4;S.309�E-O�
5 u= 0•.01 15 Conclusion = Reje_ct Ho
Go back to your cps4_small data worksheet. In cells Ql:S2, enter the following column labels
and formulas.
Q R s
1 south midwest west
2 =I2 =H2 =12
Note that all we are doing is copying the values of SOUTH, MIDWEST and WEST in columns
Q-S so as to create columns of explanatory variables next to one another.
Copy the content of cells Q2:S2 to cells Q3:S1001. Here is how your table should look (only the
first five values are shown below):
Q I R I s
1 south mid:west west
-
2
-
1 Q 0
3 0 1 0
4 0 0 1
5
-
1 Q 0
6 0 0 0
In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Ml:SlOOl. Check the box next to Labels. Select Output Range and specify it to be
cell Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output
Range window. Finally, select OK.
186 Chapter 7
- .
Regression [RJ@
Input
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.
-------
OK fJ J Can�el ] J Help
The result is (see also Table 7.4 on p. 267 of Principles ofEconometrics, 4e):
.LJM MAR
YAO
�UTP
�--UT
-+-- B ..::...
.... -'- -C"-- -'- D
-= _,_. E
-.::. �- .F
- -'- C ...
-..:. -'- H
-'-'- --'---'-----'1
S
- - - - - - _ -" - - - - -
l
3 lf-----R-eg-ro-�-8-ron_ _Sl_a_tili_u_Gi'.i
___
=H M u ltip� 0.457853351
5 R Sq_uar_e -0 21 8886758'
6,Adjustw RSquare b.2'i\3374B71
r1standa�d Emu 11.38335986
-
6 Obse�rat ions - i 000
-g:-
10 ANOVA
11.�
J df
_
ss
_
�! s F
_
-
s��nmran ce _
�
____
F
12 R81)ressie>n
________
7
________
36021_ 19302 5145Jl�4717______
39-71175875 2.3·198
· 1 E-49
13 Residual 992 128544.2347 ·129.5808818
14 Total 999 164565_4276
15- 1
1-----�
16 Caetficie.nts Standard EJror t Stat P-�<1lue fovm 115% U er 95% lower 95.0% U
��er 95. 0%
+-- •
-4- B -0_62_0-99 _8_6 __
2 02_$_
69 -1 -
1 4-2 �-
. 3-
69 -1-
1 8-6-� 1 80 -2-21 _1 _
0-. 0- - 6 -S- -7-87- 2� -7-1- --0 �82 -5-1 �
.2- 9 1-262
- --8 -
7- 2 _ 7_
8 '7-2-
0 1 _- 0 82 -5�1 9-1 -
2 6-2-
. _ . . . .
------ _ - - - -
To test the hypothesis that there are no regional differences (H0: 83 = 0, 84 = 0, 85 = 0), the
restricted model is our old unrestricted model (7.2).
Go back to your cps4_small data worksheet. From there, go to the Regression dialog box. The
Input Y Range should be Al:AlOOl, and the Input X Range should be Ml:PlOOl. Check the
box next to Labels. Select Output Range and specify it to be cell Al in your Restricted Model
worksheet: you can place your cursor in the Output Range window and move it to that cell to do
that, or type 'Restricted Model'!Al in the Output Range window. Finally, select OK.
Using Indicator Variables 187
-
. - _
-_____ ---- _-
I Regression l1JIBJ
lnput
Input Y Range.: I $A$1:SA�1001 [�I ��
Input� Range: :$M�1:$P$1001
Cancel
I
�
tielP
0!.al:lels D Con �tant is �ero
D Confidence Level: @=] %
Output options
0 QulplJt Range:: I1 Mod!i:I' I SA$ l �
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. With 3 restrictions, at a = 0.01, the results of the F-test are (see
also p. 268 in Principles ofEconometrics, 4e):
A B c
8 Computed Values 3
I 1
A
Data Input
I B
J:
I c
3
9
10
m2 =
Fe= 3_6.014114549
99'2
2 N= 1000
11
' T K= s
12 F-test F�statisti,c: = 4.24556:55-5
4 SSEu= 128544.2347
13 Conclusion = Rejed Ho,
5 SSER= 130194-6611 14 p-value 0_005427611
- =
Go back to your cps4_small data worksheet. Insert four columns to the left of the midwest
column R (see Section 1.4 for more details on how to do that). In your new cells Rl:U2, enter the
following column labels and formulas.
R s T u
1 educ x south black x south female x south black x female x south
2 =M2*Q2 =N2*Q2 =02*Q2 =P2*Q2
Copy the content of cells R2:U2 to cells R3:U1001. Here is how your table should look (only the
first five values are shown below):
I R I s I T u
1 educ x south black x south female x south blac:kx female x south
2
-- 16 O• 1 0
3 () 0 (} 0
T
- (I 0 0 0
14 1 1 1
I± -
()
-
O•
-
0 0
188 Chapter 7
In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Ml:UlOOl. Check the box next to Labels. Select Output Range and specify it to be
cell Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model' !Al in the Output
Range window. Finally, select OK.
' - ---------------- 1 -c
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is (see also column (1) in Table 7.5 p. 269 of Principles
ofEconometrics, 4e):
A B c D E F H
1 SUMMARY OUTPUT
-
2 I
3 Re mssion Slatistir;s
4 Multiple.R 0-458405258
5 §�ware
R Q_.Li R1053Bj
6 Atljuste.d R S_q;uare 0.2rf2954794
-11.4585071
7 Standard Error
8 O'bs ew.ati om;. 100()
J
9
10 ANOVA
11 I df SS MS F Sf nmcanc.e F
12 Regressfon '9 34.56 UM 886 3842.335429' 29-2•64 37183, .2'. 00107E-45
13 Residual 129984.4089 13 1 .29'73827
14 Total 164565.4278
15
<----- 1
16 Coefficients Standard Error t Stef P·watue Lower 95% er 95% Lower95.0% U. er U.. 95 0%
17 lntercep� -6.16 05!'!72133 2',336627-655 -2.826968225. 0,�047937�2 -11.19088392 -2.0:W,260348 -11.1908S392 -2.02026.oJ48
18 ed'Oc- 2.17255-37'05 o j 654&3 BBB 13_05 120125 4.B3652E�36 1_s4.539{12·1 2499'21628:9 1-8458.91.121 2499216.2119
19 black -5.06935'.99'�16 2.543060109 -1.9255-5587], 0.05444•6013 � 1Q27q00343 O.Q9!283596' -10.2750()343 Cl.0972835�6
-20 femal·e 1 ·5.00501788� Cl.8990CP7421 -5.567337678, 3.330B1E-08 -6.76i92561H5 -3.240898937 -6.769256636
-3 . 240898 93-7
2'1 black x female· 5.305574257 J.49726GB24 Ui170'63045 0.12957•0005 - 1 55 733295 5
. 12.161848147 -·1_557J32955 12. 1684.8147
22 s.outh 3 . 943 9103 83 4.048453462 0.9'74177033, 0.3302066-12 -4.000625124 1Ul&8445!l9 -4.rioo&2s124 11.88844.589
.2'3 edlilc x sowtn· -0.3-0854104 Cf.285734274 - 1 . 0 798 1 8 0 99'
' 0.26048·6184 -
0 86925.542 3
. . 0.25217i34-2 -0.869255423 CJ.2521733421
2:4 blai;k X SCJUth 1.704395981 3.633326787 0.4-0910065& 0.6391009'76 -5.42551027-6 8.834302238 -5.425510276 8._834"302;238·
25 femal-e x sou1h I 0.'9 011119838 1J72§6���2 0.50834187� 0.61 B2SB56 -2...5774923- 94 4.3J�?�2.0�·9 -2.57!�923�4 4.37973-2()S9
2.6 black ·x-female x south 1 -2 . :-935833839 4-.787,647()47 -D�S:t3210166, 0: 5398-782-'32 -12_3:fo93553 6.459'2&7852 -12.33093553 &.45 926785'2'
To test the hypothesis that there is no difference in the wage equation between the southern
region and the rest of the country (H0: 81 = 82 = 83 = 84 = 85 = 0), the restricted model is our
old unrestricted model (7.3). With 5 restrictions, at a = 0.1, the results of the F-test are (see also
p. 270 in Principles ofEconometrics, 4e):
Using Indicator Variables 189
A I B I c
B Computed Values 5
A I 8 I c
9 990·
1 Data. Input J= 5
I- 'w 3AH573503B
2 N= 1000 ,_
1--
3 K= 10: ,J.:L
,_g_ F-t·est F-statistic 0.3202778�2
I- =
4 SS'Eu= 1.29984.4089 r
1-- ,_rr_ C�nclusi-0n = Do Not Heje<:t Ho
5 SSER= 130194.66711 p-value- 0_900944.SSS:
I__:)_!_
=
1--
5 Cl= 0_01 15 Conclosicm = Do Not Reje<:t Ho
Note that as explained on pp. 268-269 of Principles of Econometrics, 4e, estimating (7.5) is
equivalent to estimating (7.2) twice-once for the southern workers and again for workers in the
rest of the country.
We first sort our data according to the region of origin of the workers. Go back to your
cps4_small data worksheet. Go to the Data tab in the middle of your tab list on top of your
screen. On the Sort & Filter group of commands, select the Sort button.
, deor
!.- Reappl}'
-� Ad'll arrmd
S.ort & Fi'lte'
Note, alternatively you can select the Sort & Filter button in the Editing group of commands on
the Home tab. On the drop down menu, select Custom Sort.
U �ort.A.tol
U S.QrtZto A
-�-
I rm C_y:�tom s�ut...
Y= filter [t
Sort & Find & ,;:tear
Clea� T
Filter Select �
K •r, I Rtap�lt
Edltmg kt
A Sort dialog box opens. Select the box next to My data has headers. Select the south dummy
variable column in the Sort by window. Values should be selected in the Sort On window and
Smallest to Largest in the Order window. Finally, select OK.
' ------------------ - �
� ffi �
In the Regression dialog box, for the non-southern workers wage equation, the Input Y Range
should be Al:A705, and the Input X Range should be Ml:P705. Check the box next to Labels.
Select New Worksheet Ply and name it Non-South Wage Equation. Finally select OK.
190 Chapter 7
- .
� -
Regression m lKI
Input
Outpufopbom;
0 Qutpl.Jt Range: J Rec.,. _•ed �1
@New Wor-ksheet�ly: j outh Eciua'tionl l
Wage
The result is (see also column (2) in Table 7.5 p. 269 of Principles ofEconometrics, 4e):
A I B I c I D I E F I G H I I
1 SUMMARY OUTPUT I
2
3 Re�rossion Stafistjcs
4 Multiple· R 0.470626476
t-
1i R Square 0_22 148.928
f-
El ,Adjustoo R Squam 0 .21703428:3
>---
7 Standard Ermr
f--·-
1_2 3943329 t I
8 0 bs•ervatlons 704 I
� �
_.!..
10 AN OVA
11 df SS MS F Sii:/nifica-nce F
12 Regression 4 2-5346.00835. 6336"502086 4�L71704395 UB666E-3T
>--
13 Residual 699 B�08l'L4S154 127.4513641
f-
14 Total 703 114434.46991 I
15
161 .coefflc:fents Starrdaro Error t Stat P-value Loww!l'5% Uppe1 !15%· Lo�;er 95. 0% Upper\15.0%
17 ln.tercepl -6. 505572:13.3 2.302150012 -2_86930569'2 0_004237901 -11-12552952 -2..085614746 -1 '1. 125 52952 -2'. 08551474·6
18 educ 2'- 17:2'55 3 705 o� 164007663 -13.246659'7 6-7&256E.:-36 1.850547039 2.49466037 i._8505470r39 2_49456037
19 black -5.089'35�·91'.6 _].�94059.954 -1.9543�3537 0 . 0 5 1 0530 1 -10.20·20782·6 0.023358424 -10-20207326 0·. 02335.8424
][ female
21 black x female
_5,_00507788-6
-f>.305574257
0.883742296. -5.650715684
3.445663602: 1.539783005
-
2.32'704E-08
0_ 124065
, -
. 8()1
-&.7441�2'013
- 1 .45S51&ot5
-3_2.66043 76 -6_744112013
-
12_0 7066453
�
-1_4595 16Wi
-32660437;6
1 2. 07066453
Go back to your cps4_small data worksheet, and then to the Sort dialog box. Change the Order
to Largest to Smallest, and select OK.
....��-----=-=-
. -- -.---
- - ---=--- ..:.---�- --- - �
- - - - - - - �--
- -
- -
'Sort rl]�
I �j a_dkl Le�el II x Q.elete 'Lewel II 0- �opy Leve'I 111 � � My data has b.eaders
In the Regression dialog box, for the southern workers wage equation, the Input Y Range
should be Al:A297, and the Input X Range should be Ml:P297. Check the box next to Labels.
Select New Worksheet Ply and name it South Wage Equation. Finally select OK.
Using Indicator Variables 191
Input
lnputYRan�e: I $h.$1::1\A$297 �
Input ![ Range : 1$<1$1:.$?$197 �
t:!e'lp
� !..abels. D Constant is !!:ero
D C."Onfidencetevel� �·-y.
Outµut opb" ans
0 Qu:lputRange;: I ·r let th
��QU E r:iJ I
age
The result is (see also column (3) in Table 7.5 p. 269 of Principles ofEconometrics, 4e):
A I B I c D I E F G H I I
_j_ SUMMARY OUTPUT
2
_3 I ReQression Stalfstics
4_ 'MlJltipile R 0-429191687
--
-
5 R Square
6 Adjuste.� H Square
Q_ 184iCJ5504
D.172S91834
7 Sta.ndaid E'r.ror i 1.as478:>91
-·
8 O.t•servat:i1Jns 296
-
-M ANOVA T
-11 I
�R��ssion Re.sidual
df
2�1
4
SS
*!
tlhl a ck
fe m al e
1 tllack xfuma1e
-3.384953936
-4. I039580fl9
2.3&9740418
2.57926843!!
1.58062'1274
3.:J.82738.729
-1.3.1237365
-2.59642086
0. 70 05390 04
CL 190428274
0.009r89!l453
Cl.484150483
-fl..46134 9801
-72�4857006
-4 .28 79 9509
1.i691'4�19.3 -8:461349801 1:69142193
-0.993059091 -7.214857006 -Oi.993Q59091
9.027475927 -4.287995·09 9.027475921
Go back to your cps4_small data worksheet. In cells Xl:Z2, enter the following column labels
and formulas.
x y z
1 ln(waee) educ female
2 =ln(A2) =M2 =02
Copy the content of cells X2:Z2 to cells X3:Z1001. Here is how your table should look (only the
first five values are shown below):
192 Chapter 7
I x I y I l
1 lnlwaget educ female
2 2_929'524 16. 1
3
-
3.25&172 14 1
4
-
3.766-9'97 16. i
-
5 2.956472 12 0
6 2.6396.5t 14 i
In the Regression dialog box, the Input Y Range should be Xl:XlOOl, and the Input X Range
should be Yl:ZlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Log-Linear Model with Dummy (Dummy Variables is another name used for Indicator
Variables). Finally select OK.
--
-�
�rn� n m�
Input
Input! Range:
Input� Range:
I .$>::;$1: $X$1001
$Y$1:$Z$1001
�
�
� eel
t!"�
�Labels D Constant isl/HO
D CGnfjdence Levd: �%
Output options
0 Qutput Rarnge: �1
0 New Worksheet B)y: I 1ear Mod et 1A·ith Dummy I
111 A I B I c I D I E I F I G I H I I
I
f
1 SUMMARY OUTPUT
'2
� I Reqressioa Slatislics
I
I
4 Multiplt)R 0.4704•&4761
5 R Square 0.221337091 I
c--g- Adjuste� R Square o.219775079
cl-.
f--"�
S.t�ndard. Error 0.51286-2309
8 JObservat1pns 1000
9 I
to AN OVA I
.iti
11
Regressic;r1
R�s1dual
df
2
997
SS MS
74.54206772 37 .271 (}3386
262.238&647 -0.26302:774 8
F
14-1.7000075
Sig_nificance F
6.B8208E-55
336.7807:125
-
Total ·999
151
161 Coefficients Standard Error t Stat P-11aiue Lo�1er 95% ue.eer95% Lower95.0% ueeer!!5.0�
J:1 1 ntew erit 1.553857936 0.()843785>78 19.60056664 1.299'.IE-72 H882a7955 1.819447917 1.46828795,5 1.819447'917
18 educ 0.09624B417 OJJD6036534 15_9443:1559 :n&1'J2E-51 o_os440:2547 o:1oao·94187 !Unl440264 7 O_ ioB094187
1'9 1iemale -0.2432 1 395 8 a·_o32m.s 05 -7.4J14 8'4915 2'.30536E-13 ag;91 24 -ojo7436652
-- . . ·-·-
f 1 if Coke is chosen
where COKE=
l 0 if_ Pepsi_ is
_
chosen
Open the Excel file coke. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it coke data, and in it, copy the data set you just opened.
lt:l K" I �
J 11nmrt Worhh<'et (S:h1ft-ill] � L-v'
In the Regression dialog box, the Input Y Range should be Al:A1141, and the Input X Range
should be Dl:F1141. Check the box next to Labels. Select New Worksheet Ply and name it
Marketing Model. Finally select OK .
. ----------- -- --·-- -
A I B I c I D I E I F I G I H I I
1G Cor;ffrcienl:> Standard Error I tSlaf P-vaiue Lower95% Upper95% L·ower 95. 0% Uep_er95.0%
Intercept 0.8_90�.'1:5056 0.065484883 4.14!l47E-39 0.761730152 0.751730152: 1.018699959
$
18 dis i:>_p.e-i>si -0.1·65•663685
�
0.035599674
13.59420692
-4.G5:i516aa2: 3.541i99E-O 6 -0.235512182
1.018699959
-0.095815187 -0.23551218:2 -0'.09581,5187
19 dis.p_rnke 0.077174455 0.034391933 2243969687 0. 0•2'.5(}2:6 3 35 0.009695612' 0.144653298
-- 0.01l959·5612: Jl'-1,44553298
>--- �-
21l prati() -0:4008161399 0 06"1-349448 -6.534066944' 9_541 HE-11 -0.521232352 -0.280490445 -0.521232352'. -0.' 280490445
(7.8)
194 Chapter 7
(7.9)
where SMALL
= {1 if the student was assigned to a small class
0 otherwise
and TOTALSCORE is the combined reading and math achievement scores; TCHEXPER is the
teacher years of experience.
Open the Excel file star. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it star data, and in it, copy the data set you just opened.
We first sort our star data so we can easily select the subset of regular and small classes only-
those characterized by the absence of teacher aide. Go to the Data tab in the middle of your tab
list on top of your screen. Select all star data. On the Sort & Filter group of commands, select
the Sort button.
A Sort dialog box opens. Select the box next to My data has headers. Select the aide variable
column in the Sort by window. Values should be selected in the Sort On window and Largest
to Smallest in the Order window. Finally, select OK.
r -- ------
--- -- -- -
� 00 00
[ 0�! �dd Level J [ )( Q.�lete te... el I l -- !::;,opy Level 11- [ Qpjions: .. ] � My aata Iii.as b.eader.s
For model (7.8), the Input Y Range should be Hl:H3744, and the Input X Range should be
Ql:Q3744. Check the box next to Labels. Select New Worksheet Ply and name it Star Model
1. Finally select OK.
Using Indicator Variables 195
The result is (see also column (1) in Table 7.7 p. 280 of Principles ofEconometrics, 4e):
A I B I c D I E I F I G I H I I
-
1. SUMMARY OUTPUT
2
f--- .
3 Re�Jessron St9l1st1cs
4 Multiple R 0_092483170
'T R_Square 0.0085531'38
6 f'\<;lju�;ted R S11uare 0_0002aa 11 &
1-,-
Sta ridard ETror 74-55066365
'3 Ob.sO?rvalions 374�
9
Ta AN OVA
11 df SS MS F Sianificance F
jf� Regressio Ii 1 179850-2664 179850-2664 32-27332709 1.4414 7E-OB
13 Resid'ual 374 f 20847551.44. 5512.r215s3
14 Total 3742' 21027401 71
1'5 1
16 Goefficienfa Slandard Error 1 srat P-value Lovier 95% Upper95% Lo�ver95.0% Upper950%
17 lntar.c.ep.t 918.0428928 1. 6.&7156939 550.6637504 0 914.7742'6TB 921.3115178 914 7742678 921.3115178'
-
1a .small 13.89899446> 2-445591778 5.6.8095181 1.44147E··08 9.102210873 18-69577804 9_102210873 18.69577804
Go back to your sorted star data worksheet. In cells Tl:U2, enter the following column labels
and formulas.
T u
1 small tchexper
2 =Q2 =D2
Note that all we are doing is copying the values of SMALL and TCHEXPER in columns T-U so
as to create columns of explanatory variables next to one another.
Copy the content of cells T2:U2 to cells T3:U5787. Here is how your table should look (only the
first five values are shown below): -
T I u
1 small tchex.per
2 1 3
""3 0 12
4 1 7
5 1 4
1--
6 0 6
For model (7.9), the Input Y Range should be Hl:H3744, and the Input X Range should be
Tl:U3744. Check the box next to Labels. Select New Worksheet Ply and name it Star Model 2.
Finally select OK.
196 Chapter 7
) R-eg-r-es -s-
o
i _n_______________ �rgj
1nput
The result is (see also column (2) in Table 7.7 p. 280 of Principles ofEconometrics, 4e):
1-rlsu i A I 'B I c I D I E I F G H I I
1 MMARY OUTPUT
l
11
�['ll u )tipl�R
Regression Stalrsl1cs
O_ l276529>9'J
=f=IAoj�st_ed
[
5 RSquare o_o1 5346,JBS
R _Square 0_(11582'037
7 Stamfard Error 74355629'79' !-
- .·- - j
.1 0 b•S·B rvati on s 3743
9
10AN OVA t
_!!_I
-·
df SS MS F Sifi.nificance F
J.?� Regr�ssio_n z: 343722.0648, 171861.0324 31.07572116 4.12.E-W
ITTl Re�i dLJa.I
Total --
3.7'40,'
37'42"
201183&79,_64: 553°-395627
2102'7401_71 - -t
15
J§. 1 CoO'!tficients Sf1N�<ia.rd En·or rstal P-�alue Lower95% Up_p_er 95% l.ower95.0% U[!_[!_l'!T 95.0%
JL Intercept 907 _554342'9- 2.542413485 3.56.9696072 0 902.57'.91)9'1 912.5489948
. --
902.57969'1 912.5489948.
.s.i3i12b1s2 1a_i&1S:9n.sl1
� -· - --- . . .
18 small 13}JS32'68; 35. 2A3rn2osi· 1.03937E-08 9204&38�42" 18-7618977.5 -9-204'63"8942
Ts lchex:1>.er USS51053:2' 0.212.275513 .5.443447136 5. .56172E-08 (J_73g;32349-5 1 .571697.569 C.739323495
. 1.571097569'
FREELUNCH =
{1 � free lunch �s
f provided
.
0 if free lunch is not provide
Go back to your sorted star data worksheet. In cells Vl: Y2, enter the following column labels
and formulas.
v w x y
1 boy white asian tchexper Freelunch
2 = 12 = J2 = D2 =N2
Note that all we are doing is copying the values of our explanatory variables in columns V-Y that
are next to one another.
Using Indicator Variables 197
Copy the content of cells V2:Y2 to cells V3:Y5787. Here is how your table should look (only the
first five values are shown below):
v I w I x I y
1 boy whit-e asian tchexper free lunch
,__
2 0 0 3 1
,___
3 1 1 12 0
.._...__
4 1 1 7 0
,__
5 0 1 4 1
1 6 0,
� 1
For model (7.10), the Input Y Range should be Ql:Q5787, and the Input X Range should be
Vl:Y5787. Check the box next to Labels. Select New Worksheet Ply and name it Check
Random Assignment Model. Finally select OK.
� ------- -- - -- -
Regression
LZJ�
1nput
Input y_ Rarige: I $T$1 :$T$3744 [�l
Jnput�Range: I$V$1 :$Y$3744 [�]
tfelp
� Labels D Constant is ;;:.ero
Ocon[idence Le'vel: �%
Output opfom;
0 Q_utput Range: �1
0 NewWorksheet BJy: I m Assignment Model I
A I B I c D I E I F G: I H I I
1 SUMMARY OUTPUT
T
i_I Ren_re-ssion Sfalis 1.ios: I
4- Mu ltipl e R 0 0079470[)5
5 R Square· 6'. 31549E-05
" -
6. Ad]us!ed R Sqware'- -0.001006868
f--
l Standard Error D.499043.954
8 0 ose rvatio ns 3743
9
r---
10 ANO VA t
11 I df SS MS c:
r Significance F
t
12 Regres'Sion 4 0. 0 587 9 6:4 78 0.014699119 0.0590:21973 o·.9:9 3554 656
930�9297154 o . .24!11 0 448&8
·--
l3 Res.idual .'.3738
14 Total 3742 930.988�119 I I
1.5
16 C.o&ffitients. Stimdaro EITOr tSlat P-v:alue Lo1>11&r95% Vpp6'r95% lower95.0% U;:.pef 95. 0,%
17 lnterc_�pt 0_46646232.7 0_025155394 18.5424953 7
- - ·-
1.G.S957E-73 0_417140731 0.5157839'23 0·.417140731 0.515783923.
18 b-oy 0. 0014 10759 O_Ol63JB·S12 0.08&345603 0 .931196313 -0.030622509 0_033444G26 -0 .030622509' 0.033444025
19 white- asfan -
0.004405672 o_o 19597025 0�224813302 0_822136805 -0.034016231 o._([42:�27?! 5 �0.034016��1 Cl>.042827575
B:s : .
.. ....
-�_4.1 Bi685 CJ.00221s4fis
,___
20 tchexper -0.000602546 iHJ 0 1438 0_'6754093811 -0.003423556 0_002218465 -0.00342355&
21 freelunch -0 000-885877 o_ri1B1.9Ji1 1 -Q_0486'9297il 0_961166577 -0.0 3 655 5:267 0.034783513 -IJ.036555267 �.034783513
198 Chapter 7
(7.11)
In equation (7.12), we add explanatory variables in addition to the ones included in (7.11):
(7.12)
In equation (7.13), we add explanatory variables in addition to the ones included in (7.12):
(7.13)
Open the Excel file njmin3. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it njmin3 data, and in it, copy the data set you just opened.
ltl K:
.. l
I llme0rt Worl<51\eet (Sh.iff-Fll! I
Using Indicator Variables 199
0 p Q R s T u v w x
1 n.i d d nj kfc roys wendys co-owned southj central.i pal
2 =G2 =L2 =M2 =12 =J2 =K2 =A2 =B2 =C2 =D2
Note that all we are doing is copying the values of our explanatory variables in columns 0-Q that
are next to one another.
Copy the content of cells 02:X2 to cells 03:X821. Here is how your table should look (only the
first five values are shown below):
"I 0 I p I Q I R I s I T u v I w I x
I
-
nj d d_nij kfc: roys• wendys co-nw.ned southj_ c:entra!j pa1
2
- , 0 0 0 rn 0 0 o�. ·1 0
3
- 1 o. -0 Q1 0.1 0 0 (} 1 o,
_i_ 1 0 --·
0 0 1 0 0 O'' 1
a!
� 1 0• 0 0 1 0 1 0 0 0
__§____ 1 0 a 0 D' a 0 0 0 o:
We first sort our data according to fte because we have missing values, which means we cannot
use the corresponding observations to estimate our regression model. Go to the Data tab in the
middle of your tab list on top of your screen. Select all your njmin3 data. On the Sort & Filter
group of commands, select the Sort button.
A Sort dialog box opens. Select the box next to My data has headers. Select the fte variable
column in the Sort by window. Values should be selected in the Sort On window and Largest
to Smallest in the Order window. Finally, select OK.
For model (7.11), the Input Y Range should be Nl:N795, and the Input X Range should be
Ol:Q795. Check the box next to Labels. Select New Worksheet Ply and name it Minimum
Wage Model 1. Finally select OK.
200 Chapter 7
������������������ -
Input'� Range.;
I $1\1�1:$1\1�795
I $0Sl::$Q$795
[�]
�
� .
!::ielp
�!.,abeh D C:m1stant'is f_ero
D Gonfidence Le'i'el: EJ %
Output optton�-
0 Qutput Range: �1
@New Worksheet Efr: I Minimum Wa'ge Model �I
The result is (see also column (1) in Table 7.9 p. 285 of Principles ofEconometrics, 4e):
A I B I c I D I E F I G I H ! I I
SUMMARY OUTPUT I I
-1_
2' I
t
3 Re,qres.sion Statistics.·
4 Mrultipl�R Q1_08503�6f41 I
5 R �q,uare
-
(i, A_djustoo_ R Square
G_ 0074 01277
01.00303•1915
!
t 1
J_ Standa.rd E::rr,or
a, ObseroatiGns
!M05618976
794 j
9,
w ANOVA l
J.:11 df SS MS F Sig_n.ific'1111 Ge F j
1 2' R_e9re s s I o_n _31 52'1. 11-64632 173-7054877 ·1_953635584 a_ 11798 2119
f----<-
13 Re·sif1ual 7901 69'8 87-8 7797 8 8.465658 32'
14 Total 79;3 70408.99444 I j
I
1.6
rnJ Coefficients Standard' Error t Stat P-value Lower95% Upper95% L.ower 95_ 0% Upoe.r�50%
17 lritercept 23-331168'83 1_Q7186976 21. 7§_679152 1_,_1635_,_JE-82 21-22711921 25_43521846 2'_!,22711921 25.4352'1846·
ffi nj -2-8917607.31 1.193523676 "2.422876721 0.01�i6219'92 -5-?34&13.598 -0_.548�07955
-
-5.2346'13508 -0_548;JC1'l' 95.5:
�d
. I - -
-2'. 1'65.584416 1.51585275-2 - 1.428624'.523 © . 153507425 -5.1411599'3.2
. - . 0.809991101 -5.1411599132 ·0.809991101
'2e d nj 2:.753605783 u:s8409131 1 ..63o-ss77w (>_ 10331257
· 8 -0.56069296 6.0S.7904526 -0 .. 56069296 €.0&79045261
Go back to your njmin3 data worksheet. For model (7.12), the Input Y Range should be
Nl :N795, and the Input X Range should be 01:U795. Check the box next to Labels. Select
New Worksheet Ply and name it Minimum Wage Model 2. Finally select OK.
i l1Ns1::$N $7.9 5 �
Cam:el
�nptJJt�Range: I $0$1:�$795 �
tie'lp
� !,_abels. D Constiint is 7:,ero
D Con�dence Level: � "/.,
Output options
0 QLJ\put R;inge::
@New Worksheet E'.lv: I Minimum WilgE Model 21
Using Indicator Variables 201
The result is (see also column (2) in Table 7.9 p. 285 of Principles ofEconometrics, 4e):
A I B I c I D I E I F I G I H I I
1 SUMMARY OUTPUT I I
r--
2
3 Re_Qression Sfatjslics ·
Tol ANovA -
-
11 1
,___
,df SS MS F Srgnificarwe F
12 Re�r-ess_ion 7 1J830_43275, 19.75_n61Cl8 27 .44785-2.59'
�
7-724nE-34
r--- --
-
J.!
19 d
nJ -2.376608094
-2_2235&5041
1.079192:119 -2'.2:022103�'8
1. 36;7;692339 , -1 _6:25778677'
0.02'794tl'p9
0.104.397.5'95
-4.49'504784
-4_908-326875
-0.258168•347 -4.49504784 -0.258168347
0-46'119S793 -4. 903326875 0-46i1_1%793
� ��j 2-8450.6.6555 1.523.336497 1.86765.22391 0: 06
-1
i 825
- a4 -Oc 1452266.13
- -
· 5Jl:-i5359724 -n_ 14s22s&l:3 5_83535�724
2i kfc
r--
-1 CL453·3897 1 0_848955906: - 1' 3 1 32 1 5 9 3
2 . ' 5'.5243 SE-32 -12_ 1198808 - 8 78 689 8-61 4
- ' -121196808 -8_ 1ass9mif4.
22 roys -1 _6:24999072 0_8�·9797951 -1-889977836 0.05:91286'21 -3:' 3121109gi 0_052772848 _3_·312no992 0_062772848
23 wandys -1 _0{)3708623 0_9'29-1'5025 -1 1 448 187 5
. 4- 0252632J2 9 -2 8 !!76 1 8 1 81
- 0-760200934
-
-2_887518181
- 0_7602009
· �4
f--
24 co-ownoo -1.168'54545 0.7161661246' -1.&3166786'.6 0_1-Q:3150035 -2. 5 743 702.4 7 0.2372l9347 -2.574370247 0.237279'347
Go back to your njmin3 data worksheet. For model (7.13), the Input Y Range should be
Nl :N795, and the Input X Range should be 01 :X795. Check the box next to Labels. Select
New Worksheet Ply and name it Minimum Wage Model 3. Finally select OK.
------ -
----
- � tg)
R�;�
l'.nput
OKtsj
lDput Y. Range:
I $N$.1::��795 �
I $0s1:$.l($]9s
Cancel
l
lrnp1.11t�Range:
�
tJ.elp
01,abels- D Cons.tanfis !(;.ero
0 Con!:iderJce Level; � D/�
Output optmns,
I 0 Qul:piJt Range.: �1
@ New Worksheet !:ly; I Minimum Wa:ge- Model sJ I
202 Chapter 7
The result is (see also column (3) in Table 7.9 p. 285 of Principles ofEconometrics, 4e):
A I B I c I D I E I F I G I H I I I
-
1 SUMMARY OUTPUT
f-
2
3 _R.·eqression Slaf"islics
4 Multiple- R 0 .470 527732:'
T
,___
R'squ<ne· 022{39634&
6i Adjustoo R S·quar>e 0-211452494
7 Standard Err.or 8_3674 1691'31
r--
9 ObseNatioris 7.94.:
9 1
10 AN OVA
I
-+
1·1 df SS MS F SigniffoaJJGe F
10• 22_2646449'7 7 _05,564E-3:7
g R�gre�_sio·n 155!l8_29412 1558-829412
13 Residual 783 54820-70G32 70_0136&58
14 Tota.I 793 70408_99444
15 I
1& 1 Coefficients Standa.r:d Error t Stat P-value Lowe-r-95% Upper95% L ower 95 0%
_ Upper95.0%
1 7 lnte.rcept 2?)20:5'1?� 1.2_1Cl90Jl35 20· 91 Oi4
' 361'7
.. _ 1- ?391E-FI 22,9435119'4 27_6975� 346. 22_ 94351 194 27.6975.1346
m nj
� • ·
-0_90·7%3'605 1 -2'1'1741824 -0-713%2776 ()_475469143 -3_404390609 1-5884633991 -3.4 043 9050� 1-588453399
J1_ d 2 21 1 85 0 952'
- . ' 1.340859-584 -1.6·39793333 0.101449806 -4.es9659985 0.4 3.595 808 -4.859659985 0.4359'5808
JJL d_nj
kfc
2.8.1490803
-f0_0580'01'i3.
1.5�i36;1'65 1.873630464
0·.844S-i1 0 8 9 -11-90759558
().06135362'7
3_6J.754E�3o
-0.134264�.5.5 5.764080614 -0.134264.555 5.7�Q80§14
-11-7160896-2 �aj�9g13837 -1 1-71608962 -8_399913837
22--
22' roys -1-693 392 5'95 0_85·918373 - 1 970 93 1 8 7
- · ' (Ul49083476 -3.37 9968772 -0_006816418 -3-37996sn2 -0_006816418
2 3 wendy�
1---
- 1 0 64-95 1 933
- . 0_�0538473 -1_15675 3612 0241nsfo9 -2_8.72163664 0_742259798 -2-B721�3S64 0_742259798
� CO-O·Wned -
- 0 71 6 309731
_ H
I -18·990484 -0_9·96271505 O_J1'9426023 -2_ 12768'6808
0.695067345 -2- 127686S08 0_6:95057346
25 S·Outhj' - UG 1 7&D689 1 o_ nsi953:1'91 - -:4-746_131�:59 ?A64'8?E-06 ��232807456 -2 _170713:923 -5-232807456 -2-17071392:3
f----
26 �>efltralj 0 ,001 7 88}354 Qo_8,9749%57 o_9D87p744 0_9·92993914 -1-75�9�947 1-769 661655 -1-_?53894947 1.769651655
e---- _
2.7 pa1 0.923861954 1.384927728 0.�67083152 0.5049':1.5554 -1.794746784 .3:642472692 -1. 794748784 3.642472692
(7.14)
Go back to your njmin3 data your njmin3 data. and then worksheet, select all go to the Sort
dialog box. Change the variable in the Sort by window to demp, and select OK.
� My:,dara hasbeaders
Order
l�:::: :: =:::::::: :::::::: ::: ::::::i tij I�v_alu_e_s _______
..,�I I Largest to Smallest Cancel LI
For model (7.11), the Input Y Range should be Fl:F769, and the Input X Range should be
Gl:G769. Check the box next to Labels. Select New Worksheet Ply and name it Minimum
Wage Model 4. Finally select OK.
Using Indicator Variables 203
,..
L1Jrg)
Input
Input I Range·:
Caruol
lnptJt� Range.:
:$G$1 : $13:$769
!::ielp.
0!.abefs. D Constantis �eni
D Confidencelevel:
ou-lput op
0 Qutput Range·:
@ New Worksheet !:.ll1; Minimum Wage Model
- -= -- -
I Reg ression
Stetistia.s
�%
Multipl.e R 0_120992&85
c on�.
R Square OJM4639'23
Adjusted R Square 0_013352:858 ool
Standard Error I 41
ObsEl'rvati-ons
10 ANOV.A.
F
Regression 912_81Z3828 912_8173828 .()_OD 0'779502
Re.sidual 61441.376&7 80.2'106745
1�1
II Total I B I 62354_19405
c I D I
I
E I F I G I H I
3 I lnt�r��ptReurossion -2.2B3;?J3:3�3
Standard" Error
OJ3'12577CM
t Stat P-value
0_0018608.17
Lower95%
-_3-7188
_ 40264
lm¥er 95.D%
i
-01• 84 782-64013 : -3. 7188402<64 -0_847826403
'
95_0% :
'
8_95604.1 22 9
J_ :
8: 768
9
r----
'
11 df SS MS F Sjqnificarroe
•
12 1 1 U 8 0248'.IB
T3 766
T4
- '
707
15
J fi
16[
18 nJ
Goefticfenfs
0.81518'6215
-3_122474225�
3 .3Z3462.344
Upper95%
. . .
Upper
CHAPTER 8
Heteroskedasticity
Chapter Outline
8.1 The Nature of Heteroskedasticity 8.3 Heteroskedasticity-Consistent Standard Errors
8.2 Detecting Heteroskedasticity or the White Standard Errrors
8.2.1 Residual Plots 8.4 Generalized Least Squares: Known Form of
8.2.2 Lagrange Multiplier Tests Variance
8.2.2a Using the Lagrange Multiplier or 8.4.1 Variance Proportional to x: Food
Breusch-Pagan Test Expenditure Example
8.2.2b Using the White Test 8.4.2 Grouped Data: Wage Equation Example
8.2.3 The Goldfeld-Quandt Test 8.4.2a Separate Wage Equations for
8.2.3a The Logic of the Test Metropolitan and Rural Areas
8.2.3b Test Template 8.4.2b GLS Wage Equation
8.2.3c Wage Equation Example 8.5 Generalized Least Squares: Unknown Form of
8.2.3d Food Expenditure Example Variance
This chapter is concerned with the nature of heteroskedasticity, tests for heteroskedasticity, as
well as generalized least squares estimation for heteroskedastic models.
(8.1)
where y is weekly food expenditure in dollars and x is weekly income in units of $100 for a
random sample of40 three-person households.
Open the Excel file food. Save your file as POE Chapter 8. Rename sheet 1 food data.
In this section we illustrate the nature ofheteroskedasticity by re-estimating (8.1) and plotting the
estimated regression line along with the food expenditure data.
204
Heteroskedasticity 205
In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Bl:B41. Check the box next to Labels. Select New Worksheet Ply and name it Food
Expenditure Equation. Check the boxes next to Residual Plots and Line Fit Plots. Finally
select OK.
' ------- - - -
I R·eg r<.'5SJ.Qll
rn�
Input
Input)':Ran.;ie: l�s1:$A:S"!1
1�1
,!::'[elp
�labels D Constant is. -�ero
0 ConBdenc:e Level:. � %
0 Qutput Rar:ige: �1
® Ne1111 Wcr'ksheet�e)y: I xpern:liwre Equation I
0 New Worl:book
Residuals
The regression analysis results are (see also p. 300 in Principles ofEconometrics, 4e):
A I B I c I D I E I F G I H I I
i--'.!-i S UMMARY OLJTPUif
Y. I
3 I R:epession Statfatics
�'!_ Multiple R 0.620485472
�R Square 0.385002221
G AdjLISterl R Sqllare 0.368818069·
Vl Standa rd Error 89.51700429
m o bs eo:valion� 40
�
1 0 !ANOVA
11 df SS MS F Sig_nifiuam;e F
1 2 Regr1€lssion 1 190_626.97&8 190626·. 9788 �3.788841:07 1,945�6E-OS:
_ l
i 3 Residu�I JB 304505.1742 301329-4058
"14 Total 39 495132.153
15
16 Coefficients Standam Error l Sfaf P-value· Lower95% U;ee_er 95% Lovi'er 95.0% Upper 95. O"�
H lnteccept 83.41600997 43.41016192 1-921577951 G.062182379 -4.463267721 111_295wn -4.463267721 1711.2%28771
r- -
1 8 in-come 10-2096425 2.09.3263461 4.8773805!;4 1.94586E-05 5.9720522021 14.4472328 5.972052202 14!.4472328
After editing the income Line Fit Plot (see Section 2.3.4 for more details on how to do that), you
should obtain a replica of Figure 8.2 p. 301 in Principles ofEconometrics, 4e:
206 Chapter 8
Figme8.:Z
0
0
lJ)
0
� 0
"' .
.5 .
..
� 0
" 0
... ...
...,
..
l
0
"'
" "'
.,
�
1000 �----- Ii D
!i l1Nl!N!!�)IW!\�l\\\\1,1,1,l,J,l,l,I\(�((
::.. 0
so�
....
By checking the box next to Line Fit Plots in the Regression dialog box, you were able to obtain
a replica of Figure 8.2 p. 301 in Principles of Econometrics, 4e (Section 8.1 ). If you go back to
your Food Expenditure Equation worksheet, you will find the plot of the residuals against
income, which was generated following your selection of Residual Plots in the Regression
dialog box.
C
i ioo
� U •• �r�t',f
•i<l
t
titi
I l
li -100 rn iP ""
-200 j :
-&OO 1 inmmr-
Consider the following general heteroskedasticity assumption for the food expenditure model:
(8.2)
Consequently, the null and alternative hypotheses for a test for heteroskedasticity based on the
variance function (8.2) are: H0: a2 = 0 and H1: a2 * 0.
Heteroskedasticity 207
(8.3)
where e'f are the squares of the least squares residuals from model (8.1).
When H0 is true, then the sample size Nmultiplied by the R2 goodness-of-fit statistic from (8.3)
has a chi-square distribution with m = S - 1 degrees of freedom, where S is the number of
parameters in (8.3):
2 z 2
X N x
R -xcm=s-1) (8.4)
_
Because a large R2 value provides evidence against the null hypothesis, the rejection region for
the statistic in (8.4) is the right tail of the distribution. Thus for a a% significance level, we reject
H0 and conclude that heteroskedasticity exists when the computed x2 -statistic is greater than the
chi-square critical value Xc1-a,m=S-1)·
2
X(m)
x.2- alue
2
Xc1-a,m=s-1)
Note that we have used a test based on a chi-square distribution before, in Section 4.6.2 (for the
Jarque-Bera test for Normality).
Go back to your food expenditure equation worksheet, if you are not there already. In cells
D24:D25 enter the following column label and formula.
D
2
24 residuals
25 =C25/\2
Copy the content of cell D25 to cells D26:D64. Here is how your table should look (only the first
five values are shown below):
D
24 residuafs 2
25 34.452084J.3
26 s 9. 9M I 99a.5
21 158.0505536
28 go.1.2o972(}7
29 5 S0.7.54189'9
208 Chapter 8
In the Regression dialog box, the Input Y Range should be D24:D64, from your food
expenditure equation worksheet, and the Input X Range should be Bl:B41, from your food
data worksheet. Check the box next to Labels. Select New Worksheet Ply and name it
Variance Function. Finally select OK.
0 QutputRange: �1
® New Wbrlciheet Ely: I Variance Function ! I
The result is:
I
I B I c D I E I F I G H I I
7 SUMMAR� OuTPLJT I
2
r-1-1 Reoretiiiion SfatistiGs
Multipl� R 0.429663-36
,_i_
y_ R Square (>, 1 S.4610"&03
6 Adjusted R Square 0·.163152"9'87
7 Standard Error .9946.!}40"92
• 2
8 Observations. 40
s
f-- ·-· --
10 ANO VA
11 I
I
df SS MS F Sif}_nificanc� F
12
,___
Regression
'
1 8.511930·27 851193027 8.60350028 0.00!5&5S
-- 104
Test Template:
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Lagrange Multiplier Test.
A B c
1 Data Input N= ='Variance Function'!B8
2 S= ='Variance Function'!B 12+ 1
1
3 R = ='Variance Function'!B5
4 a=
Heteroskedasticity 209
A B c
6 Computed Values rn= =C2-1
7 x2-critical =CHIINV(C4,C6)
value=
8
9 Lagrange x2= =Cl*C3
Multiplier Test
10 Conclusion= =IF(C9>=C7,"Reject Ho","Do Not Reject Ho")
11 p-value= =CHIDIST(C9,C6)
12 Conclusion= =IF(Cll<=C4,"Reject Ho'',"Do Not Reject Ho")
At a = 0.05, the result of the test is (see also p. 306 in Principles ofEconometrics, 4e):
, B c
, A I I
1 Data ln1mt N= 40
T S-= 2
3 Rz= 0_ 184611
a=
� 0_05
_§____
_§___ Comp,utedi Values m=
For the White version of the test, we base the test statistic on the following variance function:
(8.5)
where ef are the squares of the least squares residuals from model (8.1).
Go back to your food data worksheet. In cell Cl, enter the column label x2• In cell C2 type the
formula =B2"2; copy it to cells C3:C41. Here is how your table should look (only the first five
values are shown below):
c
1 x2
2 13_6161
3 19-272'.1
4 22_5625
5 J6.36tl9
6 155.5009
In the dialog box, the Input Y Range should be D24:D64, from your food
Regression
expenditure equation worksheet, and the Input X Range should be Bl:C41, from your food
data worksheet. Check the box next to Labels. Select Output Range and specify it to be cell Al
in your Variance Function worksheet: you can place your cursor in the Output Range window
and move it to that cell to do that, or type 'Variance Function' !Al in the Output Range
window. Finally, select OK.
210 Chapter 8
Jnput
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is:
A I B I c I D I E I 'F I G I H I I
1 SUMMARY OUP
T UT
T
3 I R&:_c:ression Statistics
_i_ Mu lt iple R 0-434599748
. � . -
--
8 Observations 40
9
1fr ANOVA l-
111 I df SS MS F Sill_rriflcanc;e F
J� ReQFession 2 8'70>864-013!1.9 435432(}19-.4 4.307883213 0.020801 �88·:
1 3· Residua.I 37 3739884282 1 o·forrs53.G
14 Total 39, 4610748321
15
1S .yoetf.icienfs Stamfa.rd Error I Stat P-value Lower 95% U(>per95% LOV/&f 950% Upper 95.0%
JI lnterce;pt -29 0 8 785889'
_ 8100.107691 -0.359104595 ..0.721558112: -19-32U&2'9r2: 13503,59114 -19321_
-
16292 13503.5'9114
.
rn income 291.7463539 915.. 8460198 0 .318553935 0.75185.6075 -1563.9.rnH4 2147.426642 -1563.!}33934 214J. 42-&64?, 1
- ' ' {
-40.11'67009'4 62.4:4723543 £0.·11 5foo94 52.44723543..
c •
19> x2
�
At a = 0.05, the result of the test is (see also p. 306 in Principles ofEconometrics, 4e):
A B c
1 Da111 lnp-ut N= 40
2 S= 3
3 Ri:·= O'. 108877
4 11= (JJJ5
5
6 Computed Values m= 2
7 ;('-criti-cal value= 5-.99'i465
a
9 Lagrange Multiplier Test l = L555076.
10 Concl.usion Reje-ct f-:io,
=- ·
11 p-value = 0•.02'2879
12 Conclusion = Reject Ho
Consider the right-tail hypothesis test: H0: (J i = (Jf against H1: (Jf > (Jf , where (J i is the error
variance of subsample 1 model and (J � is the error variance of subsample 2 model. If H0 is true,
Heteroskedasticity 211
(8.6)
where Bf is the estimated error variance from subsample 1 model with Ki parameters and Ni
observations; 8� is the estimated error variance from subsample 2 model with K2 parameters and
N2 observations.
If H0 is not true, then the value of the computed F-statistic will tend to be unusually large. We
will reject the null hypothesis if F > Fe, where Fe is the critical value shown below.
.g
..,
c
2
The right-tail Goldfeld-Quandt Test is similar to the F-test from Section 6.1.
For a two-tail hypothesis test: Hi: al * <Ji. If H0 is not true, then the value of the computed F
statistic will tend to be unusually large or unusually small. We will reject the null hypothesis if
F < FLe or F > Fue where FLe and Fue are the lower and upper critical values shown below.
Note that in this case, a/2 of the probability is in each tail of the distribution.
reject Ho reject Ho
a/2
Fuc F
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Goldfeld-Quandt Test.
t Goldfeld:Quandt iTest, I
q I
212 Chapter 8
A B c
1 Data Input N1 = ='Subsample 1 Model'!B8
2 K1= ='Subsample 1 Model'!Bl2+1
A B c
3 MS Residual 1= ='Subsample 1 Model'!Dl3
4 N1= ='Subsample 2 Model'!B8
5 K1= ='Subsample 2 Model'!Bl2+1
6 MS Residual 2= ='Subsample 2 Model'!Dl3
7 a=
8
9 Computed m1 = =Cl-C2
Values
10 m1= =C4-C5
11 F-statistic= =C3/C6
12 Goldfeld-
Quandt test
13 Right-tail Fc= =FINV(C7,C9,C10)
14 Conclusion= =IF(Cll>=C13,"Reject Ho","Do Not Reject Ho")
15
16 Two-tail FLc= =FINV(l-C7/2,C9,Cl0)
17 Fuc= =FINV(C7/2,C9,Cl0)
18 Conclusion = =IF(OR(Cl1<=Cl6,Cll>=Cl7),"Reject Ho",
"Do Not Reject Ho")
Cells C16:Cl 7 are where the lower and upper critical values of the two-tail Goldfeld-Quandt test
are computed. Recall that, in this case, a/2 of the probability is in each tail of the distribution.
The FINV function, on the other hand, gives us a Fe value such that P (Fcm1,m2) >Fe ) =a. So,
what we need to do, to get the correct upper critical-value, is to divide the specified a value by 2
in the FINV function (see cell Cl 7). Further note that the FINV function returns a F-critical
value, once we have specify the probability to the right of that value. For our lower critical
value, the probability to its right is 1 - a/2; that is what we specify in cell C16.
where WAGE is hourly wage, EDUC is years of education, EXPER is years of experience, and
METRO is an indicator variable equal to 1 for workers who live in a metropolitan area and 0 for
workers who live in a rural area.
Heteroskedasticity 213
Open the Excel file cps2. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 8 in one file, create a new worksheet in your POE
Chapter 8 Excel file, rename it cps2 data, and in it, copy the data set you just opened.
I cps2 data,. ·
Insert a column to the left of the female column D (see Section 1.4 for more details on how to do
that). In your new cell Dl, enter the column label metro. In cell D2, enter the formula =K2; copy
it to cells K3:K1001. Now we have a table where the explanatory variables of interest to us are in
columns next to each other.
A I B I c I D
-
1 wage ,educ exp er metro
-
2 2 OJ 13 2 Qi
3' 2.07 12 7 1
__±___ 2.12 12 35 1
5 2.54· 16 20 1
fi; 2.68 12 24 1
In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Bl:DlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Wage Equation. Finally select OK.
-
. --------- - -
i Regression
I
f?]r8]
-
Input
Input Y Range:
ll'lput?;. Rfillge;
I $A $1: $.a.$100 1 �
I :$J1$: 1:·$[)$100 i [�] � I
t!.elp
� !:abels D Cqns!:i!lnt is £em
D Confldenc:eilevel: � "'"
Output options
0 Qulput Range: 1�1
@ New Worksheet E'.Jy: !wage Eqoation I
B --�
l C D -�
l_ E 1 F I I H I
_,__��
+ SUMMARY-- "_J_
OUTPUT
___ � _ __ _ _,__
3 R�wssion St11tislii;s
_i_ 'Multi_�le- R 0.5%6266,77 +
-5
__ R Square _ 0.2_6�90312�
6 Adjuste<l .R Sqware 0.264695001
T Standa.rd Err�r - 5.3564897
T ObseNatio-rr5 -moo
s
lQ ANOVA
1 1�-------�_f ___ _ s_s____ M_S ___ ___ F _s�m�
m_ F
fic_a_n c_�_
12 Re11rce•ssion 3 10404-28343 3468.09-4.475 1?0Jll3?919 !l-.2�965E-67
13 Residual 996 2'857721.397 28. 691981 9•
14 T t loa ·999 389B1.4974
Ts·1
Cot;ftiaienfs Standard Error t Stat P-vaJu-e Lower 95% Upper 95% l.oi6'er9q_0% 1Jppe.r 950%
-9.913984216 1.07566251 T -9.21 6630734- 1.77326E-19 .
-12.02480904- -7.80-3159397 -12:.0•2480904 - -7.80.3"159'397
j_23395399!J
. -· -
We estimate the following equation (8.8) twice-once for workers living in a metropolitan area
and again for workers living in a rural area.
We first sort our data according to the area of residence of the workers and then successively
estimate (8.8) with metropolitan area observations only (subsample 1), and with rural area
observations only (subsample 2).
Go back to your cps2 data worksheet. Select the whole worksheet by left clicking on the upper
left-comer of the worksheet. Your cursor should turn into a fat cross as shown below:
Select the Sort & Filter button in the Editing group of commands on the Home tab. On the drop
down menu, select Custom Sort.
I
U iortAto l
U S.Qrt Z to A
A Sort dialog box opens. Select the box next to My data has headers. Select the metro indicator
variable column in the Sort by window. Values should be selected in the Sort on window. Select
Largest to Smallest in the Order window. Finally, select OK.
Heteroskedasticity 215
In the Regression dialog box, for the metropolitan area wage equation, the Input Y Range
should be Al:A809, and the Input X Range should be Bl:C809. Check the box next to Labels.
Select New Worksheet Ply and name it Subsample 1 Model. Finally select OK.
.
Regres sion [!I)�
Input
Tuput Y Rar:ige: I $A:s1:'5A$!109 �
rnput � R>!rnge: I $8$1:$G$309 �
: c:iB
t!elp
�!,,abets D Const.al'.ltis f_ero
D Confidence Level: EJ o/o
O\Jtpu e.op.1ioos
Q QutputRange: 001
0 New Wa.rk:sheet.E)y: I subsample 1 Mo dell
I
I A I B I c D I E F I G I J:-t II I
m·S_U M MARY _OLJTPUT
3 I Reare•ssion SlatistiGs
,_1_ Mdltirile R 0 .508117361
0 .258183252
� R �g,u�·B>
Ao:Jjus!ed R SIJ.Uare 0.256:340229
._!__
L Standard Error -5.6412'5268
8 O b.servafo... �s . 808
+
9
To A'NOVA
111 df SS MS F Signi fi�11 c e F
J1_ Re9res.s1on 2 8916.17':1611 4458.085806 -
140.0868331 6.22867E-53 -
13. Residu.al 805 25618.1041 31.8237318
f--14 -
T1Jtal 807 34534.21511
15
16 I Coefficients Sti1ndard Error I Still P-11aiue l.ower95% Upper95% Lower95.0% Uf!P.eor 95.0%
�lnter�e�t -9.052478207 1 .18945608.2 7 6 1 0603 1 53
-
. 7.6J367E-14 -1 1.38727966 -6.717676756 -1 1.38727966 -6.7 1 7676.756
edu·c 1 .281714419 o:o 79762684 16. o·s969843 i,Jzs15E-5o 1. 12514fo33 1.43.a2s 1 so6 1.125147033 -f A.3B.281sci6
19 1ex�er 0.1345�9682 .
0 0 1 7947 5 84 f.4�7370149 1-71985E-13 O. O �9330096 0. 1 &9789269 0. 09•9330096 0 169789269
Go back to your cps2 data worksheet, and then to the Sort dialog box. Change the Order to
Smallest to Largest, and select OK.
5'lrtby v [ l. ,__v-al�
1:m:etr=o========�-= u e� �I ·r-,�E.Aert-�:_ - :�-,:=:.�-
s �-----�v - J- :g_:es
:� - -:::
: _. -�· :=: =:
t- :::-:=:: -:: ��-
__
- J �_,,
216 Chapter 8
In the Regression dialog box, (or rural area wage equation, the Input Y Range should be
Al:A193, and the Input X Range should be Bl:C193. Check the box next to Labels. Select
New Worksheet Ply and name it Subsample 2 Model. Finally select OK .
Reg:ressi()n ITJ�
Ihput
InpiJtrRa119e:
Inp1J1 li �ang e:
I$11.$1: $115193
1:$S$1::$C$193
m
[�)
� el
!::!elp
f�l_b_abels 0 Constant.is �ero
0 Con�dence Leve:I: @=)%
Output ol}tions
QQutput Range: I �1
® New Worksheet �I\\'� ISubsample 2 Model I
I I E
,_1i-------A B c D F I G H I I
SUMMARY OUTPUT
, --1__,
2
3[ Rearession Slalislics
4 Multiple R 0.508673076
�� R �q.u_are . _ (} 25'8748:298
....§_ A-Oj usted R S_quare 0.250904365 T
___]._9_0422_6?_6
,_l_ S!a:ndard I
t
Error
8 ·O·bserva:tions 192
19o·1ANOVA
1_:1_ I df SS MS F SiQnificunce F
2 1()05:642618 502.8213091 32.987 05973 5.H943E-13
,_g R·egr�ssion
13 Residual 189 2880.924466 15.24298659
'14 Total 191 388-6�567084
�
15
1&J Coefficients Standard Error t Sfat P-v-alue Lowe195% Uooe195% Lower95.0% UoDer95.0%
Jn_ti;irrnpt -6.16.5854725 1.89!!510693 -3247732418 0.001376545 -9.9'1084 7494 �2.42086195'6 -!f.910847494 -2.42086'1956
'
edl!c- 0.9555B5JsJ 0.133189909 _J.174607953 1.6011E-11 0.&9285.5629 121!i315137 0.&92855629 1.218:i.1 51}7
·ex per 0.125973719 a_o24na91 5 _()85538445 IL790&9E-O 7 0.0 77110627 0.174836811 0_()77110627 ()_ 1748-36811
At a = 0.05, the result of the Godfeld-Quandt test for the wage equation example is (see also p.
308 in Principles ofEconometrics, 4e): -
- - -
j A I
B I c
9 Co-mp_uted V11lue5 m1= 805
-
I-
2 K1= j 13 Rigid-tail. Fe= 1.215033'
3 MS R·esidual 1 = J.1.82373 14
-
"conclusion=RijE!ct Ho
4 N2= 192 -
15
I-
5 K2 = 3 16 Twu-1ail Fu,= 0.805198
-
�
We would like to test the hypothesis that the error variance increases as income increases. This is
a right-tail hypothesis test where: H0: CT I = CTf and H1: CT{ > CTf . To get the estimated error
variances and test this hypothesis, we will split our sample into two equal subsamples of 20
Heteroskedasticity 217
observations each, and successively estimate (8.1) with higher income observations only
(subsample 1), and with lower income observations only (subsample 2).
Go back to your food data worksheet. Place your cursor in any cell of the income data column B.
Go to the Data tab in the middle of your tab list. In the Sort & Filter group of commands select
the Sort Largest to Smallest button. Your data set should be sorted by descending order of
income values as shown below.
A I B
1 - food_exp income·
2- 37S._73; 33_4
3 257.96' 29.4
I ZA 1 1\. Sort fi!t�r , _., '4 587_65' 28.62
�'1 l\cfllaace"d 438.2'9. 27_ 1•S.
�"'{
..
Formu[a.s Data!J�ew
, ��-
Sort& Rlte-r
, ..£
� 48:2:.S.5 :27. 14
In the Regression dialog box, for the higher income food expenditure model, the Input Y Range
should be Al:A21, and the Input X Range should be Bl:B21. Check the box next to Labels.
Uncheck the box next to C ons tant is Zero. Select Output Range and specify it to be cell Al in
your Subsample 1 Model worksheet: you can place your cursor in the Output Range window
and move it to that cell to do that, or type 'Subsample 1 Model'!Al in the Output Range
window. Finally, select OK.
�·�-� - �
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.
. �����������������������
?• Regression. - Outp\Jt range will overwrite e Xistlrng data. Press OK to overwrite data in range
A I B I c I Di I E F I G I H I I
=
SUMMARY OUTPUT
,J_
2
3 Reomssion Sfalisfics
i.. Multiple R 0.41248222
5 R Squa-re 0.170141582
I-
ti �pjust·ed R Squar.e 0.124038337
rr- Standard Error 1n&146495
8 Observationi;; 20
9
To ANO VA
f1 1df SS MS F S1'.qnificam;e F
12 Regression 1 4 76 8 7-68234 47687.68234 3.69•04469'64 0.0707070921
-
,_
13 Residual
- -
19 232594.6668 129213259'3
f---
14 Tntal 19 280282.3492
1.5
16 Goeffi-Gients Standard Error t Stal P-va!ue Lovter 95% Upper95% Lov.·er 95. 0% l.JJJoer95.0%
17 lntmeept -24. 9146:22�4 �84.92:4846 -0_ 1'3.472 836:9 0. 8943217'37 -413.42'73[)71 361.598.0612 -413.4273071 %3_5�80512'.
1
Ta income 14.26400003 7.42509'2131 1 W'1tl536018 2
0 . 070 7 0 70 9 - 1 . 335539657 .29.1!6353971 -1. 3J5.S3965·7 29'.86353971
Go back to your food data worksheet. Place your cursor in any cell of the income data column B.
Go to the Data tab in the middle of your tab list. In the Sort & Filter group of command select
the Smallest Sort to Largest button. Your data set should be sorted by ascending order of
income values as shown below. -
A I B
1 foo·d_e-x:p inc.ome
�
I Formula.s J �. Review
Sort&. Fllte r
5
6
114.96
187.05
6.03
12.47
In the Regression dialog box, for the lower income food expenditure model, the Input Y Range
should be Al:A21, and the Input X Range should be Bl:B21. Check the box next to Labels.
Select Output Range and specify it to be cell Al in your Subsample 2 Model worksheet: you
can place your cursor in the Output Range window and move it to that cell to do that, or type
'Subsample 2 Model'!Al in the Output Range window. Finally, select OK.
- �-----��----�
Input�Rang12:
I i!A$1:$A$21
$El$1::$13 $2 l
�
�
� .
!::[elpr
0:1.abel� D C(i)n�t:lntjf> i:_ero
D Confldi::nci:: Level: EJ %
Output eptiofls
0 Qu:tput Range: I ?Mild el'! $Ml �
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is:
Heteroskedasticity 219
A I B I c I D I E I F I G H I I
1 SUMMARY OUTPLIT
-2
l
3 Reoression Sfatisfic.S
__1__ _Multipl� R,_ 0. 734079402
s R Squarn 0.538872568
-
_§_ Adjus.ted R Square 0.513254J78
_]_ �Janda�. Emir 59-789399'39 - -
8 Obser;alions 20 --
9
-
10 AN OVA I
11 I dt SS MS F Sig_11ificanoe F
11 RE!gressie>n 1 75194-48762 75194.48762' 2H34762-9fl 0.000229013
18 income 1-1.50037 916 2_5075.B89 4_586367679· o_ood229ti13 6.232287967 16-7-&!!47035 6.23228796-7 16.76!!47 (}35
At a = 0.05, the result of the Goldfeld-Quandt test for the food expenditure example is (see also
p. 309 in Principles ofEconometrics, 4e):
I '
! Data Input
A I B I c
1 N = 20
�
A I B I c I
�
K1 = 2 9 -Computed Values m,= 18
-
MS Res-idual 1 = 129'21-93 10 m2= 18
-
4 Ni= 2a 11 F-stati sti c = 3.6147·55
- -
5 K2-= 2 12 Goldfeld-Quandt test
-
-
6 MS Re·s iduaJ 2 -= 35,74_772 13 Right-tail F 2.217197
-
=
Go back to your food data worksheet, if you are not there already. In cells Dl:E2, and Gl:H2,
enter the following column labels and formulas.
D E
1 x-bar = =AVERAGE(B2:B41)
2 White se(h2) = =SQRT(SUMPRODUCT(G2:G41,H2:H41)/SUM(G2:G41Y'2)
G H
1 (xi - x-bar)2 residuals2
2 =(B2-$E$1Y'2 ='Food Expenditure Equation'!C25'"2
220 Chapter 8
Note that we are using the SUMPRODUCT mathematical function to compute the White
standard error. The general syntax of the SUMPRODUCT function is as follows:
=SUMPRODUCT(cell_range_in_columnl,cell_range_in_column2)
where the cell range in both column 1 and column 2 must specify an identical number of rows.
For each row, the value from column 1 is multiplied by the value from column 2, and all products
are then summed.
Copy the content of cells G2:H2 to cells G3:H41. Here is how your table should look (only the
first five values are shown below):
D E F G H
'
19_60475 (x;: x .h a rf l . residual�
1.75.327 253.279269 34.452084
231.488'619 59.9642
220.&63599 156.05055
184.273839 901.20972
50_ 9 0•4,6583 560. 7541.9
The estimated White (b2) above differs slightly from the value reported on p. 310 of Principles
se
of Econometrics, 4e. The reason for this is the value reported in Principles of Econometrics, 4e
was computed using the following modified White standard error estimator:
The source of this adjustment follows from the discussion on pp. 64-65 of Principles of
E[L el] Na2. However the expected value of the sum of the squared least squares residuals is
Econometrics, 4e. Namely, the expected value of the sum of squared regression errors is
E[L el] (N 2)a2. The squared least squares residuals are smaller, on average, than the true
=
= -
In cells D3:E4 of your food data worksheet, add the following column labels and formulas.
D E
3 N= =COUNT(B2:B41)
4 Modified =SQRT((SUMPRODUCT(G2:G41,H2:H41)/SUM(G2:G41Y'2)*
White se(b2) = (E3/(E3-2)))
The estimated modified White se (b2) should be equal to the value reported on p. 310 of
Principles ofEconometrics, 4e:
�
D I E
3 N= 4-0·
T Modified Whit·e se(b ll = 1 !lCJ9CJ
• 77
Heteroskedasticity 221
(8.11)
Given assumption (8.12), the following food expenditure model has homoskedastic errors:
(8.12)
Yi
1
where the transformed dependent and explanatory variables are defined as Yi*
=
Fi.,xi1
*
=
Fi. ,
and xi2 =
jX; .
Note that model (8.12) does not have an intercept.
Below, we first calculate the transformed dependent and explanatory variables, and then use
Excel regression analysis tool to get the generalized least squares estimate of model (8.12).
Go back to your food data worksheet. In cells Jl:L2 enter the following column labels and
formulas.
J K L
1 y* * *
X1 X2
Copy the content of cells J2:L2 to cells J3:L41. Here is how your table should look (only the
first five values are shown below):
J K L
1 Y'. x{ x{
2 59_98114 0.52:0579 1-9120937
3' 64_899'71 0 . 477274 2_095.23,3
4 54_75S9S 0.4588.31 2.179'449
4K81S33
- - �
0 . 40.7231 2.45.5606
.. ,. . . .. .. .
In the Regression dialog box, the Input Y Range should be Jl:J41, and the Input X Range
should be Kl:L41. Check the box next to Labels and next to Constant is Zero. Select New
Worksheet Ply and name it GLS Food Expenditure Equation. Finally select OK.
222 Chapter 8
r - - -
, Regression lZJ(g]
Input
Input l Range:
tie Ip
� b_abels 0 Constal'.lt is f:ero
D Con5aenc:.e Level: EJ %
Oalpu t op ti ans
0 Q.utputRange: �1
@ New Worksheet�y� I Je11ditur·e EQuation I
A 8 I c I D I E I F I G H I I
1 SUMMARY OUTPUT I
-- -
2
3 Re!J.re.Ssion Slaffrilics
4 Multiple R 0_952446484
_,_
___&____ R, Sqi.rnre 0.925303234-
__§____ A<.ljusted_R Square 0.898048056· �
18.75005��2
-
7 St.ao<.lard Error
8 Observalicms 40.
I
-·
9
Ta AN OVA
-,.,-1 dt SS MS F Sionifioance F
12 'R�egr•es si on
--
1� Residw1I
2:1 167916.5405 63'158.27027
351.5645932
236. 613213 6 7.06526E-22 �
14 TGt.al.
15
38
40
13359_45454
181275.9951 I
16 J Coefficients Stam;iar.d Error t Stat P- value_ Lower95% Uge_er9.5% Lower95.0% Upper95.0%
JI. lnter.cept 0 #NIA #NIA #N!A #NIA #NJA #NIA :#NfA
18 xi' 7&..684082:03 23._78672165 3 .30'7621263 0.00206413 30.52633316 126.641830 9 30.52·63 3316: 126.64163091
19 ;x2• 10_45100:89 1-3858!1�227 7 541002276 .. 4_6137E>E-a9 7_645418811 1J.256!i9899 7.645418811 132565'9899
If we assume that the error variances in the metropolitan and rural areas are different, instead of
estimating equation (8.7), we can estimate the following equation (8.13) twice-once for workers
living in a metropolitan area and again for workers living in a rural area.
Now, if the assumption that the effect of education and experience on wages is the same for
metropolitan and rural areas is true, then better estimates can be obtained by combining both
subsets ofdata and applying a generalized least squares estimator to the complete set ofdata, with
recognition given to the existence ofheteroskedasticity. That is what we do next.
Heteroskedasticity 223
Given the assumption that the error variances in the metropolitan and rural areas are different, the
following wage model has homoskedastic errors:
(8. 14 )
where {ii = {jM for metropolitan areas observations, and {ii = {jR for rural area observations; {jM is
the estimated standard error from (8.8) using metropolitan area observations only (subsample 1
model), and {jR is the estimated standard error from (8.8) using rural area observations only
(subsample 2 model).
Go back to your cps2 data worksheet. In cells Ml:N2, and Pl:T2, enter the following column
labels and formulas.
M N p Q
1 O"-hat ='Subsample 1 y* X1*
metro= Model'!B7
2 O'-hat rural ='Subsample 2 =A2/IF(D2=1,$N$1,$N$2) = l/IF(D2=1,$N$1,$N$2)
-
Model'!B7
R s T
1 X2* X3* X4*
2 =B2/IF(D2=1,$N$1,$N$2) =C2/IF(D2=1,$N$1,$N$2) =D2/IF(D2=1,$N$1,$N$2)
Copy the content of cells P2:T2 to cells P3:T1001. Here is how your table should look (only the
first five values are shown below):
I M I N I 0 I p I a I R I s I T I
1 o--hat metrn = 5.641253 >J x;{ x.2· X·:{ x./
I-
2 u-hat rural = 3.904227 0.519949 01256133 3.329725, 0.512265. 0
3
I-
0.80.9379, Q1.25613J 3.329725 0.2§�1.:n 0
3.0iJ5921 3.84199'
-� - . - .
4 . 0]6829 Q·,2'.56133 0
5
I-
OJ36829 0�256133 3.073592 S..196245· 0
6 0.9425·68 01.256133' 3.073592 1.280663· 0
In the Regression dialog box, the Input Y Range should be Pl:PlOOl, and the Input X Range
should be Ql:TlOOl. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it GLS Wage Equation. Finally select OK.
224 Chapter 8
Input
A I B I c I D I E F I G I H I I
1 SUMMARY OUTPUT
+
T
3 Rearessfon StatistJc;s
4 Multiple R ()_8917416094
'T R Square 0_805355646
6 Adj_u·sted R Square o _ s 013765352
7 �tandar<I_ Error 1 :00·1216522
I
�a - Obserwtions 1 0 00
9-
10 AN OVA
11 df SS MS F Sirmific-.11n c-.e f
12 Regressiori 4 413'1-057618 1032. 7644 0·4 1030.2562:2 0
13 Resi<Jua.l 996 998.424786'7 1_00243452'5
T4 Total 1000 5129.48240 5 I
15
16 J Coefficients Standard EtTOr t Stat P-v-11Jue L-ower95"% Uee!r!.l5% Lower95.0% Upper95.0%
�lnter(>il'pt 0 #NIA #NIA #NIA #NIA #N!A #NIA #NJA
x1 ' - 9 3 :10 3 6i 561
. 1.()195726'0 8 -9 -2170384 77 1_ f5706E-19 -113 9931 47•& -7-39740856 -11.39931476 -7.39740856·
1-1 �5720589 QJl68507954 17.45374644 l00115E-!i9 1.0�12841 OJ 1.330157075 1_061284103 1-331l157075
()_ 132208766 0Jl l 4548 50·2' 9.087443443 5-3446·1E-19 0.1 ()3659533 Q,_%0·758 o_ 103659-533 0.160758
JC4' 1.53BB03242 0.346285576: 4.4437405 0•4 9_!!3178E-06 0.8592702-31 2.218JJ 6253 0:8592702''.31 2.218336253
��: -
Consider the following more general heteroskedasticity assumption for the food expenditure
model:
(8.15)
Given assumption (8.15), the following food expenditure model has homoskedastic errors:
(8.16)
*
Y
i *
1
where the transformed dependent and explanatory variables are defined as Yi =
a , xi1 =
a
,
i i
Note that 8i = exp( a1 + a2ln(xa ) , where a1 and a2 are the least squares estimates of (8.17):
Heteroskedasticity 225
(8.17)
and ef are the squares of the least squares residuals from model (8.1).
Below, we first estimate (8.17). We will then calculate the transformed dependent and
explanatory variables and use Excel regression analysis tool to get the more generalized least
squares estimate of model (8.16).
Go back to your food data worksheet. In cells Nl:02 enter the following column labels and
formulas.
N 0
2
1 In(e-hati ) ln(x)
2 =LN(H2) =LN(B2)
Copy the content of cells N2:02 to cells N3:041. Here is how your table should look (only the
first five values are shown below):
I N I 0
1 ln(,e-hatl) ln(x)
2 3.5395§9 1.30562§
,_
3 4.093748 1-479' 32:9
I-
4 5.06291.5 1.558145
E Ei.8.03738 1.79&747
I-
6 6.329283 2:.S.2,3326
In the Regression dialog box, the Input Y Range should be Nl:N41, and the Input X Range
should be 01:041. Check the box next to Labels. Make sure the box next to Constant is Zero is
not checked. Select New Worksheet Ply and name it Log-Log Variance Function. Finally
select OK.
i Regressfon �L8]
Ini:iut
l["\put 1 Ral'l!Je:
Input?;, Ra11!Je.:·
$N$1:$N'$41
J.$0$1:$0$•n
�
[j1]
� .
0 Qutput R;ange: �
·@ Ne1111 Worksheet �ly: J VaFian11.e Fum:tlonl I
A I B' I c I D E I F G I H I I
1 SUMMARY OUTPUT I
f-
2
.3 R,eare.ssion Slali:>lics
� Mult'iple· R 0_5723�9254
.5 R Sq�are ll.3275.97405
i6
i--
�dj u s't.e<:J R S;quare o 3 Oi990.2'6
_
7
1--
Stal'ldard Ermr 1-720854391
8 Ob.seMlfions 40
9
1I
1 o ANOVA
t
I
11 df SS MS F Si!J.nificarwe F
R139re-s ion s 1 54.82554 54.82554 18.51376:169 0.000, i 3872
$
1 3 Res.id'UaJ 38 ·112_5,30.9137 2,961339835
-
14 Toti
< :J 39 167.3'5 64537 I
15
f--�
l6 CDetfi"'r;iimts Standa!d Emir f Stat P-value Lower95% Uor;w .95% Lower95.0% Uoo.er95.0%
17 Jnter· c �pt 0.91719'654 1.583105245 o,_5923773s1 O.S.57106:301 -2.i67032452' 4.142'6255].2 -2.267032452 4. 1426255.32
f-
ln·�i
- -
18 2.3292-38594 [J_5413,35668 4.3027, 621' 1l1 D.DD01'1387:2. 1.233361 B37 3..425115351 1-2333&1837 3-42.5115351
Go back to your food data worksheet. In cells Ql:R2, and Tl:W2, enter the following column
labels and formulas.
Q R
1 ai-hat = ='Log-Log Variance Function'!Bl 7
2 ai-hat = ='Log-Log Variance Function'!B18
T u v w
1 a-hat y* X1
*
X2
*
Copy the content of cells T2:W2 to cells T3:W41. Here is how your table should look (only the
first five values are shown below):
Q I R s I T I u I v I w
1 •CJ.1-hat = 0.93ll97 o-hat y'* -
X1" xl"
2 1a2-hat = 2.329239 --
7.31155£ 1.5.758&2 0<.13677 (}_504fi,81
-
-
J. B.9'5GB96 15<_19'171 0.111721 0'.490454_
-
4 9.S.11386 1 2:1£342.,
.
0_ 10,1922
'
0'.484B1
-
& 12.95426 8.81430'2 0.077195 0_4'55484
6 30.19'306 6.19.5132 Q_(}3J12 Qi-41'3009·
In the Regression dialog box, the Input Y Range should be Ul:U41, and the Input X Range
should be Vl:W41. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it GLS Food Expenditure Equation 2. Finally select OK.
Heteroskedasticity 227
lhput
I R11nge; 1:�$41
Cani::el
Input ;g_:REl!lge; $\':$1�$\1\1$41
b[elp
[t] !..abets [t] is z_ero
D Coo5deni::e Level:
Output
. 0 Range: -- .
'
Regression
® New Worksheet �y: )enditJ.Jre Equation l7JLRJ
Irput
lw �
OK
�
The result is (see also p. 318 in Principles ofIEconometrics, I
� 4e):
Constarnt
I
SUMMARY OUTPUT @=J �.1�
�ptions
J. Slatislics
0.9'7588(}711
Qutput �1
Squar,e 0. %.2:34.31161 j I
ste-d R Square 0.!12.4773245
Error 1.54&73969
40
AN OVA
11 f
I
A
R�re-ssion I B I c
181&.712874! D
908-35&4372 E
319.6836 I F
2.19348E-25 I G I H I
J_
1
Res.idual 90·.9'H3'.J.9>'.3.5 2.392403667 - I
2 14 Total 1907.624-214 I
I Regression
�ultipleR
17 RIntercept
Coefflciertfs
0
StanclBJ.cl Error
#NIA
t Stat
#NIA
P-vafue
#NIA
Lower95%
#NIA
.Lowerc95�0%
#NIA
95.0%
,_Ji18
;i�JLI
--
�
1 0 63349'15 8 o 9-71514 2.81 10.94527563 2.61541E-13 6,666753754 12.600-2194
· 12.600219.4
!Observations
.. ..
t
6'
9'
To I
rJF SS MS f Sig_nifican:oe
2
J£
13 38
>---
40 I
1�1
15·1 Upper95% Uooer
tlN/A #NIA
x1•; 11n22&,
x2' 8.666763754
CHAPTER 9
Chapter Outline
9.1 Finite Distributed Lags 9.3.1 t-Test Version
9.1.1 US Economic Time Series 9.3.2 T x R2 Version
9.1.2 An Example: The Okun's Law 9.4 Estimation with Serially Correlated Errors
9.2 Serial Correlation 9.4.1 Generalized Least Squares Estimation of
9.2.1 Serial Correlation in Output Growth an AR(1) Error Model
9.2.1a Scatter Diagram for Gt and G1-1 9.4.1a The Prais-Winsten Estimator
9.2.1b Correlogram for G 9.4.1b The Cochrane-Orcutt Estimator
9.2.2 Serially Correlated Errors 9.4.2 Autoregressive Distributed Lag (ARDL)
9.2.2a Australian Economic Time Series Model
9.2.2b A Phillips Curve 9.5 Forecasting
9.2.2c Correlogram for Residuals 9.5.1 Using an Autoregressive (AR) Model
9.3 Lagrange Multiplier Tests for Serially 9.5.2 Using an Exponential Smoothing Model
Correlated Errors 9.6 Multiplier Analysis
This chapter is concerned with the nature of autocorrelation, generalized least squares estimation
of AR(l) models, and tests for autocorrelation. Forecasting, finite distributed lags models, and
autoregressive distributed lags models, are also introduced.
Open the Excel file okun. Save your file as POE Chapter 9. Rename sheet 1 okun data.
Below we plot the time series of some important economic variables for the US economy as in
Figure 9 .4 on p. 345 of Principles ofEconometrics, 4e.
228
Regressions with Time Series Data: Stationary Variables 229
c
2 Du
3 =B3-B2
Copy the content of cell C3 to cells C4:C99. Here is how your table should look (only the first
five values are shown below):
·C
2 cl'u
-
� -0.1
-
4 -
-0.2
-
s. 0
5 0.2-
-
7 -0.2.
Select the Insert tab located next to the Home tab. Select C3:C99. In the Charts group of
commands select Line, and Line with Markers.
,2-() U:ne
���
�
!8J)B�
C.illumn Lint Pie Bar Area Statler Other
T T
Chol'l:s T
Cham
·.-_-- --·· - -
After editing, the result is (see also Figure 9.4(a) p. 345 in Principles ofEconometrics, 4e):
1.25
0.75
05
=
"Cl
0.25
-0.25
-0-5
To plot the change in the US GDP series select cells A3:A99. After editing, the result is (see also
Figure 9.4(b) p. 345 in Principles ofEconometrics, 4e):
230 Chapter 9
ll
i
... 1
'i
e
�
c.
c 0
l!J
.,;
::i
·1
1 s 'l rn 11 n 25 29 35 37 41 45 49 53 57 51 65 59 73 77 s1 as ll9 93 97
'19850:3-to 2()0'} Q3
(9.1)
where DU is the change in the U.S. unemployment rate and G is the percentage change in Gross
Domestic Product (GDP) from quarter 2 , 1985 to quarter 3, 2009; t = 1, .. , T where T = 98.
In cells D4:G5 and H3:J4 of your okun data worksheet enter the following labels and formulas.
D E F G H I J
3 g gt-1 gt-2
4 g gt-1 g t-2 g t-3
=
A4 =
A3 =
A2
5 =
A5 =
A4 =
A3 =
A2
Copy the content of cells D5:G5 to cells D6:G99 and that of cells H4:J4 to cells H5:J99. Here is
how your table should look (only the first five values are shown below):
D I E I F I G I H I I I J
3 g gH gt-2
.-
4 g g,_, g,_� g,_, 1.4 2 1.4
5 1.5 1.4 2 1.4 1.5 1.4 2
·o 0.9 1.5 1.4 2 o:� 1.5 1.4
.-
7 1.5 0.9 1.5 1.4 1.5 0.9 1.5
.s 1.2 1.5 0.9 1.5 1.2 1.5 0:9
:g 1.5 1.2 1.5 0.9 1.5 1.2 1.5
In the Regression dialog box, the Input Y Range should be C4:C99, and the Input X Range
should be D4:G99. Check the box next to Labels. Select New Worksheet Ply and name it
Okun's Law Lag Model q=3. Finally select OK.
Regressions with Time Series Data: Stationary Variables 231
lnput
Input I'Range:
Cancel
Ihputl(_ Ramge:.
� Labels D con6tantisiero
.
D Con�ence Le�el� EJ %
OlI!put options
0 QutputRanQe: I �I
@New Worksheet E'.ly:: j 1 Law Lag Model q�3I
The result is (see also Table 9.2 p. 346 in Principles ofEconometrics, 4e, for Lag Length q = 3):
A I B I c I D I . E I F I G I H I I
1 SU MMARV
-
OUTPUT
l
2
l RegreS$ion Statistir;s
-
4 Multiple R 0. .&07716384
'
s R
-
Square 0.155240575 7
� Adjustecl R Square 0.636957124
1s I
16 I Coefficien t:s Standard Error t Stat P-value Lower.95% Upper.95% Lower.:J5.a% Upper35.0%,
17 lnter<tept
--
0.58-0974603 O.Q.53889266 10.780.8%58' 5.9l581E-18 Q.473914173 (),688035034 0.473914173 0.688035034
Go back to your okun data worksheet. In the Regression dialog box, the Input Y Range should
be C3:C99, and the Input X Range should be H3:J99. Check the boxes next to Labels. Select
New Worksheet Ply and name it Okun's Law Lag Model q = 2. Finally select OK.
Input
Input I Range.: p:$C$'J9 �
Input � Raflge: I $1'1$3:.$J:S99
�
!::!elp
0 labels. D Cons'taot i:s £era
D Con�dence Level� EJ·%
Outputoption5·
0 Qutput Range:
@ New Worksheet.�ly.: j; Law Lag Model q�� I
The result is (see also Table 9.2 p. 346 in Principles ofEconometrics, 4e, for Lag Length q = 2):
232 Chapter 9
A I B I .c I D E I F I G H I I
1 .SUMMARY OUTPUT
-
2
3 Regression Stat1stics I
4 Multiple R 0..80866'9257 I
5 �S q u are 0.653945967
.fi �dj u sted R Square 0. 542551596
161 Caef1ir::ien ts Standard Error t Stat . P-vcilue Lower95% Upper95% Lower 95.. 0% Upper95. .0%
17 Intercept 0.583556Il2
. -
0.047211917 12.36035632 2..<J455E-21 0.48ffZ89173 0.677323052 0.489789'173 0.677323052
K L
2 t-1
3 = A3 = A2
Copy the content of cells K3:L3 to cells K4:L99. Here is how your table should look (only the
first five values are shown below):
'I K I L ,I
I
2 g gt-1
-
3 2
-
4
-
1.4 1.:1
5 1.5, 1.4
-
6
- 0.9 1.5
7 1.5 0.9
Select K2:L99. Select the Insert tab located next to the Home tab. In the Charts group of
commands select Scatter, and Scatter with only Markers.
Regressions with Time Series Data: Stationary Variables 233
Sc.attel'
After editing, the result is (see also Figure 9.5 p. 348 in Principles ofEconometrics, 4e):
2.5
.
!I ll! ..
. . �
2
... � ....
.
*•!II II !II
I
•� i1 it!la � ii
•
1-5
.--� .. ....
• a
.. .
.. .. � ......
• !I• !It
1 • *
. .
. .
.... • •lo
tii D·.5
lo
0
..()< _ 5
-1
-1.5
-2
-2 -1 0 1 2 3
9.2.lb Correlogramfor G
Let rk be the correlation between Gt and Gt-k; in other words, it is the correlation between
growth rates that are k periods apart. The null and alternative hypotheses for a test of
autocorrelation are: H0: Pk = 0 and H1: Pk * 0. When H0 is true, the product of the square root of
the sample size and the estimated correlation rk has an approximate standard normal distribution:
(9.2)
In cells Ml:N13 and Ol:P2 of your okun data worksheet enter the following column labels and
formulas.
234 Chapter 9
M N 0 p
1 lag rk LB UB
2 1 =CORREL(A3:A99,A2:A98) =-P2 =l.96/SQRT(COUNT($A$2:$A$99))
3 2 =CORREL(A4:A99,A2:A97)
4 3 =CORREL(A5:A99,A2:A96)
5 4 =CORREL(A6:A99,A2:A95)
6 5 =CORREL(A7 :A99,A2:A94)
7 6 =CORREL(A8:A99,A2:A93)
8 7 =CORREL(A9:A99,A2:A92)
9 8 =CORREL(AlO:A99,A2:A91)
10 9 =CORREL(Al1:A99,A2:A90)
11 10 =CORREL(Al2:A99,A2:A89)
12 11 =CORREL(Al3:A99,A2:A88)
13 12 =CORREL(Al4:A99,A2:A87)
Copy the content of cells 02:P2 to cells 03:P13. Here is how your table should look (see also
reported correlations up to four lags on p. 349 in Principles ofEconometrics, 4e):
,,, M I N I 0 I p I
I I
1 lag fk LB UB
-
Note that your Excel results differ slightly from the one reported in Principles of Econometrics,
4e. By using the Excel function CORREL we constrained ourselves to computing the
autocorrelations by using (T - k) observations in the numerator and the denominator of the
correlation coefficient-an alternative, mentioned on p. 349 of your textbook, that leads to larger
estimates in finite samples and is given by
2·[) Un:e
Pag11: Layout
LB
1 2 3 4 5 6 7 8 !l 10 :l!1 12
-o.os -+----
--0.Jl -+------
-LB
-0_]5 ;-----
--0.2 +-------�
-0.2.5 ...________________
__ _
We would like to add to it, our upper bound values, and our correlation coefficient values. Right
click anywhere in your chart area and choose Select Data on the list of options that pops up. In
the Select Data Source dialog box, select Add. In the Edit Series dialog box, the Series name is
the one found in cell Pl and the Series X values are from P2:P13. Select OK.
. -----
Sele-c:;t Data Source
Cu! . ---------- - ·-
Edi t Series
�
__
Cbange T�p·e...
=-_ _ _ _ ' _ _ _· _ _
Chart
j� -r+I
J;JL§
S.�lect D-ata,.. L6 OK
-� � ...
The Select Data Source dialog box reappears. Select Add again. Type Correlation for the
Series name. In the Edit Series dialog box, the Series X values are from N2:N13. Select OK.
The Select Data Source dialog box reappears again. Select OK one more time.
. ------
s.e!ect Data s.ource
, - ---- ---- - �
Ed it St>ries
R Series �ame:
Correlailor.i [ffi =
Series yalues�
J ='okun data'!$N$2:$N$13I ti] =
Uti
oKr::;J '--�·
o
� K=1k:;J [ cane.el
- LB
-UB
-Corne latiom
On your chart, select the Correlation series, right-click and select Change Series Chart Type in
the menu of options. Select Clustered Column in the Column group of chart type. Select OK.
I
Delete
Templates
lj Re·; et to. M�tch 'Styfe·
-rn
- us
Select the horizontal axis in your chart, right-click and select the Format Axis in the menu of
options. In the Axis Options panel of the Format Axis dialog box, change the Axis labels
location to Low. Select Close.
Regressions with Time Series Data: Stationary Variables 237
------- --- �
'
.
, Format Axis
Ad[fl.ti:!ajo• Gridllnes
Minor tick mark type: I "I
None
A111;1 MLn or G�udlines
�xisJabels:
�- Close ti]
I� f<>rma1 Axis ...
[1;
After your add axis titles and delete the legend, your chart should look similar to Figure 9.6 on p.
350 of Principles ofEconometrics, 4e:
0.6
OS
0.4
0.3
:c
'O
0.2
...
"'
'ii 0.1
...
.... 0
Q
u
-0.1
-0.2
-0.3
-0.4
1. z 3 4 .5 fi 7 8 9 10 1.1 1.2
Lag
Open the Excel file phillips_aus. Excel opens the data set in Sheet 1 of a new Excel file. Since
we would like to save all our work from Chapter 9 in one file, create a new worksheet in your
POE Chapter 9 Excel file, rename it phillips_aus data, and in it, copy the data set you j ust
opened.
Below we plot the time series of some important economic variables for the Australian economy
as in Figure 9.7 on p. 352 of Principles of Econometrics, 4e.
238 Chapter 9
Select the Insert tab located next to the Home tab. Select Al:A92. In the Charts group of
commands select Line, and Line again.
2-D L!lne
bJ � �
After editing, the result is (see also Figure 9.7(a) p. 352 in Principles ofEconometrics, 4e):
25
2
1111
! 15
c
,IQ :ll
.!.5 0'5
(l
-()>5
-:ll
1 5 9 1317212529333741454953576165697377818589
c
2 du
3 =B3-B2
Copy the content of cell C3 to cells C4:C92. Here is how your table should look (only the first
five values are shown below):
c
2 du
3 -0.1
4 -0.2
-0.1
-0.4
O•
To plot the time series for the quarterly change in the Australian unemployment rate select cells
C2:C92. After editing, the result is (see also Figure 9.7(b) p. 352 in Principles of Econometrics,
4e):
Regressions with Time Series Data: Stationary Variables 239
(9.4)
where DU is the change in the Australia unemployment rate and INFL is the inflation rate from
quarter 2, 1987 to quarter 3, 2009; t = 1, . . , T where T = 90 observations.
In the Regression dialog box, the Input Y Range should be A2:A92, and the Input X Range
should be C2:C92. Check the boxes next to Label and Residuals. Select New Worksheet Ply
and name it Phillips Curve Model Finally select OK.
Input
I ·$A �2:·$A:$92
OK�
l11put)'. Range:
� Cancel
I
I11put�Range:
I $C$2::$C:$92 [�]
tielp
� �abels D Constant is �ero
D Confidence Level: �%
Output options
0 Qu:tput R�i;ie: �1
0 New W'?fkSheet �ly.: I Phinips Curve Model
I
0 New W.orkbook
Residuals
l!'.l B.eSiduals. D Resi�ual fll ots.
A I !l c I D I E F I H I
-
1 SUMMARYOUTPlJlT
....__ - -
2
3 Regression Statistic;<;
....:!._ Mllltiple R 0.23822·694·
� RSqu<ire 0.055752075
__§__ Adju>ted R Square 0.046033348
7 Standard Error 0.621988587
8 OIJ.s.ervations 90
t
-
9
10 ANOllA. I
nl elf SS MS .F Significcmce F
_g_ Rt:�ression 1• 2.0481345334 2..048346334 5.294555.&65 0.023753914
13 Residual 88. 34.04454256 0.186869802
14 Total 89 36.0"9<2888&9 I
Coefficfe<nts· Standard Error t Stat P-vafue lowe/95% Upper 95";f Lower 95.0"'"
A Upper 9S.0%
t?J!_ntercep·t 0.777521'157 0.065824943 11.81347414 7,53029-E-20 0.6458'{}8019 0.90.8434495 0.545808019 0.908434495
18ldlu -0.52i8o3&47 0:229404373 -2. 301014095 0.023153�14 -0.'983757818 -0.071969&77 -0. 983757818 -o.bn9.59g77
19
2.0
-
2.1
22 RESIDUAL OUTPUT
+
23
24 'observation Predicted 1 Residuals
25 1 0.830407'642 0.66�592358_
-f- '--
-
26 2: o.ss3194Q27 o:si:&&05973
2J 3. !).li?.30407'642 0.96�592358·'
In cells E22:F34 and G22:H23 of your Phillips Curve Model worksheet enter the following
column labels and formulas.
E F G H
22 laJ;? rk LB UB
23 1 =CORREL(C26:Cl14,C25:Cl13) =-P2 =1.96/SQRT($B$8)
24 2 =CORREL(C27:Cl14,C25:Cl12)
25 3 =CORREL(C28:Cl14,C25:Cl1l)
26 4 =CORREL(C29:Cl14,C25:Cl10)
27 5 =CORREL(C30:Cl14,C25:Cl09)
28 6 =CORREL(C3l:Cl14,C25:C108)
29 7 =CORREL(C32:Cl14,C25:Cl07)
30 8 =CORREL(C33:Cl14,C25:C106)
31 9 =CORREL(C34:Cl14,C25:Cl05)
32 10 =CORREL(C35:Cl14,C25:C104)
33 11 =CORREL(C36:Cl14,C25:C103)
34 12 =CORREL(C37:Cl14,C25:Cl02)
Copy the content of cells G23:H23 to cells G24:H34. Here is how your table should look (see
also reported correlations up to five lags on p. 353 in Principles ofEconometrics, 4e):
Regressions with Time Series Data: Stationary Variables 241
·� E I F I G I H
22 la.g r1c LB UR
23 1 0.552:909832 -0. 20 (j(j{)214 0.206()02:14
-
-
26 4 0.44694�916 -0.226321306 U.22•6321305
E... 5 0.3667341"Ci8
·------ --
-0.225321306
- -- -------
0.226321305
----- -- ·-
Again, note that your Excel results differ slightly from the one reported in Principles of
Econometrics, 4e (see Section 9.2.lb for more details on that).
Proceed as in Section 9.2.lb to get the following correlogram for residuals (see also Figure 9.8 p.
353 in Principles ofEconometrics, 4e):
0.6
0.5
0'.4
..
Oc3
·"
0.2:
.i
I: 0 .. 1
a 0
-().1
-0.Z.
-tu
1 2 3 4. 5 5 7 B 9 10 11 12
Reconsider the Phillips curve model (9.4), restated below in a general form:
(9.5)
(9.6)
One way to test the null hypothesis H0: p = 0 is to use at- or F-test to test the significance of the
coefficient of et-1 in (9.8):
242 Chapter 9
(9.8)
where et_1 's are the lagged least squares residuals from the Phillips curve model (9.4).
The estimation of (9.8) requires a value for e0. Two commons way for overcoming the
unavailability of e0 are (i) to delete the first observation and hence use a total of 89 observations,
and (ii) set e0 = 0 and use all 90 observations.
In cells D2:D4 of your phillips_aus data worksheet, enter the following column label, value and
formula.
D
1 et-1
2 0
3 ='Phillips Curve Model'!C25
Copy the content of cell D4 to cells D5:D92. Here is how your table should look (only the first
five values are shown below):
D
2 e1
•. 1
3 0
4 0.815805
>--
5 0.95959'2
>--
,5 0.811233
�
7 0.922.379
In the Regression dialog box, the Input Y Range should be A2:A92, and the Input X Range
should be C2:D92. Check the box next to Labels. Select New Worksheet Ply and name it t-test
Version of LM Test. Finally select OK.
0 Qllllput R.arn�e: �1
@New Worksheet �ly: I Version of LM Test
I
The result of the t-test is highlighted below (see also p. 354 in Principles ofEconometrics, 4e):
A I B
I c I D I E I F I G I H I I
-
18 du - 0 57!1358279
. , 0.1'93570737' -3.5()780(!34- 0.0007-17437 -1.064299813 --0.2944167ZS· -:1.064.2998::13 -Q.29·4415725
19 et-1 0.55&7-83928 0.030096701 6.202046518- t:82193E-08 0.3797069&3 (l.737860872. 0.379 7069&3 0.73.7&60872.
Regressions with Time Series Data: Stationary Variables 243
9.3.2 T x R2 Version
The T x R2 version of the Lagrange multiplier test is the one we worked with in Section 8.2.2a.
(9.9)
where et 's are the least squares residuals and et-1 's the lagged least squares residuals from
model the Phillips curve model (9.4).
Once again, the estimation of (9.9) requires a value for e0. Two commons way for overcoming
the unavailability of e0 are (iii) to delete the first observation and hence use a total of 89
observations, and (iv) set e0 = 0 and use all 90 observations.
Go back to your phillips aus data worksheet. From there go to the Regression dialog box. The
_
Input Y Range should be C24:C114 from the Phillips Curve Model worksheet. The Input X
Range should be C2:D92 from the phi/lips aus data worksheet. Check the box next to Labels.
Select New Worksheet Ply and name it Auxiliary Regression. Finally select OK.
-
R�res.sion ---- - -- -- l1J['g]
Input
OK_w
InpJt l( R1ilnge: I c 524: SC$114 [�l Cancel
I
!np.Jt�Range:
1��2::l'i):?92 �
��abels. D Constant is f.ero
!jelp
I
D Coo!jdence Le�el: @=] %
OU1pat cp1ions
Q Qutput R;ar:ige: 1·1-E'!'""""'"�" 11 · �1
0 New Wbrksheet �ly1 j Auxiliary Regression!
The results we are going to use for the Lagrange multiplier test are highlighted below:
j A I B I c I D I E I F
1 Sl.llMMARY OUTPUT
2
3 Reg!.ession Statistks
TIMuWpleRR Square
Adj l.l st e d R.Square
0.5-5369814
0.306581•63
0.29-064097&
Stan d.ard Error 0.520908923
Obse rvations '90
-fo-1.A.NOVA
Il l df SS MS F Significance F
�Reg< e<•Pn '2_ UJ.43743135 5.218715674 19.23269-051
--
1. 2.1143 E-071
Res i dual ,g7 23:0011u21 0.271346106.
liota l 89 34.04454256
244 Chapter 9
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.
The degrees of freedom for the Lagrange multiplier test are equal to the number of hypotheses
being tested or number of parameters in the null hypothesis. When used to test for
heteroskedasticity based on the variance function (Section 8.2.2a), the degrees of freedom also
corresponded to the number of parameters in the auxiliary regression minus one. This is not the
case anymore. We make the appropriate modifications to our template to reflect that.
A B c
1 Data Input N= ='Auxiliary Regression'!B8
1
2 R = ='Auxiliary Regression'!B5
3 a.=
4 m=
5
6 Computed Values x2-cntlca
. . 1 va1ue= =CHIINV(C3,C4)
8 La2ran2e Multiplier Test x2= =Cl *C2
9 Conclusion= =IF(C8>=C6,"Reject Ho","Do Not
Reject Ho")
10 p-value= =CHIDIST(C8,C4)
11 Conclusion= =IF(C10<=C3,"Reject Ho","Do
Not Reject Ho")
At a = 0.05, and with m 1, the result of the test 1s (see also p. 355 in Principles of
Econometrics, 4e):
A B
1 Dara Input N= '9-0
2 H2= 0.306582
3 1a= 0.05
4 m= 1
5
6 Computed Values :x2-criticaJ value-= 3.84145'9
7
8 l- = 27.59235
9 Lagrange Mul1ipHerTes1 Goncllilsion = Reject Ho
10 p--v:alue = 1.S.IE-10'7
11 Gonc.lusion = Reject Ho
Regressions with Time Series Data: Stationary Variables 245
Reconsider the Phillips curve model (9.4) where the error et is assumed to follow a first-order
autoregressive AR(l) model (9.6). The following Phillips curve model (9.10) has an error term v,
that is homoskedastic and uncorrelated over time (see Appendix 9C pp. 397-399 in Principles of
Econometrics, 4e for more details ):
(9.10)
where et 's are the least squares residuals from the Phillips curve model (9.4).
The process of first estimating model (9.4), second using the least squares residuals et 's from
(9.4) to estimate model (9.11), and third using the least squares estimate fi of (9.11) to transform
the dependent and independent variables and estimate model (9.10), is similar to what we have
done in Section 8.5.
Note that both models (9.10) and (9.11) do not have an intercept.
We already went through the first step of the process and estimated model (9.4) in Section
9.2.2.b. The data we need to estimate (9.11) are in our Phillips Curve Model worksheet. From
there, go to the Regression dialog box. The Input Y Range should be C26:C114, and the Input
X Range should be D4:D92 from the phillips_aus data worksheet. Uncheck the box next to
Labels. Check the box next to Constant is Zero. Select New Worksheet Ply and name it AR(l)
Error Model. Finally select OK.
Note that we are losing one observation as there is not - 1 0 residual value corresponding to
the firstt = 1 residual value.
246 Chapter 9
"'
Regres.s.io n
l1.J �
Input
Input 1 Range:1 I :$C �zo :$C$114 �
Ol('t;J
cancel J
lrlput6_Range; I '!$]�4�$0$92 �
Di,.abels � Constant is.:f_ero
!::!elp J
D Coofliknce Level: �%
Output options
0 Qutput.Range: I· ...
. �,. . J·�I
@Ne'lll Worksheett!:ly: I AA(t) Error Model I
The result is:
A B I c D I E I F
I
G I H I I
1 SUMMARY OUTPUT
,__ I
2 r
3 Regresslo·n St:atistics
I
: .
4 Multiple R 0.55292.2734
j
5 R square n.:3osn:355,
�
6 Adjusted R square 0.29435'9'914
I
j
7 5.ti'lnqard
-
Error _g.51�3.71611 I
f-- 1
8 Observations 89
:
� j
9
I
10 AN OVA
iii I df SS MS .F Sign ific an,c;e .F
12 Regression 1 10.27114506 10.27114606 38.75066.255 L5466E-Q8 I
,__
�
13 Resi·d,ual 8& 23.32504257 0.255057302 I j
14 Total 89 33.59618861 I ..I
15
171
16 Co.efficients Standard Error tStat P...vafue lower95% Upper95% l.ow.er 95. 0% Up•per 95, 0% ,
lnfe[cept o, 4lN/A .t!N/A tlN/A ttN/A ltN/A flN/A UN/A
18 X Variable 1 0.549B&l589 0.08&'!3434:9 5. 2 2500 '!016 1 L59'542E--OS: 0.374335537 0. 725427542 0.3743•35637 o. 72542 7542
Go back to your phillips_aus data worksheet. In cells Fl:Gl, and Il:K3, enter the following
column labels and formulas.
F G
p-hat = ='AR(l) Error Model'!B18
I J K
2 y
* X1* X2*
3 =SQRT(l-G1/\2)*A3 =SQRT(l-G1/\2) =SQRT(l-G1/\2)*C3
4 =A4-$G$1*A3 =1-$G$1 =C4-$G$1*C3
Copy the content of cells 14:K4 to cells 14:K92. Here is how your table should look (only the
first five values are shown below):
Regressions with Time Series Data: Stationary Variables 247
F G I H I I I J I K I
1
- p>-hat= 0_549882
In the Regression dialog box, the Input Y Range should be I2:I92, and the Input X Range
should be J2:K92. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Prais-Winsten estimates. Finally select OK.
------- -- -
Regression
��
Input
.InputY Range•
Inpu'tK Range:·
I $1s2-:·$I$9 2_
I $J:$2:�$92
�]
�
� I
t!elp
El �aoels El Ccinsrant is !;_em
D Co11:fjdenc11: Le11E:I: EJ %
Ou'lput Qptions
A I � I c I D I E I F I G I H I I
2
3 Regression St<11tisfics
4 Multiple R 0.5:).5057.872-
+-
5 RSqUJare 0.378295186
5 AdjUJsted R Square 0.359&57734-_
7 Standard Error 0.51578159&
8 Obse-rvati ans 9{)•
9
1-0 A NOVA
uj elf ss MS f Si!Jaiflcanoe F
12 Regression 2 14. 30030569 7.150153344 26.773250&8 8.o8792E-10
13 Resi·dlual 88 23.50157241 0.207063323
1,4 Total 9{) 37.801R791
15
16 Coefficients Standard Error tStat P-vafue Lower:95% Upper95% lower�5:0% Upper 95.D%
17 ln1;ercept - (), UN/A #N/A - tlN/A #N/A #N/A #N/A #N/A
>---
18 xl"' 0. 7&:5&377Hi 0.119563257 6. 5 7 25-68664 3.'.H&31E-09 0.548230&72 1.02344455· '0.548230S72 1.02344456
--
19 x.2"' -Q.59'942()8615 0.242-8047.5-6 -2.:800514353 0.00498436'9 - l.1819502B& -0.2.15903-445 -1.18195028.& -0.2:11,)90]445
Note that these results do not match those in equation (9.45) in Principles of Econometrics, 4e, p.
362. What we have described is a simple two-step estimation process sometimes called the Prais
Winsten estimator. However, there are advantages to "iterating" this procedure, as we describe in
the following section.
248 Chapter 9
Reconsider the GLS model (9.10). To estimate it, another option is to not include a
transformation for the first observation as we did for the Prais-Winsten estimator, and proceed
with the estimation on the basis of T - 1 observations only. We then repeat the GLS estimation
process outlined in Section 9.4.1a until the least squares estimate b1 and b2 from model (9.10) do
not change in value. This iterative procedure is known as the Cochrane-Orcutt estimator. Below
we walk you through the first two iterations of this process.
Note that the omission of the first observation is not, in general, a good strategy. It simplifies the
calculations, however, so we will use this trick. You might want to test your understanding by
extending the iterative process we describe below to include the first observation.
First Iteration
Go back to your phillips_aus data worksheet. In cells M2:03 enter the following column labels
and formulas.
M M 0
2 Cochrane-Orcutt estimator
3 y* X1* X2*
4 =
I4 =
J4 =
K4
Copy the content of cells M4:04 to cells M5:092. Here is how your table should look (only the
first five values are shown below):
>--
I M I N I 0
2 Cochrane-Orcutt estimator
..___
3 Y" Xi* Li:�
4 o_s 75177616 0-45011 g - 5
0 - 1 4 012
5 0.86520129'8
-
0.450118 .O.OO!i97{1
- ···
6 0.810213139 0.450118 -0.345012
t--
7
1--
0_ 710213139
··- -
0_450118
-
0-219953
8 0_ 9{\5201298 0-450118 -0.6
In the Regression dialog box, the Input Y Range should be M3:M92, and the Input X Range
should be N3:092. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Cochrane-Orcutt estimates. Finally select OK.
�MolUple R
A I B I c I I) I E I F Ei I H I I
1 SUMMARY OUTPUT 1 I
I
I
-
2
3 Regn>ssion Stat.istics
0.599724987
11 square 0.359ff70061
ACljus.ted RSquare 0. 34081'.5693 I
Standard_ Error. 0.5164-04118
Obse·rvati ans 39 I
j
-ioiANOllA. I I
ul df SS MS F Sig nificwice .F
1
1Ui·egressi o·n 2 13 .031641.5-5 6.515820774 24.43372U 3.95754E-09
161 Coeffi'cients Standurd Error t stat P-value lower95% UppeT95% lower.95.0% Upper !!5. 0%
#NjA #N/A #NjA 14N/A ttNjA tJN/A .J!N/A
�'"'"""•'
0
Second Iteration
Go back to your phillips_aus data worksheet. In cells Ql:R2, and T3:U4, enter the following
column labels and formulas.
Q R T u
1 b1 =
='Cochrane-Orcutt estimates'!B 18
2 b2 =
='Cochrane-Orcutt estimates'!B 19
3 e-hatt e-hatt-1
4 =A4-$R$1-$R$2*C4 =A3-$R$1-$R$2*C3
Copy the content of cells T4:U4 to cells T5:U92. Here is how your table should look (only the
first five values are shown below):
Q R s li u
1 b1 = 0.76101!4
3 e-ha� e-hat._1
4 0_80058015 0.669748{}3
5 0_96974803 ,0.8�D?.�D.li5
5 I)_ 76224438 .0.%'.9748-03
7 0.9389Il:591 0.76224438
& 0.72390861 : 0. 9.3 8915'91.
In the Regression dialog box, the Input Y Range should be T3:T92, and the Input X Range
should be U3:U92. Check the boxes next to Labels and Constant is Zero. Select Output Range
and specify it to be cell Al in your AR(l) Model worksheet: you can place your cursor in the
Output Range window and move it to that cell to do that, or type 'AR(l) Model'!Al in the
Output Range window. Finally, select OK.
250 Chapter 9
Input
Inpt<t 'f_ R"nge:
I $TS:3:$T$92 � c ..ncel
Input ! 1Range:
I $!.l$S:$J$92 (i]
t:!elp
�kabels [ti consta!'lt•is.�ere
D Con.5dence Lev el:: �%
Output options
E) Qutput Range.: r M ode'f ! $:11:$1) 00
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.
. -�-------�
? Regression - Output range will o'l/erwrite e)(istfng, data. Press OK to Clverwrit:e darn in range
A I B I c I D I E I F I G I H I
1 SUMMARY OUTPUT - - -
�· •A• ,�
2
3 Regression Statistics
4 Multiple R 0.5 50169009
5 R Squ a re 0.313789408
,_
6 A.dju>ted R Square 0.302425772
7 Standard Error 0.513441029
�
8 Ob.s.ervations 89
9
10 AN OVA
ll df SS MS F Significance F
12 Regression 1 10.60827273 10. li08Zn73 40.24051548 9. 74557E-09
13 Residual 88 23.19870879 0.253621691
14 Total 89 33.80598151 I
is I
Coejficien ts
�
15 I st.anaa,.d Ermr t Stat P-valu.e lower.95"A Upper.95% lower9s.o% Upp.er.95.0%
1 Intercept 0 #N/A tlN/A #N/A f4N/A ijN/A #N/A #N/A
18 e-tiatt-1 0.557261:979 0.087847144 6.343541241 9.42502E-Q9 0.3S2684i45 0.7ll&39714 0.382684245 0.731839714
Go back to your phillips_aus data worksheet. Notice that in cell Gl your p-hat value has been
updated, and so have your transformed dependent and independent variables in columns M-0.
In the Regression dialog box, the Input Y Range should be M3:M92, and the Input X Range
should be N3:092. Check the boxes next to Labels and Constant is Zero. Select Output Range
and specify it to be cell Al in your Cochrane-Orcutt estimates worksheet: you can place your
cursor in the Output Range window and move it to that cell to do that, or type 'Cochrane
Orcutt estimates'!Al in the Output Range window. Finally, select OK.
Regressions with Time Series Data: Stationary Variables 251
� Regression 12J(g]
Input
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is:
-
A I B c I I) I E I F I G I H I I I
1 SUMMARY OUTPUT I
- t
2.
I
3 Regression SttJtJstics
I
4 Multiple R 0.594570371
\
5 II Square 0.353513926
- j
6 Adjusted R Squar.e
-
0.334588798
\
7 Sta rndard Error 0.51638.3048
+
8 obs2rvatloa� 89j
j
9
� t
10 AN OVA I I I
J
11 I (ff SS MS F SJgn:ffirnnce f
+
6.34279335 2,3.78683"218 ,5 .991n4E�09
� ""'"
2 12.6&55867
'
l
Residual 87 23.19667537 0.25
. 5651452
t
Tot�I "' "' 89 35.&8425307
15
16 Coeffidents Standard Error tStat" P-volue l.ow.e-r95% l/pper95% l.()Wet95.Q% Upper95.0%
We have run two additional iterations. The third and fourth iterations give identical estimates at
the fourth decimal place level of precision. We can thus say that after a total of four iterations, we
obtain the following stable estimates (see also p. 362 in Principles of Econometrics, 4e):
The table below reports the Cochrane-Orcutt estimates obtained for each iteration:
Iteration 1 2 3 4
fj = 0.5499 0.5573 0.5574 0.5574
b1 = +0.76108 +0.76087 +0.76087 +0.76087
b? = -0.69168 -0.69434 -0.69439 -0.69439
252 Chapter 9
( 9 . 1 4)
(9 . 1 5 )
In cells W3:Y4 and Z2:AA3 of your phillips_aus data worksheet enter the following labels and
formulas.
w x y z AA
2 inf1-1 dut
3 inf1-1 dut dut-1 =A2 =C3
4 =A 3 =C4 =C3
Copy the content of cells W4:Y4 to cells W5:Y92 and the content of cells Z3:AA3 to cells
Z4:AA92. Here is how your table should look (only the first five values are shown below):
w x y z AA
2 in1f1-.1 du1
..
4 1.5 -0.21 -0.l 1.5 -{). 2.
"'
5 1.7' -0.1 -0.2 1.7 -0.1
,,,.
6 1.& -0.4 -0.1 1.8 -0.4
. ,,.
7 1.&· 0 -0.4 1.8 '°
,,.
& 1.7 -0.6 0 1.7 -1()1.5
In the Regression dialog box, the Input Y Range should be A3:A92, and the Input X Range
should be W3:Y92. Check the box next to Labels. Select New Worksheet Ply and name it
ARDL(l,1) Phillips Curve Model. Finally select OK .
. � - ------ _:___ -
: Regression rrJ �
Jnput
OKl4J
Input '!.Range: �$3:$A$92 �
I $W$3:$Y$92
Cancel I
-Input ;:;_Range:
��
t:j_elp
�Labels D constant. is ?'._ero
D Con[idence Le\iel: �-010
Output options
0 .Q_utput Range! 1rrel
0 Nev-i Worksheet Ely: j iiilips C u rv e Model! I
The result is (see also p. 364 and p. 365 in Principles ofEconometrics, 4e):
Regressions with Time Series Data: Stationary Variables 253
A I B I [ I () E F I G H I I
1 SUM MARY OUTPUT
2 � I
Reg1-.!ssion Statistits
3
4 Multiple R Cl.590704343
i t-
5 R Square 0.348931.621 I
�
tI
6 A'djusted R Square Ol.325952737
i-- -
7 Standcird Eho� 0.522078251
I---
8 Obs;ervmtions 89 I
I
f
9
10 AN OVA I �
11 df SS MS F Slgnifiwnce F
u Re:gresslo_n 3 12.41663373 4.13&87791 15 .18488
- 113 5.370171E-08
--
13 Re·sidual
-- .85 23.15808537 0.27255571
14 Total 88 :i5:.5847191 I
15 j
16 Coefficients Standard Er.-ro.r t Sto't P-value Lower95% Uppei95% lower35.0% UppeI95.0%
17 lriter.cept 0.333•53253 0.089902785 3.711 03515.32 0.000367565 0.15488171 0.512383349 0.15488171 0.5 1238334"3
18 inft,1 0.559·267573
- 0.09-079 624� G.rn1s-s9153 2.3386E-08
- ().378740314 Ci. 739'.794832
-- - i0.378740314 Q.7397948-32
-
19 dut -0. 63Bl85 2-25 0.24'98-7037 -2.7541M9'89 0.00719'5323 -1.184994454 -0. 191375997 -1.184994454 -0.1913759>97
i--
20 dtlt-l [J.319852527 0.25·750'41
{ 11 1.242515119 0.217463514 -0.192034325 0.8319:19579 --0.192034325 0.831913957.9
-
In the Regression dialog box, the Input Y Range should be A3:A92, and the Input X Range
should be Z2:AA92. Check the box next to Labels. Select New Worksheet Ply and name it
ARDL(l,O) Phillips Curve Model Finally select OK.
! Regression
' �����-·
����--
L1JIBJ
1np&t
OK[$j
lnput y_ Range : I$A$2:$A$92 li:l
I $Z$2:$AA$92
Cancel
l
lnput-;ilJ_ange:
1�1
�!,:abels D Constant is �ero
tfelp
l
D Con[idence Level: EJ %
Output options
0 Qutput Range: �1
0 Ne\l'I Worksheet �ly: I ARDL(l,O) Phillips Curv1 I
The result is (see also p. 364 and p. 365 in Principles ofEconometrics, 4e):
254 Chapter 9
A I B I G D E I F I G I H I I
I
r
.1 SUMMARY dUTPUT I
2 -
3 i?egr2ssion 5tDtistics
� Molti pl • " 0.5&8552275
R Square 0.34639378'1
A.dju!rted R Square ().33136835
Standard Error 0.52.072602.6
Ohservratiom 90
-
-7o-1ANOVA
nj
II f Sigl'lificonce F
rlf SS MS F
1sl Coefficients S!c;r1cford frrot tStat P-110/ue lower9S",.6 Upper95% Lci-wer9S.0% Upp·er 95.-0%
��I ntercept ().35479511 0.0870023cB' 4.(150064624 0.()0011085 0.18{)675:191 0.528914.lll 0.180675991 0.5:28914229
l& i n ft- 1 0.528247247 (>.0850 75625' 6.2<()91490 71 l.765BE-Qll 0.35'9150231 0.697344263 0.359150231 0.6'97344263
15 d ut -0.490864743 O.lcl214913a, -
2. 554002.891 0.012370485 - 0 . 8'72781353 - 0.10894753 3 -0.872 781953 -0.1'()8947533
9.5 FORECASTING
(9.16)
where G is the percentage change in Gross Domestic Product (GDP) from quarter 2, 1985 to
quarter 3, 2009; t = 1, .. , T where T = 96.
In the Regression dialog box, the Input Y Range should be H3:H99, and the Input X Range
should be I4:J99. Check the box next to Labels. Select New Worksheet Ply and name it AR(2)
Model. Finally select OK.
: R�gre�---- - - t1J�
Input
ii OK�
rnp1Jt ]'._Range: !$Hp$H$99 �]
[$!$3 :$1$99 I Cancel l
Input �Range:
[�
�!,:abels D Constant is f'._ero I !::!elp
0 C on[idence level : �%
01Jt put options
A I B I c I D I E I f I G H l
1 Sl.UMMARY OUTPUT -- -
-
4
--
Multiple R 0.5 39149511
-
5 RSqusre 0.2'905821:95
6 Adjusted R Square 0.275428049
-
;
7 Standar-d Error 0.5526875.12
8 Observations. '9
6
9
-
10 ANOVA I
-11 I df SS MS F Significance F
-
12 Regres�ion 2 11. 6417.9165 5.8208%823 19.05594-643 1. 15904E-07
13 Re�rdual
-
93 28. 40810419 0.30· 5453486
14 Total '95 40.0i\98<:/5.83
-
15
19l gt-2 0.246239399 0.102863812 2.3"!3722589 0.0186&6-064 OAJ419&233 0.450516467 0.0419623 3 (}.45(}516467
Once estimated, the AR(2) model in (9.16) can be used to forecast GDP growth into the future.
Let Gr be the last sample observation; the forecast for GDP growth 1 quarter into the future
(Gr+1), 2 quarters into the future (Gr+z), and 3 quarters into the future (Gr+3), are given by (for
more details see pp. 372-373 in Principles ofEconometrics, 4e):
(9. 17)
(9.18)
(9.19)
The estimates of standard error of forecast error for GDP growth 1 quarter into the future (o\), 2
quarters into the future (82), and 3 quarters into the future (83), are given by (for more details see
p. 374 in Principles ofEconometrics, 4e):
(9. 2 0)
(9. 21)
(9. 22)
Finally, the lower limit (LL) and upper limit (UL) of a forecast interval for Gr+ j are given by:
(9. 23)
(9. 24)
256 Chapter 9
where tc is the 100(1- a/2)th percentile from the t-distribution with T - K degrees of freedom,
and K is the number ofparameters in the autoregressive model (9.16).
Below we create a forecast interval template, similar to the prediction interval template we
created in Section 4.1.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Forecast Interval.
I
.
Create the following template to construct forecast intervals. In the bordering columns and rows
you will fmd the numbers of the equations and the formatting options used, if any, in the
template.
A B c
1 Data Input Sample Size T= ='AR(2) Model'!B8
2 Confidence percentage
3 K= ='AR(2) Model'!Bl2+1
4 cr-hatv= ='AR(2) Model'!B7
5 8-hat= ='AR(2) Model'!B17
6 81-hat= ='AR(2) Model'!Bl8
7 Srhat= ='AR(2) Model'!Bl9
8 YT-1= ='okun data'!H98
9 Yr= ='okun data'!H99
10
11 Computed a= =l-C2
Values
12 dfor m= =Cl-C3
13 tc= =TINV(Cl1,Cl2)
14
15 Forecast
16 y-hatr+i = =C5+C6*C9+C7*C8 (9.17)
17 y-hatr+2= =C5+C6*Cl6+C7*C9 (9.18)
18 y-hatr+3 = =C5+C6*Cl7+C7*Cl6 (9.19)
D E F
14 Forecast Interval
15 a-hati Lower Limit Upper Limit
16 (9.20) =C4 =Cl6-C13*Dl6 =Cl6+C13*Dl6
17 (9.21) =C4*SQRT(l+C6/\2) =Cl7-C13*Dl7 =Cl7+C13*Dl7
18 (9.22) =C4*SQRT((C6/\2+C7)A2+C6/\2+1) =Cl8-C13*Dl8 =Cl8+C13*Dl8
(9.23) (9.24)
Here are the results you should get (see also Table 9.7 p. 374 in Principles ofEconometrics, 4e):
Regressions with Time Series Data: Stationary Variables 257
B c D E f
1 Data Input Sample Size T = '96
3 K= 3
4 o--h;n,. = 0.55168751
5 Ii-hat= 0.46572617
6 8'1-hat = 0.37700148
7 l>__rhat= 0.2-162394!
8 }'T-1= -'L
9
t 10_8
YT=
10
p,aied Villues 0..-05
11 C'ol!l:l
12 dform00 93
13 ti: = 1.9'8 5301 77
14 Po.recast Inten'al
17
y-batT�:i = 0_9a 343 +72 0_500659S4!1 - Ot . 2 l 94 '9863 7 2:106.3'680716
R s T u
1 T= =COUNT(A2:A99)
2 T/2= =Sl/2 ghat a =0.38 ghat a =0.8
3 ghat1= =AVERAGE(A2:A50) =S4*A2+(1-S4)*S3 =S5*A2+(1-S5)*S3
4 a= 0.8 =$S$4*A3+(1-$S$4)*T3 =$S$5*A3+(1-$S$5)*U3
5 a= 0.38
Copy the content of cells T4:U4 to cells T5:U99. Here is how your table should look (only the
first five values are shown below):
R I 5 I T I u I
1 T='98
�
2 T/2=·49 ghat u=e.3S ghata.='0.8
� �
Select the Insert tab located next to the Home tab. Select T2:T99. In the Charts group of
commands select Line, and Line again.
258 Chapter 9
After adding the US GDP series (actual, cells A3:A99) and editing, the result is (see also Figure
9.12(a) p. 377 in Principles ofEconometrics, 4e):
0
----· gh at a=0.38
-1
-- g-adu"I
To plot the change in the US GDP series select cells A3:A99. After editing, the result is (see also
Figure 9.12(b) p. 377 in Principles ofEconometrics, 4e):
----- g h at ct=\J.lt
--g-actual
- 1
(9.25)
Regressions with Time Series Data: Stationary Variables 259
In cells W3:Y4 of your kun data worksheet enter the following labels and formulas.
=A3
Copy the content of cells W4:Y4 to cells W5:Y99. Here is how your table should look (only the
first five values are shown below):
wl x I y
3 d'lft-1 g1 gt-1.
4 -0.1 1.4 2
I-
5 -0.2. 1.5 1.4
-
fj ( ) 0.91 1.5
7 0.2. 1.5 0.9
f--
g -0. 2 1.2 1.5
In the Regression dialog box, the Input Y Range should be C3:C99, and the Input X Range
should be W3:Y99. Check the box next to Labels. Select New Worksheet Ply and name it
ARDL(l,1) Okun's Law Model. Finally select OK.
lnput
D Con[idence Level: EJ %
Output,o ptior.s
0 Qutput Range: �1
@New Worksheet Ejy: I , 1) Oh.m's La1o11 Medell I
A B I c I 0 E I F G I H I
1 .SUMMARY OUTPUT
f--
I
2
3 Regression Statistir;s
1
4 Multiple� 0. £33126945 .
Square 0.69410()506
]
� R
'
5 Adju.sted R Square 0.54
8 125523
4
J_ Standard Error 0.162.277406
1
B O bservati ans 96
1
"i�ANOVA I Significance F
I
'
u I df .SS MS F
'
t
12 Regression 3 5 .4'3 7276.00 8 1.8:32425336 69.58412576. 1.40!186E-2J
f-- -- i
13' Residual 92 2.4:2272399 2 0.026333956
14 Total 95: 7.92 I 1
15
wI Cc>efficient� , Stan clard E.rror t Stat P-VQ{Ue lower95% Upper9.5% £ower95.0% Upp1n95.D%
2?.._l_n tercept o. 378U1()424 0.057839-8• 5.53547254.'i 3.47005E-0'1· 0.2.6313.5591 0.49288..5256 0;.2.63135591 0.4928852551
20 ,gt·l -o'.-099.155204 0.03682442.S -2.692647515 0.008423035 -0 .17229 1 691 5 -a.02601.s114 -ci.17229•1695 -0.025018714-
Estimates from (9.25) can be used to compute estimates of the impact multiplier and the delay
multipliers for the first 7 quarters (see pp. 66-72 in Principles of Econometrics, 4e for more
details):
(9.26)
(9.27)
(9.28)
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Multipliers.
lb .� I
I J[nsei:tWarksh�Et (Shitt+FllJ)
q Multipliers,:: f
Create the following table to compute the lag weights. In the last column you will find the
numbers of the equations used in the table.
A B c
1 Data Input 80-hat = ='ARDL(l,1) Okun's Law Model'!B19
2 81-hat = ='ARDL(l,1) Okun's Law Model'!B20
3 81-hat = ='ARDL(l,1) Okun's Law Model'!B18
4
5 Computed Values .i Bi-hat
6 0 =Cl (9.26)
7 1 =C2+C6*C3 (9.27)
8 2 =C7*$C$3 (9.28)
l A I B I c
3 B1-haf = 0.35011.576
4
-
8 2--0.057281044
-
9 3 -0.020054996
- -
10
-
4 -0.00702157
-
11 5 -0 . 0024 5 8 3 62
12 6 -O _ Q0086G7 1 l
- -
13 1 -0 _ 000 3' 01 349
Select cells B5:Cl3. Go to the Insert tab to the left of your tab list. Select the Scatter button in
the Charts group of commands, and select Scatter with Straight Lines on the menu of Scatter
chart type.
scatter
�
l:_
� [8]
• I'.,_
l"-S
Clnarts •
r,,
8
I ,&lll Chad 1yp�1 ...
After editing, the result is (see also Figure 9.13 p. 381 in Principles ofEconometrics, 4e):
0 1 2 3 4 5 7 8
CHAPTER 10
CHAPTER OUTLINE
10.1 OLS Estimation of a Wage Equation 10.2.2 With a Surplus Instrument
10.2 Instrumental Variables Estimation of the Wage 10.2.2a First Stage Equation for EDUC
Equation 10.2.2b Stage 2 Least Squares Estimates
10.2.1 With a Single Instrument 10.3 Specification Tests for the Wage Equation
10.2.1a First Stage Equation for EDUC 10.3.1 The Hausman Test
10.2.1b Stage 2 Least Squares Estimates 10.3.2 Testing Surplus Moment Conditions
where WAGE is hourly wage, EDUC is years of education and EXPER is years of work
experience.
Open the Excel file mroz. Save your file as POE Chapter 10 Excel file. Rename sheet 1 mroz
data.
In your mroz data worksheet enter the following labels and formulas.
AA AB AC AD
1 ln(wa2e) educ exper exper2
2 =ln(M2) =L2 =Y2 =AC2/\2
The data set includes information on working and non-working women. lfp is a dummy variable
which identifies labor force participation: it is set to 1 if a woman is in the labor force and 0
262
Random Regressors and Moment-Based Estimation 263
otherwise (you can find this variable in column G). We will use data on working women only -
which span from row 2 to row 429 only.
Copy the content of cell AA2:AD2 to cells AD3:AD429. Here is how your table should look
(only the first five values are shown below):
AA I AB I AC I AD '
_1_ ln(w.age) educ expeir exper2
2 1.210154 12 14 1%
-
3 0.328512 12 5 25
-
4 1.514138 12 15 225
5 0.09'2123
- 12 6 36
-
-
6 1.5242n 14 7 49
In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be AB1:AD429. Check the box next to Labels. Select New Worksheet Ply and
name it OLS Wage Equation. Finally select OK.
1 Regre ssion ��
Input
Q Qutptit Range:
0 New Worksheet. E'ly: loLS Wage Equation! I
The regression analysis results are (see also p. 407 in Principles ofEconometrics, 4e):
A I B I c I D I E F I G H I I
1 SUMMARY OUTPUT --
- �
-I
2
3 fiegressfon Statistic:;
L
4 Multiple R 0.3%005.544
5 R SqUiire 0.15f>&203'H
6 Adjusted R Square 0.15(}854497
7 Standa_rd Err.or 0.665420217
�
8 Observations 428
10 AllJOVA
11 rif SS MS F Significn.nce-F
12 !1egress· ion 3 35.02229647 11.674{)9'8.&2. 26.2861534 1. 3.0'177E-15
B ilesidual
�-
14 Total + 424
427
188.305144
223.3274405
0.4441159-06;
15
16 Cor>jfic:ients Standard Error tStat P-value Lower.95% UppE'r95% lowerc95.0% Upper95.0%
17 Intercept -0.52_2,()4()559 0.198632066 -2..623178668 i'.l.00889594 - Q . 91245670<7 -0.131614411 -'0.912466 707 -
0 . 131614411
educ 1.93993E-13 Cll.079 f>B".3 fiS, 0.135295598 .0.0796836S 0.135L'l5598
>---
18 O.Hl7489539 0.014146478. 7.598332005
The instrumental variables estimators, derived using the method of moments, are also called two
stage least squares (2SLS) estimators, because they can be obtained using two least squares
regressions.
In the case of a multiple linear regression model with one instrument, the first stage equation has
the random regressor or endogenous variable as the dependent variable, and the instrumental
variable plus all the exogenous variables as the explanatory variables.
Let MOTHEREDUC be our instrumental variable; the first stage equation for EDUC is:
Go back to your mroz data worksheet and enter the following label and formula.
AE
1 mothereduc
2 =U2
Copy the content of cell AE2 to cells AE3:AE429. Here is how your table should look (only the
first five values are shown below) :
AE
1 mothereduc
2 1.2
3 7
4 1.2
5 7
6 12
In the Regression dialog box, the Input Y Range should be AB1:AB429, and the Input X
Range should be AC1:AE429. Check the boxes next to Labels and Residuals. Select New
Worksheet Ply and name it 1st Stage Eq. for EDUC 1 IV. Finally select OK.
Random Regressors and Moment-Based Estimation 265
-es-s-io-n----------�
,' Re-gr ··t1J�
Input
0 Qutpu"t R<mge:
@New Worksheet l'_ly: j ige Eq. For EDUC 1 lll I
0 New '.t[orl<book
Residuals
0 Re.sidual> D Residual Plots
A B c D I E I F I u H I I
1 SUMMARY OtJTPUT
2
�
3 Regre!fSTon Statistics
4 Multiple R o().3907681}37
.,___
5 R Square 0.15269411
r
:9
u ANOVA
11 I df ss MS F Significance F
12. Regr·e�sion I 3 340.537.8336 113.5126112 25.46-986611 3.61726E-15
is I
�151 Coefficients Stan rtanf Error tStat P-valu£ Lower95% Upper 95% Lo.wer.95.0% Upper.95.0%
17 lnter·cept 9.77510269 0.42388.8615 2:3.06054547 7.5742:"3E-77 S..941917936 10,6082:8739 8.94191791!6 10.60828739
18. expe-r ().0488615 0.0416fi9;1.6 L172603007 0.241613422 -0.03 3042541 -0.130765541 ·'D.033().<!'1541 0.130765541
t---
27 1 13.42036467
-
-l.4203t;;t665
-
·--
t---
28:
291
2'
3'
lf.86121-92
· 3
13.4320752'8
o.1387.son
-1.43207528.1 I
In stage 2, we obtain the predicted values of our endogenous variable, in this case EDUC, from
the estimated first stage equation (10.2) and insert them in the original linear regression model
(10.1) to replace the EDUC values. Then, we estimate the resulting equation (10.3) by least
squares:
(10.3)
where EOOC are the predicted years of education from the estimated first stage equation (10.2).
266 Chapter 10
Go back to your mroz data worksheet and enter the following labels and formulas.
AG AH AI
2
1 educ-hat exper exper
2 ='1st Stage Eq. for EDUC 1 IV'!B27 =AC2 =AH2/\2
Copy the content of cells AG2:AI2 to cells AG3:AI429. Here is how your table should look
(only the first five values are shown below):
AG I AH I Al
,
1 edu.c·hat exper expe1r·
-
2: U.42035 14 �fj
--:--
3 11.86122 5 25
-
4 13.43208 15 225
5 11.89'59'9 6 35
-
5 13.26565 7 49
In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be AG1:AI429. Check the box next to Labels. Uncheck the box next to Residuals.
Select New Worksheet Ply and name it Stage 2 LS Estimates 1 IV. Finally select OK.
Input
0 Qutput R.;mge: �I
0 New Worlo5heet EJy: j 2 LS Estimates l IVI I
A
I B
I c D I E I I'
I 'G
I H
I I
I
t t
1 SUMMA.RY OUTPUT
- -
� t- t-
2
-
I
3 Regression Statistics
4 Mu�tiple R 0.213515'(169
-
5 R �guare
- 0.04.5588(.\85
15
-
A�justedR Square 0.038.835775
t
7 Sti'tndar.d Ernror 0.709015788
-
8 Ob >eFV a tio ns 428
;�ANOVA I
11 I df SS MS F Significance F
t
12 Regression 3 10.1812.0431 3.3-937347691 · 6.750 958574 0.00018608.S.
19 Resid1;1al 424 2.13.1462351 -0.502703387
-
14 Total I 427 223.3274405 I
15
Note that while using this two-stage least squares approach yields proper instrumental variables
estimates, the accompanying standard errors are not correct (see also p. 412 in Principles of
Econometrics, 4e).
The correct standard error of the instrumental variable estimator of Pk is estimated using equation
(10.4) below:
(10.4)
where (10.5)
and (10.6)
In equation (10.6), fj1, fj2, /j3 and /j4 are the least squares estimates from equation (10.3).
Go back to your mroz data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any.
AK AL
1 wage equation IV estimates using 2 SLS, 1
instrument
2 fJi-hat1v = lh-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B 1 7
3 lh-hat1v = lh-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B 18
4 fJ3-hat1v = fJ3-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B 19
5 (l4-hat1v = (14-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B20
268 Chapter 10
AN
1 e-hat1v2
2 =(AA2-$AL$2-$AL$3*AB2-$AL$4*AC2-$AL$5*AD2Y'2 (10.6/
AK AL
7 N= ='Stage 2 LS Estimates 1 IV'!B8
8 K= ='Stage 2 LS Estimates 1 IV'!B12+1
9 O'-hatsta2e 2 LS= ='Stage 2 LS Estimates 1 IV'!B7
10 se(Pi-hat)stm 2 LS = ='Stage 2 LS Estimates 1 IV'!Cl 7
11 Se(f}2-hat)sta2e 2 LS = ='Stage 2 LS Estimates 1 IV'!C18
12 se(f}3-hat)stm 2 LS = ='Stage 2 LS Estimates 1 IV'!C19
13 Se(f}4-hat)sta"e 2 LS = ='Stage 2 LS Estimates 1 IV'!C20
14 a-hat1v = =SQRT(SUM(AN2:AN429)/(AL7-AL8)) (10.5)
15 se(P1-hat)1v = =(AL14/AL9)*AL10 (10.4)
16 se(fJi-hat)1v = =(AL14/AL9)*AL11 (10.4)
17 se(f:h-hat)1v = =(AL14/AL9)*AL12 (10.4)
18 se(fJ4-hat)1v = =(AL14/AL9)*AL13 (10.4)
The result is (see also standard errors estimates on p. 415 in Principles ofEconometrics, 4e):
-
AK I AL I AM I AN
£> ()_L35H26
:�
7 N= 428 ()'_1()84702
8 K=4 0.565 ·39
:�
9' 11-bat;i.�t.l LS= 0.709(11579 ()'_{;88703
IO
- se(IJrEl.aO,,.go l LS =·Cl.49334166_ 0336899
11 se(p�-ll:at),til�•H.S. 0_0390562
= ()_048497
13
,_
se-(jJ4-bilf);t.igelLS.= 0_00042397 (1'.1_13528
14
,_
cr-bat1v =.0.6 7.�603 55 ()_()0306&_
-
In the case of the multiple linear regression model with two instruments, the reduced form
equation has the random regressor or endogenous variable as the dependent variable, and the two
Random Regressors and Moment-Based Estimation 269
instrumental variables plus all the exogenous variables as explanatory variables. In Section 10.2.1
we used "mother's education" as an instrument; let us add "father's education" as an additional
instrumental variable. The first stage equation for EDUC is:
Go back to your mroz data worksheet and enter the following label and formula.
AF
1 fathereduc
2 =V2
Copy the content of cell AF2 to cells AF3:AF429. Here is how your table should look (only the
first five values are shown below):
II ••
AF I
1 fath.ereduc
2 7
,_
3 7
4 7
,_
5 7
,_
6 14
In the Regression dialog box, the Input Y Range should be AB1:AB429, and the Input X
Range should be AC1:AF429. Check the boxes next to Labels and Residuals. Select New
Worksheet Ply and name it 1st Stage Eq. for EDUC 2 IV. Finally select OK.
A B I c I !l E F I G I H I I
1 SUMMARY OUTPUT
2
3 Regression Statistics
4 Multiple R 0.4598591354
f--
5, R Square 0.2:11470525
5 Adjus.ted R Square 0.204014():83
'!
f-
Standard Enror 2.038957457
g, observation� I 4.2&
+a-IA NOVA
11 I df SS MS "' Sig·nificance F
12 Regres.s.io,n 4 471.62-09982 117.9052:4'%
- 28.35041288
- 5.87297E-2l
-
13 Res.I dual 423 1-758,575253 4.1573&833
-
14 Total 427 :2230.1962·62. r
1s i
1.6 Coeffi'ci:entS Standard Error tst,at P--vo/!Je lower95% Upper95% lower95,0% Vpp�i:9SO%
17 lrn�er.cep.t 9.10264011 I0.4'26551357 21.339579.27 4.09847E-69 8.264195239 9.941083981 8..2641%239 9.9410&3981
lil
- exper 0.0452.254.23 I0.040250712. 1.123593117 0. 2'6i82i938 -0.0338'90891 0.124341737 - 0 033 890-891
. 0.124341737
19 12xper2. -0.001009091 0.0012-03.345 - 0 8 385 71743 0.402183285
. -0.003 3 74371 0.001356189 -0. 003 374 371 0.000355189
20 mothereduc 0.157597033
- o.mss,94116 4390509>167 1.429ME-05 O.G.87043 9'93 0.228150073 0. -
087()4399'3 0.21815007.3
21 'fathered11c I 0.18954841, 0.033756467 5.61517327fi 3.5615.1 E-08 0.123197107 0.255899714 0.123197107 G.255899
, 714
22
1-
23
1--
24
-
25 RES.IDUAl OUTPUT
f--
26
We obtain the predicted values EDUC from the estimated first stage equation (10.7) and insert
them in the original multiple linear regression model (10.1) to replace the EDUC values. Then,
we estimate the resulting equation (10.8) by least squares.
(10.8)
Go back to your mroz data worksheet and enter the following labels and formulas.
AP AQ AR
2
1 educ-hat exper exper
2 ='1st Stage Eq. for EDUC 2 IV'!B28 =AH2 =AQ2/\2
Copy the content of cell AP2:AR2 to cells AP3:AR429. Here is how your table should look
(only the first five values are shown below):
Random Regressors and Moment-Based Estimation 271
n
AP I AQ I AR I
1 educ-hat exper exper2
I-
2 12_ 7560!75 14 196
3 1L7335 "31 .5 r
5 11-7676835 6 36
,_
5 13: 9 146148 7 49
In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be APl :AR429. Check the box next to Labels. Uncheck the box next to
Residuals. Select New Worksheet Ply and name it Stage 2 LS Estimates 2 IV. Finally select
OK.
! Reg ressi1rn
· --------� --
(1J l8J
Input
Input "f:. Range:
Inp ut b: Range:
1$AA$1:$AA$429 rni]
$AP$ i: $.AR$429 �
� I
t:!elp
�!,_abels D Constant is E_ero
D Conf_ide.nce Level: 6=]�10
Output options
0 Q.utput Range: l�l
@New Work$heet �y: I z LS Estimates 2 IV
I
A I B I c I D I E F G I J-:! I I
1 SUMMARY OUTPUT
f---
2
L
31 Re:i
r reMioa Sratisfi�
Mul!ipJe R 0.2.2:3:1202?4
� +
5 R Square 0 _049'782634
42S
-1�IArliovA
-H I df SS MS F Sig_nificanoe F
J! Regression 3 11_ 11182834 3]05942779' 7 -4045£4396 Hi 1.541 E-05
lJ' Besidual
f---
424 212.20961 21 .0.500494368·
14 Total 427 223.3:274405
15'
f--- -
11& I Coefficients Standard Error t Stat P-walue foweI 95% UeE.eI 95% lower!l5_0% Ue£!.er 95.0%
JI�Intercept 0•.048100303 0.41975647.5 0.114590972 0.906823579 -0.7769&2371 o.8731629176· -o_n6962371 0 . 87316297ip
1 B_J educ -h at OcO 61396628 0_0329623 56: 1J�626286J-8. Qi.0632059 -0.003 393342 0.126186598 -0.003393342 0_ 126186598
� exper
_
exper2
0·_044170394
-()_00()89897
0_01408437 3_ 136128625 0_001ll3:1158
4 118 -2_ 1344Q775-S: O_OJJJ-82:382
0_0002
0.0164-86515
-O_OIH72683
0_071854274
-7_11Q91E-05
0_01648!)515,
-Q_001725B3
0.0'71854274
-7_ 11091E-D5
While using the two-stage least squares approach yields proper instrumental variables estimates,
the accompanying standard errors are not correct. The correct standard error of the instrumental
variable estimator of {Jk is estimated using equations (10.4)-(10.6) restated next:
(10.4)
272 Chapter 10
where (10.5)
(10.6)
In equation (10.6), /Ji, /12, /13 and /14 are the least squares estimates from equation (10.8).
Go back to your mroz data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any. This is identical to what you
have done in Section 10.2.lb except for the fact that you now retrieve the information needed
from your Stage 2 LS Estimates 2 IV worksheet instead of your Stage 2 LS Estimates 1 IV
worksheet.
AT AU
1 wage equation IV estimates using 2 SLS, 2
instruments
2 IJi-hat1v = IJi-hatstaPe 2 LS= ='Stage 2 LS Estimates 2IV'!B17
3 fh-hat1v = fh-hatsta11.e 2 LS= ='Stage 2 LS Estimates 2 IV'!B18
4 lh-hat1v = lh-hatsta11.e 2 LS= ='Stage 2 LS Estimates 2IV'!B19
5 fJ4-hat1v = fJ4-hatstal!e 2 LS= ='Stage 2 LS Estimates 2IV'!B20
AW
2
1 e-hat1v
2 =(AA2-$AU$2-$AU$3*AB2-$AU$4* AC2-$AU$5* AD2)1'2 (10.6/
AT AU
7 N= ='Stage 2 LS Estimates 2IV'!B8
8 K= ='Stage 2 LS Estimates 2IV'!B12+1
9 O'-hatstaPe 2 LS= ='Stage 2 LS Estimates 2IV'!B7
10 se(l}i-bat)sta!!e 2 LS= ='Stage 2 LS Estimates 2IV'!Cl7
11 se(82-hat)sta11.e 2 LS= ='Stage 2 LS Estimates 2IV'!C18
12 se(lh-hat)sta11.e 2 LS= ='Stage 2 LS Estimates 2IV'!Cl9
13 Se{B4-bat)staPe 2 LS= ='Stage 2 LS Estimates 2IV'!C20
14 a-hat1v = =SQRT(SUM(AW2:AW429)/(AU7-AU8)) (10.5)
15 se<8i-hat)1v = =(AU14/AU9)*AU10 (10.4)
16 se(lh-hat)1v = =(AUI 4/AU9)*AUl1 (10.4)
17 se(fJ3-hat)1v = =(AUl4/AU9)*AU12 (10.4)
18 se(IJ4-hat)1v = =(AUl4/AU9)*AU13 (10.4)
AT AU AV AW
5 0-123535
7 N =<L8 0.085834
8 K=4 0-507%1
10 : s = 0.41975647
se(llr'l'lat)J:bl$�1i (l328Hl7
Let us revisit the first stage equation (10.7) from Section 10.2.2a:
We obtain the residuals v from the estimated reduced form equation (10.7) and insert v in the
original wage equation (10.1) as an additional explanatory variable. We estimate the resulting
equation (10.9) by least squares:
Go back to your mroz data worksheet and enter the following labels and formulas.
AY A'Z BA BB
1 educ exper exper2 v-hat
2 =AB2 =AC2 =AZ2A2 ='1st Stage Eq. for EDUC 2 IV'!C28
Copy the content of cell AY2:BB2 to cells AY3:BB429. Here is how your table should look
(only the first five values are shown below):
274 Chapter 10
AV I AZ BA BB
1 educ e"-11·er exper2 v-bat
2 12 14 196--0.756017
I
3 5 25 0.26644
4 12 1 5' 225;--0_771979
5 12 6 36 0.231317
6 14 7 49 0_[)85385
In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be AY1:BB429. Check the box next to Labels. Select New Worksheet Ply and
name it Hausman Test for Wage Equation. Finally select OK.
• -- ����� ----�
----===-- �--
The result is (see also Table 10.2 on p. 422 of Principles ofEconometrics, 4e):
A. I B I c I D I E. I F I G I H I I
16 Coefficients Standard Error t Sfa! P-valu� Lowe195% Ue_e_er95-% Lower95.Q% Uppe195 0%
17 Intercept 0.048100}��! 0.39457.5257 0.1219()4001 0.9030'32937 -0.7274 T2Q56 OJl236 72661 -0. 727472'056 Q;_ 8:2'36 72.661
1s educ 0.051396fr28 0.030984942 1_9§14�8ss<1 o.0481 s234.5 0.0·00493' 0.12230-0256 0.000493· o t22Joo2ss
_
We have outlined the p-value of the t-test of interest to us in the above table. The coefficient of
the reduced form residuals is significant at the 10% level of significance using a two-tail test.
While this is not strong evidence of the endogeneity of education, it is sufficient cause for
concern to consider using instrumental variables estimation.
For the wage equation (10.1), restated below, if we use MOTHEREDUC and FATHEREDUC as
instruments there is one surplus moment condition.
We obtain the residuals e1v from the JV estimates for equation (10.1), as we did in Section 10.2.2:
(10.6)
Random Regressors and Moment-Based Estimation 275
We then regress the residuals eIV on all available exogenous and instrumental variables. In other
words, we estimate the following equation:
(10.10)
Finally we use the R2 from the estimated equation (10.10) and run a Lagrange multiplier test.
Go back to your mroz data worksheet and enter the following label and formula.
BD
1 e-hat1v
2 =AA2-$AU$2 -$AU$3 *AY2-$AU$4 *AZ2-$AU$5 *BA2
Copy the content of cell BD2 to cells BD3:BD429. Here is how your table should look (only the
first five values are shown below):
BD
1 e-b.af1v
2: -0. 016894
3 -0_ 6 5 4726
4 026899
5 -0.915390
In the Regression dialog box, the Input Y Range should be BD1:BD429, and the Input X
Range should be AC1:AF429. Check the box next to Labels. Select New Worksheet ply and
name it IV Residuals Regression. Finally select OK.
------ ----
0 Q.utput Range: �1
.0 Ne'l-I Work.sheet e_ly: I·iduals Regression I
A I B I c [} I I: F I
I
G
I H
I I
I
1 SlJMMARY OUTPlJT
3 I Reg1ession Sta:tistics r
iM"IHpl• RR'Square
A dj uste d ll Square
0.02,9721113
0. 00(f8:83 34'5
·0.()1)8564567
:Standa rd l:rror 0. 67.5 2_1{)346
Obs.ervati ons' 428
-ioiANOV'A
n J rJf SS MS F Slgnijfcan:ce F
12
-
Reg'iessi•on 4 0.1106()3174 10.042525794 0. 0934%273 0.9S44'95193
-
13 Residual 423 192.849�U8 '0.455909011
J.4 To1al 427 1.93.!Y20014'.9 �
15 J
�J Cor?fffcients Stando rd Errnr t Stat P-value Lower95% Upper 9.5% [OW.er!l.5.0% Upper 95.0%
17
-
lnter.cepi 0. 010'9 64!064 0.141257100 Q.0775177!!5 0.9-38168795 -0.2'66589202 Qo.28851733 -0. 266689202 0.28251733
;
18 e.xper ·1.83:148E·05 O.Or13.3 29147 -•0.0013755:19 0.998°903127 -0.026217945 0. 026181276 -0.{)26217945 0.()2.fi!.81276
I
19 expN2 7.34B'JE-07 0.<>003 9'84"91 -0 .001842295 0. 9'9.853-0931 -O.GG07S2536 1).000784004 -
O . M07825 3 6 O.GG0784004
-
20 mothereduc
--
-0. 0056ll'6533 0.011&&6447 -
0 . 55580 � 878 0.57.8638784 -0.029970389 1).016757323 -0.!029 970 3 89 0.01675 7323'
21 1fatheredU1c 0.0()5782258 0.()1117855:& Q.517263334 0.605242715 -0.()1519018 Cl.027754595 -O.Q1619018 0.02 77Ji4696;
The results we are going to use for the Lagrange multiplier test are highlighted in the above
summary output.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.
A B c
1 Data Input N= ='IV Residuals Regression'!B8
2
2 R = ='IV Residuals Regression' !B5
3 a=
4 m=
5
2
6 Computed Values x -critical value= =CHIINV(C3,C4)
8 Lagrange Multiplier i= =Cl *C2
Test
9 Conclusion= =IF(C8>=C6,"Reject Ho","Do Not
Reject Ho")
10 p-value= =CHIDIST(C8,C4)
11 Conclusion= =IF(C10<=C3,"Reject Ho","Do Not
Reject Ho")
Random Regressors and Moment-Based Estimation 277
At a = 0.05, and with 1, the result of the test, found in the Lagrange Multiplier Test
m =
D.ata Input
IR2
=
0.000883
lit= 0.0·5
Note that the difference between the x2-statistic value reported above and the one reported on p.
422 of Principles ofEconometrics, 4e is due to rounding number differences.
-
A B I G I D
1 N= 428
2
1-
Ii
4 m= 1
'5
1--,-
1-
6 ('-oitical --·
7
1-
B Lag,rang;e
�eject
1-
9
1-
l
0.5386_3"7
1-
,1
CHAPTER 11
CHAPTER OUTLINE
11.1 Supply and Demand Model for Truffles 11.2 Supply and Demand Model for the Fulton Fish
11.1.1 The Reduced Form Equations Market
11.1.1 a Reduced Form Equation for Q 11.2.1 The Reduced Form Equations
11.1.1b Reduced Form Equation for P 11.2.1 a Reduced Form Equation for lnQ
11.1.2 The Structural Equations or Stage 2 11.2.1b Reduced Form Equation for lnP
Least Squares Estimates 11.2.2 The Structural Equations or Stage 2
11.1.2a 2SLS Estimates for Truffle Least Squares Estimates
Demand 11.2.2a 2SLS Estimates for Fulton Fish
11.1.2b 2SLS Estimates for Truffle Supply Demand
In this chapter, we estimate simultaneous equation models where there are two or more dependent
variables that need to be estimated jointly. Ordinary least squares estimation is not possible when
we are dealing with more than one equation. For example, to explain both price and quantity of a
good, we need both supply and demand equations which work together to determine price and
quantity jointly.
(11.1)
(11.2)
where Q is the quantity of truffles traded in a particular French market-place, indexed by i and
measured in ounces. P is the market price of truffles and PS is the market price of a substitute for
real truffles, both are measured in $ per ounce. DI is per capita monthly disposable income of
local residents, measured in $1,000, and PF is the hourly rental rate($) for a truffle-finding pig.
278
Simultaneous Equations Models 279
Consider the following reduced form equations for the supply and demand model for truffles:
(11.3)
(11.4)
Open the Excel file truffles. Save your file as POE Chapter 11. Rename sheet 1 truffles data.
In the Regression dialog box, the Input Y Range should be Bl:B31, and the Input X Range
should be Cl:E31. Check the box next to Labels. Select New Worksheet Ply and name it
Truffles Reduced Form Eq. for Q. Finally select OK.
: Regres-sion ��
.Input
lnput l R. ange :
lhput � R.ange: I
SB.Sl:S8p1
S.CSl:SE531
�
�
� I
t:Jelp
0�abefo 0 Con� tant is ;?;ero
0 Confidern:e Level; E=:J%
Output options:
O Q.utputRang e : �I
@New Worhheet E'ly: I [ed Forrn Eq, for QI I
The result is (see also Table 11.2a p. 456 in Principles ofEconometrics, 4e):
A I B I c I D I E I F G H I I l
1
--r
SUMMARY OUTPUT
� l
3 Regress1w Sfotrslios.
-
4 Multiple R 0 8350$646°2
5_ R Square 0.69'7386101
__£___ A.dju�led R Square 0 662469112
7 Standard Error 2 630084498
T Obse�iations 30
�ANOVA
....
I
11 I df SS MS F StnnificanGe F
-
12 Regressio11 3 ,j,30 3826319 143.4603?73 �7268759 63317- 1E-0 7
t
-
__1l. Residual 2.6 186.7.541753 7.18285291 G
14 Total 29 611 .1 3 6a or r I
16 I
16 I Coefficfe11ts SfMda-rd Error I Stat P-•talue lower95% Uooer95% Lower 95.0% Upoer 95.0%
17 lnter;cept 7 .89�100328 3.24342132:5 2 4 34188944 0 [122099332
_ 1.273152378 14..56204828 1.228152378 14.56204828
18 fJS 0 66540201·1 0.14;2.537596- 4 .605114.937 9.6316GE:o5 C}534·11J9 0.949392232 0 _ 3 6341179 ll .949392:232
19 di 2_ fo7156o7s 0. 70047J729 3.[)93843417 0.004580302 0.1273 11721 iGii7ooo43S 0 72731'1721 3.607000435
Toi�f -0.50 698.239:2 0..12.1261645 -4.1soa96.549 o . odo�291ia r -
C 7562'392 7
L -0�25 ni5!514 .:-a.1�r623921 -tl 257725514
280 Chapter 11
In the Regression dialog box, the Input Y Range should be Al:A31, and the Input X Range
should be Cl:E31. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it Truffles Reduced Form Eq. for P. Finally select OK.
,.. - - .
: Regre5sion [1'J LR]
1ilput
lnputr Range: I SASl:SAS.3'1 � Ca171(el
Jnput);;Range: SC Sl: 'SE-SJ-i �
!jelp
�labels D Constant Is !('.ero
D Con�den-ce level: � e'. . ,
Output option:s
0 QutputRang�: �1
@New Ww.sheet�y; Iced Form Eq. for Pj
I
0 Ne•11 \l'Lorkbook
Reslduafs
� &esidual�. D Resfgpal Plotli
The result is (see also Table 1 l .2b p. 456 in Principles ofEconometrics, 4e):
A I B I c I D I
E F I G I H I I
,__L SUMMARY"()ijfpLiT I I
2
�
3 I ,C?eg_ress1on SlatislJCs
,__±___ f\llu_ltiple R 0_9427'00058'
s R Square
'6 i'.-;d)ust_ed R Square
0.88Bfi83399
o _ sisa3·9 1 15
1
T Standa rtl Error 659.74855"16
8- Observations 30
I
1o-1N�OVA. �- ---+ �
11 I I rJf SS MS F S{I£.nificarrce F -
1:2' Regression 3 9034 775536 3011.591845 6�.18934538 1-596.7'1 E-12
f
,4
T3
f--
Residual 26 ·1131_(;9.7193 43 52681513 '
�-
21
2i�
� RESIDUAL OUTPUT
� -- - ----1 ---+
2S
2(} Observaf<ati Predicted(! Rec;1d1J!Efis
27
-- , 31 8 3 114 0532 -2 19040631!) �
t
28 2. 40_46577233 -0 235772328
29 -3 38.50107611 -3.791077113 � I
Simultaneous Equations Models 281
Demand: (11.5)
Supply: (11.6 )
Go back to your truffles data worksheet and enter the following labels and formulas.
G H I
1 p-hat ps di
2 ='Truffles Reduced Form Eq. for P'!B27 =C2 =D2
Copy the content of cells G2:I2 to cells G3:I31. Here is how your table should look (only the
first five values are shown below):
G I H I I
_j_ p-hat p:s di
2
-
31.83041 1�L97 2.. 103
-
3 4.0_46517 18.04 2'.043
4_ 36.5'01 OB 22.36 1.87
5 39 03302 20.87 1}525
-
In the Regression dialog box, the Input Y Range should be Bl:B31, and the Input X Range
should be Gl:I31. Check the box next to Labels. Uncheck the box next to Residuals. Select New
Worksheet Ply and name it Stage 2 LS Demand for Truffles. Finally select OK.
Regression
[1]�
lnput
Input'.!'. Range: ls:i51:sss:n &i
Input!Range: I sGs1: s.153 i �
tieIp
06.abels D Cmsfimtl'!. :1:_ero
D Coo:!jdence LeveJ-: EJ � •.
Output options
0 Qutput Range: �1
G New Worksl:Jeet f:ly: l.cmand for Truffles I
The result is (see also Table l l .3a on p. 456 in Principles ofEconometrics, 4e):
282 Chapter 11
=
A I B I c I D I E F I G I H I I
1
f--
SUMMARY OUTPUT I
2
31 Reoress1or; Sfo'li sires
� Pl/lultifJle R 0 _635096462
.0 R SqL1are _b.69BW>101
r--
6 Adjusted R Squ a re
-
_o 56?45911-2 I
7 's!ai1dard Error 2'.G80084498·
T Observ.ations 30
9
1f AN OVA
t-
11 di SS MS f S•g_mficance F
J1. h1t�rcept -4,279473279 3.013833T48 · 1 41994338 0.167504529 ·10.4744972·1 1 915550652 ·10.47449721 1-915550552'
18 p-ha1
,____
_
-0.374459'152 o.OB9SG43i1 -4.186s96549 o.ooo29'1is1 -o sss561-isg -0�·190]67055 -0_5.585f.fr2'59 -0_ i 9(ffs70r,5.
19 ps 1 296033361 0 1930_944.29 6 11914817 4_02 07E.-07
_
-0 899122:081 1 69294464 0. 899122031 1.69294464.
Ta di 5 . 0 1 397887 1 1 . 2:41414409 4 038924337 0.000422352 2 462215032 7 56574271 2.4 62215032 7 .56574271 .
Note that while using this two-stage least squares approach yields proper variables estimates, the
accompanying standard errors are not correct. The correct standard error of the variable estimator
of ak is estimated using equations (10.4) and (10.5) restated below:
(10.4)
where (10.5)
and (11.7)
where al, a2 'a3 and a4 are the least squares estimates from equation (11.5).
Go back to your truffies data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any.
K L
1 Demand for Truffles, structural equation
or IV estimates usin2 2 SLS
2 ai-bat1v = a1-hatsta2e 2 LS= ='Stage 2 LS Demand for Truffles!B17
3 az-hat1v = az-hatsta2e 2 LS
= ='Stage 2 LS Demand for Truffles'!B18
4 a3-hat1v = a3-batsta2e 2 LS= ='Stage 2 LS Demand for Truffles'!B19
5 a4-hat1v = a4-hatsta2e 2 LS= ='Stage 2 LS Demand for Truffles'!B20
N
2
1 e-hat1v
2 =(B2-$L$2-$L$3 *A2-$L$4 *C2-$L$5 *D2)"2
K L
7 N= ='Stage 2 LS Demand for Truffles'!B8
8 K= ='Stage 2 LS Demand for Truffles'!B12+1
9 O'-hatsta!!e2LS= ='Stage 2 LS Demand for Truffles'!B7
10 Se(a1-hat)sta1?e2LS = ='Stage 2 LS Demand for Truffles'!Cl 7
11 Se(Ui-hat)sta!!e2LS = ='Stage 2 LS Demand for Truffles'!C18
12 Se(U3-hat)sta!!e2LS = ='Stage 2 LS Demand for Truffles'!C19
13 Se(U4-hat)sta!!e2LS = ='Stage 2 LS Demand for Truffles'!C20
14 a-hat1v = =SQRT(SUM(N2:N31)l(L7-L8)) (10.5)
15 se(arha01v = =(L14/L9)*L10 (10.4)
16 se(a2-hat)1v = =(L14/L9)*Ll 1 (10.4)
17 se(arhat)1v = =(L14/L9)*L12 (10.4)
18 se(a4-hat)1v = =(L14/L9)*L13 (10.4)
The result is (see also standard errors estimates m Table 11.3a on p. 456 of Principles of
Econometrics, 4e):
I K I L I M I N
1
-
Demand for Truffles. slruc.lural equation or IV estimates using 2 SLS e-hat,..,?
2
-
hat1v = 1:11-hats·ta.ge 2 L�
a. 1 - =
-4 ,2 7947 1 340364
3 C1<-hat1v = Clz-hats1age 1 LS= -0_37'446 1_537591
4 a:3-hat1v = Cl:;i-hat,1,.g• ns =· 1 296033 2_ 156431
-
5 C14-h at1v = C14-hat.tage ·z LS.= 5.013979 4.967461
6 57_50169·
7 N-'-
= 30 65.88'19
8 K= 4 23_92123
9
-
u-hats1ag;e 2 u = 2 G30084 12.05736
10 se(ct1-hat}s1.a,ge 1 LS= 3 01:3834 17-75874
11 se(•cr;i--hath,1,g• l LS=· 0.089564 7_303551
12
-
se(u:i.-halhtag• i LS.=· 0.193094 30.53484
13 se�u..i-hat).i.,9�as =· 1.241414 5.9·8552
14
-
·<Y-hat111 = 4.92.996 6·.781699
15
�
se(a.1 -hatJiv = 5 543885 0.6165'1&
1(} se·(a.2-hat!tv = 0 164752 3. 568511
17 se{C13-hat)tv = 0_355193 13 7012
18- se(a..i-hat)11r =· :2 283556 32.17105
In your truffles data worksheet, enter the following labels and formulas.
1
2
Copy the content of cells P2:Q2 to cells P3:Q31. Here is how your table should look (only the
first five values are shown below):
284 Chapter 11
p Q
p-hat yf
2_ 31 83041 10.52
1 40.46677 l9_i57
4 38!i0108 13_74
5 39.03302 17.95
6 4()_44901 H7 1
In the Regression dialog box, the Input Y Range should be Bl:B31, and the Input X Range
should be Pl:Q31. Check the box next to Labels. Select New Worksheet Ply and name it Stage
2 LS Supply for Truffles. Finally select OK.
Input'!'_Rar.ge:
!liput X Range:
I!'llsi, !!ll 5.3j
ISPs1:�sJ·1
[�]
�
� I
tieJP
�h.al:lels. D constanti� �ero
D Go fiden(f Level: @=!%
output options
0 Qutput Range: �1
0 Mew Wor!<sheet�ly: 12 LS Supply for Truffles I
The result is (see also Table 11.3b on p. 457 in Principles ofEconometrics, 4e):
�SUMMAA$-oufPUT I B
f
I c I D I E I F I G I H I I
31 Re_qreo:sion Slaliotios
,_!L Mu_ltiple· R 0. -.0320884-9"1
� . .
__§_ R Square _q _69_ 23] g57_
,__§__ Adfuste-0 R Square 0.66358394-3-
' Stan-dard Error 2-!iS.1687234
1- -
8 0 bserYati o ns 3 0·
2-
10 AN OVA
11 cff SS MS Sio11ific8'1iCe f
r:
I
Again, note that while using this two-stage least squares approach yields proper variables
estimates, the accompanying standard errors are not correct. The correct standard error of the
variable estimator of {Jk is estimated using equations (10.4) and (10.5) restated below:
(10.4)
where (10.5)
Simultaneous Equations Models 285
(11.8)
where p1, Pz and p3 are the least squares estimates from equation (11.6).
Go back to your truffles data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any.
s T
1 Supply for Truffles, structural equation
or IV estimates usin 2 SLS
2 1-hatsta e 2 LS= ='Sta for Truffles!Bl 7
3 z-hatsta e 2 LS= ='Sta for Truffles'!B18
4 3-hatsta e 2 LS= ='Sta for Truffles'!B19
v
2
1 e-hat1v
2 =(B2-$T$2-$T$3*A2-$T$4*E2)"'2
s T
6 N= for Truffles'!B8
7 K=
8 O'-hatsta e 2 LS=
9 1-hat)sta e 2 LS= for Truffles'!Cl 7
10 2-hat)sta e 2 LS= for Truffles'!C18
11 se 3-hat sta e 2 LS=
12 O"-bat1v =
13 se(P1-hat)1v =
14 se( z-hat)1v =
15 se rhat)1v =
The result is (see also standard errors estimates in Table 11.3b on p. 457 of Principles of
Econometrics, 4e):
• s I T I u I v I
1 SuJJpJyforTrufles, structural equation or IV estimates using 2 SLS e·-hativ2
I-
2 11,-h.ativ 1'1-hat,to�e i LS= 20_0328 0_ 13:6153
I-
=
-
3 fu-h·ativ =
1!2-h at,tage z LSi = 0.33798Z 0_813448
4 113-h ativ = �-hat�taye z LS =
-1.0009
· 1 2_554734
-
5 1-12'5603'
6 N= 30 3-234284,
L K= 3' 2.921237
8 U-hat.tage ?_LS= 2.1>51687 0_6G68!t8.
9 se(ll,-hat�<toge i u = 2.1656·96 0.1>1213
I-
11 :se(l\3-haf�,.tage z LS = 0.146127 0_416504
12 U•-h.af1"= 1.497585 0_06.28!19
...__,_
I-
13 se,(frhatl1y = 1.22_3115, -
0.507986
14
-
se(IJ>?-hat�LV = 0.02492 0_65358'1
15 se�IJ.J;-hat>1v = 0.0'82528 4_57183.5;
286 Chapter 11
11.2 SUPPLY AND DEMAND MODEL FOR THE FULTON FISH MARKET
Consider the following supply and demand model for the Fulton fish market:
a1 + a2ln(PRICEt) + a3MONt
Demand: ln(QUANt) = (11.9)
+a4TUEt + a5 WEDt + a6THUt + ef
where QUAN is the quantity of fish sold, in pounds, and PRICE is the average daily price per
pound. The subscript "t" is used to index daily observations collected over the period December
2, 1991 to May 8, 1992. MON, TUE, WED, THU are dummy variables for the days of the week;
they capture the day-to-day shifts in demand. STORMY is a dummy variable indicating stormy
weather during the previous 3 days; this variable is important in the supply equation because
stormy weather makes fishing more difficult, reducing the supply of fish brought to market.
Consider the following reduced form equations for the supply and demand model for the Fulton
fish market:
rr11 + rr21MONt + rr31TUEt + rr41 WEDt
ln(QUANt) = (11.11)
+ n51THUt + n61STORMYt + Vti
(11.12)
Open the Excel file fultonfish. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 11 in one file, create a new worksheet in your POE
Chapter 11 Excel file, rename it fultonfish data, and in it, copy the data set you just opened.
We first estimate the reduced form equation for ln(QUAN) (equation (11.11)).
In the Regression dialog box, the Input Y Range should be Dl:D112, and the Input X Range
should be El:I112. Check the box next to Labels. Select New Worksheet Ply and name it Fish
Reduced Form Eq. for lnQ. Finally select OK.
Simultaneous Equations Models 287
_-- - .
Regres.sio111 11Jl:EJ
Input
InputfR<.111ge:
Jnput � Rarige:
1.;D l:SDSl 12
I SES.1:51&112
�]
[�
� I
!::!_elp
�Labels D Constant ls f_ern
D Conjjdence Level: EJ �/[!
Output options
0 Quti::n.Jt R;:ing�:. �1
@Nei\' Worksheet eJy: I1 Form Eq, for lnQ
I
The result is (see also Table l l .4a p. 459 in Principles ofEconometrics, 4e):
A El I c I D I E I F I G I H I I
., SUMMARY OUTPUT I
,_
2
J Regmssron Slatislics
4 l\llultipJe R 0.43974D44�
f--
_§__ R ?'quare Q 19�:).7165.9
.l f+:cijus.te.d R Sql1are 0 1549{)0786
7 Standard Error 0.681789555
>--- 0
s Observations 111
�1ANOVA
11 df SS MS F Sfg_nificance F
18 rnbn 0 101005158 oiuG5o2'523 OA8!lt23"11· 0.62'5774713 -a 3os"'-'!s'rns o_s 1·a4Gt5.o3 -0.308451188 0.51046151)3
r---- - -· - --···
Next, we estimate the reduced form equation for ln(PRICE) (equation (11.12)).
In the Regression dialog box, the Input Y Range should be Bl:B112, and the Input X Range
should be El:I112. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it Fish Reduced Form Eq. for lnP. Finally select OK.
288 Chapter 11
'
Regression ��
lnput
Input :-;_Range:
I S1:51lS112.
sE
I g:<;1:SIS112
�
[�I
� I
!]_elp
01,,ab'efs D con�tan!is �era
D Con5� Rce Le11el: 8=] 01. .
Ouiputoptim�
0 Qu:!putRange: rill
®New Workl.ihe�t E'Jv: J 1 Form Eq. for lnPI I
0 New '!f'l_orJcboak
Re;iduals
�Residuals D Re&ictual Plots
The result is (see also Table 11.4b p. 459 in Principles ofEconometrics, 4e):
B I c D I E F I .G I H I I
3 I Re<rl'essw1 Stal'iSl1cs
r±- Multiple R
I
0. 422952732
el R Square 0_-17��8:9065 j
� Adjusted R Squan� 0.139786544
7 Standard Error 0.3:54235114
S Oh'SeP>alions 111 1
i
9
lo ANOVA I
-� 1 �������� -d·t_ ����s_s����M_S����F�� -s�
1 1�
Re!Jrassio11
- .
Residual
�
·-1ufi
5
105
� ce
_ c_a_ � F-
?-8_70_:17�668'
13 1755642,
0_574095934- -1.575.10696_3,
0.125482516
O.OUO!l155B!l
j
j
Total 110 tfi_(}46l4387 ,
15I
161 Goeffic;ients Sta.1dard Error t Stat P-value lower95% Upoe.r95% lo1�er'95. 0% Uaoor95.0%
JI l11terc_eP-t_ 0 27170:5457
- . 1)_07638$9 74 -3. 556$6747 0.0Q0:5€4G33. -0.4.231701317 -0. 12024023 7 -0.4£3170677 0 1:2024 023 7 -
.
22 stomw 0_34640:55!!4 0 0746776-01 4.63136111183 1.015l7E-05 (j)_ 1983337 0.4941477468 0. 1953337 0 4'34477468
23
24- j
25
j
TG RESIDUAL OUlfPUT
2f
23 I Observation Prerffclf!d !price Resirflillls
1 -9.,'°3�222353 -0.39226.Q.647'
2 0 . 03355i07!J8 -0.033-55()798
3 -02835'3'()425 0.355 85 t425 T
Next, we use the results of the estimated reduced form equation for ln(PRICE) to test the
significance of the daily dummy variables.
Equation (11.12), restated below, is our unrestricted model (for a review of the F-test, see Section
6.1):
rr12 + rr22M0Nt + rr32TUEt + rr42WEDt
ln(PRICEt) = (11.12)
+ rr52THUt + rr62STORMYt + Vtz
(11.13)
Simultaneous Equations Models 289
Go back to your fultonfish data worksheet. In the Regression dialog box, the Input Y Range
should be Bl:B112, and the Input X Range should be 11:1112. Check the box next to Labels.
Uncheck the box next to Residuals. Select New Worksheet Ply and name it Restricted Model.
Finally select OK.
Input
fnpt..rt 't_ Range: I sss1: sss112 [iJ
Input )\'. Range: I 5!Sl: 51511.1 (�]
tielp
� �<ibels 0 constant is �er-0
D Confidence Leve:r: EJ o;..
Output op trans
0 QutPl.Jt Range: �1
0 New Work:;heet �ly: I Restricted Model I
The result is:
'IARYA_OUTPUT I I I I I I I I
B c D E F
-HSUMr..
j -- ____ __ ---
G H
L
�
JI ,�f,[J_tess1cn Sl8lisl:Jcs
0)99416%9
Square
�""'""""
R
Square
Q_ 15953;1915:
fl.djusted R 0 1518.23217'
--nst.a_ndard Error_ o .. 351748447
$ Obse!'latians 111
9�
1 o�A�JOVA
�8egmssi on
I
11 1 df SS MS .,c
Srfl_nif1c1mce F
20.6899446 1 4()774E-05
Residual
2.5"599[14.152 2 . 5599-04.152
13 0.1237269>7
Total
109 l3.48G23J71
14 110 1 li.04514387 I
1"51
16"1 Uef!.er 95'%
�lritercept
Coefiicie.nts Standard Error I Stal ,0-value Lower95'% Lower95.0% Uoper950% .
-0. 2,903 3336 7 0.039574 792 ·7.335J20702 4_13476E·'11 ·0.358769316 -0.2118�7418 ·0.368769316 -0211697418
.
1ll stormy 0.:335262367 o.o 7.'.l7o 63s 41i4aii2oon 1.40774E-05 0 18917870.2 0,481346032 !l.1891 787 02 OAB13460.l2,
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it F-test.
J
I Insert Workil'leet [Shift-Fl!)�
Replace the following reference: [POE Chapter 6.xlsx]Unrestricted Model by Fish Reduced
Form Eq. for lnP. Also delete references to POE Chapter 6 attached to the Restricted Model
references to obtain the following modified template:
A B c
1 Data Input J=
2 N= ='Fish Reduced Form Eq. for lnP'!B8
3 K= ='Fish Reduced Form Eq. for lnP'!B 12+ 1
4 SSEu= ='Fish Reduced Form Eq. for lnP'!Cl3
5 SSER = ='Restricted Model'!C 13
290 Chapter 11
A B c
6 a=
7
8 Computed Values m1= =Cl
9 mz= =C2-C3
10 Fe= =FINV(C6,C8,C9)
11
12 F-test F-statistic= =((C5-C4)/C8)1(C4/C9)
13 Conclusion = =IF(Cl2>=C10,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reiect Ho","Do Not Reiect Ho")
With 4 restrictions, at a 0.05, the results of the F-test are (see also p. 460 of Principles of
Econometrics, 4e):
I a
-
A
Computed Values
I B
rn- =
I c
4
I D
A 8 c
1 Data Input J = 4
9
-
10
m;=
Fe=
105
2-45821
I
I
2 I� 111
11
=
3 K= 6 -
The joint F-test of significance of the daily dummy variables hasp-value of 0.65 so that we
cannot reject the null hypothesis that all these coefficients are zero.
This mean that, in this case, the supply equation is not identified in practice, and we will not
report estimates for it in the next section (for more details, see Section 11.7.2 pp. 458-460 in
Principles ofEconometrics, 4e).
We obtain the predicted values ln(PRftEt) from the estimated reduced form equation (11.12)
and insert them in the structural demand equation (11.9) to replace the ln(PRICEt) values.
Demand:
a1 + a2ln(PRftEt) + a3MONt (11.14)
+a4TUEt + a5WEDt + a6THUt + ef
Go back to your fultonfish data worksheet and enter the following labels and formulas.
Q R s T u
1 lnp-hat mon tue wed tho
2 ='Fish Reduced Form Eq. for lnP'!B29 =E2 =F2 =G2 =H2
Simultaneous Equations Models 291
Copy the content of cells Q2:U2 to cells Q3:U112. Here is how your table should look (only the
first five values are shown below):
Cl I R s I T I u
1 lnp-ha1 mora tue wed ahu
2 -0.03822
>----
1 0 0 0
3 ()_033551 0 1 0 0
f--
4 -o.2e3s:i 0 0 1 0
5�ll.124346- 0 0 0 1'
6 0.0747 0 0 ci' 0
In the Regression dialog box, the Input Y Range should be Dl:D112, and the Input X Range
should be Ql:Ul12. Check the box next to Labels. Select New Worksheet Ply and name it
Stage 2 LS Demand for Fish. Finally select OK.
-
��ession ------- � [gj
Input
InpcJt y_ R.onge; sDc&i; SDsi 12- �
DconBdenceLevel: �%
Ou:\pu:t opbons
O-Q.utµut R�nge: �1
G) New Worksheet 8_fy: I 2 LS Demand for Fish I
The result is (see also Table 11.5 on p. 460 in Principles ofEconometrics, 4e):
A I B I G I D I E I F I G I H I I I
_J._ suwi1ARY OUTPUT
2:
3 Reqress;on Stalrslics
-4:_ Mllltiple
R 0 439740445
5 R._Square
�- 0.19337 1 65-9
& Adjusted R Square 0 154960786
t
-
8 Ob-servations 111
9
10 ANOVA
11 df SS: MS F S,'911iffcaace ,''
_g_ Re-gression 5 1 1.70063233 2.340126466' 5.0Jfl2:94!77 0_900 356.107 I
While using this two-stage least squares approach yields proper variables estimates, the
accompanying standard errors are not correct. The correct standard error of the variable estimator
of ak is estimated using equations (10.4) and (10.5) restated below:
(10.4)
292 Chapter 11
where (10.5)
where a1, lrz 'lr3, lr4, as and a6 are the least squares estimates from equation (11.14).
Go back to your fultonfish data worksheet and enter the following labels and formulas. In the
last column, you will find the numbers of the equations used, if any.
w x
1 Demand for Fish, structural equation or
IV estimates using 2 SLS
2 arhat1v = arhatsta11:e2LS = ='Stage 2 LS Demand for Fish'!Bl7
3 Uz-hat1v = Uz-hatsta11:e2LS = ='Stage 2 LS Demand for Fish'!B18
4 U3-hat1v = U3-hatsta11:e2LS = ='Stage 2 LS Demand for Fish'!B19
5 U4-hat1v = U4-hatsta!!:e2LS = ='Stage 2 LS Demand for Fish'!B20
6 U5-haf1v = U5-hafsta11:e2LS = ='Stage 2 LS Demand for Fish'!B21
7 a6-hat1v = a6-hatsta11:e2LS = ='Stage 2 LS Demand for Fish'!B22
z
2
1 e-hat1v
2 =(D2-$X$2-$X$3 *B2-$X$4*E2-$X$5*F2-$X$6*G2-$X$7*H2Y'2
w x
9 N= ='Stage 2 LS Demand for Fish'!B8
10 K= ='Stage 2 LS Demand for Fish'!B12+1
11 O'-hatstaee2LS= ='Stage 2 LS Demand for Fish'!B7
12 se(arhaOsta11:e2 LS = ='Stage 2 LS Demand for Fish'!Cl 7
13 se(az-hat)sta11:e2LS = ='Stage 2 LS Demand for Fish'!C18
14 se(a3-haOsta11:e2LS = ='Stage 2 LS Demand for Fish'!C19
15 se(U4-haOsta11:e2LS = ='Stage 2 LS Demand for Fish'!C20
16 se(as-hat)staee2LS = ='Stage 2 LS Demand for Fish'!C21
17 Se(a6-hat)staee2LS = ='Stage 2 LS Demand for Fish'!C22
18 O"-hat1v = =SQRT(SUM(Z2:Zl12)/(X9-X10)) (10.5)
19 se(«1-hat)1v = =(X18/Xll)*X12 (10.4)
20 se(a2-hat)1v = =(XI 8/Xll)*X13 (10.4)
21 se(a3-hat)1v = =(X18/Xll)*X14 (10.4)
22 se(a4-hat)1v = =(X18/Xll)*Xl5 (10.4)
23 se(a5-hat)1v = =(X18/Xll)*X16 (10.4)
24 se(a6-hat)1v = =(X18/Xl1)*Xl7 (10.4)
Simultaneous Equations Models 293
The result is (see also standard errors estimates m Table 11.5 on p. 460 of Principles of
Econometrics, 4e):
"
Demand for Fish, structural nr IV estimates using 2 SLS e-hat1v 2
a1-h atl'.r = a1-h g... Ls = B. 5059'1 '1
1Zll-h'itt1;;r = l!l!2-h ats.ta-9"' .� L5 = -1_1 i9'42 0.071866
4 a3-hatLv =
1 LS = -0_0254
a.t-hatLv = !1.4 �h ats.tai;r.e :n s = �0.53(}77'
a5-hat1,v = as-hat,.t>g@:z u = -0.5'6635· 01.006722
a.G-h = �·-h ats.ta-f):e LS = 0.10>9267 0..347519
G.665611
N= 0.766534
K= ·0.192356
11 2 L_S = 0.10285
sef�-J-.at)_s.tag.e 2 LS = 0.11J.0846 n_oo4593
13 SE! la:i:-hat),.� 2 LS = Qt.334898
- -
3
1B a- h a t1v =
0. 70434:2: 0.00283
OJ-hats.t.ag:e 0241662
,_ se{a:1-hat}1v = 0.166167 2.762907
.s. 0. 1013S6
20
,_ = 0.428645. -- {L323631
-
-
& s-e{11.3 -hat}iv 0.214774 1.000532
=
7 a''tJ. \I
-h at.
?
I iv =
0.2:0& 0.305589
:8·
,_ se{us-hattiv = 0.212755. 0-.002:431
,91 111
Soe{«i;-h atl IV = CL208787 (}_522244
iO _,___
6i
O·-hats.ta-g:e G.681 7 9
12
0!.41492
,_ -·
I-
15'
-
ft!14-hal)s.i:ag;e
16
-
17 _:;e(":_:;-hat}s.t;ao_,g;e
,_
,_
19
,_
se{a2-hatt1v
-
21
,_
22 se{ll.4
-
23
1�
24
-
CHAPTER 12
CHAPTER OUTLINE
12.1 Stationary and Nonstationary Variables 12.2 Spurious Regressions
12.1.1 US Economic Time Series 12.3 Unit Root Tests for Stationarity
12.1.2 Simulated Data 12.4 Cointegration
Open the Excel file usa. Save your file as POE Chapter 12. Rename sheet 1 usa data.
Below we plot the time series of some important economic variables for the US economy as in
Figure 12.1 on p. 476 of Principles ofEconometrics, 4e.
E F G H
2 �gdp �inf �F �B
3 =A3-A2 =B3-B2 =C3-C2 D3-D2
Copy the content of cells E3:H3 to cells E4:H105. Here is how your table should look (only the
first five values are shown below):
294
Nonstationary Time-Series Data and Cointegration 295
E I F I G I H
2 6gdp 1.inf JiF .AB
--
!i 83.2' :1- -
0 79. -0.42
7 58.5 -1.27 -0_56 -0,92
Select the Insert tab located next to the Home tab. Select Al:A105. In the Charts group of
commands select Line, and Line again.
ll-D line
:S-catt<r 1Mr
� Cham·
Iii
After editing, the result is (see also Figure 12.l(a) p. 476 in Principles ofEconometrics, 4e):
US GDP 1984ql-2009q4
To plot the change in the US GDP series select cells E2:E105. After editing, the result is (see also
Figure 12.l(b) p. 476 in Principles ofEconometrics, 4e):
You can proceed similarly to replicate any of the other plots from Figure 12.1 p. 476 of
Principles ofEconometrics, 4e.
296 Chapter 12
Yt = PYt-1 + Vt (12.1)
Below, we generate our Vt and Yt values, similarly to the way we generated random samples in
Sections 6.6.2, 3 .1.4 and 2.4.4.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.
In cells Al:E3 of your simulated data worksheet, enter the following labels, values and
formulas. In the last column, you will find the numbers of the equations used, if any.
A B c D E
1 Yo= 0 Vt Yt
2 p= 0.7 =B2*Bl+D2 (12.1)
3 =$B$2*E2+D3 (12.1)
In column D, we generate a sample of 500 random numbers from a normal distribution with
mean 0 and standard deviation 1.
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
A Random Number Generation dialog box pops up. We need to generate one set of random
numbers for our random errors, so we specify 1 in the Number of Variables window. We would
like to generate 500 random numbers, so we specify 500 in the Number of Random Numbers
window. We select Normal in the Distribution window; the selected Parameters should be
Mean equal to 0, and Standard deviation equal to 1. Select Output Range and specify it to be
D2:D501. Finally, select OK.
Paramerers.
M�an=
�
�t;ndard deo:ialj0r1 = �
B.andom .Seo;:d;,
output opbnns
@ Qutput R.C!n ge·:
0 New Workshe&E_ly:
0 Ne�1' �crkbcok
After you copy the content of cell E3 to cells E4:E501, here is how your table should look (only
the first five values are shown below):
A I B c I D I E I
=
1 �u 0 lft Yt
2 e:.= 0.7 -0.86857 -0.86857
3 -0.70454 -'1.31.254
4
I
-
-0 . 34 472 -1.2635
5 0.9"14442 0.029994
6 0.171311 0.192307
Note: you will obtain a different random sample than the one we obtained, so your Vt and Yt
values should be slightly different than the ones reported above.
Select the Insert tab located next to the Home tab. Select El:E501. In the Charts group of
commands select Line, and Line again.
.H>Line
�
Cillum11 Lil!IE Pfe Bar Arca S cattcr Other
Charts·
Homt . � �
Ii
After editing, the result is (see also Figure 12.2(a) p. 479 in Principles ofEconometrics, 4e):
298 Chapter 12
4-
··2
·4
-5
Again note that since you obtain a different random sample than ours, your plot will be slightly
different than the one shown above. For the same reason, our plot and yours are also slightly
different than Figure 12.2(a) on p. 479 of Principles ofEconometrics, 4e.
Algebraically, it can be shown that, for the AR(l) model (12.1), the mean, vanance and
covariance of the time series Yt are:
(12.2)
(12.3)
(12.4)
(12.5)
(12.7)
The AR(l) model in (12.1) is a classic example of a stationary process with a zero mean. AR(l)
models fluctuating around a nonzero mean and AR(l) models fluctuating around a linear trend
are extensions to (12.1).
The special case where p = 1 in equation (12.1) leads to a random walk model. Extensions of the
random walk model are random walk with drift and random walk with a deterministic trend. In
contrast to AR(l) models, random walk models display properties of nonstationarity.
Examples of all those models are illustrated in Figures 12.2(b)-(f) on p. 479 of Principles of
Econometrics, 4e. You too can consider all those additional models by proceeding as we did
above.
Nonstationary Time-Series Data and Cointegration 299
Two independent random walks series, rw1 and rw2, were generated similarly to the way we
generated our AR(l) time series in Section 12.1.2. The data set is named spurious.
Open the Excel file spurious. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 12 in one file, create a new worksheet in your POE
Chapter 12 Excel file, rename it spurious data, and in it, copy the data set you just opened.
l
I � ['{
I 1nmt Warksheel! (S1'1Jft�F11),.
Select the Insert tab located next to the Home tab. Select Al:B701. In the Charts group of
commands select Line, and Line again.
Scatt•r Othe·r
�
Ctlart§ �
Ch.arts
After editing, the result is (see also Figure 12.3(a) p. 483 in Principles ofEconometrics, 4e):
Time Series
10 �������
60
50
Select the Insert tab located next to the Home tab. Select Al:B701. This time, in the Charts
group of commands select Scatter, and Scatter with only Markers.
S\'.atle'I
Char;
t ·"'G r;.
After editing (refer back to Section 2.1 if needed), the result is (see also Figure 12.3(b) p. 483 in
Principles ofEconometrics, 4e):
300 Chapter 12
....
f
Scatter Plot •
70
n� �,_� -
60 -
. ., .
�
so
q
. :-. ....
"'•.:;If:lj;r �,..,.
40
� II
..
i
..
30 1:
� ;i,. -:.�
20
!l
:HJ - do�
. · ':'-
(}
-1-(] 0 10 20 30 40 50
l
n.vz
- .... -
These time series were generated independently and, in truth, have no relation to one another, yet
when we plot them, as we have done above, we see a positive relationship between them.
Next, we estimate a simple regression of series one (rw1 ) on series two (rw2 ) :
(12.8)
In the Regression dialog box, the Input Y Range should be Al:A701, and the Input X Range
should be Bl:B701. Check the box next to Labels. Select New Worksheet Ply and name it
Spurious Regression. Finally select OK.
r� ---===--- - ��
- � .
0 Qutput Range�
® New. WorkSheet..eJy� I Spurious .Regressim1I I
I
�
A B I c I 01 I E I F I G I H I I I
1 SUMMARY OUTPUT
I
I
f
;
T
_3 I R<fl'rxr.ession Statistics
1
_4_ Multiple R 0_83960906; - ;
_5_ R Square 0.704943374'
;
___!__ Adjusted R S.qoare 0.704520657
1 Standard Error 8 _ 5572 &7989
8 Obsel\lations 700
)_1
10 ANOVA
}!I
1
I
!
df SS MS F Sig_nific:.1mce F
122n6-556'8 1
� RBgrnssicm 1 12'2116.5568 11Sli7 .&4-7606 3-5686E-18' 1
13 RE!sidual 698· 51112.33113 73.226:83543
;
14 Total 699 173228.887�
15 I '
16 I Goefficien Is Sfilndard E!Tor t Stal P-value lower95% Upper 95% l·ower 95.0% UeE_er9i5.0%
17 lnt_e-rcept 17.81804111 0. .£:204176(}3 2'8.716.65471 2.4G0'3E-1;m 16.5998149-8. 19.03626723 16.59981498 19_03626723
1ll rw2 0.84l04116 '0_0;20619645 40_83&84128 :r.5.fiB!iE-187 o. B 01 551::w 1 o_sai52512 -(i_8o155i2o1 o.8s2�2512
This result suggests that the simple linear regression model fits the data well (R2 = 0.70), and the
estimated slope is highly significant (tiny p-value). These results are, however, completely
meaningless, or spurious. The apparent significance of the relationship is false.
The Federal Funds rate (Ft) and the 3-year Bond rate (Bt) series exhibit wandering behavior, so
we suspect that they may be nonstationary variables. In addition, the series fluctuate around a
nonzero mean, so the appropriate Dickey-Fuller test equation is the one that includes a constant
term. Finally, following the procedures described in Sections 9.3 and 9.4 of Principles of
Econometrics, 4e, we find that the inclusion of one lagged difference term is sufficient to
eliminate autocorrelation in the residuals in both cases. The extended test equations are thus as
follows:
(12.9)
(12.10)
The null and alternative hypotheses of the unit root test for stationarity are:
Go back to your usa data worksheet. In cells J3:P4 enter the following labels and formulas.
302 Chapter 12
J K L M N 0 p
3 LiFt Ft-1 LiFt-1 LiBt Bt-1 LiBt-1
4 =C4-C3 =C3 =C3-C2 =D4-D3 =D3 =D3-D2
Copy the content of cells J4:L4 to cells JS:LlOS, and the content of cells N4:P4 to cells
NS:NlOS. Here is how your table should look (only the first five values are shown below):
j I K I L I M I N I b I p
3
-
ilFt F1-1 .:l.Fr.1 �s. Bt-1 .!l.Bt.1
4 0.83 10.55 0.87 0 12.64 1.45
-
5_ -2.12 11.39 0.83 -1.54 12.64 0
G -0.79 9.27 �2.1.2 -0.42 11.1 -1.54
-
_]_ -0 56. ' 8.4S -0.79 -o.Si 10.68 -
0 42: .
In the Regression dialog box, the Input Y Range should be J3:J105, and the Input X Range
should be K3:L105. Check the box next to Labels. Select New Worksheet Ply and name it
Dickey-Fuller Test for F. Finally select OK.
---
�" rn�
lnDUt
Input 'f. Ran�e: $J$3:$J$l05 LiitJ
[ Cancel I
i npu t� [!.ange: I $K$3: $L$105
!:!elp
0 Labe!s D Constant is i:_ero
0 Con[idence Level: �%
Output op lions
0 Q_utput Ran_ge: �I
0 New Worksheet !:'_ly; I Dickey-Fuller Test for F I
The result of estimated equation (12.9) is (see also p. 487 in Principles ofEconometrics, 4e):
A I B c I D I E I F G I H I I
-'--'-'- ·
1 SUMMARY OUTPUT
2 �
3 Regression St-atisucs
4 Multiple R 0.582724854
T R S·guare 0. JJ9!i68256
�f, Ar-.JOVA
1
1f df SS MS F S(qriificanc& F
12 Regression 2' 10.0.957157,'t 5.047857897 25.45091022i 1.20638E-09
13 �esidual 99 19.6353195 ()_ 198336?61
'14 Total 101 29.73103529
15
�
1s
lntoer"ept
ft-1
Coefficients- Standard Error
0. 172522121
- -0. 04462129·
f Stal
0"1002333()1 1 721205623
0.017614175 -2' 50481925
P-valcre
0.086337136
0.013883951
lo�1er 95%
c0.0W362489
-o .079 9 Ga47B
UllJ!.er95%
0.3714 Q 6B1
Lower95.0%
·0.0.26362489
-0.0092741 -0.07996847'8 -0.009274102
11,oe_er 95. (}%
0.371406731
Aft-1 0.561058175 0.08091!2746 G.928119927 4.36()2E-10 0 .4 () 0 3 70842 0. 7.21745501l 0.4 0037034 2 0.721745508
The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure
for more details on that refer to Section 12.3 of Principles of Econometrics, 4e) is -2.505, and
the 5% critical value for tau, Tc, is -2.86 (value found in Table 12.2 on p. 486 of Principles of
Nonstationary Time-Series Data and Cointegration 303
Econometrics, 4e). In this case since -2.505 > - 2 86, we do not reject the null hypothesis that
.
the series is nonstationary. In other words, there is insufficient evidence to suggest that Ft is
stationary.
In the Regression dialog box, the Input Y Range should be N3:N105, and the Input X Range
should be 03:P105. Check the box next to Labels. Select New Worksheet Ply and name it
Dickey-Fuller Test for B. Finally select OK.
Input
Input'[_ Ra�ge: I $N$3 :$N$105 @§1 �
[ Cancel . ]
Input o:; R�nge: I $0$3: $P$105 1�1
tj_elp
D Con[ide�ce Level: EJ %
Output options
0 Qutput Rarige: I 'u1cklly-Full.,r fo5t l�j
©New Worksl"teet f:'_ly: I Dickey-Fuller Test for B I
The result of estimated equation (12.10) is (see also p. 487 in Principles ofEconometrics, 4e):
A I B c I D I E I F G I H I I
1 SUMMARY OLJTPUT
�
2
_] Re-a1es:;;ion Slaus!K:;
4
--
MultipJe R °-3794 02286
2- R Square 0.143946095
6 Adjuste-0 R. Square
-"-
0,126.652077
7 Standard EFmr 0 .502618036
8 Observations· 102
-9
10 N�OVA
11 df SS MS F Significarrce F
-:12 Regr.essi on 2 4 .205427073 2'.102713537 8.3234614&6 0. 0004558 32
�
13 Residual
. 99 25_0098641 0_25�62489
14 Total 101 29.21529118 I
15
16 Coefficirmls Standard EITDr t Stat P-val11e Low.ef '95% Uf!E.r:!r95% LDl\'ef 9:5_0% I Uf!.f!.81' 95.D%
-
17 lntr.rcept 0_23687297 0.129173fl89 1 83376408'1 0_069693252 -0.019434455 0-4�H180396 -0.01943446 0-4lB180396
18 Bt-1 - 0 . 056241 169 0.0:10803115 -2'. 702847917 €L003091462 -0. 097523982 -0.01495.336 -fl.0975!11898 -0.0149533-57
0_290307786 O.OB960G852 3.239794507 0.001629198 0Ai3a·1on1,6
-·-.
The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-2.703, and the 5% critical value for tau, Tc, is -2.86 (value found in Table 12.2 on p. 486 of
Principles of Econometrics, 4e). In this case again, since -2.703 > -2.86, we do not reject the
null hypothesis that the series is nonstationary. In other words, there is insufficient evidence to
suggest that Bt is stationary.
Since we found insufficient evidence to suggest that Ft and Bt are stationary, we are interested to
determine whether these series can be made stationary by taking their first difference. If we can
establish that this is the case, then these series would be integrated of order 1, or I(l). In general,
the order of integration of a series is the minimum number of times it must be differenced to
make it stationary.
304 Chapter 12
Because the series /),_Ft and /),_Gt appear to fluctuate around zero, to test the first difference of the
Federal Funds rate (!),_Ft= Ft - Ft_1) and the first difference of the Bond rate (!),_Bt Bt - Bt_1) =
(12.11)
(12.12)
Go back to your usa data worksheet. In cells M3:M4 and Q3:Q4 enter the following labels and
formulas.
M Q
3 d(�F)t d(�B)t
4 =G4-G3 =H4-H3
Copy the content of cell M4 to cells M5:M105, and the content of cell Q4 to cells Q5:Q105.
Here is how your table should look (only the first five values are shown below):
M Q
3 !i.l�F}t -
a ll.IJ.Bl't
4 -0. 04 4 -1.45
.5 2. 9 5
- 5 -1.54
s; 1 . 3 3. . 6 -
1.12
7 0.23 --
7 -0.5
8 D.54: 8 0.45
In the Regression dialog box, the Input Y Range should be M3:M105, and the Input X Range
should be L3:L105. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Dickey-Fuller Test for changeF. Finally select OK.
' . -
Output options
The result of estimated equation (12.11) is (see also p. 488 in Principles ofEconometrics, 4e):
Nonstationary Time-Series Data and Cointegration 305
A I 8 I a I D I E I F G H
lll-
·1-
f---
us -�-
., M-A-RY OUTPUT'
2
on_ _l _t
s i_
�3-��� -� -�-s- a·_ - -s��
i s�
R a S -
4 Multiple R 0-47�20942
�� RSquare (')22%41658
6 Adju�led R Square 0.219740678
7 Standard Error 0..457�0180.5
T ObseNai.ions i02
9
WANOVA
11 I _dt SS MS F Siari'if'icm1ce F
12 Regression , 5.304559392 6.3045593,92 JQ_·HJ781804 3.09269E-07
JI Resi�ua.1- ·m 21.14934061 0:209399412
14 Total 102. 27-45391
15
16 CoeffiGienlS' Standa.rd Error t· Sfal P-value 1.. OL'l'er 95% Upper 95% Lower .95_ 0% Upper 95_ 0%
17 I nterc e pt__ 0 #NIA #N/A #NIA #NIA #NIA #NIA #NfA
18 ;l,,ft-1 -0.4469-86047 0.0814·61861 - 5 4810 59 1 43
_ 3.0409E-07 -0_&085S446 -
0 28 5 387 6 33
. -0Ji085844.& -0.2-S.5387633
The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-
5 4 8 7 , and the 5% critical value for
. tau, re, is -1.94 (value found in Table 12.2 on p. 486 of
Principles of Econometrics, 4e). In this case since -5.487 < -1.94, we do reject the null
hypothesis that the series tlFt is nonstationary and accept the alternative that it is stationary.
In the Regression dialog box, the Input Y Range should be Q3:Q105, and the Input X Range
should be P3:P105. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Dickey-Fuller Test for changeB. Finally select OK.
. �������-- --
Regression r1] �
Input
Input-):'. Range: j$Q$3:$Q$105 �
OK tJ
Cancel ]
Input:� Range: l$P$3:$P$105 �
.tjelp
�s_abe;ls �Const.ant is. �ero
D Con[idence Level: �%
Output option �
The result of estimated equation (12.12) is (see also p. 488 in Principles of Econometrics, 4e):
306 Chapter 12
I
J_, A
___ __ B I c I D I E I F G I H I
�SUMMARY OUTPUT
JI Re_g_re.ss10FJ Sta6sf1cs
_4_1Mul!iple R 0.60629.3811
5 R Square_ 0.367.592185
t-g- Adjusted R Square 0.35-7691195
,_!_ Standard Error 0-522402752
8 Obsewation s 102
:
io-1ANOVA :
rif SS MS F Sigpificanc� F
]�e'' "" '"" 1 16.0214318:� 16.02143188 56J07071_3J 1.20231 E-11
t3 Residual 101 27.56336812 0-272904635
f4 Tota l 102 43.5848
15
16 Coefficienbs Slamfard Error I Sfaf P-v'a/.ue Lower95% Ue_e_er95% L.ower 95. 0% Upped!5. 0'3'o .
17 lnte.rc.ept_ 0 #N/A #NIA #1'-l/A #N/A #N/A #NIA #NIA
,_
t8 ll.Bt-1 .D_7 0179559 0.091593563 -
7 6620 53 99 3
. 1 14557E-11 -0. 883492774 -0.520098406. -0. 883492774 -0:.520098405
The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-7.662, and the 5% critical value for tau, Tc, is -1.94 (value found in Table 12.2 on p. 486 of
Principles of Econometrics, 4e). In this case since -7.662 < -1.94, we do reject the null
hypothesis that the series fiBt is nonstationary and accept the alternative that it is stationary.
These results imply that while the Federal Funds rate (Ft) and the Bond rate (Bt) are
nonstationary, their first differences, fiFt and fiBt , are stationary. We say that the series Ft and
Bt are integrated of order 1, I(l).
12.4 COINTEGRATION
As a general rule, nonstationary time-series variables should not be used in regression models, to
avoid the problem of spurious regression. However, there is an exception to this rule. If Yt and Xt
are nonstationary I(l) and their difference, or any linear combination of them, such as et = Yt -
{31 - {32xt is a stationary I(O) process, then Yt and Xt are said to be cointegrated. In other words,
in this case, there is a fundamental relationship between these two variables, and an estimated
regression between them is valid and not spurious.
We have already established in Section 12.3 that Ft and Bt are nonstationary. Now, we would like
to test whether these series are cointegrated. The test for cointegration is a test of the stationarity
of the residuals et = Bt - b1 - b2Ft, where b1 and b2 are the least squares estimates of the
regression of Bt on Ft.
(12.13)
In the Regression dialog box, the Input Y Range should be Dl:DlOS, and the Input X Range
should be Cl:ClOS. Check the boxes next to Labels and Residuals; uncheck the box next to
Constant is Zero. Select New Worksheet Ply and name it Regression of Bon F. Finally select
OK.
Nonstationary Time-Series Data and Cointegration 307
. - -
D Con[jdence Level: �%
Output options
The result of estimated equation (12.13) is (see also p. 489 in Principles ofEconometrics, 4e):
1- e-
A I B _L_ C I D I E I F I G H I
j_
S UMMARY OUTPUT
2 � ��
3 -r- Re-a-�-ss -i o-n·-S-la-Hs-it-cs
-� -
4 Multii:ile R 0.945824�9'2
,IR Square 0·.8945$4726
_§_ Adju�!ed R S_quare 0.8·91551243
7 Stan,d ard E·rror o.810·1t301&s
'8 Observations 104
-fo-1ANOVA
11 I rif SS MS F Significance F
12 Re11r.essio11 1 568-1739601 568_17.39601 1155. 6017148 t22562E -51
' 13 Resiaua/ 102 55_95'197449 0.556391907 --- -
1
1'4 Total · 103 635:1259346
15 ;--��������������������������������������
-11
�6 J��������
C=
o e�
m
1�c�W!=1t�s.-=-S!=
an
� d=a ro
"'-"'E=1ro=r��tS=ta=t���P
_"V
=a�
lu� e ��L=
o�
, � ie�r �
9 5�%�
· �U
=p=
p=
��9�5 %
�o �L=aw
= e�r �
9 5=
.0�
%�U
=g�
o
o�r=
95�
.0�%
o�11
_1Jjlntercept 1 13982Q659 Q_ 174083328 G.547609544 2.3992'6E-09 0.7945362:13 1.48512310,5 0.794S36.213 1.485123105
1Bl F 0 . 9 1 44 113�7 0.031.080112 29.42111002 122562E-51 0,852'764144 0.976058651 0.852764144 0.9'76068651
19
20
I
I�
22 R_ESJDUAL OUTPUT
23
24 I Observ.alion Predicted B Residuals
1 10.01Hl47&1 1 . 18952390'1
2 10 796G140 1 1 .843985985
3 11.554!:,?547 4 .0115024526
The test for stationarity of the residuals is based on the test equation (12.14) which follows. This
is the augmented Dickey-Fuller version of the test equation (12.7) found on p. 489 of Principles
ofEconometrics, 4e. It includes one lagged term Llet-l to correct for autocorrelation.
(12.14)
s T u v
1 e-hatt
2 ='Regression ofB on F'!C25
3 ='Regression ofB on F'!C26 �e-hatt e-hatt-1 �e-hatt-1
4 ='Regression ofB on F'!C27 =S4-S3 =S3 =S3-S2
Copy the content of cells S4:V4 to cells SS:VlOS. Here is how your table should look (only the
first five values are shown below):
s T u v
1 e-hat1
2 •1.189524
3 1.843986 c.:e�hat1.. e-hati.1 ll.·e-ha!i.1
4 '1.0135025 -0.75896 1.84398& 0.654462
5 1.483577 0.398552 1.08.5025 -0.758·96.
6 1.785962 0.. 302385 1.4!l35n
7 1.378032 -0.40793 U85S62: 0.30,2385'
8 0.9'2632 -0.45171 1.378032: -0.40791
In the Regression dialog box, the Input Y Range should be T3:D105, and the Input X Range
should be U3:V105. Check the boxes next to Labels and Constant is Zero; uncheck the box next
to Residuals. Select New Worksheet Ply and name it Cointegration Test. Finally select OK.
- - - ---
i Reg,ressio n
rul8J
Jnput
The result of estimated equation (12.14) is (see also p. 489 in Principles ofEconometrics, 4e):
I
\SUMMARY
1
A
OUTPUT
B I c I
i
D I E I F G I H I I
T'
4
R�aression Statistics
Multiple R 0.410996241
I
� R Square 0.16891791
1�ANOVA
11 l df SS MS f Signffi'r;ance f
..:1L _Regr.ession 2 3.539'073191 1.7695 36595 10.15252859 9 .67&5·6E-05
13 Residual 100 17.41236524 0.174123652 -
w2 20�95143843.
-··
14 Total
'
1!.i I
is I CoeffiGien/s Slanda�d fuar I Slat P-va/u.e LoVl'er95% Uef!.er95% Lov;eI 95. 0% Veger95Q%
_Jl_ lnle!:E�pt 0 #NII'\_ #NIA
-t·
#NIA #NIA #N/A #N/A #N/A
_1A_ e-hatt-1 -0.224509324 0.053.503858 -4 .1 % 133318 5.88749E-05 -UJ0659451 ·0.118359196 -U30659451 -.0.116359196'
19 .::ie-hatt-1 0 .254044805 0.09}700632 2.711236981 O.OOTB9·HJB3 4
0.0&8'1 5425 0.43:9944165 0.06.8145426 o.43.9944185.
Nonstationary Time-Series Data and Cointegration 309
The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-4.196, and the 5% critical value for tau, Tc, is -3.37 (value found in Table 12.4 on p. 489 of
Principles of Econometrics, 4e). In this case since -4.196 < -3.37, we do reject the null
hypothesis that the residuals are nonstationary and accept the alternative that they are stationary.
This implies that the Bond rate and the Federal Funds rate are cointegrated. In other words, the
regression relationship between them, estimated above, is valid.
CHAPTER 13
CHAPTER OUTLINE
13.1 Estimating a VEG Model 13.2.2 The VAR Model
13.1.1 Test for Cointegration 13.3 Impulse Responses Functions
13.1.2 The VEG Model 13.3.1 The Univariate Case
13.2 Estimating a VAR Model 13.3.2 The Bivariate Case
13.2.1 Test for Cointegration
Open the Excel file gdp. Save your file as POE Chapter 13. Rename sheet 1 gdp data.
Insert a new column to the left of the column labeled usa. In your new cells Al:AS, enter the
following label and values.
A
1 q*-year
2 ql-1970
3 q2-1970
4 q3-1970
5 q4-1970
Select cells A2:A5, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell A125.
310
Vector Error Correction and Vector Autoregressive Models 311
I 1
A
o'-vear
2 q1-lS70 �
3 .q2-1970
4 q3- 197 0
5 o4-19·70 +
i; I
Excel recognizes the series and automatically completes it for you. Here is how your table should
look (only the last five values are shown below):
A
121 q4-1999
122. ql-2000
123 q2-iooo
114i qJ-2000
125- q4-2000
Next, we plot the time series of the quarterly real GDP of Australia and the United States for the
sample period 1970 to 2000.
Select the Insert tab located next to the Home tab. Select Al:C125. In the Charts group of
commands select Line, and Line again.
After editing, the result is (see also Figure 13.1 p. 502 in Principles ofEconometrics, 4e):
30
Q 111 0 L<°I 0 111
"- ... 00 00 CT\ ""
CT\ en en 01 GI GI
::rr
.,.; ..... ..... .... .....
....
It appears from the figure above that both series are nonstationary and possibly cointegrated.
Formal unit root test of the series have confirmed that they are indeed nonstationary.
312 Chapter 13
We first estimate the regression of Australia's GDP (A) on the United States' GDP (U)-the
intercept term is omitted because it has no economic meaning:
(13.1)
In the Regression dialog box, the Input Y Range should be Cl:Cl25, and the Input X Range
should be Bl:B125. Check the boxes next to Labels and Constant is Zero, and Residuals.
Select New Worksheet Ply and name it Regression of ausGDP on usaGDP. Finally select OK.
Oujput opbcns
0 QutputRange: �1
@Ne,,� Worksheet E'.I)': J' ausGDP on usaGDP I
0 New �orl<book
Residuals
�l��-�id_��-i�j' D Resic:J.ual Plots
The result of the estimated equation (13.1) is (see also p. 502 in Principles ofEconometrics, 4e):
A B c I D I E I F I G I H I
1 SUMMARY OUTPUT
j
T ------
3 Rec:ression Slatrstfcs
!__Multiple R 0.999626204
5 R Square 0.9!19652439' ___....__
.._ ------1-----;�--___,!--� 1
'& Adjusted -·a s91s2235s
�
R SquilrB
J Standard Error 1 21937:t742 I
j
6 Obser1ation.s 124
..
""
9 i1 °
MOVA
1
11 df SS MS F Si1J'1"1itic�r.·ce F
,�I Re,gression 1 526014.2115 5.26014.2115 3537716996 4.4122E-213 1
I
13 Re5idual 123 1B2.8855951i� 1 486B747G1
j
1l4 Tota l 124 526197 09?1
15
16� ! ------ -
C-�--
e��-
w-rit_
s
_
S_-
t� ;_ ro_E-
d a_ -
1r
w --t - at-- -P
St- - 1e --
l u-
-·-- L o-�-
�-
r 9_
5_
%
_
_ o�-
U__ r-9�-
%- --
, . -L o w-
� e r_5_-
9_ -
0%� -u
- o__ 9-
er_o. -�
0%o
r
g p
_1_7 [ Intercept 0 :IN/A 1Ni'A #NIA #r11/A ;'IN/A :'FNJA �N/A
1alusa 0.985349542 0_001G56642 594.Ti371045 U477E-214 0.9B20703°2 0.988628764 0.98207032.. 0 �886287G4
1��
20
21
4
Select the Insert tab located next to the Home tab. Select cells C25:C148 from your Regression
of ausGDP on usaGDP worksheet. In the Charts group of commands select Line, and Line
agam.
2-() Line
After editing, the result is (see also Figure 13.2 p. 502 in Principles ofEconometrics, 4e):
To compute the first order autocorrelation use the CORREL function as you have done in Chapter
9. In cells E24:E25 of your Regression of ausGDP on usaGDP worksheet, enter the following
label and formula.
E
24 ri
25 =CORREL(C26:C148,C25:C147)
r1
0.871647553
Again, note that your Excel results differ slightly from the one reported m Principles of
Econometrics, 4e (see Section 9.2.lb for more details on that).
The test for stationarity of the residuals is based on the Dickey-Fuller test equation (12.7) found
on p. 489 of Principles ofEconometrics, 4e. It is restated below:
(13.2)
E F G
1 e-hatt
2 ='Regression of ausGDP on usaGDP'!C25 �e-hatt e-hatt-1
3 ='Regression of ausGDP on usaGDP'!C26 =E3-E2 =E2
Copy the content of cells E3:G3 to cells E4:G125. Here is how your table should look (only the
first five values are shown below):
E F G
e-hat1
2 0.49552'7 J'le.hat1 ·e.hat1.1
3 0.943886 0.448359 0.495527
OJ),24073 -0.31981 0..9438.861
0.532725 0 624CJ13
-0.37954 1.156798
- 0 1 4 0 74
. 0 i17i63
In the Regression dialog box, the Input Y Range should be F2:Fl25, and the Input X Range
should be G2:G125. Check the boxes next to Labels and Constant is Zero; uncheck the box
next to Residuals. Select New Worksheet Ply and name it Cointegration Test for GDPs.
Finally select OK.
� -
'
Regressio n 12]�
Input
I
OK
Input!'. Range: 5F52:5FS125
� Cancel
Input l!'. Range: J sc;s2: sx:;n2.s
�
!ielp
0�abels 0 Constant 1s :i;ero
0 Con5dence level: �%
Output o_pban.s
0 Qulput R.ange:: �1
© Nl'=W \l\lorksh.eet ['.:ly: J �ration Te�t for GDPs
I
The result of estimated equation (13.2) is (see also p. 502 in Principles ofEconometrics, 4e):
A I B I c I D I E I F I G I H I I I
SUMMARY OUTPUT
I
,...L
2
3 Reoress;o1r Slalisfrc;s
The I-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-2.889, and the 5% critical value for tau, Tc, is -2.76 (value found in Table 12.4 on p. 489 of
Principles of Econometrics, 4e). In this case since -2.889 < 2.76 , we do reject the null
-
hypothesis that the residuals are nonstationary and accept the alternative that they are stationary.
This implies that Australia's GDP and the United States' GDP are cointegrated. In other words,
the regression relationship between them, estimated above, is valid.
According to the estimated equation (13.1), if the United States' GDP increases by one unit, the
GDP of Australia would increase by 0.985 of a unit. But the Australian economy may not
respond fully by this amount within the quarter. To ascertain how much it will respond within a
quarter, we estimate the vector error correction model.
The vector error correction model (VEC model) for Australia's GDP (At) and the United States'
GDP (Ut) is as follows:
(13.3)
(13.4)
where et-i are the lagged residuals from estimated equation (13.1).
I J
2 �usa �aus
3 =B3-B2 =C3-C2
Copy the content of cells I3:J3 to cells I4:J125. Here is how your table should look (only the
first five values are shown below):
I I J I
2 l'l.US11 ti.a us
-
3 0_0723 0_5196
,_
4 0.340297 0.015499
5
...---
-0_4146 0 124199
6 1-062401 0_667301
7 0.222099 0.078103
'-
In the Regression dialog box, the Input Y Range should be J2:J125, and the Input X Range
should be G2:G125. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VEC Model Eq. for ausGDP. Finally select OK.
316 Chapter 13
Input
Input '(Range: I s.m:sJs12s t�l
19$¥:SG5m �
t!elp
0labiols, D C onstant iS G_e:ro
r;::;;--1 "
D Conjjdence Levef: � fol�
Outplltoptions
0 QuipL1i Rarige: �1
@NewWorkBl1eet.f'.ly: I ode! Eq, kw ausGDP I
The result of estimated equation (13.3) is (see also p. 503 in Principles ofEconometrics, 4e):
A I B I c I D I E I F G H I I I
�[ s:DJ�1MAP.Y OUTPUT !
2 l· ;
_l__I Reg_ressron SIEr11stics
_L Multiple R O_ JB55�233
_6_ R Square O _ Q 34�,3�4 .
6 A.djusted R SqLiare 0.026453511 -·
T Standard Error
-
0:640876564 -
8 OIJ.servations 123
,.. .
io-1ANOVA
11 df SS MS ,� Sfonificance ,c ·-
'
12 Re:gressiun 1 1 7722392'18 1 772239218 4. 3 1502.2'34 91 0_03989'2949
13 Residual 121 49.69/7716.5 D.410°i'25386
'
14 Tots I 122 51 47006037 I
1.5
ii I
1 Coeffic1er;ts :Sta11d;;rd Erro1 f Stat P-value lo11w95% u;g_per9o% Lower95 0% Uoe_er95 0%
*1�1erc. ept OA91705B74 0_05-790946-9 8.490940752 6 12439E-14
0 039 892-�·:,fa
0. 3 7705 880 7 0_606352942 0.377058807 0_606352942
-0.. 19277:2639 -0.00463275fl
·hatt·1 -0. 098 702 599 0.04 75-1574�1 -2° 01t21f31a1 -0.192772639 -0 Otl46�27S�
In the Regression dialog box, the Input Y Range should be I2:1125, and the Input X Range
should be G2:G125. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VEC Model Eq. for usaGDP. Finally select OK.
,---���---
-- ---- ---
� -_,----
- -
Regression [1]�
Input
1nput '!.. Rar.ge: Js.1s2.J$i:i;l2.5 �
'Input 1 Range: I %S2:'SG5125 (00]
t:J.elp
�Labels D Constant is t_ero
D Confidence LewEil: � O J"
L::..:...__J "
Output option�
0 Qutput Range:: �1
0 Me•N Worksheet �ly:, I odel E q . fqr uMGDP I
The result of estimated equation (13.4) is (see also p. 503 in Principles ofEconometrics, 4e):
Vector Error Correction and Vector Autoregressive Models 317
A I B I c I D I E I F I G I H I
_L SUMMARY OUTPUT '
2
l
J Rewession Slalislrcs
...
4 Multiple R 0.07161£1325
-- ...
5 R Square 0.005129323 '
-if Adjusted R Square -o ooio92i44
7 Sta 1 d a rd Error
1 0 5165 68014
g Obser1atio11s 123,
'
-
9
'
IO AN OVA I
11 I cff SS MS F Si.Qflfficarrce F
'
12: Regression 1 O 1GG469c'.32I 0 1664693.21 o._?.?3�da5B 0 431165838
13 Rasidual 121
.
'32 2879 ,(41 0 2!56342513
'
Open the Excel file /red. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 13 in one file, create a new worksheet in your POE
Chapter 13 Excel file, rename it fred data, and in it, copy the data set you just opened.
Insert a new column to the left of the column labeled A. In your new cells Al :AS, enter the
following label and values.
A
1 q*-year
2 ql-1960
3 q2-1960
4 q3-1960
5 q4-1960
Select cells A2:A5, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell A201.
I
A
1 q*-yeai:-
2 ql-1%0
3 q2-1960
4 q3-1960
5 lq4-1960
-f-
-" I
Excel recognizes the series and automatically completes it for you. Here is how your table should
look (only the last five values are shown below):
318 Chapter 13
A
197 q4-l(HJ8
r-
198 q!-20(19
I--
199
,__
q2-20(19
200 q3-20CJ9
201 q4-20Cl9
Next, we plot the time series of the quarterly log of Real Personal Disposable Income (denoted as
Y or ly in your Excel file) and log of Real Personal Consumption Expenditure (denoted as C or le
in your Excel file) for the US economy over the period 1960: 1 to 2009: 4.
Select the Insert tab located next to the Home tab. Select Al:C201. In the Charts group of
commands select Line, and Line again.
.2-D LiITT;,
After editing, the result is (see also Figure 13.3 p. 504 in Principles ofEconometrics, 4e):
7_5
7.2
"' "' 0 "' 0 If"\ 0 "' 0 U1
"' "' ID ((] "'
- "'
lD •JJ m D D
"' Ci "" "' "" "' "" 0 0
,..; rl rl ,,..; rl rl ,..; r;' '";' N
� ..-'< ...; ..... ...; ,; ..... rl rl rl
cr cr cr er er rr er tT rr 0-
·�·
It appears from the figure above that both series are nonstationary.
We first estimate the regression of the log of Real Personal Consumption Expenditure ( C) on the
log of Real Personal Disposable Income (Y) for the US economy:
(13.5)
Vector Error Correction and Vector Autoregressive Models 319
In the Regression dialog box, the Input Y Range should be Bl:B201, and the Input X Range
should be Cl:C201. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it Regression of C on Y. Finally select OK.
"' � ----
-=- �-
Regressio n ll]�
Input
Input l: Range: I $B$1:$B$201 l_______Ql<_ti]
[ Cancel J
Input c; Range: 1$C$1 :$C$201 [�l
tielp
l!'.J Labels D Constant is ;::era
Oi:on[idencelevel: �%
output opt! ans
o·Q.utput Range:
The result of the estimated equation (13.5) is (see also p. 503 in Principles of Econometrics, 4e):
A I B I c I D E I F I G I H I 1
1 SUMMARY 0 TP T I
2 I
J Regre�sio11 Statisti� -
4 Multiple R ().99-�198794 I
-
.5 R Square {).99'8398229' I
-
1
6 A�us.ted R Square ().998390139' I
10 A NOVA I
11 df SS MS f Significonc:e F I
-
12
u
Regre�sion
Residual
1
198.
47.8010&067
0.0766&9215
47.80108067
0.000387'11'9
123415.1893 l.02-061E-2 78
-
l
-
I
Total 47.87776'988;
14 199>
I
15
-
20 I
21
-
ll RESIDUAL OUTPUT
_,_
23 I
24 Observation Predicted .le Resrduols· I
-
25 1. 7�441661973 0.03735.$027' 1
26 2 7.447258738. 0.044331262
--
27 3 7.4411153227 0.039468773.
J
The test for stationarity of the residuals is based on the Dickey-Fuller test equation (12.7) found
on p. 489 of Principles of Econometrics, 4e. It is restated below (and includes the extra term
Llet-1):
(13.6)
320 Chapter 13
E F G H
1 e-hatt
2 ='Regression of C on Y' ! C25 �e-hatt
3 ='Regression of C on Y' ! C26 =E3-E2 e-hatt-1 Ae-hatt-1
4 =E3 =F3
Copy the content of cells E3:F3 to cells E4:F201 and cells G4:H4 to cells G5:H201. Here is
how your table should look (only the first five values are shown below):
/" E I F I G I 11
1 e-hat1
-
2 0.037355 1le-ba_tt
-
In the Regression dialog box, the Input Y Range should be F3:F181, and the Input X Range
should be G3:H201. Check the boxes next to Labels and Constant is Zero; uncheck the box
next to Residuals. Select New Worksheet Ply and name it Cointegration Test for C and Y.
Finally select OK.
. Regre-s-
' s i-
o n _________
-
--- - ��
Input
Input Y. Range:
lnput·lj_ Range:
I $F$3:$F$201
I $G$3:$H$201
�
[�]
� I
.t!elp
0!..abels �Constant is �ero
D Con[idence Level: �%
, 0-utput options
0 Ql..!tput Range: I �1
Test for c and
®New Worksheet Ely: I 1 YI I
The result of estimated equation (13.6) is (see also p. 503 in Principles ofEconometrics, 4e):
Vector Error Correction and Vector Autoregressive Models 321
A I B I c I D I E I F I G I H I I
sv'MMARY iJ VTPUT I
� I
t � r r + j
3 R'egn!'ssion S,tatistics
I
4 MultipleR 0.388.877197
t- i
5 R Square 0.151225474
r-- j
6 AdjtJsted R Square 0.141792951
7 Stanc!a rd Error 0.008:18920.8
r--
& Observations 198
I
j
�MAN OVA I I
t-11 I df SS MS F Significance F
12 Regre>'sio·n 2 0.00,2341922 0.00117()961 17.46058111 1.05-786E-07
15 I
.16 J Coefficients Strmdrircl Error t Strit P-valve Cower9S% Upper95.% Low,er 95.0% Up-per95.0%
17 Intercept I() ttN/A tlN/A li!N/A �/A flN/A li!N/A t!N/A
18 e- h att-1 - 0.087647619 0. 0-305()8415 -'2.8.7289975 0.00/151539'5 -0.147814521 0. ()27480117 0 . 147814521 -0.027480717
-
- -
The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-2.873, and the 5% critical value for tau, re, is -3.37 (value found in Table 12.4 on p. 489 of
Principles of Econometrics, 4e). In this case since -2.873 > -3.37, it indicates that the errors
are nonstationary and hence the relationship between C (i.e., ln(RPCE)) and Y (i.e., ln(RPDI)) is
spurious. That is, we have no cointegration. Thus we do not apply a VEC model to examine the
dynamic relationship between the log of Real Personal Disposable Income Y and the log of Real
Personal Consumption Expenditure C. Instead we estimate a VAR model for the set of I(O)
variables {�Yt, �Ctl.
The vector autoregressive model (VAR model) for the log of US Real Personal Disposable
Income (Yt) and the log of US Real Personal Consumption Expenditure (Ct) is as follows. For
illustrative purposes, the order of the lag in this example has been restricted to 1.
(13.7)
(13.8)
I J K L
3 .l'.\Ct L\Yt .l'.\C1-1 L\Yt-1
4 =B4-B3 =C4-C3 =B3-B2 =C3-C2
Copy the content of cells 14:L4 to cells IS:L201. Here is how your table should look (only the
first five values are shown below):
322 Chapter 13
I K L
___!__ liC1 �C,_1 1'.Yi1.1
.6.Y!
-
4 -O_OCH96S 0_000864
.
·�
O_OL573 0_0054fr6
s I
J.OOU43 -'il00061 -0_0039'68 0_000864
-
6 -OJJD-028 0.009'061 0.001343 -0.00061
In the Regression dialog box, the Input Y Range should be J3:J201, and the Input X Range
should be K3:L201. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VAR Model Eq. for Y. Finally select OK.
----- ----- -
I Regression �tBJ
lnput
Input y_ Range: $J$3: $J$20l
0 Qutput Range:
©New Worksheet !:,ly: I VAR Model Eq, for V
I
The result of estimated equation (13.7) is (see also p. 504 in Principles ofEconometrics, 4e):
I
�M�UTPUT I
I A B I c D I E I F I G H I I I
t
9 I Regression Stotistics
4 Mllltipfo R \>.334387691
-5 11 Square {l.111315128
-
-
7 Standard Eri'or Q.008Sfil528'.
8 Observati ans 198
9
-
10 A NOVA t I
Iii I rlf 55 MS F Slgnificonrf!f
-�I"'"'"" I
2 '().001799428 0.000899714 12.27444346 9.52969E-06
--
14 To-tl
i l 197 -0.010092881 I
15
-151 Coefficients Standard Error t Sta< P-vol!u!· Lower95% Uppee-95% l.ower-95.0% Upfi!er95.0%
17 Intercept -0. 00 6'0'35673 -0.IJ(}M&fi078 -6..12-1:1-03131 4. 98959E-09 0.004091927 0.0079 8142 0.0040919>2'7 0.0079&142'
18 .1Ct-1 o.:475427604 0.097320409 4.884&77702 2.15226E-06
-
0.283480071 0.567375137 0.283400071 0.667375137
19 t.Yt-1 -Q.2171G7947 �.075172994 -.2..888900094 0.00'4303069 -0.365424427 -0.068911466 -0.3654.24427' - 0.068911455
In the Regression dialog box, the Input Y Range should be I3:I201, and the Input X Range
should be K3:L201. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VAR Model Eq. for C. Finally select OK.
Vector Error Correction and Vector Autoregressive Models 323
-
- rlJ�
-
· � r�-----
Input
Input 't Range: [�] Lil';]
I Cancel J
Input� Range: $K$3:$L$201
�Labels �
0 Constant is �ero
tielp
0 Qu_tput Range: I �1
® lllew Worksheet E'ly: [ vAR Model Eq. for cJ I
The result of estimated equation (13.8) is (see also p. 504 in Principles ofEconometrics, 4e):
A I B I c D I E f I G I H I I
�SUMMA RV OUTPUT I
5 R S,quare 0.12.0487027
-
-
6' Adjusted RSq1.iare 0.111466381
7 Standard Error 0.00:6575419
-
10 AN OVA t I
11 I SS MS F Significance F
df
_g Regression 2 IJ.0011549-93 0.0005774'97 "l.3.3$6S{f713 3. 66117E-C>6
13 Residual 195 Qi.008431046 4.32351E--05
-
Yt = PYt-1 + Vt (13.9)
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.
A B c D
1 p= 0.9 Yt
2 1
3 =$B$1 *D2
Copy the content of cells D3 to cells D4:D31. Here is how your table should look (only the first
five values are shown below):
A B c D
p= 0.9 'it
2·
J 0_9
4 O_B 1
5 0.729
5 0_6561
Select the Insert tab located next to the Home tab. Select Dl:D31. In the Charts group of
commands select Line, and Line again.
.2:·D Line
Bar
Oiarh
k.:a S°[atter Otl')-:r
Chart§•
r;; I� � l;2
I
After editing, the result is (see also Figure 13.4 p. 505 in Principles ofEconometrics, 4e):
0 .8.
..
0'.6
DA
. .0.2
!Q
1 5 9 17 25 29
Vector Error Correction and Vector Autoregressive Models 325
(13.10)
where the errors vi and vf are independent of each other (contemporaneously uncorrelated);
vY-N( 0, a; ) and vx-N(O, a;).
In this case, there are two possible shocks to the system-one toy and the other to x. Thus we are
interested in four impulse responses functions-the effect of a shock to y on the time-paths of y
and x, and the effect of a shock to x on the time-paths of y and x.
First, let us consider what happens when there is a one standard deviation shock to y, so that
vi = O"y and vi = 0 for t> 1; assume vf = 0 for allt.
811 = 0.7 and 812 = 0.2, 821 = 0.3 and 822 = 0.6.
Note: this implies y1 = 1 and x1 = 0. Fort> 1, Yt and Xt are given by equations (13.12) and
(13.13):
(13.12 )
F G
1 010 = 0
2 011 = 0.7
3 012 = 0.2
4 020 = 0
5 021 = 0.3
6 022 = 0.6
In cells Il:K3 enter the following labels, values and formulas. In the last row, you will find the
numbers of the equations used, if any.
I J K
1 Shock toy: Yt Xt
2 1 0
3 =$G$1 +$G$2*J2+$G$3 *K2 =$G$4+$G$5*J2+$G$6*K2
(13.12) (13.13)
326 Chapter 13
Copy the content of cells J3:K3 to cells J4:K31. Here is how your table should look (only the
first five values are shown below):
F G H J
010 = 0 Shock !Cl�� lit
611 = 0.7
012 = 0.2 0.7
Oio �· 0 0.55
021 = 0.3 0.4�3 0.399
Ou= 0.6 0 4.039 0.3783
Select the Insert tab located next to the Home tab. Select Jl:J31. In the Charts group of
commands select Line, and Line again.
__,51
® __ QH'.l_ _m e_ � I nsert�
After editing, the result is (see also Figure 13.5 p. 507 in Principles ofEconometrics, 4e). We also
show the response of x toy, plotted by selecting cells Kl:K31.
0 . .8
0.5
04
i:U:
'()
I
r 5 9 13 17 21 25 29 1 5 9 1! 17 21 25 29
�------ ,�
Note that the figures above looks slightly different from the ones found in Principles of
Econometrics, 4e. The difference is explained by the fact that in the above figures we did not plot
the Yo and x0 values, but started instead with y1 and x1.
Next, let us consider what happens when there is a one standard deviation shock to x, so that
v f = ax and v f = 0 for t > 1; assume v [ = 0 for all t.
We further assume the following numerical value: 2. Note: this implies y1 0 and x1 2.
For t > 1, Yt and Xt are given by equations
ax = = =
(13.12)
In cells M1:03 enter the following labels, values and formulas. In the last row, you will find the
numbers of the equations used, if any.
M
1 Shock to x: Xt
0 2
3 =$0$1 +$0$2 *N2+$0$3 *02 =$0$4+$0$5*N2+$0$6*02
N 0
Yt
Copy the content of cells N3:03 to cells N4:031. Here is how your table should look (only the
2
first five values are shown below):
(13.12) (13.13)
Shoc:k lo x:
0.52 0.84
0 532
1.11 I�
0.5044 I 005556 I
1 'Yr x,
-
2 0 2:
,____ ..,
Response of y to x Response of x to x
0.6
1.4
0.5 2
04 1,6
0.3 1.2
D.2 0.8
IH 0.4
{) 0
1 5 9· n 17 25 2'9 :L 5 1,..., i7 21 25 29
CHAPTER 14
CHAPTER OUTLINE
14.1 Time-Varying Volatility 14.2.1 b Lagrange Multiplier Test
14.1.1 Returns Data 14.2.2 Forecasting Volatility
14.1.2 Simulated Data 14.3 Extensions
14.2 Testing and Forecasting 14.3.1 The GARCH Model
14.2.1 Testing for ARCH Effects 14.3.2 The T-GARCH Model
14.2.1a Time Series and Histogram 14.3.3 The GARCH-ln-Mean Model
Open the Excel file returns. Save your file as POE Chapter 14. Rename sheet 1 returns data.
Insert a new column to the left of the column labeled nasdaq. In your new cells Al:Al3, enter
the following label and values.
A A
1 m*-year
2 ml-1988 8 m7-1988
3 m2-1988 9 m8-1988
4 m3-1988 10 m9-1988
5 m4-1988 11 ml0-1988
6 m5-1988 12 ml1-1988
7 m6-1988 13 m12-1988
328
Time-Varying Volatility and ARCH Models 329
Select cells A2:A13, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below, left-click, hold it and drag it down to cell A272.
A
1 I m•-11·ear
2 m1-1988
3 m2-1988
4 1113-1.988
5 m'l-1988
G mS-1988
7 m6-1988
-
B rnl-1'988
-
9 rn8-19·88
1 ti m9-1988
11 11110�1 sss
12 m 11-i 988
I 13 m 1 2 -193
-'-'I
�.
Excel recognizes the series and automatically completes it for you. Here is how your table should
look (only the last five values are shown below):
A
258 m3-2'010
r----
259
--
m4-2-010
270 mS-2-010
-
271 m6-W10
---
272 mJ-2-010
1.-� -- -----
Next, we plot the time series of the monthly returns to the United States Nasdaq stock price index
(NASDAQ).
Select the Insert tab located next to the Home tab. Select Al:B272. In the Charts group of
commands select Line, and Line again.
Scatte.r Other
Charts-
Charts
After editing, the result is (see also Figure 14. l(a) p. 520 in Principles ofEconometrics, 4e):
20
10
-10
-2tl
-30
330 Chapter 14
The values of this series change rapidly from period to period in an apparently unpredictable
manner; we say the series is volatile. Furthermore, there are periods when large changes are
followed by further large changes and periods when small changes are followed by further small
changes. In this case, the series is said to display time-varying volatility as well as "clustering" of
changes.
We proceed as we have done before in Section 4.6.1. First, we create a BIN column. In cell Gl,
type BIN. The bin values will determine the range of values for each column of the histogram.
The bin values have to be given in ascending order. Starting with the lowest bin value, a value
will be counted in a particular bin if it is equal to or less than the bin value.
Note that econometric packages such as SAS or Stata automate the choice of the number and
width of bins. Thus the figures they produce might differ slightly from ours.
Fill in the bin values as shown below. Note that all you need to do is enter the first two values:
- 30 and - 27.5, select cells G2:G3, move your cursor to the lower right comer of your selection
until it turns into a skinny cross as shown below, left-click, hold it and drag it down to cell G26:
Excel recognizes the series and automatically completes it for you.
1
G
BIN
2200
23
24
:U..5
25
2 i---:3ol 25 27.5
3 .�.f. �. 30+
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data analysis.
Anal}'m
The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
,.� - -
�
D ata An11 l ysis �L8J
Analysjs Tools
Co�ariaince �
Descrii!Jtive Stalistics
Exponential Smoothing �-
F-TestJwoc'Silrnple for Variances
=I
F.iurier Analysis tlelp·
I 1sto�ram
Moving Average
Ranciorn Number Generafon
R<irnk and Percentile
Regresflion
An Histogram dialog box pops up. For the Input Range, specify B2:B272; for the Bin Range,
specify G2:G26. The Input Range indicates the data set Excel will look at to determine how
Time-Varying Volatility and ARCH Models 331
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it US Nasdaq Histogram; check the box next to Chart Output. Finally, select OK.
r -- - ---- -- ·---
1 Histogram I}]�
�
; �
Input
Input Range:. 1$_
'-"B$,__2--'-
:$B__,$_27_2_--'- _..,
J;l,in·Range: $G$2:$G$26 �J �
tielp
D Labels.
Output options
0 Qutput Range: I ri:'il
@New W�rksheet E'.!y: I us Na·sdaQ 11 istogr am I
0 New :tLorkbook
D P2reto (sorted histogram).
D curnul,ative Percentage
�t�F�-r.f§�tP.�t:'
Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. Select Close.
_ I; Tien.dli r: ...
NciGap
Go to the Border Color tab and select Solid line, choose a different Color if you would like.
Select Close.
... � - - - �
format Datil Series �['g]
Series Oplions Border Color
Fill 0 t:!o'nne.
� Border Color j @ �otd
t line
Q §radient line
Border Styles
0 A!!tomatic
Shadow
3-0
i:;'.,olor: �
_b;J
Format
Iransp Color Ji--- :I Close
Finally, delete the Legend, and increase the size of the Chart area (see Section 2.3.4 for more
details on that). After editing, the result is (see Figure 14.2(a) p. 521 in Principles of
Econometrics, 4e):
332 Chapter 14
50
40
I>
li!
� 30
I:!"
:!!
...
20
10
We would like to draw a normal distribution on top of this histogram so we can better assess
whether or not the returns display normal properties.
Go back to your returns data worksheet. In cells Il:J4, enter the following labels and formulas.
I J
1 Nasdaq
2 sample mean = =A VERAGE(B2:B272)
3 sample variance = =V AR(B2:B272)
4 standard deviation = =SQRT(J3)
L M
1 Mid-point NormalNasdaq
2 -31.25 =NORMDIST(L2,$J$2,$J$4, FALSE)
3 -28.75
In column L, we specify the mid-point or mid-value of the bins or class intervals we used to
construct the US Nasdaq histogram. In column M, we compute the normal distribution values
corresponding to those mid-point values, where the normal distribution is specified to have a
mean and variance corresponding to the sample mean and variance of the monthly returns of US
Nasdaq.
Select cells L2:L3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell L26: Excel recognizes
the series and automatically completes it for you.
L I
""[
22
L 1 23 21.25
1 Mid-point I 24 2:3.75
-31.�I
2 25 25.25
:3 ·· 28.75+ 26 28.7\
r ' :1
Time-Varying Volatility and ARCH Models 333
Copy cell M2 to cells M3:M26. Here is how your table should look (only the first five values are
shown below):
-
I I J I K L I M
1 nasdilq Mid-point NormalNi!sd.aq
1--
2 sample me.an= 0.70S548 -31.25 9>.52135E-07
,_
3 sample variance= 46.35319 -28.75 5.MllE-05
�,_
4 standard deviation= 6.808318 -26.25 2-.30812E-05
5 -23.75 9'.2349E-05
l
,_
5 . .
-21.25 i{) .0003 2'."88 0
Go back to your US Nasdaq Histogram worksheet. Select the histogram, right-click and choose
Select Data on the list of options that pops up. In the Select Data Source dialog box, select Add.
In the Edit Series dialog box, specify the Series values to be M2:M26 from the returns data
worksheet. Finally select OK. The Select Data Source dialog box reappears again. Select OK
one more time.
e--- .
R
Qelele
The series you just added is barely visible, at the bottom of your plot area. Select it, right-click
and select Change Series Chart Type. In the Change Chart Type dialog box, select Line. In
the list of Line charts, select Line again. Finally select OK.
. -----
I ,Qei-.te
Templates
�I Reset to M_g_tcb '.ityl.:.
Your series is now is little bit more visible, still at the bottom of your plot area. Select it again,
right-click and select Format Data Series this time. In the Series Options, select Secondary
Axis.
���-
�
sene� Op ti\;ln!>
Serie-s Optio ns
Marker Options Plot SE:rfes On
Add Data LoQels·
.dd Tre·ndline.,,
Marker FJll 0 e.rim.ary Axl!>
Lme Color
1 '·� forma� Gata s.:�i;::1 ...
����-����r.:t .��-i�.1
In the Line Style options, select Smoothed Line. Finally, select Close.
334 Chapter 14
r - -
Mark.er Optims
� SOlOOthed llt1e
Select the right-vertical axis, right click and select Format Axis. Select the Axis Options tab,
specify Fixed Minimum at 0.0 and Fixed Maximum at 0.09. Finally select Close.
Format Axis
The result is (see also Figure 14.2(a) p. 521 in Principles ofEconometrics, 4e):
1- :f.------'-'-"--'' "'
0.08
50
0.07
:>
40 0.-06
u
"
0.05
�.,. 30
0.04
�
L&. 20 0.03
0.02
10
O.Ql
0 0
� � � � � � Q � � � + � �
Note that there are more observations around the mean and in the tails. Distributions with these
properties-more peaked around the mean and relatively fat tails-are said to be leptokurtic.
You can proceed similarly to plot the time series and histograms of the monthly returns to the
Australian All Ordinaries stock price index (ALLODS), the Japanese Nikkei stock price index
(NIKKEi), and the United Kingdom FTSE stock price index (FTSE ). They are shown on Figures
14.l(b)-(d) and 14.2(b)-(d) pp. 520-521 of Principles
ofEconometrics, 4e.
(14.lb)
Time-Varying Volatility and ARCH Models 335
(14.lc)
where {30 = 0, a0 = 1 and a1 = 0. Note: these values imply ht = 1, which means that
var(et llt_1) is constant and not time varying.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.
A B c D E
1 Po= 0 et Yt
2 ao= 1 =$B$1+D2
3 a1= 0
In column D, we generate a sample of 200 random numbers from a normal distribution with
mean 0 and standard deviation 1.
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis. The Data Analysis dialog box pops
up. In it, select Random Number Generation (you might need to use the scroll up and down bar
to the right of the Analysis Tools window to find it), then select OK.
;· Da
-ta.
- An _11_lys _i_s -----------� (IJ[8]
analy�is Tools
Descriptive Statistics
Exporn:nb.ol Smoothing
F-Test Two-Sample for ·� ar i<lnces
Fouri er Analysis
Histogram
Mo11i1�
· vera�
IMijft@ijij;fi§§.t�tl·®·
Ra k and Percffl!lle
Reoressi on
II Formul1M [}atatf Review sa�pling ,.,,_; 1
Jl.nal}i�ls
A Random Number Generation dialog box pops up. We need to generate one set of random
numbers for our random errors, so we specify 1 in the Number of Variables window. We would
like to generate 200 random numbers, so we specify 200 in the Number of Random Numbers
window. We select Normal in the Distribution window; the selected Parameters should be
Mean equal to 0, and Standard deviation equal to 1. Select Output Range and specify it to be
D2:D201. Finally, we select OK.
336 Chapter 14
r -- ------ .
Mgan=
�
�tamlard deviation = �
Ou cput op!ions
After you copy the content of cell E2 to cells E3:E201, here is how your table should look (only
the first five values are shown below):
"I A I B I c I D I E I
1 Ille= 0 9t y, I
2 ct11 = 1 , 1.903�3 'U 96313
3 ll'1 =
0 , 021504 1-0.21.504
-
-
4 0_625423 0_13.25423
5 -0.0904.3 -0_09043
Ei -0.132132 -0.132.62
Note: you will obtain a different random sample than the ones we obtained, so your et and Yt
values should be slightly different than the ones reported above.
Select the Insert tab located next to the Home tab. Select El:E201. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.
S:Ca.tter
E�
l� �
After editing, the result is (see also Figure 14.3(a) p. 522 in Principles ofEconometrics, 4e):
Time-Varying Volatility and ARCH Models 337
-4
Next, we standardize the simulated data we just generated. That is for each observation we
subtract the sample mean and divide by the sample standard deviation.
G H I J
1 Yt standardized Yt
2 sample mean = =AVERAGE(E2:E201) =(E2-$H$2)/$H$4
3 sample variance = =VAR(E2:E201)
4 standard deviation = =SQRT(H3)
After you copy the content of cell J2 to cells J3:J201, here is how your table should look (only
the first five values are shown below):
G I H I I I J
1 Yt_ s.tandardirnd Y:rc
1-
2 .sample me.an= 0_005703 1-309547 877
- -
�
3 sample variance = 0.826602 i.117.275611
4
- sample .standmd deviation= 0.909177 0.581628202
5 -0.105734753
,_
6 I -0_ 152135987
Remember that your numbers our going to be different than ours since you are working with a
different random sample.
To plot the histogram of the standardized Yt, we proceed as we have done in Section 14.1.1.
L
1 BIN
2 -4
3 =L2+1/3
338 Chapter 14
Copy the content of cell L3 to cells L4:L26. Here is how your table should look (only the first
five values are shown below):
L
Blf�
2 -4
3 -3.56667
4. - 3 33333
.
5· -3
6 -2.5661)7
In the Histogram dialog box, the Input Range should be J2:J201, and the Bin Range should be
L2:L26. Check the New Worksheet Ply option and name it Simulated Data Histogram; check
the box next to Chart Output. Finally, select OK.
Input
Input Range�
output op!iorts
0 QutputRan.ge: �1
@ Neilll \�for�eet ely: I red Oat.:; Histogr<iml I
0 Ne•N '.tl_or�book
D P�reto {sorte d histogram}
D Cumulative Percentage
� �hart Output
Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. In the Border Color tab, select Solid line, and
change the Color to black. Finally select Close.
Qe-lt·t
l e
� Series Optic:ms l Se·ries Options i Format Data S.elies
Re1et to M;:i_tchi Style
Fill Series Qverlap
Ch a.ng,e Set['es Cha rt T�p e,,, Series Optims Border Color
Border Color Separated
Si:;lect Data ...
.----
3--0 form:;t 0
A.did T!en.dl1ne ... Ma Gap Shadow
�
.Eorm�.at Data �e ries .. ,
I t- 3-0 Format
�olm:
After editing, the result is (see Figure 14.4(a) p. 522 in Principles ofEconometrics, 4e):
Time-Varying Volatility and ARCH Models 339
Note: we obtain a different histogram than the one illustrated in Figure 14.4(a) since ours is based
on a different random sample. Yours will be different than ours and that of the textbook for the
same reason.
We would like to draw a normal distribution on top of this histogram so we can better assess
whether or not the ARCH(l) model with constant variance display normal properties.
Go back to your simulated data worksheet. In cells Nl:03, enter the following labels and
formulas.
N 0
1 standardized Yt
2 sample mean = =AVERAGE(J2:J201)
3 sample variance = =VAR(J2:J201)
4 standard deviation = =SQRT(03)
You should find that the sample mean of the standardized Yt is 0 and the variance is 1.
Q R
1 Mid-point StandardN ormal
2 =L2-0.5*(L3-L2) =NORMDIST(Q2,0,l, FALSE)
3 =(L2+L3)/2
In column L, we specify the mid-point or mid-value of the bins or class intervals we used to
construct the simulated data histogram. In column M, we compute the normal distribution values
corresponding to those mid-point values, where the normal distribution is specified to have a
mean 0 and variance 1.
Copy the content of cell Q3 to cells Q4:Q26, and copy the content of cell R2 to cells R3:R26.
Here is how your table should look (only the first five values are shown below):
340 Chapter 14
Go back to your Simulated Data Histogram worksheet. Select the histogram, right-click and
choose Select Data on the list of options that pops up. In the Select Data Source dialog box,
select Add. In the Edit Series dialog box, specify the Series values to be R2:R26 from the
simulated data worksheet. Finally select OK. The Select Data Source dialog box reappears
again. Select OK one more time.
-------�---�- -
Serres name:
Q<:lete
The series you just added is barely visible, at the bottom of your plot area. Select it, right-click
and select Change Series Chart Type. In the Change Chart Type dialog box, select Line. In
the list of Line charts, select Line again. Finally select OK.
. -
Chillnge Ch<irtType
QefrlE
Templates Urne
'
r::-� I
Column
� Re ;et to M!tch Styl"
II t.t'I
Change S1.:rie! Ghart Typ-e... OK
dd
t'J
Your series is now a little bit more visible, still at the bottom of your plot area. Select it again,
right-click and select Format Data Series this time. In the Series Options, select Secondary
Axis.
,. - -
I Ser1e.s Op.cons.
Series Optio ls
Marker Options Plot Series On
Add D<ita la.Q.el;.
I
r
Series Op tlon;
'Marke.r Option<
'Marker Fill
.l..Jne·color
Tt smooth"'d
111
(:
- -�I �.11
llni:: Style Clos�
-
- ·
Select the right-vertical axis, right click and select Format Axis. Select the Axis Options tab,
specify Fixed Minimum at 0.0 and Fixed Maximum at 0.85. Finally select Close.
�������
r
Format.Axis
Clp1ion•
Add Major Gridllnes
"-xlS
Axis Options
Add f\li!!or Grrdline< Mumber Minimum: 0 �uto @ !:_b:ed
�
11, fprmat Ax i s-. ..
-� Fill Maximum: Q A!,J.to @ F\xed
� 11 Close �
The result is (see also Figure 14.4(a) p. 522 in Principles ofEconometrics, 4e):
ll.l:I
0.7
IHi
0.5
0.4
0.3
()2
0.1
J
The bottom panels in Figures 14.3 and 14.4 of Principles of Econometrics, 4e (p. 522) illustrates
the case of a time-varying variance. It is would be much more complicated to generate such a
series in Excel; we will not investigate this problem at this point.
Open the Excel file byd. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 14 in one file, create a new worksheet in your POE
Chapter 14 Excel file, rename it byd data, and in it, copy the data set you just opened.
342 Chapter 14
Select the Insert tab located next to the Home tab. Select Al:A501. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.
5!att;;r Other
_1', Cllart5�
Charu !-{ r.
After editing, the result is (see also top panel of Figure 14.5 p. 524 in Principles ofEconometrics,
4e):
BYD Lightin,g
To plot the histogram of returns for BYD Lighting, we first create our BIN column.
D
1 BIN
2 -8
3 -7.5
Select cells D2:D3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell D34: Excel recognizes
the series and automatically completes it for you.
I D
1
c--2�
I
Q
BIN
I
I
}� c-� -
32
33
7
7. 5
2-L...l.L5
A I 0
34
�·
B
T..
Time-Varying Volatility and ARCH Models 343
In the Histogram dialog box pops up, the Input Range should be A2:A501, and the Bin Range
should be D2:D34. Check the New Worksheet Ply option and name it BYD Lighting
Histogram; check the box next to Chart Output. Finally, select OK.
.
Histogram L7Jtg]
Input
'Input Range: I sAs2: �Asso 1 [BJ
�lf1Rango<: ED52:5DS34 �
!:ielp
Output option&
Q Qutput Range: f'iil
8 Ni::'ii Wor-ksheel e_ly: I BYD. LJgh ting His;tograml I
0 New \"l_orlcPook
D P,;_re:to {Mrt:ed hfstogram)
D Cu!:!l_u1.3tive Percentage
� �;J1art,QU/put
Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. In the Border Color tab, select Solid line, and
change the Color to black. Finally select Close.
��-
���������
Qelde
I :Serles Opbons J Series Optio: s
Fomu1t D<ll:a Series
� Rt"sett to M'l!.tch Style
Series· Q.verf<!p
flll
Seri es Options
Chang• s_�ries Chilrt 1iline ... Separated Bor• der Colm
Border Color �
Fill
� S!lect Gata... 0 hlo fine
Border 5 tvles
:!-D B.otiit 1Q n Bord�r Color ® 2olidline
Shadow 0 2rad1entline
Add l:J<ata Lall.el'� Gap \IY.idth Borafr Stvle�
Q A;!_tomabc
�
3-0 Format
Add T!endline ... No Gap Shadow
& fm m at · Data $erie<... � 3-D'fomiat
Qllor:
After editing, the result is (see lower panel of Figure 14.5 p. 524 in Principles of Econometrics,
4e):
·8 -6 -4 -2 0 2 4 6 8
344 Chapter 14
(14.2)
where rt is the monthly return on shares of BYD, the hypothetical company BrightenYourDay
Lighting.
(14.3)
We estimate equation (14.3) instead of (14.2). First, we create our explanatory variable x. In cells
Bl:B2 of your byd data worksheet, enter the following label and value.
�
�
Copy the content of cell B2 to cells B3:B501. Here is how your table should look (only the first
five values are shown below):
B
1 x
2
3
4
5
6
In the Regression dialog box, the Input Y Range should be Al:ASOl, and the Input X Range
should be Bl:BSOl. Check the boxes next to Labels and Constant is Zero and Residuals. Select
New Worksheet Ply and name it Mean Equation for BYD. Finally select OK.
Regres�on l1JLEJ
rnput
0 Qutput Range: �I
® �Jew Worksheet �ly: J Mean Equation for ll D J
0 �Jew �orkbook
Residuill�
�liieii�-g�i�] D Resi!:!o.JalPlois
A I B' I c I D I E F I 13 H I I
SU�IMARY OUTPUT
J_
2 � t
3 Rearesmoft' Statl<cs
4 M_ultrple R 0Ji7'33 826!52 '·
>--' -
5 R Sql1are 0.453444196
� Adjust_ed �_Squar.e .0-451440186
7 Standar·d Error 1 1 35024 !524,
,___
8 01Jsarv::1tions !500
eJ_
10 ANOVA
1i I df SS MS •"' Sif}_mfrcance ,i::
1
J:?_ Regression 581-3592124 581 3592124 413.990030;i 1.97762E-67
elJl
d1.. -
22 RESIDUAL OUTPUT
23
f-- ·---
24 I
Observa tion Predicted r Re101dua/s
25 - 1 -
1 07��9422. -1.07629422
26., 2 -1.30548722
i
,...__
I-
1.07829422
-
27 j 1.07829422. 0.27254878
A Lagrange multiplier (LM) test (used previously in Sections 9.3.2 and 8.2.2a) is used to test for
the presence of ARCH effects.
To test for first-order ARCH, we first consider the following auxiliary regression:
(14.4)
where e[ are the squared residuals and e[_1 are the lagged squared residuals from model (14.2)
or (14.3).
The null and alternative hypotheses for a test of the presence of ARCH effects based on the
auxiliary regression (14.4) are: H0: y1 = 0 and H1: y1 * 0.
When H0 is true, there are no ARCH effects, and the sample size (T - q), where q is the order of
the lag, multiplied by the R2 goodness-of-fit statistic from (14.4) has a chi-square distribution
with m S-1 degrees of freedom, where Sis the number of parameters in (14.4)-note that:
q.
=
m = S-1 =
X2 =
(T - q) Rz-XCm=S-l=q)
X (14.5)
In cells D25:E26 of your Mean Equation for BYD worksheet, enter the following labels and
formulas:
D E
2
25 Residuals/ Residualst-1
26 =C26"2 =C25"2
Copy the content of cell D26:E26 to cells D27:E524. Here is how your table should look (only
the first five values are shown below):
346 Chapter 14
D I E
25 -
Residuals11 Residu111s1.1 ;:
26 1 704296882 1 162713425
T7 0.0742$2$37 uo429riss2
Ts 0.000743368 0.074282837
1-025754158 0.000743368
.JJ_
3[! o .3634821 i6 {ri2s7s<11·59
In the Regression dialog box, the Input Y Range should be D25:D524, and the Input X Range
should be E25:E524. Check the box next to Labels. Uncheck the boxes next to Constant is Zero
and Residuals. Select New Worksheet Ply and name it Auxiliary Regression. Finally select
OK.
Input
�3--- I B I c I D I E I F I G I H I I I
_1_ SUMMARY OUTPUT I
2
I
3 RecrresS1011 Stat�sfrc.s
4 Multiple R 0 352.942118
-g- R Square a t2455ans
I
.!... Adju,sl.ef! R Square 0 122EHlfi70T I
_L Standard Error 2.45001797'1
a Observati.ons 4.9gr
9
w N-JOVA
11 df SS
4.24.501tll9 4:!4501319
MS F S(cwrt1cance f
1 I
�
f2 Re9re.ssi�11 1 70. 71979859 4-3871E-15
I
13 Residual. 497 2983,286254 6.002588067
I
--
The results we are going to use for the Lagrange multiplier test are highlighted in the above table.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.
A B c
1 Data Input N= ='Auxiliary Regression'!B8
2 S= ='Auxiliary Regression'!B12+1
2
3 R = ='Auxiliary Regression'!B5
4 a=
5
6 Computed Values m= =C2-1
7 i-critical =CHIINV(C4,C6)
value=
8
9 Lagrange i= =Cl*C3
Multiplier Test
10 Conclusion= =IF(C9>=C7,"Reiect Ho'',"Do Not Reiect Ho")
11 p-value= =CHIDIST(C9,C6)
12 Conclusion= =IF(Cl1<=C4,"Reiect Ho'',"Do Not Reject Ho")
At a 0.05, the result of the test is (see also p. 524 in Principles ofEconometrics, 4e):
A I E3 I G
__L Data Input N= 499
2 S= 2
-
]. R2 = 0_ 124568
-
4 "= 0.05
2-
G· CompUled. Values Ill= 1
2
-
11 3 17E-15
Concluslon Reject Ho
-
12 =
The value of the Lagrange multiplier statistic reported in Principles of Econometrics, 4e, p. 524,
is LM = (T -
l)R2 = 499 x 0.124 = 61.876. Our calculation is slightly different because more
decimal places are used for R2.
Equation (14.6) shows the results from estimating an ARCH(l) model applied to the monthly
returns from buying shares in the company BrightenYourDayLighting. These results are obtained
using econometrics software such as EViews, Stata or GRETL-computer manuals for those can
be found at http://www.principlesofeconometrics.com/. The estimated mean of the series is
described in (14.6a) while the estimated variance is given in (14.6b):
We can use the estimated model to forecast next period's return rt+l and the conditional volatility
ht+l · For our case study of investing in BrightenYourDayLighting, the forecast return and
volatility are:
Tt+1 = Po =
1.063 (14.7a)
� ( )2
- Po
A
ht+1 = a0 + a1 rt =
o.642 + o.569(rt - 1.063) 2 (14.7b)
In cells Fl:I4 enter the following labels, values and formula. In the last column, you will find the
numbers of the equations used, if any.
F G H I
1 ARCH(l)
2 Po-hat= 1.063 ht+i-hat
3 «o-hat= 0.642 =$G$3+$G$4 * ( (A2-$G$2)A2) (14.7b)
4 «1-hat= 0.569
Copy the content of cell 13 to cells 14:1501. Here is how your table should look (only the first five
values are shown below):
1 F I G I H I I
1 ARCHpj
-
2 S-0-ha,t = 1.063 h1+d1a.t
1�
3 aq-ha,t = 0.G42 1.284952
f---
4 Q1.-hat ·= 0.569 1.589156
5 0.689144
I
6 0.64303"1
7 120811)
Select the Insert tab located next to the Home tab. Select 12:1501. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.
After editing, the result is (see also Figure 14.6 p. 525 in Principles ofEconometrics, 4e):
Time-Varying Volatility and ARCH Models 349
BYD Lightirng
24 �-------�
16 -
h1-hat 1
14.3 EXTENSIONS
The GARCH model, or generalized ARCH model, allows capturing long lagged effects with few
parameters. The general GARCH(p,q) model hasp lagged h terms andq lagged e2 terms. The
conditional variance function of a GARCH1
( ,1) model is given by :
(148
. )
where a1 + /31 < 1.
The returns to shares in our BrightenY ourDayLighting example have been reestimated under the
new GARCH(l,1) model:
ft= 1.049 (14.9a)
- -
ht= 0.401 + 0.492e�zt-l + 0.238ht-l (149
. b)
We use the estimated GARCH(l,1) model to forecast next period's return Tt+l and the
conditional volatility ht+l:
Tt+l= /30 = 1.049 (14.lOa)
In cells Kl:NS enter the following labels, values and formulas. In the last column, you will find
the numbers of the equations used, if any. Note that for our first ht+l value, there is no ht value
available, hence the shortened version of equation (14.1Ob) in cell N3.
350 Chapter 14
K L M N
1 GARCH(l)
2 Po-hat= 1.049 ht+i-hat
3 o-hat= 0.401 =$L$3+$L$4*((A2-$L$2)1'2) (14.JOb)
4 a1-hat= 0.492 =$L$3+$L$4*((A3-$L$2)"2)+$L$5*N3 (14.JOb)
5 lh-hat= 0.238
Copy the content of cell N4 to cells NS:NSOl. Here is how your table should look (only the first
five values are shown below):
r.=======::::======
::;::: :::::;:==:;:::==
:: ::::::rn
-
K L M L N I
_J_ GARCH(1,1)
2 �-hat• 1 049 ht+-1-hal
-
3 5-hat = 0,401 0_942397
� -·-
Select the Insert tab located next to the Home tab. Select N2:N501. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.
After editing, the result is (see also Figure 14.7b p. 527 in Principles ofEconometrics, 4e):
.... ..
GARCH(l,1)
10
15
ll
h,·h<>t
8_
,-o
J"�M-u.Jll.JJJ.w,
0 1'()0 20Q JOO
iw
40-0
J.l.
500
: - .... L-
In the T-GARCH version of the model, the specification of the conditional variance is:
(14.11)
Time-Varying Volatility and ARCH Models 351
The returns to shares in our BrightenYourDayLighting example have been re-estimated with a
T-GARCH(l,1) specification:
ft= 0.994 (14.12a)
� �
In cells Pl:U6 enter the following labels, values and formulas. In the last column, you will find
the numbers of the equations used, if any. Note that for our first ht value, there is no ht-i value
available, hence the shortened version of equation (14.12b) in cell U3.
p Q R s
1 T-GARCH(l,1) et
2 fl0-hat = 0.994 =A2-$Q$2
3 o-hat= 0.356
4 ai-hat= 0.263
5 y-hat = 0.492
6 fl1-hat = 0.287
T u
1 dt
" "" " ht-hat
2 =IF(S2<0, 1 , 0 )
3 =$Q$3+$Q$4*(S2/\2)+$Q$5*T2*(S2/\2) (14.12b)
4 =$Q$3+$Q$4*(S3/\2)+$Q$5*T3*(S3/\2)+$Q$6*U3 (14.12b)
Copy the content of cells S2:T2 to cells S3:T501 and copy the content of cell U4 to cells
U5:U501. Here is how your table should look (only the first five values are shown below):
p I Q R I s T I u
1 T-GARCH(1 1')1 e1 d!
2 !\.·hat= 0_994 -0994 1 h1-hai
-
�1 0
s�rttor
1"
�
. OlM �r
Charts
lli
352 Chapter 14
After editing, the result is (see also Figure 14.7d p. 527 in Principles ofEconometrics, 4e):
l· -GARCH{l,1)
14
12
,.&JJ�Jllii J
.s
h,-hilt
6
{)
Yt =
/30 + 8ht +et (14.13a)
(14.13c)
� �
We use the estimated GARCH-in-mean model to forecast conditional return and volatility.
In cells Wl:AC7 enter the following labels, values and formulas. In the last row, you will find
the numbers of the equations used, if any. Note that for our first ht value, there is no ht-l value
available, hence the shortened version of equation (14.14b) in cell AC3.
Time-Varying Volatility and ARCH Models 353
AA
1 GARCH-in-mean et dt
2 IJo-hat= 0.818 =A2-$X$2 =IF(Z2<0,"1","0")
3 w 0= 0.196
x y =A3-$X$2-z$X$3*AC3
4 <>-hat= 0.370
5 ai-hat= 0.295
6 y-hat= 0.321
7 P1-hat= 0.278
AB AC
1
2 E(rt) ht-hat
3 =$X$2+$X$3*AC3 =$X$4+$X$5*(Z2/\2)+$X$6*AA2*(Z2/\2)
4 =$X$4+$X$5*(Z3/\2)+$X$6*AA3*(Z3/\2)+$X$7*AC3
(14.14a) (14.14b)
Copy the content of cell Z3 to cells Z4:Z501, copy the content of cell AA2 to cells AA3:AA501,
copy the content of cell AB3 to cells AB4:AB501, and copy the content of cell AC4 to cells
AC5:AC501. Here is how your table should look (only the first five values are shown below):
GARCH-in-mean
1\,-hat = 0_8113' -0.818' 1 E{rt) h1-hat
·8;= 0 .196 � -1.1985 1 0.9 1307 0.782181
5-ha� =· 0.37 0244278 0 1.106565 1.47227j
-
w
·O'�·hat =
I x I
0295:
y I z
Ii '1313.68
I AA I AB I AC .1
1 ·e1 dt
v-hat = 0.32'1 -0.86944 1 0.934939
2
-
P?-hat = 0278. 0.666892 1.014297
-
J -
4
-
!
-
---
��
After editing, the result is (see also Figure 14.7e p. 527 in Principles ofEconometrics, 4e):
1\;
�
354 Chapter 14
G.ARCH-in-Mean: Ef r11:J
IE� r,).
" _::l)
-
4
3.5
5 1:QD 1:.:0 O<O 25'() 3 D 350 400 45'J 500
3
2.5
l
1.5
l
.. .... ..
GARCH-h:i-mean: ht
2()
hr.
16
1
50 150 25 ] 450
12
t
��vLlMu.�J��
:1
8
0 I I ' ' I I I
�
.o lO::J 200 300 35.(J 400 SOD
: - - - - - �
CHAPTER 15
Chapter Outline
15.1 Pooled Least Squares Estimates of Wage 15.3.1 Testing for Random Effects
Equation 15.3.2 Random Effects Estimation of the Wage
15.2The Fixed Effects Model Equation
15.2.1 Estimates of Wage Equation for Small N 15.4Sets of Regression Equations
15.2.1 a The Least Squares Dummy 15.4.1 Estimation: Equal Coefficients, Equal
Variable Estimator for Small N Error Variances
15.2.1 b The Fixed Effects Estimator: 15.4.2 Estimation: Different Coefficients, Equal
Estimates of Wage Equation for Error Variances
N = 10 15.4.3 Estimation: Different Coefficients,
15.2.2 Fixed Effects Estimates of Wage Different Error Variances
Equation from Complete Panel 15.4.4 Seemingly Unrelated Regressions:
15.3 The Random Effects Model Testing for Contemporaneous Correlation
Open the Excel file nls_panel. Save your file as POE Chapter 15. Rename sheet 1 nls panel
data. The nls panel data contains information on a sample of N = 716 women who were
interviewed over T = 5 years: 1982, 1983, 1985, 1987 and 1988.
+ + {39UNIONit + eit
where EDUC EXPER measures total labor force experience, and its
measures years of education.
square is measured by EXPER2. TENURE measures tenure in current job, and its square is
measured by TENURE2. BLACK, SOUTH and UNION are indicator variables.
355
356 Chapter 15
T u v w x y z AA AB
1 lnwa2e educ exper exper2 tenure tenure2 black south Union
2 =C2 =F2 =02 =P2 =Q2 =R2 =M2 =L2 =N2
Copy the content of cells T2:AB2 to cells T3:AB3581. Here is how your table should look (only
the first five values are shown below):
I T I u I v I w I x I y I z I AA I AB
1 lnwage 1 educ exper � ex.per� tenure tenure2 black south union
'2
--
1.80828.9 12 7.666657 58.77717 7.65'6·6'61 58.Tl777 1 0 1
3 1.863417 1-S_ 8.5S.J333 73,67361 S..58333.3
H67JG1 1 0 1
'4 1.789%7 12 10.1790 i03.622] 1-833333 3.361111 ·1
it
0
,_L 1-84�53 12 12:.17949 1 48.339·9 3.75 14.0625 0 1
G 1.856449 12 13_62'1791 185 . .55.33 5.25 Z.7.1i625 B 1
In the Regression dialog box, the Input Y Range should be Tl:T3581, and the Input X Range
should be Ul:AB3581. Check the box next to Labels and Residuals. Select New Worksheet Ply
and name it Pooled LS Wage Equation. Finally select OK.
r - ----- - • -· �
The result is (see also Table 15.2 p. 543 in Principles ofEconometrics, 4e):
Panel Data Models 357
A I D E f G
J]suMMARY PUTPUT
2 �������
3 Regression Slatis/ics
___!_,Multif!leR 0_57050_B5_2
+
5 R Square (1_325585903
:::C M.il!sted R Square 0�3?40750�
_I_iSt_aITTdar�_Error 0.�81�7492
iJ OW.errot,ion'S :J!iQ()
1il 'ANOVA
11 df SS MS F S1gnmcance F
_E_ Re�re-ssiori 8 2:51. 5350441, 31.44188051 215.49'5803 1.1658E-29ll
�Res. idcual 3571 521.02611311 0 .14590483 9'
14 TolaH 3579 772.5612252
15
1� -����������������������������
Coofffcren·ts StaJtdard Error f Stat P-va/11e lowet 95% �
�PP�e_r 95_ _%
�
_Lo _�_ °'
1r_9_�-
" -�
� pp
U� r 9_5_ _0%
·_e_ __
17 lmerceP't 0.47&500026 a.ose1ssas 8.48 094854 3.06:UE-t7 0.366499268 0.58670()784 ()_35649:9268 0.586700764
�educ r 0_071448792 O•_O 026893-�2 26.56689212 4_5664-E-142 OJJSG 175893 0_076721691 0.05617'5893
� - .
0_076721691
- '
__i:w exper 0.055685059 a.oosso116 6 .46962067 6 1.1161l6E-10 0.038ll09G16 0.072560501 0_038809616 0•.012560501
JQ.i_e xper2 -0_0,(}11_47538 ·o_OOOJ61287 -3.176250626 0_0015D4632 -0.001855887 - 0 0004 3 91 8 8 �G.. 0()':1 8�887
. -0_00tl4391�8
�tenyrri'l 0.00069531[ O.QQ6318981
_
-tL 11 ()71386·7 ()i_0157151l95 -7 .426485425 1_3874E-13 -0. 147526899 --0.085'300835 -0.'147526399 -O.OB5goo835
24 i soutn -0_ 106<002565 0 _014200329 . 7 .464533577' 1.04458E-13 -0.133845-J 15 -O.G781£M16 -0.133845115 -0_078160016
25-, unncm IL 132·243201 (k014961M8 8 83883633°J 1-48971E-18 0_ 102909047 0_1161577355 ()_ 1()290'9047 0.151577155
26
27
2'8
2:9. RESIDlJAlOUTPUT
3-0 ------
1 __ O_ b_s_fJtv<
. 3_ _a_fi_
o n
__ P
_redic_
_ t e_riJn w�
_ _ a9�
e _ _
Res_ _ _ a_ts
i riu _
Ej
332 1.79'5108'91 CL01311l0()9
2 1-83.5533305 Oi.0:27863695
�
34 3 1.823243235 -
0· 033876235
_
Again, we consider a wage equation model, but this time we are working with N = 10 women,
over the same period of T = 5 years. Furthermore, we assume that all behavioral differences
between individuals and over time are captured by the intercept. Assuming equal variances of the
error terms across individuals, this model can follow the dummy variable format of (15. 2):
AD AE AF AG AH
1 dl d2 d3 d4 d5
2 =IF(A2=1,1,0) =IF(A2=2,1,0) =IF(A2=3,1,0) =IF(A2=4,1,0) =IF(A2=5,1,0)
AI AJ AK AL AM
1 d6 d7 d8 d9 dlO
2 =IF(A2=6,1,0) =IF(A2=7, 1,0) =IF(A2=8,1,0) =IF(A2=9,1,0) =IF(A2=10,1,0)
AN AO AP AQ AR
1 exper exper2 tenure tenure2 union
2 =02 =P2 =Q2 =R2 =N2
Below cells ADl:AMl, we assign values to the dummy variables. In cell AD2, Excel is
instructed to look at the value contained in cell A2. If this value is equal to 1, i.e. if we are
looking at information regarding individual 1, then the dummy variable dl is assigned the value
1, and 0 otherwise. In cell AE2, Excel is again instructed to look at the value contained in cell
A2. This time, if this value is equal to 2, i.e. if we are looking at information regarding individual
2, then the dummy variable d2 is assigned the value 1, and 0 otherwise. l's and O's are assigned
similarly to dummy variables d3-dl0 (for more details on the IF function, see Section 3.1.4e).
Copy the content of cells AD2:AR2 to cells AD3:AR51. Here is how your table should look
(only the first five values are shown below):
,_
AD � AF I AG I AH Al ,tU AK AL AM AN AO� AO l AR
I dl dol d:J, d4 d'.i id5 dT 116 d!! d10- eKper exper2 _ tenure renurei Uf'lllO·ll
-2- I c 0 c � I() 0 0 0 � 7-666667 fJ.B. nm 7-66&667 sa.1m1· 1
] 1 u c 0 m l{l c 0 Q )
( !Ui83:l33 7167361 8 Eill:HB 13 67361 1
4 I ll 0 0 � () Q 0 0 � 10. 1794�· 103.622 1.83'3''.r.33 3.361111 I
� 1· I) 0 0 m 0 0 0 0 0 12.1784'l> 148.3399 3.75 14-.06-15 1
6 -
1 Ii 0 D 0 0 c D 0 ii 13_62179• 185_5533 5.25 'J.7%25 1
In the Regression dialog box, the Input Y Range should be Cl:CSl, and the Input X Range
should be AD1:AR51. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it LS Dummy Variable Wage Equation. Finally select OK.
------- --..··--
Regression f1Jrg)
lnp•Jt·
OK[0.J
Input 1 Range: $C$1 :$C$51 �
Input::\. Range: I $AD$1 :$AR$51 [�]
Cancel
)
[�]�abels � Coristant is·f_ero
ljelp
I
D ConEidence Level: �%
Output option>
0 Q.utput Range:. �1
@ Ne'f!i Worksheet �ly: I riable Wage Equation! I
The result is (see also Table 15.3 p. 545 in Principles ofEconometrics, 4e):
Panel Data Models 359
A B c D E F G H
.SUflo1MARY OUTPLJT
1
L_
___ ____.!
Regressio,, Sl.atlstics
Multiple R 0.994G17953
·
.5 R.Square 0.989264872
6 R Squai:,e
AdJu�,ted o:§ssJ9i:fa3
7 iStandard Error 0.2'7.605J302
6 ;·OJ}servatrQflS 50
'9
1o 'ANOVA
11 df Sq MS F SCgnific.anoe ,c
(15.3)
H1: the /31,i are not all equal
Our unrestricted model is equation (15.2). In the restricted model, equation (15.4) below, all the
intercept parameters are equal.
In the Regression dialog box, the Input Y Range should be Cl:C51, and the Input X Range
should be AN1:AR51. Check the box next to Labels. Uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it Restricted Model. Finally select OK.
360 Chapter 15
. ����������������� - --
! Regressio n �rg}
Input
1$C$1:$C$51
OKtsJ
Input 1 Range:
[�]
I $AN$ :$AR$5
Cance·1
l
Input·;>; Range: l l
�
!;_!elp
�Labels D Constant is ;::.ero
.
D Con [idence Level: �%
Output options
Q.Qutput Range: �1
© Nel/>J Worksh_eet !!II': I Restr ic ted Model I
The result of this pooled regression model 1s (see also Table 15.4 p. 546 in Principles of
Econometrics, 4e):
-
A B I c I D I E F I G I H I I
-}�SUMMARY OUTPUT j_
I
f-
3 I Regression Slalfst.i<:s
�ANOVA,
f-1. 1 df SS MS F SignniGam:;e F
i
i
t2 Regressior1 5 1.46524182 0.29304fr364 2:34333615 0.056S9i'119
13 Residual 44 5.502466225 (}_ 1 25056051
i
11 Total 49 sAGnoso45 I
15
16 Coefficierrls Sf:at'ldard Error i Stal P-va/ue Lower 95% Ueeer95% tower 95.0% Upper95'0%
f-
JL ln�ercept 0.6208522·56 1.01720872 O.G·
· 1034B942 0.544770491 -1.429197176
- - 2.6709017·09
. -1.4-29197176.: 2.67090•1709
J._8_ �per 0.1947492.26 Q.173043985 '1 .12543'1931 0<..26650765
.
-0_ 1539·98005 0.543496456 -0 153993005 0.543496455
0 00_70i'•i 1_2 -0.'.687&822�� 0 �.0 1 9 1 21 698
-
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it F-test.
ltiJ:{
I I l�srect Wor�heet (Sh·in�rnH
I ,--\
L--../
f Hestl/ti ,.1]
_
A B c
1 Data Input J=
2 NT = ='LS Dummy Variable Wage Equation'!B8
3 K= ='LS Dummy Variable Wage Equation'!B12
4 SSEu= ='LS Dummy Variable Wage Equatioin'!C13
5 SSER = ='Restricted Model'!C13
6 a=
8 Computed Values m1= =Cl
9 m2= =C2-C3
10 Fe= =FINV(C6,C8,C9)
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")
With 9 joint null hypotheses, at a - 0.05, the results of the F-test are (see also p. 546 of
Principles ofEconometrics, 4e):
A B c
·-
B Computed Values m -
9
A B I c
9 m ;:
35
I
1 Data Input J= 9
--
10 Fe= 2.160829
_?_ N= 50
11
3 -K= 15
- -
12 F-test F-stati:stic =_, 4.133967
4 SSEu.= 2_66
· 719
- n Conclusion= Reject Ho
5 S.S.ER = 5_502466 14 e_-11aiue = Q_Off1084
-
(15.5)
Go back to your nls panel data worksheet where we will first transform our data in deviation
from the mean form.
AT AU AV AW AX AY
1 lnwa2e exper exper2 tenure tenure2 union
2 =C2 =02 =P2 =Q2 =R2 =N2
AT I AU I AV AW I AX I AY
1 lnwag17 e x p el exper2 tenure tenure'2' union
--
2 1.80B2:89 7.G66:!fo7 58.n"in ;' 1.666667 58.inn l
-3 1.863417 8.583333
"
73.67361 8.583333
"
7"3.67361 1
-4. 1.78936_7 10.17 949 10:3.fi,22 1.833333 3.35111,1 1
--
5 1.64>£}53' 12:17949 14'8.3399,. .3.75 14.0&25 1"
6i 1.8.56449 '13.62179 195,_5533 r 525 2'1.56,25 1
In cells BA1:BA6 and BBl:BFl, enter the following labels and formulas.
BA
1 lnwa2ebar
2 =AVERAGE(AT2:AT6)
3 =AVERAGE(AT2:AT6)
4 =AVERAGE(AT2:AT6)
5 =AVERAGE(AT2:AT6)
6 =AVERAGE(AT2:AT6)
BB BC BD
experbar exper2bar tenurebar
BE BF
tenure2bar unionbar
Select cells BA2:BF6, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell BF51.
- - - - -
BA BB BC BO BE BF
1 lnwao:ebar ex1Jerbar .exver2bar tenur·ebar lenure2:b.ar unionbar
2 1.8328104 10.4461!5 113.99332 5:.4166666· 3!5.4874978 1
3 1.8328104 10.446.15 113.99332 5.416666() 35.4874978 1
4 1.8328104 10.44·61
· 5 11.3.99.332 5.4166666. 35 .4874978 1
5 1.8�28104 10 ..44•615 113.,99.332 !j,_416666& 35.4874978 1
10.44'6; 15
6
7 I
1.8328104 113.99332 5.4166666 35.4874978
�
Here is how your table should look (only the last five values are shown below) :
Panel Data Models 363
BA BB BC BO BE BF
47' 227£607 1'3. Ht10'2 179.6.3768- 2.1333.334 s._6472� 0
48: 227·66(}7 13.19,10•2 179. &37'68' 2. 13:33334 s:.647222 0
49 2..27'66(}7 13.19<102 u·9_,s;37as. 2_ 1333334 8,_64722:2 0
50 2:27'6.607 13.19,10'2 179.'!)3768 2.B33334 8.647222 0
51 2.2.7,6607 13.1SW2 179..,6876& 2.1.333334 K647'222 0
--, •
BH BI BJ BK BL BM
1 lnwaged experd exper2d tenured tenure2d uniond
2 =AT2-BA2
Copy the content of cell BH2 to cells BI2:BM2 and then copy the content of cells BH2:BM2 to
cells BH3:BM51. Here is how your table should look (only the first five values are shown
below):
I BH I Bl I BJ I BK I BL I BM
1 lnwagecl1 experd exper2di tenured tenure2d union cl
,_
2 -0.02452 -2.77949 - 55 . 2155, 2.25 23.2902'7 O'
3- 0.030607 -1.86282 -40.3197 3.166666 38.18611 0
4 -0.04344 -0.26666 -10.3713 -3.583.33 -32.1264 0
5 0.01372 1.7'33336 34_34,559, -'1.66667 -.21.42'5 °'
6 0.023639 3_175636 71..5599
· 8 -0,_16667 -7.925 0
In the Regression dialog box, the Input Y Range should be BH1:BH51, and the Input X Range
should be BI1:BM51. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Fixed Effects Wage Equation. Finally select OK.
,..- - -= - -· -=--·
D Con[idence level: �%
Output options·
0 Qutput R.:ino;ie: �1
® Newworksheet E'JY: I :FFects W.:ige Equation I
The result is (see also Table 15.6 p.549 in Principles ofEconometrics, 4e):
364 Chapter 15
I I I I I
m SUMMARY OUTPUT
A B c I} E F G H I
f
f--
�:i I R�ress1of1 Stalishos
4 Multiple R 0-470143855
5 R Square_ 02210'35244
f--
-� -
.
�
Note that the least squares residuals from (15.5), SSE= 2.66719, are the same as the least
squares residuals from (15.2). Furthermore, the least squares estimates of the f3k parameters from
(15.5), shown in the above table, are identical to the least squares estimates from the dummy
- -
variable fixed effects model (15.2) shown in Section 15.2. The standard errors of those
coefficients estimates are slightly different though. This is because the estimate of the error
SSE SSE
-
vanance
"2
above uses ae R N
W O G NT-K
•
-- "2
when 1- t should use ae C RRECT
O NT-N-K
. The ca1cu1at1- 0n•
ofBi R N ignores the loss of N = 10 degrees of freedom from correcting the variables by their
.W O G
sample means. So, if we multiply the standard errors estimates of the coefficients, from the above
table, by the correction factor �. the resulting standard errors will be correct and identical
��
to those obtained in Section 15.2:
isSE) �INT-=-K )
(Be,WRONG ) x
(� ( � �) = �� x
( �
(15.6)
= ��
� =Be C RRECT
. O
With N = 10, T = 5 and K 5, the correction factor for the standard error estimates from the
table above is:
(15.7)
Select cells D16:D22, right click and select Insert in the menu of options that pops up. In the
Insert window, select Shift cells right and then OK.
Panel Data Models 365
Insert �(8]
rnsErt
@[f;F;i_�-�fli°i�h�:
0 Shi A: cells down
0Entirto[OW
jnsert.. . 0 'Entire 1;olumn
I
Dele:te...
Oear l"i:o_nt,;nts
1=---�0=K-1r:;;J I Can c:el
In your new cells D16:D18, enter the following label and formula:
D
16 Correct SE
17
18 =SQRT(45/35)*C18
Copy the content of cell D18 to cell D19:D22. The result is:
A I B I c I D
16
--
Coefficients Standard Error CfJrrei;t SE
17 Jnbercept 0 #N/A
I
13 experd 0.2379-9-8541 0.165585769 0 18175'66'13 '
0.00(}971395 0:0()79048'19 !
-
j_�exper2d -().068188.1&8 � -- -
Note that the subsequent !-statistics, p-values and confidence intervals estimates would need
correction as well.
z
-- -- --
Go back to your nls panel data worksheet where we will transform our data in deviation from the
mean form.
BO BP BQ BR BS BT BU
1 lnwa2e exper exper2 tenure tenure2 south union
2 =C2 =02 =P2 =Q2 =R2 =L2 =N2
In cells BW1:BW6 and BXl:CCl, enter the following labels and formulas.
BW
1 lnwa2ebar
2 =AVERAGE(B02: B06)
3 =AVERAGE(B02: B06)
4 =AVERAGE( B02: B06)
5 =AVERAGE( B02: B06)
6 =AVERAGE( B02: B06)
BX BY BZ
experbar exper2bar tenurebar
CA CB cc
1 tenure2bar southbar unionbar
Select cells BW2:CC6, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell CC3581.
B:W BX BY BZ CA CB cc
1 , lnwa11ebar experbar exper2b.ar tenure bar' tenum2bar southbar unio-nbar
2 1.832810'4 10.44515 11'3.9i933'.2 5.416-6666 35.4874978 0 1
3 1.832810-4 10.44·6:15 113.9'9332 5.4166666 35.4874978 0 1
4 1.83.28104 10.441&15 113.9-9332 5.416&666 35.4874H78 0 1
5 1.8326104 10.44615 11�.9·933-2 5.41·66666 3-5.4874978 0 1
6 1.83.26104 10.44-615 113.9·9332 5.4166'66:6_ 35.4674918 0
.., I I I 1+
Here is how your table should look (only the last five values are shown below) :
Panel Data Models 367
BW BX BY Bl CA cs cc
3577 1-4663.57 15J�20.51 246.'62264 6_ 1'999•9 98 44.3444452 1 0
a57B 1-466357 15_52051 246_.62264 6.1999998 44.344445.2 � 0
3579 1.466357 15._52051 246_,62264 6:.t99·99"98 44.34444.52 , 0
3580 1.46u357 15.52051 24Edi2264 H.199-9'99·8 44.3444452 1 0
3581 1_4663.5 7 15Ji2051 246"'62264 6c1999'99'8 44.34.M452 1 0
CE CF CG CH CI CJ CK
1 Lnwa2ed experd exper2d tenured tenure2d southd uniond
2 =B02-BW2
Copy the content of cell CE2 to cells CF2:CK2 and then copy the content of cells CE2:CK2 to
cells CE3: CK3581. Here is how your table should look (only the first five values are shown
below):
CE I CF CG I CH I Cl I CJ I CK
3577 0_ 14308'1 -3.0841).1 -.91.97
· 1 -3_11667 �34J3375 0 0
"-
3578 -0..00692
t-·-·-
-2. 0.8'46;1 -66.0993 -2.'1166.7 -27.67
° 08 0 0
3 5 791 -0_03924 -0.138461 -8.35573 - 0 1 1 6 6,7
_ • -7.337.5 - 0 - 0
,_
3580 0.028011 1.915385 57_38786: 1.966667· 22.35 0 0
,_
3581 -0.12494 3338457 109.0383 3_383333- 47.49•583 0 D
·- -
In the Regression dialog box, the Input Y Range should be CE1:CE3581, and the Input X
Range should be CF1:CK3581. Check the boxes next to Labels and Constant is Zero. Select
New Worksheet Ply and name it Fixed Effects Wage Equation All. Finally select OK.
The result is (see also Table 15.7 p. 350 in Principles ofEconometrics, 4e):
368 Chapter 15
�
A B I c I D I E I F I G I H I
_J_ SUMMARY OUTPLJT
-
2
3
-
Regres.sion Statistics
l
....L Multiple R 0. 378119 0 87
5 R Square
-�
0.142974044
_L Adju_sted R Square 0.14149527'2
_1Q__ tanure.d 0.013908943 0.002931175 4.745f 2'.1 G4B1 E-06 0.008161999 0.019655887 0.008161999 0 ()'196"5588
l1 tenur11.ld -0.. 0 0089622'.7 0•.0 001 84088 -4_868462'4'.33 1.17J38E-06 -0 001257155 -0.000535298 -0.001257155 -0.000535298
.
2i southd -0.016322397
• 0 032'325859, -0.504933.f'.31 ri _513:5,35935 -0. ()79701378 o.o41dsss84. -o-cmiioms 0 .04 7056584
23 unio.nd 0.053697234 0.01274631 4.997307672 S.092413E-07 0.038706463 0.088588005 CJ.03870£463 O.OB858B006
With N = 716, T = 5 degrees of freedom and K 6, the correction factor for the standard
error estimates from the table above is:
3580 - 6 j3574
(15.9)
3580 - 716 - 6 2858
=
Select cells D16:D23, right click and select Insert in the menu of options that pops up. In the
Insert window, select Shift cells right and then OK.
,,... -- - - .
®[sh1tf�1�ifohf'
Q Shifil cells Q.o•.m
Q-Entire cow
!n>ert... --� QEntire £Olumn
I �ear �[
Del,,..te ...
OK Cancel
.Contents
In your new cells D16:D23, enter the following label and formula:
D
16 Correct SE
17
18 =SQRT(3 574/2858)*C18
Copy the content of cell Dl8 to cell Dl9:D23. The result is:
Panel Data Models 369
1�
A I B I c I D. I
16
--
Coefficients Standard Error Correct SE
i7 Intercept --
a #N/A
,_
1B experd 0_ 041083173 0_005919878 0 _ 00()620014
�exper2d -0_000409052 0•_000244425 0_000273333
- -
� tenured 0.01390894 3 0.00·2931175 0. 00 32778.41
�tenure2d -0.000895227 0_000184088 0·_000205861 . --�
� .
Note that the subsequent t-statistics, p-values and confidence intervals estimates would need
correction as well.
Our unrestricted model is equation (15.8). In the restricted model, equation (15.11) below, all the
intercept parameters are equal.
In the Regression dialog box, the Input Y Range should be B01:B03581, and the Input X
Range should be BP1:BU3581. Check the box next to Labels. Uncheck the box next to
Constant is Zero. Select Output Range and specify it to be Al in your Restricted Model
worksheet. Finally select OK.
lilput
Input1Range: 1$BO$l:$B0$3561 �
Cancel
Input :i R-:in9�: f$BP$1:$BLJ$3581 Ii]
.t:Jelp
� �abels D Constant r:i·;:;:ero
D ConEiclence Le\lel: EJ 0/o
Output options;
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.
370 Chapter 15
, ·�����
? RegreS!lio;n -Dwtput ramge wm overwrite ex\sfing data. Press. OK to .overwrite dara in r<1nge
A B I c I D I E I F I G I H I
SUMMARY OITTPLH
,__1_ I
2
3 Reqressioa Slfltislics
Multiple R 0_408141951
, _i__ j
,_i_ R Square D. 16957985?
§__ Adj'Usted R Square 0.165180322
7 StandaFd Error 0.424504153
'T Obs,ervations. 3580
I
9
'10 AN OVA
-
11 df SS MS F Sionificam;e f
I
12 Regression 6 128:.6931345 21.44885575 11 '.1-0255624 1.614'&E·137
,_ 1
i;!. Resid�al 35}3 643.8680907 0_1802(}3776
14 Total 3579 772.5612252
1.5
16 Coefficie11ts- Sfanddld Etror I Stal P-va/tte Lo�er95% Upper95% Lowedl5.0% U[!_e_er950%
17 ln!er�ept -- 1.2849303 75 0_052094637 24.665%998' 3:5454E-t.24 1-18279'2:166 ua1ow;ss i. 1827-9<2'166 1-3870&8585
Ta
-
exper 6.0783667.JB CL009511456 s-2391.94· 458 2A061�E�16 !L08971 B'3Q9 0_097015,166 IL059718309 OJisiiJ i 5166
19 expe�2 -0Jl02009946 0.000399'301 -5_033658213 5_04964E-07 -0_00279:?,&27 -0.0012.i rnss -0.00279'2:827 -0_001227065
20 ten ure 0_01206214.5 0_0048%715 2_463313468 0_013812755 0_00245'1507 0_0216&2:783 0_0024'& 150 7 0.0216&2:783
In your F-test worksheet replace the reference LS Dummy Variable Wage Equation by Fixed
Effects Wage Equation All. Also, the denominator degrees of freedom, m2, in cell C9, needs to
be corrected to account for the loss of N = 716 degrees of freedom from correcting the
variables by their sample means.
A B c
1 Data Input J=
2 NT = ='Fixed Effects Wage Equation All'!B8
3 K= ='Fixed Effects Wage Equation All'!B12
4 SSEu= ='Fixed Effects Wage Equation All'!C13
5 SSER= ='Restricted Model'!C13
6 a=
8 Computed Values m1= =Cl
9 mz= =C2-C3-716
10 Fc = =FINV(C6,C8,C9)
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho'',"Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho'',"Do Not Reject Ho")
With 715 joint null hypotheses, at a - 0.01, the results of the F-test are:
Panel Data Models 371
A B I c
8 Com1mted Values m1= 715
A I B I c
9 m2= 2858
.J_ Data Input J= 715
JC!_ Fe= 1-144628
2 N= 3580
11
..]__ K= 6
12 F-tes.t F-statist.ic = 1�.·65819
4 SSEu= 108.79135 t---
13 Conclus[on = R�ject Ho�.
ti
,____
SSER.= 643.8681 14
,___
p-value = 0
� a= 0.01 15 l Conclusion"' R'E!jE!ct H[J
Thus we reject the null hypothesis of no fixed effect differences between these women; it is
proper to include individual effects in the model.
In the random effects model we again assume that all individual differences are captured by the
intercept parameters, but we also recognize that the individuals in our sample were randomly
selected, and thus we treat the individual differences as random rather than fixed, as we did in the
fixed effects dummy variable model.
Below we re-consider the wage equation of Section 15.2.2 and treat individual differences
between the 716 women as random effects:
(15.12)
where iJ1 is a fixed population parameter. vit eit + ui> where ui are random individual
=
differences or random effects. The component ui that is common to all time periods implies that
the errors vit are correlated over time for a given individual, but otherwise uncorrelated. The
correlation is given by:
cov(vit, vis) a �
p = corr(vit, vis) = =
2 2 (15.13)
.Jvar(vit)var(vis) au + ae
We test for the presence of random effects by testing the null hypothesis H0: a� = 0 against the
alternative hypothesis H1: a� > 0. If the null hypothesis is true, i.e. there are no random effects,
then the Lagrange multiplier test statistic (15.14) is distributed as a Xfi) random variable in large
samples:
(15.14)
where ei,t are estimated residuals from model (15.15) below-which (15.12) reduces to when
there is no need for a random effects model:
(15.15)
372 Chapter 15
You will recognize model (15.5) as model (15.1) from Section 15.1.
In your new cells Cl:F2, enter the following labels and formulas:
c D E F
1 e-hatit Ye-hatitovert (Ie-hatitovert)2 e-hat2it
2 ='Pooled LS Wage Equation'!C32 =SUM(C2:C6) =D2t\2 =C2t\2
Copy the content of cell C2 to cells C3:C3581 and the content of cell F2 to cells F3:F3581. Here
is how your table should look (only the first five values are shown below):
c I F
-
1 e-hatit e-harZit
2 O_OB18 0_000174
-
3 0_027884 !l_ooorni
4 -0_033:88 0_001148
-
5 -0.06Q,24. 0_003629
Select cells D2:E6, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell E3581.
D E
1 �e-hat;,ov.er, fre--hatrrovert�2
2 -0_ 15686.248 0 _·02460583 8
3
4
5
6
'7 I +
Here is how your table should look (only the last five values are shown below):
D I E ,
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.
- � I'< 111fl JI
I Insert Worlc:s11 eel [Shift-FU] I
q rlilgrange Multiplier T�/tJf
In it copy the simplified Lagrange multiplier test template you created in Chapter 9.
Panel Data Models 373
In cell B2, replace R2 by T. Delete the content of cells Cl:C2. In cell CS, we enter the formula
for the Lagrange multiplier statistic given by equation (15.14).
A B c
1 Data Input N=
2 T=
3 a=
4 m=
5
2
6 Computed x -critical value = =CHIINV(C3,C4)
Values
2
8 Lagrange x = =SQRT((Cl* C2)/(2*(C2-1)))*
Multiplier ((SUM('nls panel data'!E2:E3581)/
Test SUM('nls panel data'!F2:F3581))-l)
9 Conclusion = =IF(C8>=C6,"Reject Ho","Do Not Reject Ho")
10 p-value = =CHIDIST(C8,C4)
11 Conclusion = =IF(C10<=C3,"Reject Ho","Do Not Reject Ho")
At a = 0.05, with T = 5, N = 716 and m 1, the result of the test is (see also p. 556 in
Principles ofEconometrics, 4e) :
A I B I c
_1__ Data Input N = i'Hi
2
- T.: 5
i_ a= 0.05
4 m= 1
-
_i_
1i Computed Value!> f-critical vaJue = 3.841459
7
-
B L_agrnnge M_ulliplie:r Test x ·-
62.12314
-9 ConclLJsion = Retect Ho
10 p-v.iJue = 3.23E-15
1f "c�n�lus.io� = R(!J��l Ho
Estimation of the random effects model is done via generalized least squares (GLS). As was the
case when we had heteroskedasticity or autocorrelation, we obtain the GLS estimator in the
random effects model by applying least squares to a transformed model. The transformed model
is:
where the transformed variables are: X{' it = 1 - a and Xit = Xit - aXi for all other variables.
(15.17)
374 Chapter 15
The regression error variance ai comes from the fixed effects wage equation estimated in Section
15.2.2 and re-stated below:
The wage equation (15.18) is referred to as the deviation (DV) regression as it uses variables in
deviation from the mean form.
NT-Kvv
(15.19)
NT-N-Kvv
A consistent estimator of ai
is obtained by multiplying the estimate of the error variance from
(15.18) by the correction factor (15.19)-see also Appendix 15B p. 583 in Principles of
Econometrics, 4e):
8,DV,WRONG X
( NT-Kvv
NT-N-Kvv
) =
SSEvv
NT-Kvv x ( NT-Kvv
NT-N-Kvv
)
SSEvv
aeDV,CORRECT
�
=
NT_ N_ Kvv
where BeDV,WRONG is the estimated standard error of the regression, SSEvv are the least squares
residuals and Kvv is the number of parameters from model (15.18); Kvv = 6, because they are all
Ksl o p es·
716 5 6,
slope parameters. Kvv is also referred as
3580 - 6 3574
= = =
NT-Kvv
NT-N-Kvv 3580 -716 -6 J2858 = (15.20)
c
6 Correct SE
7 =SQRT(3574/2858)*B7
-
6
T
c
Correct SE
·"-----"
O. I 9SI 10384
Next, we obtain an estimate of u� by getting the regression error variance of the following model:
- 2
{31 + {32EDUCi + {33EXPERi + {34EXPERi + {35TENUREi
ln(WAGE)i = + {36TENURE'f + {37BLACKi + {38SOUTHi + {39UN/0Ni (15.21)
+ui + ei
Equation (15.21) is referred to as the between estimator (BE) regression as it uses variation
between individuals as a basis for estimating the regression parameters.
-+ ei)
var (Ui =
;-u;
O"u + - T
_
-
SSEBE
N-KBE
_ ,_2
- O"v
(15.22)
where BJ is the mean square residual, SSE8E are the least squares residuals and K8E is the
number of parameters from model (15.21); K8E = 9, the intercept and 8 slope parameters.
With estimate (15.22) in hand, we can estimate u� as (see also Appendix 15B p. 583 in Principles
ofEconometrics, 4e):
(J."2
-
u -
a.
2-ui
u +
Bf -
-
SSE8E 1 ( SSEvv )
T-T N-KBE T NT-N- Kslopes
(15.23)
"2 1 "
O"v - O"eDV,CORRECT
T
where T = 5 years.
In cells CQl :CQ6 and CRl :CXl, enter the following labels and formulas.
376 Chapter 15
CQ
1 Lnwa�ebar
2 = AVERAGE(X2:X6)
3 = AVERAGE(X2:X6)
4 = AVERAGE(X2:X6)
5 = AVERAGE(X2:X6)
6 = AVERAGE(X2:X6)
CR cs CT cu
1 educbar experbar exper2bar tenurebar
CV cw ex CY
1 tenure2bar blackbar southbar unionbar
Here is how your table should look (only the first five values are shown below):
CQ I CR I cs I '
CT I cu I CV I cw ex I CY
1 l
lnwa�e,bar ·educbar exp�rbar exper�bar tenurebar tenur,e2bar blackbar southbar unionbar
2 1-8328104 12 10_44.£i15 ·113.99332 5_41B66.66
-- -- -
35_4874978 1 0 1
-- _,
� 1.8.3281
• 04 12 10.446•15 11'3.9.9332 5.4166666 35.4674976 1 0. 1
Select cells CQ2:CY6, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell CY3581.
ca CR cs CT cu CV cw ex CY
1 lnwag·eb·ar edu·cbar ,experbar exper2bar t,e·nur·ebar tenure2:bar blackbar southbar unionbar
2' 1.8328104 12 10_44615. 113-991332 5_416666:6: J.5_4874978 1 0 1
3 1. 8 3281 04 12 10.44615, 1 '13.99332 5.41 GG66& 35.4'8.74978 1 0 1
4 1-8328104 12 10_44615 iu.sg.332 s"41Ei6666i 35-4874978 1 0 1
sJ 1.8328'104 '12 10_44615 113_99332 5_4166666:, 35_4874978 1 0 1
(j. 1.8328'1 04 12 1 0 .446 1 5· 1i3.99•332 5.415666&.; 35.48'74978 1 0
-,•I ' ' I �
Here is how your table should look (only the last five values are shown):
I
co CR cs CT cu CV cw ex CY -
� Cu�
� �op.y
£lo.st�
l1
�
Pii.ste �� e[ial ...
Place your cursor in cell DAl. Right-click, select Paste Special. In the Paste Special dialog box
that pops up, select Values. Finally, select OK.
l�L"'t llf--:.-..
. lf•T-�1 -
� Ou!
Paste
� £opy
Oe.JI
faste 0Eorrnulas
I
,_
l'aste �peci;>I ...
� ©[��-�.��! 11 OK
f;J
Here is how your table should look (only the first five values are shown below):
..
DA OS: DC QD DE DF DG DH DI
1 lnwagetiar ooucbar experhar exper2.har tenurebar tenure2bar hlackbar south bar union bar
-2 1 _83281 12 10.44615 1H9933 5.416667 35.4875 1 (} 1
3 1.83281 12 10.44615 113.9.!HJ 5_416-66"7 35.4875 1 °' 1
4 1 _83261 i2 10.'44615 113.99'33 5.416'661 35.46'75 i 0 1
5 1.83281 12 10.44615 113.99>33 5.41666·7 35.4875 1 0 1
& 1.63281 12 10.44615 1 t3.9�33 5.41666·7 35.4875 1 0 t
Go to the Insert tab, at the upper left comer of your screen. In the Tables group of commands,
select Table. A Create Table dialog box pops up. The data for your table are found in cells
DA1:DI3581. Select My table has headers. Finally, select OK.
I !l!mmimmu [�1
� !;iy table has headers
Here is how your table should look (only the first five values are shown):
5 1
6 1
Table Tools and its Design tab show up. In the Tools group of commands of the Design tab,
select Remove Duplicates. This will delete duplicated rows across selected columns. All
columns of your table should be selected. If not, select the Select All button. Finally, select OK.
378 Chapter 15
' ����-
��� - -·
Colurmns
� lnwagebar
0 ei.:perbar
0 ei.:per2bar
� renurebar
� Summartze wrth l'ivotlable � temire2loar
I �Ouplfcates � fl'l s ()Uthbar �I
�� Co1wertto Ra11geo OK Cancel l,
T1Ml5
Excel informs you that 2864 duplicate values were found and removed, and 716 unique values
remain. Those are the 716 time-averaged observations we need to run model (15.21). Select OK.
�- -�-
�
Here is how your table should look (only the first five values are shown below):
1
2
3
4
:; 1:11 0 1
6 Q Q
Left-click anywhere in your table. Go back to the Tools group of commands, and select Convert
to Range. When asked, confirm that you do want to convert your table back to a normal range by
selecting YES.
Yes _lij I No
Too/3 -
Here is how your table should look (only the first five values are shown below):
DD
-
DE
exper2bar te11urebar
g 02
0 1
0 CJ
Panel Data Models 379
In the Regression dialog box, the Input Y Range should be DA1:DA717, and the Input X
Range should be DB1:DI717. Check the box next to Labels. Select New Worksheet Ply and
name it Between Wage Equation. Finally select OK .
.
Regression LI]l8}
lnpLit
Input '!'.Range:
t:!elp
0 Labels EJ
D Con:ltant is.f_ero
A I B I c I D I E I F I G I H I I
1 SUMMARY OUTPUT
"'T
3 Reunmsion Stalisf.ir;s
1 Mult. iPo[ eR 0.6045-39637
-
� �.Square 0.3654i68173
I->� A_djust<><!� Square
���
0.35828818
Stand.ard Error 0.340422217
8 Ot>servations 716
i�ANOVA
11 df SS MS F Sif/nifimnce f
�1s Re<gression 8 _4] .1901497_6 5.8987687l 5Cl.90091369 5.4524)lE-65
13 Re-sldual 7fr7 81-93231089 0.115887285
14 Total 71 5 129.1224-606
15
15 Coefffoients Standard Error t Stat P-vafue 1-ower95% Uooer95% l O'Nft.f 95. 0% Uooer95.0%
JI_ lnt�-cei:it 0.4166�8577 0.135761818 -3.069261906- 0.002127968 0.1.50144()05 ll.6$3:23315 Q.150144005 0.68323315
e<Jucbar o_il'fon231f 0 _()05387'371 13i3670i48 i9s55a.E:-:is o:os01g.51s7 o_ o813494-78 o_o6- 0'195157 o_o-01349478
�1 9 e:xF"erbar 0.0661 !!202: 0.023455392 :2.822056097 Cl.0049056.78 0.020141874- ()_112242989 0.020141874 o_112242989
- -
20 e:xper2bar
e-
-0.001606476 o_oo-og.gg.826 - 1-606754494 (j_ -j 08554128 -0.003569!16 0.000356509 -0.00356!!46 0_000356509
3?_ terJure2fair -0.000494785 0,0007028�8 -9-704�1177 0.481657249 -0_001874627 0_9oa88505_6 -o_op10_71j_6:?7 0_000885056
Ts uriionbar o.1557.3549f 0.03.5460749 4-.391771186- 1.29639E-05 0.08611452:2 0.2253 564 i4. a.oa6114.s22 0_2:25356474
D E
3 'Fixed Effects Wa
D I E I
a2 u-h at= n_ 108273673
We now have the ingredients we need for our transformation parameter a and thus for estimating
model (15.16).
DK DL
a= =1-('Fixed Effects Wage Equation All'!C7/
1 SQRT((5*'Between Wage Equation'!E3)
+('Fixed Effects Wage Equation All'!C7A2)))
DK DL
a; o,_743683
In cells DN1:DP2 and DQl:DWl, enter the following labels and formulas.
DN DO DP
1 lnwa2e* xl* educ*
2 =X2-$DL$1*CQ2 =l-$DL$1 =Y2-$DL$1*CR2
DQ DR DS DT DU DV DW
1 exper* exper2* tenure* tenure2* black* south* union*
Copy the content of cell DP2 to cells DQ2:DW2, and then copy the content of cells DN2:DW2
to cells DN3:DW3581.
Here is how your table should look (only the first five values are shown below):
:I -� I I
ON I DO DP I OQ I DR I OS I OT I DU I OV ow
1 lnwage• x1• educ• ex.per' exp-er2" tenure• tenure2• black• -South• uni-on•
2 0.445259' 0.266317 3-07580-5 -0_ 10196 -25_99'71 3_1i38'384 32;3?632' -
0_256317 0 0"256317
Q-_500�87 tj,2_5_6317
--
In the Regression dialog box, the Input Y Range should be DN1:DN3581, and the Input X
Range should be D01:DW3581. Check the boxes next to Labels and to Constant is Zero.
Select New Worksheet Ply and name it Random Effects Wage Equation. Finally select OK.
0 Q_utput Range: �I
@ lllew Worksheet ['_ly: jcts W'.lQ'-' Equation I
The result is (see also Table 15.9 p. 556 in Principles ofEconometrics, 4e):
Panel Data Models 381
A I B I c I D E F I G I H I I
1 SUMMARY OUTPUT
'2 t
3 Rearession Sfl!tislics
I- MullifJ.!.§. R
4 �
0_9 H1973 ?
_5_ R S!quare 0_868101666
jl_ Adjusted R -Square O. B6l 52614 t
Open the Excel file grunfeld2. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 15 in one file, create a new worksheet in your POE
Chapter 15 Excel file, rename it grunfeld2 data, and in it, copy the data set you just opened.
We consider a model for describing gross firm investment for N 2 firms, General Electric
(GE) and Westinghouse (WE), over a period ofT = 20 years.
(15.24)
where t = 1, ..., 20; i = GE or WE; and var ( ecE,t) = aJE = a&,E = var (ewE,t)·
I NVit denotes gross firm investment, for the ith firm in the tth period of time. Vit denotes the
stock market value of firm i at the beginning of year t, and is used as a proxy for expected profits.
Kit denotes the actual capital stock of firm i at the beginning of the year t, and is used as a proxy
for permanent desired capital stock.
382 Chapter 15
If, in addition, we assume the errors are uncorrelated, both over time for each firm and between
firms, then equation (15.24) can estimated with the General Electric and Westinghouse data
pooled together using the least squares regression technique, as in Section 15.1 for our wage
equation example.
G H I
inv v k
Copy the content of cells A2:C21 to cells G2:I21, and the content of cells D2:F21 to cells
G22:I41. Here is how your table should look (only the first five values are shown below):
I G I H I I
1 inv IJ k
--
2 33.1 117Q_1b 97_8
3 45 2015_8 104_4
,_
4 -
77_2 2803_3, 118
,_
44_6 2039_7 1 SK2'
I'+ 48.1 2256.2 17:2:.6;
In the Regression dialog box, the Input Y Range should be Gl:G41, and the Input X Range
should be Hl:L41. Check the box next to Labels. Select New Worksheet Ply and name it
Pooled LS Investment Model. Finally select OK.
,.
r Re g;e-ssi-c_n_________ rn �
1nput
I $G$1 :$G$4 l
OK�
Input 't_ Range:
�
I $H$1: $1$41
Cancel I
lflput � R.ange:
�
0 b_abels 0 Coristant is f_erc
.t!elp
l
0 Ccntiderice Level: �%
output options
0-QutP.ut Raoge: �1
€) New Worksheet �Jy: I L-5 ln\lestment_Mcdel
I
The result is (see also Table 15.11 p. 564 in Principles ofEconometrics, 4e):
Panel Data Models 383
A I B I c I D E F I G I H I I
SUMMARY OUTPUT l
t
�}j
+
�
i
3 I Re_cj:ressron S'fii'lislios
-
�· Multi'plEl R T
- '
0_1199873334
i
5 _R Squar� ()_6097720 t?
�
6 Adjus_ted R Sguare ()_ 799489423
11 1
i
df SS MS F Sig_nificil'nm F
12 R99N1ssim1 I 2 705()6_221 352-53_ 1105 78J5172745 4_()4.()63E-14
Jl Residual 37 0
1656J. ()28& 44l.64.872&5 i
14 Total T -39 87069.22383 I
1:5
16] Coefficients Stan rfard E.rror I Stat P-value_ lov.'r!'r.95% Upper95% Lower950% Upper 95.0%
lnterc'ept 17.87200128 7 .02:408050_7 2'_.5443°901.54 Q_.Q1525292'4 3.639'862407 32 1041401.5 3:639·862407 .32 10414015
JJ_
1B 'f ()_()15192638 0.0061�623� .2'_4519,13329 o_o 190.50853 0.0Cl2637B68 0_02774 740� 0_0021i Ji a 6 s -ii.027747409
119k () 143579159 0.01860()98& 7-718!Hl0416 3_ 1939'2E-09 0.105889.981 0_1B126833S 0.105889981 ()_ 181268336
where (
var ecE,t ) = (
var ewE,t ) , and Di is a dummy variable equal to 1 for Westinghouse
observations and 0 for General Electric observations.
Equation (15.25) is estimated using the pooled set of General Electric and Westinghouse data.
Insert a column to the left of the column labeled v and one to the left of the column labeled k.
In your new cells Hl:L2, enter the following labels and formulas for the dummy variables.
H
d
Enter value 0 in cells H2:H21, and value 1 in cells H22:H41. Copy the content of cell J2 to cells
J3:J41, and the content of cell L2 to cells L3:L41. Here is how your table should look (only the
first five values are shown below):
I
H I I J I K I L I
1 d v dlxv k dxk
f--
2
-
D 1170.5 o.. 97-8 0
3 0 2015_8· 0 104_4 0
f--
4 0 2803_3- 0 118 0
5-
-
D 2039.7 0 15&_2 0
6 D 2255.2 0 172.6 0
384 Chapter 15
In the Regression dialog box, the Input Y Range should be Gl:G41, and the Input X Range
should be Hl:L41. Check the box next to Labels. Select New Worksheet Ply and name it
Dummy Variable Model. Finally select OK.
... - -- - - -
I:nput�.Rarige:
1�$:1:$G$4l
I �$l:;$LS41
[�
�
� 1
't:!elp•
�babels 0 Ci;mstan t 1� i:_ero
0 Confidence Level: t:=J %
OIJ tput options
0 Qutput Range: �1
@ New WorkBheet Bly·: I Dummy varfable Mod el
I
The result is (see also Table 15.1 2 p. 565 in Principles ofEconometrics, 4e):
A I B I c I D I E I F I G I H I I
SUMMARY OUTPUT
t
3 I Rearession Stati.st1:Y-s
4 Multif!le.R 0.909857235
0.8278401813
,--.L R Square-
6 .f><djusted R Square (J.802522568
7 St.a.ndard - - -Error 20_99707349--
B Obserwtions.. 40
-?o{,"\NOVA
11 I df SS MS F Sia.rr ifica nce F
12 Regre-ssion 5 TZO 79 .4 02·64 14415.88053 32.69818434' 4.6< 07E-12
--
13 Residual t ]4 14�B9.82123 440.8770951
14 Total 39 87069.22388
15
161 CcieffiGie11 Is Standard Error f Slat P-�alue: Lower : 95% Uooer95% L.ower95.0% UDOeI 95.0%
17 l nterce pt _
9 9563 0 8498
-
_ 2'3.. 626�·b432' -0.421406712' 0 . 6761' 104 5ti -57.9708573� 38.05824039 57 97085739 38.0·5824039
-
_
'1a d 9 44 6�2061 5
_ 2'8.80535028
. 0 .32795715.1 fr.744955154 -49Jl925.94 67_98643523 - -49_092594 67. 9'8643523
T9 v 0.026551169 0-.011722048 Z.2650b4 05 8 !Hl29996268 0-:-002729122 o.o5oi732s1 0.002729122 0.05-0'373257
'20 Q·X)/ 0.02634293 (}_034352'67'& CU66637B�1 0·_4484 70 172 -0'.04347·01 O& 0_()96155966 -0_043470106 O_O 96155966
7[ k
22 d:xk:
0. t5169Jii75
-0.05928736
o.619356449'
0.11694·6429'
1.8-3G"fi65rh6· 4.0157SE-09
-Q1_ 5 0 69616S3 0..6154540<;)4
1u{i:f5Ga:3.9' o.191030911
-0·_2969510·% 0_ 17'8376377 -0-296951096
o.11§s5839 0.19'fojos11
0.178:375377
- - - - -� - - -
If we assume that these two firms have distinct investment behaviors, fixed over time, their
separate regressions can be specified as:
(15.26)
where t = 1, ..., 20; i = GE or WE; and var ( eGE,t ) =a E g * a a,E =var ewE,t ( )·
If, in addition, we assume there is no contemporaneous correlation, then equation (15.26) can be
estimated twice, first with General Electric data, and then with Westinghouse data, using the least
square regression technique. Equation (15.26) equivalently as the set of equations (15.2a) and
(15.2b):
Panel Data Models 385
(15.26a)
Note that the dummy variable format equation (15.25) becomes equations (15.27a) and (15.27b):
The least squares estimates of f3k,GE from (15.26a) will be equal to the least squares estimates of
In the Regression dialog box, the Input Y Range should be Al:A21, and the Input X Range
should be Bl:C21. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it GE Investment Equation. Finally select OK.
- -
� Regression 11J�
Input
Input Y. Range:
Input?;_ Range:
$A$1:�$21
1$1l�i:�l
�
1�1
� I
t!elp
�i�.b..�.S.i D Constanus :feFO
D Con�dence Level: ECJ %
Output OptiOfl!i
0 Qutput'Range: �I
@ New Worksheet 8Jy� I esbnent Equation I
0 New �orkbook
-Residuals
�Residuals D ResiQual Plots
The result is (see also Table 15.13 p. 566 in Principles ofEconometrics, 4e):
386 Chapter 15
A I B I c I D I E I F I G I H I I
�SUMMARY OITTPUT
3 I Reqression Siatistic:s
Lj Mulliple R 0..!!3 982540:5
�R Square 1().70:530671
6 ,Adjusled R Square 0.670636911
__J�Standan:l Error 27.88212414
8 Observations I 20
�
10 1ANOVA t
11 I I .c/( SS MS f Sig_nifiarnw f
12 ! Regre.s sion 2 -��6_32.0322 _15816.(1161 20.3435478.3 3.0;Sn9E-05
R!i
1s I
Residual
-Total t 17
19
13:216.58719
44!148.6193;9
m.4463053
1� I CDefticienJs Slandani Etro.r t Stat P-va/ue Lawer9a% Upperil5% l..ower95.0% Upper 95.0%
17 , lntercepl - 9 .'956303498 3137424�3'7 -0.317340144 0.754849862 -76.151118584 5623756885. -76.15018584 56-2375688'5
!Tfiv ge
- � "!
In the Regression dialog box, the Input Y Range should be Dl:D21, and the Input X Range
should be El:F21. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it WE Investment Equation. Finally select OK.
r------=-==- - --
RegreSSiofl L[J �
Input
Input y Range.:
Q QutputRange: �1
0 Ne_w Worksheet B!Y� I ve.stme:nt Equation
I
0 New �orkbook-
Residuals
0 !?,esiduals. 0 .ResiQ_ual 'Plot->
The result is (see also Table 15.13 p. 566 in Principles ofEconometrics, 4e):
Panel Data Models 387
8_ I c I D I E I F I G H I
+fsoM"MAk�·aliTP ur
. �
Tl Reg_ression Statistics
4 I Multiple R Q_6621l129
lQ_
2'1
�
---
u
n RESIDUAL OUTPUT
-- ·--
2'4
-1§..I Observation Pe
r diclerJ inv we Residuals
25 i 9.786167743 J_14383
· 2257
-w 2 26.857903�3- -Q.95790303
"is- � - 38c.?��2�35 }-61!4_?3�!i - " �
Below we use the Goldfeld-Quandt test to test the null hypothesis H0: a EJ = a a,E.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Goldfeld-Quandt Test.
A B c
1 Data Input Ni= ='GE Investment Equation'!B8
2 Ki= ='GE Investment Equation'!B12+1
3 MS Residual 1 = ='GE Investment Equation'!D13
4 N2 = ='WE Investment Equation'!B8
5 K2= ='WE Investment Equation'!B12+1
6 MS Residual 2 = ='WE Investment Equation'!D13
7 a=
8
9 Computed mi= =Cl-C2
Values
10 m2= =C4-C5
11 F-statistic = =C3/C6
388 Chapter 15
A B c
12 Goldfeld-
Quandt test
13 Right-tail F=
c =FINV(C7,C9,C10)
14 Conclusion= =IF(Cll>=C13,"Reject Ho","Do Not Reject Ho")
15
16 Two-tail FL=
c =FINV(l-C7/2,C9,C10)
17 Fuc= =FINV(C7/2,C9,C10)
18 Conclusion= =IF(OR(Cl1<=C16,Cll>=Cl7),"Reject Ho",
"Do Not Reject Ho")
At a = 0.05, the result of the Goldfeld-Quandt test ts (see also p. 566 in Principles of
Econometrics, 4e):
A I 'B I c
9 Compute di Values m1= 17
-
-10 m2= 17
A I B I c
- -
11 F-slatistic = 7_45333
-
1 Data Input I N1 =I w T2 Goldfeld-Qu.andt test
-
J
-
MS. Residual 1 = 777.4463 14
-
Gon:e'l usion = R·eje-ct Ho
4 Ni= 20< 15
-
-
7 a= 0.05 18 Conclusi-on Ho
Again, we consider a model for describing gross firm investment for General Electric (GE) and
Westinghouse (WE), over a period of T = 20 years. These two firms have distinct investment
behaviors that are fixed over time.
This time we assume that the variances of the error terms are different across firms, (15.28), and
the error terms across firms, at the same point in time, are correlated, (15.29):
(15.28)
(15.29)
Correlation like (15.29) is called contemporaneous correlation, and to be accounted for, a dummy
variable model, not separate investment equations, has to be estimated. As we saw in Section
15.4.2, a dummy variable model like (15.25) implies that (
var ecE t
, ) = var ewE t
,
( )· So, the
dummy variable model will have to (1) correct for the heteroskedasticity implied by (15.28) and
(2) account for the contemporaneous correlation between the errors of GE and WE implied by
(15.29). This is what a seemingly unrelated regressions (SUR) model does.
Panel Data Models 389
We would like to test whether or not <rcE,WE = 0 to determine if we need a SUR model. To carry
out such a test we compute the squared correlation:
2
<TcE,WE
rc2E,WE
�
- 2 2 (15.30)
<TcE<TWE
� �
where agE and a�E are the mean square residuals from the GE and WE investment equations.
The estimated covariance is computed from (see also p. 567 in Principles ofEconometrics, 4e):
20
8cE,WE =
1\ L ecE,tewE,t (15.31)
t=l
Go back to your grunfeld2 data worksheet.
In cells Nl :05, enter the following labels and formulas. In the last column, you will find the
numbers of the equations used, if any.
N 0
1 =(1/17)*SUMPRODUCT('GE Investment Equation'!C26:C45,
O'-batGE,WE ""
'WE Investment Equation'!C26:C45)
(15.31)
2
2 <J -hatGE =
='GE Investment Equation'!D13
2
3 u - hatwE-
- ='WE Investment Equation'!D13
2
4 r GE.WE-
- = (QI A2)/(02*03) (15.30)
5 rGEWE = =SQRT(04)
N I 0
1 a-h:a�Wi! - 207_5871
2 ai-ha�= 777.4463
,_
3 az·hatWE - 104-3079
r-
<
4 r G..w.> = 0.53139
I-
5, •Gl,W< - 0.728965
-
lb tr I .---1\
I!Insert Warksheet{Shilt-'-flll·I l____/
In it copy the simplified Lagrange multiplier test template you created in Chapter 9.
390 Chapter 15
In cell Bl, replace N by T. In cell B2, replace R2 by r2 GE,WE· Delete the content of cells Cl:C2.
In cell C2, we get the value of r2 GE,WE from our grunfeld2 data worksheet, as shown in the table
below.
A
1 Data T=
2
2 r ='grunfeld2 data'!04
3 a=
4 m=
5 B c
6 Input Values
Computed
2
. . 1 va1ue= =CHIINV(C3,C4)
x -cntlca
8 Test GE.WE=
= =Cl *C2
Conclusion= =IF(C8>=C6,"Reject Ho","Do Not
10 =
11 Conclusion= =IF(CIO<=C3,"Reject Ho","Do
r.!
La2ran2e Multiplier Not Reject Ho")
9
At a 0.05, with T
= 20 and m Reject
1, the result of the test Ho")also p. 569 in
is (see Principles of
Econometrics, 4e): p-value =CHIDIST(C8,C4)
1 Data Input T=
r2GE.wE 0_�3139
0_05
mr=
.3 a:==
·-
-
4 Conclusion = Reject 1
Ho
.�
,5
·-
_]__
To implement the SUR estimation use one of the econometric software programs listed at
·- s. x2 =
www g.
·-
10 J!. OO�!_l 4
·-
·11
.principlesofeconometrics.com.
CHAPTER 16
CHAPTER OUTLINE
16.1 Least Squares Fitted Linear Probability Model 16.2.1 Censored Data
16.2 Limited Dependent Variables 16.2.2 Simulated Data
Open the Excel file transport. Save your file as POE Chapter 16. Rename sheet 1 transport
data.
We consider a model for explaining individuals' choices between driving (private transportation)
and taking the bus (public transportation) when commuting to work, assuming that these are the
only two alternatives:
Y = /31 + /3zX + e (16.1)
=
{1 individual drives to work
(16.2)
y 0 individual takes bus to work
391
392 Chapter 16
where p is the probability that y takes the value 1. This discrete random variable has expected
value E[y] = p and variance var(y) = p(l - p).
A priori we expect that as x increases, and commuting time by bus increases relative to
commuting time by car, an individual would be more inclined to drive. That is, we expect a
positive relationship between x and p, the probability that an individual will drive to work:
In the Regression dialog box, the Input Y Range should be Dl:D22, and the Input X Range
should be Cl:C22. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it LS Linear Probability Model. Finally select OK.
Input
Input y: Ran,ge:
[�]
0 i,_abels 0 Constant is "?_ero
0 conBdenc:e Level: � %
Ootput options
0 Qulput Range: �1
0 New Wo�ksheet Ely·: j =-ar 'Probability Mod el I
0 New �orkbook
Residue.ls
0 B,esiduals
A I B I c I I 'E I F I G H I
1 ,SUMMARY OUTPUT
.
2.
-�
J Reqressi'on Srali:sfiGs
4 MultipJe
R 0.781873104.
-� R S�q uare _o.61nzsss
-s;- AoJu steo- R Square D.5908'69
7' -Standard Error 0.327342874
T D b�ervati o n s . . 21
T
rJF SS MS F Signiticarme F
J.f.. Regression 1 .3 .202181 4 55 3 .202181455
,
29.8840983· 2.8.342E-05
J1 R·es.idual 1. 9 , 2.0359137841 0.107153 357'
14 Total 20« 5.2380 9523°8
15 I
1& I Coefficients Standard Error t Stal P-vaiue Lowef 95% Upoer 95% Cower 95. 0%, UoDer 95. 0%
0.4?47:95Q� 8 Q.0•7144;!41'1 6.78.5151347' U6499E� O � O.J.35249'732 0.634340404 0.33514973:2 0.163434(1404
0.0070309�2· 0 .001266;164 5.4666.35007 .2.S 34 2 E�05 0.0.04 3 390 1 9 0:009722955 o·.oo4:3J:9ofa o.o.0·97229-65
j A I B I c
2..?;� RE S IDUAL OUTPUT
23
'---
24 Oose!lf.8tiorr Pf'Scfof.ed auto Resid11a.ls
25 1 0 1 437919 71
_ ' - 0 143791971
_
26 2 0.65&351265 �0.656351265
27 3 1.066951201 -0 0 5696 1 201
_
f---
28 4 0.311832&73' - 0 311 !!32673
_
19 5 0.262:615731 -·0.262·61-57.31
30 '6 1 1:24615311
_ - 0 1241515311
_
34 10 0_ 12269899& - 0 122'59B996
_
f---
35
--
11 -0.152:915867 0.152915657
12 0 945.32'5 023 0 054,574977
.� _ · _
37
---
13 0_ 1 75431434 0_824568?5_6
J_! 14 0.435576126 -0.435578126
39 15 0.84759'42'25 0_ 152405775
f---
40 1'6 0_712'599213 02B740()7B7
f--.
41 17 0.050·2 79789' -0.05027(1769
42 18 0 723848785
_ , 0-276151215
f--'-
43 19 0_68095.9736 ()_3190402-54
44 20 -0.02:776424 0.0277>6424
45 21 0 835 41567
_ & 0 1'64358433
_
The underlying feature that causes this problem is that the linear probability model (16.1)
implicitly assumes that as x increases the probability of driving increases at a constant rate.
However, since 0 :::; p :::; 1, a constant rate of increase is impossible. To overcome this problem a
nonlinear probit or logit model must be used. These estimation options are available m
Open the Excel file mroz. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 16 in one file, create a new worksheet in your POE
Chapter 16 Excel file, rename it mroz data, and in it, copy the data set you just opened.
To plot the histogram of the wife's hours of work in 1975 (hours), we proceed as we have done
previously in Chapters 4 and 14.
AA
1 BIN
2 0
3 200
394 Chapter 16
Select cells AA2:AA3, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell AA27: Excel
recognizes the series and automatically completes it for you.
I
AA
23 42M
,r AA I 24 440·0
1 BIN 2:5 4600 -
2 25 4800
i zo�! 21 5000+.
• I 'JO I
In the Histogram dialog box, the Input Range should be H2:H754, and the Bin Range should
be AA2:AA27. Check the New Worksheet Ply option and name it Censored Data Histogram;
check the box next to Chart Output. Finally, select OK.
. - -
1 Histogram
l1JLR!
Il'lput
InputRange� I $H$2:$H�754 [@] OKEJ
16in Range� I $AA�2:¥<A$27 [�I ·Cancel ]
t!elp l
Output optlons
0 QutpUtRange:
0 New Worbheet<P_ly: I ·ed Data Histogram! I
0 New '!'\l_of.kbook
D P.§_reto.(sorted histr.gram)
D C�m.ulative Percentage .
� QiartDutput
Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. In the Border Color tab, select Solid line, and
change the Color to black. Finally select Close.
-------�
Qelete (
Series Opbo:ns 1 Series Options format IJrata Series
� Re�e-t to Mgtch :Sl)·le·
Ftll Series Q.verl.ap
Chan.!Jle S�ri;es
· Ch.art Type ... Series· Options Bord�r Color
Border Color Separated �
Si;_�e-rt Data .... Fill Q !:J.Eoline
�
Border S yles
3-D Botatron,, .Border CollDr @ �olid line
Sh.odow 0 �adient line
Ad dl Data La.]l:e·l:s Gap Width -- Border Styles
Format 0 A!,!tomatic
Addi Tren.drhn,e....
3·-0
I NoG-:p __
Shado\111
After editing, the result is (see Figure 16.3 p. 614 in Principles ofEconometrics, 4e):
Qualitative and Limited Dependent Variable Models 395
300
Ei"
"
2QO
Ill
"
...
, ...f
100
The histogram shows the large fraction of women who did not enter the labor force. This is an
example of censored data, meaning that a substantial fraction of the observations on the
dependent variable take a limit value-which is zero in the case of market hours worked by
married women.
(16.6)
where xi are uniformly distributed over the interval [0,20] and ei are normally distributed with
mean 0 and standard deviation 4.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.
A B c D E
1 y* x e y E(y*)
2 =- 9+B2+C2 =IF(A2 <=0 , 0 , A2) =- 9+B2
396 Chapter 16
In column B we generate a sample of 200 random values that are uniformly distributed over the
interval [0,20] and in column C we generate a sample of 200 random values from a normal
distribution with mean 0 and standard deviation 4. We proceed as we have done before in
Chapters 14, 12 and 3.
We first use the Random Number Generation dialog box to generate our x values. We need to
generate one set of random numbers for our x values, so we specify 1 in the Number of
Variables window. We would like to generate 200 random numbers, so we specify 200 in the
Number of Random Numbers window. We select Uniform in the Distribution window; the
selected range should be Between 0 and 20. Select Output Range and specify it to be B2:B201.
Finally, we select OK.
!'lumber of�ariables:
NumberofR��dom NUJfll�rs:
�Iro_o
___
�
Qii:b'ibmfon.: j
�
u_
ni fo
_r_m _____ v�I I !::!elp
Paramel:Er"
'B.andoiin Seed:
output llptians
0 QutputRange:
Next, we use the Random Number Generation dialog box to generate our e values. We need to
generate one set of random numbers for our e values, so we specify 1 in the Number of
Variables window. We would like to generate 200 random numbers, so we specify 200 in the
Number of Random Numbers window. We select Normal in the Distribution window; the
selected Parameters should be Mean equal to 0, and Standard deviation equal to 4. Select
Output Range and specify it to be C2:C201. Finally, we select OK.
-
-rn
• Ra11dQm Numli>er Genercition �
Number of �ariatiles:
I�==�
i
Number·oflRandom 'Nlum!!.ers: �I20_0 _____,
Parameh:rs
M�an=
�
��ndard devialic>n = �
Random Seed:
Output op tlons
(9) Quiput Range:
After you copy the content of cell A2 to cells A3:A201, and the content of cells D2:E2 to cells
D3:E201, here is how your table should look (only the first five values are shown below):
Qualitative and Limited Dependent Variable Models 397
A E3 c D E
I � y• -
x e y E{Y"I _
Note: you will obtain different random samples than the ones we obtained for x and e, so your
y*, y and E(y*) values should also be different than the ones reported above.
Next, we plot the uncensored sample data and the latent regression function as we have done
before, in Chapter 2 for example. We choose a Scatter with only Markers chart type for the
uncensored sample data series, where the x-axis values are B2:B201 and the y-axis values are
A2:A201. The latent regression function is plotted using a Scatter with Smooth Lines chart
type, where the x-axis values are B2:B201 and the y-axis values are E2:E201.
The result is (see also Figure 16.4 p. 616 in Principles ofEconometrics, 4e):
2CJ
10
-10
..
-20
The latent or uncensored data y* are scattered along the latent regression function. If we observed
these data we could estimate the parameters using the least squares principle, by fitting a line
through the center of the data.
However, we do not observe all the latent data. What we can do is estimate the parameters of our
regression, using the least squares principles, by fitting a line through the center of the observed
or censored data-which is what we do next.
In the Regression dialog box, the Input Y Range should be Dl:D201, and the Input X Range
should be Bl:B201. Check the box next to Labels. Uncheck the box next to Residuals. Select
New Worksheet Ply and name it LS Fitted Censored Data Model. Finally select OK.
398 Chapter 16
lnptJt
1nput )'_ 'Raniie: I :$0$1: $1)$201 [� cancel
Input� Range: I :$6$1:$6$201 �
t!elp
�!,_ab!!ls D Cons.tantis l!!rO
D cinDdenae Level; 19: 1%
Oulput options
0 Qutput �ange·: �1
©New \�Jor'kslieet·BJy: j 1sored Data Model I
The result is:
-
A B I c a I E F GI I H I I I
1 SUMMA'RY OUTPUT
--
2
!
j Reqression Statistics
--
4 Multiple,R 0.74233 03'41
2- R Square 0 551054335
.. .
-,.
AdJ�·�te<l
-· ..
6 R $.quare 0.548786�33. l
,_
1 Standard Error 3., 67307796 1
8 Ol>servations 200,
,._.!_ j
10 AN OVA
11 df SS MS F Sfan.fffciil'nce F I
J_2_ Regression 1 2,43.8,071126
: 2438.0'71126 243.03.33267 2.S0475E-36
13 Residual rnr 1986.304 05 7 10_01133s67'
14 Total 199• 4424.375184 j
t5
I
�16 Coofftdents Standim:f Error /Stal P-l'a}ue Lowe195% Upper95% i..oww95.. 0% Uppe;�.0%,
�nterc ept -2.451600913: 0.46:355
· 9917 -5 28�638697 3.2464£iE-07 -3.3()5'7491_49 ·1.537452677 -3.3�5!�91149 -1.537452677
'
lK 0 .5 979245173: 0.03 8354.249 15.58952619 2Jf0475E-.36 0.522289325 0.6'73-55'9821 0.52':2289i325 0.673559821
The estimated regression function in the table above gives different parameters estimates than the
ones reported in equation (16.32a) on p. 616 in Principles ofEconometrics, 4e because it is based
on a different censored sample data. Your estimated regression function will also be different
than ours and that of Principles ofEconometrics, 4e for the same reason.
F
1 LS Fitted
2 ='LS Fitted Censored Data Model'!$B$17+
'LS Fitted Censored Data Model'!$B$18*'simulated data'!B2
After you copy the content of cell F2 to cells F3:F201, here is how your table should look (only
the first five values are shown below):
Qualitative and Limited Dependent Variable Models 399
F
1 LS Fittedf
2 2.60S965
3 S.944174
4 G.04018
5 o,_759.911
G 9.4815453
Note: since you are working with different random samples than the ones we are working with,
your LS fitted values should also be different than the ones reported above.
Next, we plot the censored sample data, its least squares fitted regression function, as well as the
latent regression function we plotted earlier. We choose a Scatter with only Markers chart type
for the censored sample data series, where the x-axis values are B2:B201 and the y-axis values
are D2:D201. The regression functions are plotted using a Scatter with Smooth Lines chart type.
For the least squares fitted regression function, based on the censored sample data, the x-axis
values are B2:B201 and the y-axis values are F2:F201. For the latent regression function, as
plotted earlier, the x-axis values are B2:B201 and the y-axis values are E2:E201.
The result is (see also Figure 16.5 p. 617 in Principles ofEconometrics, 4e):
- Ely*)
- Fitt e d LS .
..
..
.
-20 �-------�
0 2 4 B 10 12. 14 �6 18 20
Note that the least squares principle fails to estimate /Ji = -9 and {32 = 1 because the observed
data do not fall along the underlying regression function E(y*) = {31 + {32x = -9 + x.
Finally, we can estimate the parameters of our regression, using the least squares principles, by
fitting a line through the center of only the positive sample data-which is what we do next.
Next, select cells H2:I201. Right-click, select Copy. Place your cursor in cell K2. Right-click,
select Paste Special. In the Paste Special dialog box that pops up, select Values. Finally, select
OK.
�I Cut
�= --
.(;;opy
,� o�
I
['if!i ,Ee>ste
0 EormUJlas
P.a,ste �pe�i a,�,, .. 0[��'�'�'�j
Here is how your table should look (only the first five values are shown below):
K L J
1 v x
2 1_18(}205 8_4'65224
3 11-94971
- .... ...
15_713:98
4 13.Q9
. 52 14..20:209'
5 0 5.369421
G 14.776$8 1 '9 . 96582
Select the Data tab in the middle of your tab list. In the Sort & Filter group of commands, select
the Sort Largest to Smallest option.
rAi'Z1
.II. I
z+ �
Su rt Filter
� Advanced
Sort&. Filter
Here is how your table should look (only the first five values are shown below):
·JI k L I
1 v )(
2 19.1556;3 18.06635
3 ·1.§.176?5 1 9.72:045
4 15.50975 15.Q�.]69
-
5 15. 0975 17.97295
G 14.77688 19.96582
-
In the Regression dialog box, select only your positive y-values in column K and their
corresponding x-values in column L. Our Input Y Range is Kl:K112, and our Input X Range is
Ll:L112; yours will be different because you have a different sample of data. Check the box
next to Labels. Select New Worksheet Ply and name it LS Fitted Positive Data Model. Finally
select OK.
Qualitative and Limited Dependent Variable Models 401
InputX'�;
Regression [2I) rg)
!:ielp
�!,_abels
Input 0 Const:ir:it is �ero
Oconfjdcince Level;
Output options
0 QulputRange:
New Worksheet E'.lv: d Positive D:ata Model
@=] %
The result is:
@ I I
SUMMARY OLJTPUT
Slafistics
Multiple R 0.5741·62303
•
R Square- A I B 6235
0.32'96_ I c I [JI E I F I G H I I
Adjust,� R Squ-are> I
r-1- - l
2 Standar;:I Error 3.52076612:7
-
3 Obsel'/atiom;
R.eqrassion 111
4
>--
5 /'/NOVA
10
6
11 O .J2J.51246· 3 SS F
7 R;egFe:s sion 664.471854.2 5.3.6:0462167 4.. 45216E-11
�
13
8 R·esidwal ·10� 1.351.141559• 1239579412
�
9 fota'l
1£
df f MS
Stal
t. F Sirznif icanoe
Lower95%
�
Lower95.0%
12 lnt:eT-cept,
17 2 60 07 862311
- . • 1.35100075 664.471.8.542
-1.92
.
: 12·64
. 0.056825344 0 .07685.325 3 -5.278425-714 0·.076853253
,__ ,
Note that once again the least squares principle fails to estimate /31 = -9 and /32 = 1. If the
dependent variable is censored, having a lower limit and/or an upper limit, then the least squares
estimators of the regression parameters are biased and inconsistent. In this case we can apply an
alternative estimation procedure, which is called Tobit in honor of James Tobin, winner of the
1981 Nobel Prize in Economics, who first studied this model. The Tobit estimation procedure is
available in standard econometric software.
APPENDIX A
CHAPTER OUTLINE
A.1 Mathematical Operations
A.1.1 Exponents
A.1.2 Scientific Notation
A.1.3 Logarithms and the Number e
A.2 Percentages
If you have not done so, read Chapter 1 in this manual. Do it now.
The basic arithmetic operations are described in Section 1.3 .1. Here we explain the use of some
Excel functions that may help in computations. Open Excel and save the workbook as Appendix
A. Rename Sheet 1 as math functions. In cell Al type the label x to name the column. Enter the
values 1,5 , -3, 3 in cells A2:A5.
::a ., - Miumolt ®I
�rll-4 hid&
i.Jf•"1� 'll:!ll ../. - Oih>< • $.t.a
F<t.I'�
112 v
"'
II c D 6 H
_1_ •
1 1
3 5
i -J
5 JI
6
�
9
1(1
��,.;� !:I
402
Mathematical Tools 403
Many mathematical functions are built into Excel. These are easy to access with a few clicks.
Suppose we want to find the sum L{=1 xi. sum x. Click in cell
In cell AS type the label BS to
make it the Active cell. Locate the Insert Function icon, to the left of the formula bar.
llgnment
A click opens a dialog box. In the search box you can enter the term you are seeking. Type sum
and click Go. The recommended function is called SUM. The command format is shown below
the function window, and the very important Help on this function link is given in the lower left
comer.
sum g_o
Select. . a fi.rl.ct10u:
SUMX2PY2
LINEST
SUMIF
SUMIFS
sUMx2fYIY2 Command format _1
SUM_59 ...:J
8lil'(l(m1nberl.numbef.2,. ) •. .
V"'
1-'dldc ;:ill ll:ic ""mboro n;:i r'lJlgo.of colln.
Help m e nu
__,,,,,,_..
O'K Cance.I
Click on OK.
Several changes occur. First, in the formula bar "=SUM()" appears-the summation command
is awaiting a range of values to add up. In the active cell BS the command is mirrored.
404 Appendix A
JI. CU\
�opy
1·=�1r�1 r Wraj>C Norm ill
? Function ArgLJments _
J 5
'1 3
! '"m{@
10
Collapse Dialog Button
12 Numl'icrl: oombcr 1,r11.1n,b6<'.1, ... .rel ·ID ao nurrooro to cum. Laqtc.:il v.:ik.ioo .:rt! 1Dxt0<1iiqnorccl
B n colic, n:b.iod rftypc!d JG :.ir:qurnontc,
14
15
11')
17
18
Click on A2 and drag the mouse down to AS. As you do so the argument in the SUM function
changes to A2:A5.
A. c D F
Function Arguments I
Number! A2 :As
Number2
8 sum x ji(A2:A5} l
g
I\ c D E F G H
1 l(
l.. 1
3. 5
4 -3:
'\; .'I
6
7
g sum x GlI
9
10
11
� � math fonatioris., Sflcct2 • Shcctl • �
Mathematical Tools 405
If the Function Argument dialog box happens to be in the way of numbers you want to select,
click on the oddly but aptly named collapse dialog button. It will temporarily reduce the dialog
box and allow you to drag it out of the way. After you are done selecting data click the restore
button to return the dialog box to full size.
A more direct approach is to type a formula, beginning with an equal sign in an active cell. To
illustrate compute If=1 xf. In cell A9 enter sum xA2 where the caret is a way to indicate a power.
In cell B9 enter =sum and a drop down list of functions appears.
A B c F H
, •
'
s 5
Type"=" and the fast f.ew
4 3
5 3 l'i'thus of tihe command you
6 s.eek. Exca'l's d ro p ,down list
7 then provides some cll oi ces.
� sumx
'!J �umx112
11:1
11
12.
B
14
15
16
0
I• -� � !J•arh f1111:pt"IW. 5hllefl. • ShlWI •
10
11 SUMIF
.SUMIFS
12
SUMPRODUCT
13
SUMSQ Return,;, the s.um of the square!> of the .arguments.,
14
:
.� SUMX2MY2
15
SUMX2PY2
16 o SUMXMY2
Double-click SUMSQ and the function enters B9. Specify the function arguments by filling in
the range A2:A5. Don't forget the closing parenthesis. Then press Enter to obtain the sum of
squared values.
406 Appendix A
A B
fie] =SUMSQ(A2:A5)
l1 ,,.c_
SUM 'X ..,t' 1 x
I A � B l c I D I E 2 1
1 x
I 3 5
1
�
4 -3
5
-3 5 3
3 6
t
6
7
117
� 8 sum x. 6
8 sum x 6
'--
,_
9 sum x112 I =SUMS.Q(�2:A5) 9 sum x:A2 441 ·-
ID 10
The trick is to know what functions are available. The key tool here is the Help button that will
be found in the upper right comer of the window.
l"'!1 1(
ng_M ci ...,r,,...os ci ft
.-J�,,...u:rs i....
..,h e o fflc e. -�
tnS!'rt Delete 1.' ..
Form -..__G-. t ... _
_ __ ____
. ea
Filter
y Select�
Cells EdrtJng
In the resulting Help window you can find resources for functions and many other tasks. If you
do not see what you are seeking, enter a short phrase or keyword into the Search window and
press Enter.
.� Excel Help
� Function refe�ence
summanzirig , consolidatlng, and o u l:ll nlng data Validating datcl
Rltenng, oorting, and conditiona lly fl
Mathematical Tools 407
Click on Function reference. There are sections for Math and trigonometry and Statistical
functions.
Function reference
EJl!llneenng !Financial
logical
L-oolruµ and reference Matti and m;ionomeuy
.ATAN furn.-tucl
Click on Math and trigonometry functions (reference). Below is a very abbreviated list (copied
directly from the Excel Help) of some useful functions.
Function Description
SUMX2MY2 Returns the sum of the difference of squares of corresponding values in two arrays
SUMX2PY2 Returns the sum of the sum of squares of corresponding values in two arrays
SUMXMY2 Returns the sum of squares of differences of corresponding values in two arrays
Click on SUMX2PY2. The resulting help window includes an equation, so that you can quickly
see that the function is designed to compute the sum of squares of two arrays.
SUMX2PY2 function
l±lShowAll
Returns the sum of the sum of squares of corresponding vaiues in two arrays. The sum of th.e sum of squ.ares is a
Syntax
SUMX2PY2(array_,x.array_y)
Remarks
• The argLiments shou[d be either numbers or names, arrays, or references that contain numbers_
• If an array or reference argument contains text, lo_gic.al values, or empty cells those values are ignored;
l1owever, cells with the value zero are included_
• If array_x and arr.ay_y have a different number of values, SUMX2PY2 returns the #NJA error value
......
'=====:!• The formula tells all.
A.1.1 Exponents
The notation xn means take x to the nth power (see p. 635 in Principles of Econometrics, 4e).
The function POWER achieves this in Excel. We will use this function to raise each value in the
array x to the power -3. Note that x-3 = 1/x3 as long as xis not zero.
Close the Excel help window. In cell Bl enter xA-3. In B2 enter =POWER(A2,-3) and press
Enter. Select cell B2. Move the cursor to the lower right comer of B2 until it turns into a skinny
cross. Drag the cross down to cell BS and release. Cells B2:B5 contain the calculated values.
Mathematical Tools 409
A B
A B ,.c
1 XJl.-3
I
x
1 x x_A-3
2 11 11 2 1 1
T- 3 5 0.008
3 5
4 -3 0 0 3 7 04
I
-
.
4 -3
5 3 0.037037�
5 3
6 �
L. , +
Instead of using the power function we could have entered =A2"-3 into cell B2 and pressed
Enter, then dragged the formula down to achieve the same result.
Very large or very small numbers can be expressed as a number between 1 and 10 times a power
of 10. For example, 0.00000034 is 3.4 x 10-7 = 3.4E - 7. In cell AlO enter small x and in BlO
enter .00000034. Right-click on the cell BlO and select Format Cells from the menu.
,f;;Opjr
A H _E.a.ste
fi tr=rltgr
7 :SQ rt
8. sum x 6
�nsert Comment
9 sum xJl.2' 44
,Eofmat Cell>...
10 small x: 0. 00000034
Prck Frntn Drop-down List ..
11
Name a _Bange: ...
_E}yperl fn It..
Select Scientific with 2 Decimal places and then OK, the number is now represented in scientific
notation.
410 Appendix A
A B
Number
l Alignment ] Font
I Border
I Fill
I 1 x .x'"'-3
�at:Egory: 2 1 1
General
Number
� 3 5 0.008
r:.:��07 4 -3 - -0 . 03 7037
Currency
Accounting
Date
Qecimal places: 12 � 5
6
3 D.03703704
Time
Percentage 7
Fraction
8 S.tlm X 6,
11
In cell Dl enter the label y and in El enter the label ln(y). D2:D8 enter powers of 10 starting
In
with 1 and ending with 1,000,000. In cell E2 enter the formula =ln(D2) and press Enter. The
function LN is the natural logarithm. All the logarithms in Principles of Econometrics, 4e are
natural logarithms, rather than those to the base 10, or some other base.
A B C D E
1 x X:"-3 y 1_n....
(y._)
...,
-
2 1 1 l 1) =ln(D2) I j
3 5 0>.002 10
-
_?
&
l 3 0.03703704 1000
10000
,____,
r lUUUUU
_!Jsumx 6 1000000
�sumx"2 44
10Ismail x 3.4Ut-O/
Move the cursor to the lower right comer of E2 and drag the formula down to ES.
A B c D E
K"-3 y ln{y)
1 1 1 0
5 0.008 10 2.302585
4 -� -0.037037 100 4.605:17
5 3 0.03703704 1000 6.907755
6 10000 9.21034
7 100000 11-51293
8 sum)! 6 llOOOOOO 13.81551
9 sum xA2 44
HJ sma&I x 3..40E-07
Logarithms are very useful in econometrics. The properties of logarithms are discussed on p. 636
of Principles of Econometrics, 4e. For example z = ln(y0·5) = 0.5 x ln(y). In cell Fl enter the
label z=0.5ln(y). In cell F2 enter the formula =0.5*E2. Copy the formula from cell F2 down to
Mathematical Tools 411
cellsF3:F8. Now that we have z, a variable in logarithmic form. Next, we would like to convert z
back into a non-logarithmic form (this is called taking the antilogarithm). To do that we use the
exponential function. In Gl enter the label exp(z). In G2 enter the formula =EXP(F2), and then
press Enter. Copy that formula down to G3:G8. Now, compare columns D and G. The values in
G2:G8 are the square roots of the values in D2:D8. Of course we could have simply used the
SQRT function to do this calculation, but the point here was to demonstrate operations with
logarithms.
D I E F r
....
�
01 I E I F t G
�
_..
H J K I J
l
x x y=.03exp(x.)
- - -
r ·�a� 1 0:081548
-�
-I 21 2. 0.,221672
I
3 0.6D12566·
I 4 1.637'945
I I
-
5 4.452395
"
¥. 6 1 2.1 0286
i� I 7' 32.89899
I I 8
9
89.42874
243.0925
i il�
--F 10 660.794
I I
Highlight 11:Jll (all the cells from these two columns, including labels). Click on the ribbon tab
Insert and then select Scatter charts.
412 Appendix A
Page Layuut FOrmufBs Data Review View Diovelop!!r Add-ltis Amibat ij) � X
�
-
I@ shapes •
Header & li'oater �
PivotTable Table Pic:tUJre
r: SmaMrt ..'1 WordA.rt •
From the drop down menu choose the one showing curvy lines. A graph is superimposed showing
the plotted relationship with a title (since you included the header row with text labels in your
cells selection).
Scatter
y=.03exp(x)
800
600
lQ � M 400
- y=.03exp(x)
200
tcl 0
0 5 10 15
i8J 811 Chart Types ...
Select the figure and drag it off to the side. Place the cursor over the column header to select
column J. Select theHome tab. Once there, go to the Cells group of command, select Insert and
then Insert Sheet Columns.
L ..,
I J w,
x y=. 03exp(i<
-
1 0 081548.
..
2. 0.221672.
3 0 . 6 02 5 66· 8� Inisert C�lls
4 1.637'945 J
_J lm:ert :heet £0 s
5 4.45?395
CD [ � ·rm· •
....
EEi
:i ··
..
Ifi1sert Sheet �olumnt&
6 12.102.86
� Insert1 Detete Format
ln�ert Sn eet
7 32.89.899
8 89.42874 celrs
9 243 0925 ..
10 6·60.794
Now column J is empty and column K contains the y values. In the new Jl enter the label ln(y),
in J2 enter the formula=ln(K2), then press Enter. Copy the formula from cell J2 to cells J3:Jll.
Now graph the relationship between x and ln(y). As you can see it is a straight line.
Mathematical Tools 413
I J I K
x ln(y) y=.03exp(x}
1 2 5 0 656 0.081548455
ln(y)
- .
2 -1.50656 0.2216'7'1683
3 -0.50656 0.602566108 10 �----
4 0.493442 1.637944501
5 1.493442 4.452394773
- ln(y)
6 2.493442 12 10i28638
..
7 3.493442 32.898.99475
8 4.493442 89.428.73961
g 5.493442 243.0925178
10 6.493442 660 .7939738
.
For econometric analysis the ability to convert "curved" relationships to straight lines 1s
sometimes very important.
A.2 PERCENTAGES
While we have understood percentages since grade school, let us consider them again. In
particular we should keep the distinction between a percentage change and its decimal form clear.
In the Appendix A workbook label a new worksheet percentages. In Al enter the label y. In
A2:A7 enter values 1.01, 1.05, 1.10, 1.15, 1.20, and 1.25. If they-value changes from y0 to y1
then the percentage change is
Yl - Yo
%Lly = x 100
Yo
For each of the values in A2:A7 compute the percentage change from the value Y = 1. This
o
choice of y0 value implies that the percentage change equation becomes: %Lly = (y1 - 1) x
100.
In Bl enter the label pct chg. In B2 enter the formula =100*(A2-l), then press Enter. Place your
cursor on the lower right comer of B2 to form a skinny cross, then drag it down to B7.
A B
....
A I B
I
....
A I B
pct chg 1 y pct chg 1 y pct chg
i.cn) :::: 10D"'�A2 -} ) 1
- -
2 1.01 11 2 1.01 1
-
+ -
j
3 1.05 3 1.05 3 1.05 5
- ,_
1
6 1.20 6 1.20 6 1.20 20
-
Y1 -Yo = 1.10-1 =
.lO
Yo 1
is the decimal equivalent of the percentage, but the percentage itself is multiplied by 100:
A B c
3 1.05 5 D.04879
4 1.1() 10 D.09531
s 1.15 15 0.139762
6 1.20 20 0.182322
7 1.25 25 0.223144
You can see that the approximation works pretty well for the first few cases. In Principles of
Econometrics, 4e, p. 638, it is shown that this trick with logarithms can be used to approximate
percentage changes when the change is small.
reduces to: %Lly ::: (ln(y1)) . In D2 enter the formula =100*(LN(A2)), press Enter. Drag this
formula down to D3:D7.
/I
A I H I c t D l
1 y .Pct chg ln(y) approx pct chg
2
r 1.cnJ l 0.009951 =·10U*(LN(A2))1 J
3 1:05 5 0.04879
,_ - --
4 1.10 10 0.09531
,_
5 1.15 15 0.,1:397'62
,_
6 1.20 20 (Di.182322
7
-
1.25 25 Q.223144
A B C D
1 y pct chg ln(y} approx pct chg
2 1.01 1 0.00995 0.995033085
3 1.05 5 0.04879 4.879016417
4 1.10 10 0.09531 9 . .53101798
5 1.15 15 0.139762. 13 .. 97'619424
6 1.201 20 0.182322 18.23215568
7 1.25 25 0.223144 22 .. 31435513
Highlight C2:D7, right-click and select Format Cells. Choose the Number format with 4
Decimal places. Click OK.
Format Cells
�nb!r is u:>W r.,. \fl'JtJT "I Llkiplely urhrntbfns, Cll r"rllly. air.:J l\CbW 1� u'lffii Spil(idll.'.t:tl
fi:lrrn;:i� liY fll'"!t;rv v;ikJP..
Compare the results in columns B and D. For the first few values of y the approximation is pretty
good, but when y = 1.10 the approximation error is already Yz%.
J._A I B c D
1 y pct chg ln(y)' approx pct chg
,_
Use the approximation %.lly = 100(ln(y1) - ln(y0)) only for small changes in y.
APPENDIX B
CHAPTER OUTLINE
B.1 Binomial Probabilities B.3 Distributions Related to the Normal
B.1.1 Computing Binomial Probabilities Directly B.3.1 The Chi-Square Distribution
B.1.2 Computing Binomial Probabilities Using B.3.2 The t-Distribution
BINOMDIST B.3.3 The F-Distribution
B.2 The Normal Distributions
B.2.1 The STANDARDIZE Function
B.2.2 The NORMSDIST Function
B.2.3 The NORMSINV Function
B.2.4 The NORMDIST Function
B.2.5 The NORMINV Function
B.2.6 A Template for Normal Distribution
Probability Calculations
Excel has a number of functions for computing probabilities. In this chapter we will show you
how to work with the probability function of a binomial random variable and how to compute
probabilities involving normal random variables.
p,
A binomial experiment consists of a fixed number of trials, n. On each independent trial the
X x= 1,
outcome is success or failure, with the probability of success, being the same for each trial. The
X= x
random variable is the number of successes in n trials, so 0, ..., n. For this discrete
random variable, the probability that is given by the probability function:
We can compute these probabilities two ways: the hard way and the easy way.
416
Review of Probability Concepts 417
P(X = 3) = f(3) =
( 5! ) 0. 33(1 - 0.3)5-3
3! (5 3)!
_
Open Excel and name the workbook Appendix B. Rename Sheet 1 binomial. Make cell Al
active by "clicking" it.
Eventually you will learn many shortcuts in Excel, but should you forget how to compute some
mathematical or statistical quantity, there is an Insert Function fx button to the right of the cell
reference window.
·- Copy
Paste
Format Pamter
[u I !!
Cl1 b°'ard 5 5
A1
A B c E
1
..
I
Click on the Insert Function button, select Math & Trig in the Or select a category window.
Next, scroll down the list of functions in Select a function window. Select FACT; this function
returns the factorial of a number.
Insert Function
1
S.earch fo� a function;
Type ci brief description of whcit you wcint 1D do cind then click Go
Se.lect a fundioo:
EVEN
EXP
FACTDOUBLE
FLOOR
GCD
INT
FACT(number)
Relur.ns the factcrial of a·number, equal 1D 1 *2*3·* . '1 Number.
Definition
Click OK. Enter 5 in the Number window of the Function Arguments dialog box that opens
up. Excel determines that 5! = 120. Click Cancel.
-
-
fLmction Arguments
ti = s
= 120
Returns lhe fac:tnr-ial of a numioor, eClJ.lal to 1 *2*3� ••. * Nt.m1iller.
Your cell Al should still be active. Click on Insert function again. This time search for the term
factorial and click Go.
Insert Function I
FACT DOUBLE
MULTINOMIAL
FACT(number)
Re1urns_ lhe factorial o.f a number1 equal to 1 *2*3* ... "' N_umber.
The funtion FACT should be selected in the list that appears in the Select a function window.
Click OK. The Function Arguments dialog box shown above appears again. Click Cancel.
Alternatively, in cell Al type P[X=3], and in Bl type the following formula:
=(FACT(5)/(FACT(3)*FACT(2)))*(O.JA3)*(0. 7A2)
l'asle
II I
]fA�
Ir =�� -[ $- % 1� .
S�1es
i" lJeToer�
Sort& F1ml&
'.:J Femi
II-' - I r� ='EJ I ... �
•
\ �- . Q · Filtfr Sele.ct -
Clipboard r,, Font Number Coils Edctii;
2
3
'I 0
s
l'4 � • �I binnmial '5heet2 Sheet3 ..
Etlil �
Note that we have used parentheses to group operations. Press Enter. The result is 0.1323.
Make cell B6 active. Click on the Insert Function button. Select Statistical in the Or select a
category window of the Insert Function dialog box. Next, scroll down the list of functions in the
Select a function window, and select BINOMDIST. Select OK.
TnsPrt fi 1nrt1on , ? x
t>YERAGC/I.
/\�G.EIF
AlkH.At;i,Jf-!;
Btl"ADIST
OCTAll'H
r:HrnT:ST
::..!
BiNOM)lST{number_o,trl.als,prObablltv_s,.cun\UtatlY e)
___
The Excel function BINOMDIST can be used to find either cumulative probability, P(X:::; x) or
the probability function, P(X = x) for a Binomial random variable. Syntax for the function is:
• cumulative is a logical value. If set equal to 1 (true), the cumulative probability P(X ::::;
x) is returned; if set to 0 (false), the probability P(X = x) is returned.
Note that Excel defines each argument for which it is prompting you. In the middle portion of the
screen shot shown below, you can find the definition of the Cumulative argument-this is the
argument that is defined because the cursor is in the Cumulative window. Using the values
n = 5, p = .3, x = 3 and setting Cumulative to 0, we obtain the probability 0.1323, as
above.
runction Arguments ·-
ThJClMl)JSr
NIJMbllir_!O
IJ ii = ·::i
Tri*
1� 6 = 5
� 0.1323
Re�..m:. tlha irdll'�I term ilhomlal d151rbutb1 prabablltty.
Wmulatlve IS a loqt:al vatie; rtr tro cumulat!Ya d1Str11Ju1Jro lln:!Ul, use
TRVC; far 1he probaQilily mas3 function, userALGE.
Press Cancel. Next, we will set up a "template" that will allow you to compute any binomial
probability with a simple click or two. In A3:A7 enter some labels for the number of successes x
(B3) inn trials (B 4) with probability p on each independent trial (BS). In B6 we will compute the
probability that X = x and in B7 we will compute the probability that X ::::; x.
A s.
1 Pi[X:= 3 ] 0.132'3
2
3 successes x
4 trials n
5 probability p
6 P[X=x]
7 PfX<=xJ
8
9
10
Make cell B6 active again. Access the BINOMDIST function via the Insert Function button as
you have just done above or directly type the function in cell B6. Either way, this time around,
instead of specifying the values of the arguments x, n and p, specify the locations (cell
Review of Probability Concepts 421
references) where Excel can find those values. Repeat the exercise in cell B7, but this time set the
Cumulative to 1.
A IB
1 P:[X=3] =(FACT (5 )/(FACT(3}"'fACT(2 )))*'(O. JA 3}11<(0. 7A 2}
2
l
3 successes x
4 trials n
5 probability p
6 P[X=x] =BINOMDL5T(B3,84,B5,0i)
In this book we will use "templates" a great deal. These templates are Excel pages with cells
addresses in the formulas so that by changing a numerical value (say in B3 ) we can compute an
alternative probability. It is very instructive to see the formulas to check on exactly the structure
of the commands. Select the Formulas tab on the Excel ribbon, and then go to the Formula
Auditing group of commands.
fx
\
.r.� US(' 111 Fo mUl3
rnsert
Fmtdion Finmicial •
Function Lib
� !Date Bl. Time -
� l Name
Manager !!§I create from
Defined Narnes.
,
Selectron
Formula
uditing •
Calculation
Select Show Formulas. You can switch between the numerical values, shown below, and the
formulas shown above.
A B c
1 P[X=3] 0.1323
2
�;= Tr.acE: Preceden!Is Show Formula�
3 successes x 3
c(� Tr-ace Dependents tf, Error C!iieckin J't 8;J Watc� 4 trials. n 5
..?., Remove Arrows y ®. Evalt.1atte Formura Window
5 probability p 0.3
fDrmr.ila Auditing
6 P[X=x] 0.1323
t
7 P[X«=x� 0.96922
a
Next time you need to compute a binomial probability you can call up the function BINOMDIST
or you can open your Appendix B workbook, go to the binomial worksheet and enter values into
422 Appendix B
the template. For example, use the template to compute the probabilities for 5 successes, in 10
trials if the probability is 0.7. Here are the results you should get:
A B c
1 P '[X=3] 0.132.3
2
3 successes. x 5
4 trials n 10
5 probability p (l7
-
6. P,[X=x] 0.102919345.
7 PIX<=x] 0 .. 150268333
g,
9
10
So that this template can be perfectly general, delete the entries in the first row and, in cell Al,
enter the label Computing binomial probabilities. Save your workbook.
A B c
1 Computing bin o m i a l probabilities
2
3 successes x 5
4 trials n 10
5 probability p {). 7
6 P[X=x] Oi.102.919345
7 P[X<=xJ 0.150268333
.8
9
10
Excel provides several functions related to the Normal and Standard Normal Distributions.
The STANDARDIZE function computes the Z value for given values of X, µ and O'. That is, it
computes:
Review of Probability Concepts 423
X-µ
Z= -
The NORMSDIST function computes the area, or cumulative probability, less than a given Z
value. Geometrically, the cumulative probability is the area under the standard normal probability
density function to the left of the given value. In many statistics books the cumulative distribution
function of a standard normal random variable is denoted by the special symbol ct>. Then,
Example:
P(Z-S:. 1.73) = 'll(l .73) = .9582
-4 -3 -2 -1 0 1 2 3 4
z
Instead of a table in the book we will use the function in Excel. The format of this function is:
NORMSDIST(Z)
If we wanted to find the area below a Z value of 1.0, we would enter =NORMSDIST(l.O) in a
cell, and the value computed would be. 8413.
The NORMSINV function computes the Z value, Zc, corresponding to a given cumulative area
under the normal curve. The format of this function is:
424 Appendix B
NORMSINV(prob)
where prob is the area under the standard normal curve less than Zc. That is, prob = P(Z <
zc ) . If we wanted to find the Zc value corresponding to a cumulative area of .10, we would enter
=NORMSINV(.10) in a cell and the value computed would be -1.2815.
The NORMDIST function computes the area or probability less than a given X value, or the
value of the normal pdf for given values of the distribution mean µ and standard deviation er. The
format of this function is:
Let X ,..,N (µ, cr2 ) . Then the function NORMDIST will compute:
CUMULATIVE is a logical value, which can be replaced by 1. If we wanted to find the area
below an X value of 6, we would enter =NORMDIST(6,3,3,1) in a cell, and the value computed
would be. 8413.
The NORMINV function computes the x value corresponding to a cumulative area under the
normal curve. The format of this function is:
NORMINV(prob, µ, cr)
where prob is the area under the normal curve less than x. That is, prob = P(X < x). To
compute the value of x such that . 10 of the probability is to its left, enter =NORMINV(.10,3,3)
in a cell, yielding --0.844 6.
Rename your Sheet 2 normal and build a template for normal probabilities by entering the
formulas shown below. The highlighted cells require user input. The formulas in the other cells
do the computations.
Review of Probability Concepts 425
8
1 Normal Probabilities
2
3 mean
4 standard_dev
5
6 Left-tail probability
7 a
8 P(.X-c;=:a l =NORMD'IST{B7,B3,B4,1J
11 b
12 P ( X>=b ) =1-NO R MDI ST ( IB 11,B3,B4,1 }
13
14 Interval pro bability
15 a
16 b
17 P(a.o:;=.X<=b)
18
19 Inverse probability
20 Left-tail probability
Using X-N(µ = 3, a2 = 9), the above template would produce the following results:
A B
1 Normal Probabilities
2
J mean �
4 standarci_dev 3
5
6 Lreft-tail probability
7 a 5
8 P(X<=aL 0.841344746
9
_1Q_ Right-rail probability -
11 h fi
12 P(X>=b) 0 .15:8655254
13
14 Interval probability
15 a 4
16 b 6
17 P(a<=X<=b) 0.210786086
18
19 lnvene probability
20 L·eft-tail probability 0.95
21. Critical value, or quantile 7.934560881
22
•
426 Appendix B
Note that the quantile equal to 7.93 gives the top 5% "cut off' value.
The template works equally well for standard normal calculations. For example,
A B
1 f\J!ormal Probabilities
2
3 mean lJ
4 standard dev 1
5
6 Left-tail probability
7 a 2
a P(X<=a) 0. '977 Z4'9868
9
10 Right-ltail probability -
�, h ?
12 P\X>=b) 0.1022750132
13
14 Interval probabJlity
15 a 1.5
16 b 2_5
17 P(a<
1 =X<=b) 0.060597536
181
19 Inverse probalbility
20 Leh-tail probability 0.95
-
It might be a useful exercise for you to compute these normal probabilities using Table 1 m
The chi-square distribution, the t-distribution and the F-distribution are related to the normal
distribution. For each we will make a few remarks and then provide a template for probability
calculations.
IfZ1 is a standard normal random variable with mean 0 and variance 1, then Zf has a chi-square
distribution with one degree of freedom. If Z1, Z2, ..., Zm are independent N(0,1) random
variables then:
m
v Izr -x(m) =
i=l
This notation means that V has a chi-square distribution with m degrees of freedom. The
th th th
expected value of V is E(V) = m. The variance of V is var(V) = 2m. The 90 , 95 and 99
percentiles, and some others, are given in Table 3, Appendix E of Principles ofEconometrics, 4e.
Review of Probability Concepts 427
The template we will create next will make calculations to answer the following two questions:
1. For any value v > 0 what is the probability that a chi-square random variable will be
greater than v?
2. What is the "critical value" for the percentile p. That is, what is the value c such that
P(V < c) = p.
To answer the first type of question we use the Excel function CHIDIST. The format of the
function is:
CHIDIST(x, df)
Here x is the value of the chi-square variable and df is its degrees of freedom. The CHIDIST
function returns the probability in the right-tail of the distribution, the probability that V> x. To
calculate the probability that V< x, use the function 1-CHIDIST.
To answer the second question we use the function CHIINV. The format of the command is:
CHIINV(probability ,df)
where probability is the right-tail probability and df is the degrees of freedom. To find the 95th
percentile use the function CHIINV(.05,df).
A B -
1 Chi-square probabiliti'es
2
3 value -
4 df
5
6 P[V<=value] =1-CH IDlST(B3}84)
g Cumulative Percentile
10 Critical value =CH I. I NV(l-89·,84}
1-4 '4 � �I binomial Mrmal chi
The calculations are illustrated for a chi-square distribution with 5 degrees of freedom, for the
value 7.7 below. We find that > 7.7 0.173563, and that the 95th percentile value is
P(Xfs) ) =
11.0705.
428 Appendix B
A B
1 Chi-square probabilities
2
3 value 7.7
4 df 5
5, P[V<=value] 0.825437
7 P[V>value] 0. 173563
,g,
11
A t-probability density function is bell-shaped and centered at zero, like the normal distribution.
Its variance depends upon its degrees of freedom parameter m, and is equal to m/(m - 2). We
denote the t-distribution with m degrees of freedom as t(m). As m � oo the t-distribution
converges to the standard normal N(0,1). The function used to compute t-probabilities is TDIST.
The function used to compute critical values is TINV.
In this function x is the value of the t-random variable and x > 0. The df is the degrees of
freedom parameter, and tails takes the value either 1 or 2.
TDIST(x, df, 2) computes P(tcan < -x ) + P(tcan > x , ) this is the two-tail probability.
To compute left-tail probabilities, for x > 0, P(tcaf) < x ) use 1-TDIST(x,df,1). To compute
probabilities for negative x values we use the symmetry of the distribution. For example,
TINV(probability, df)
where probability is the two-tail probability. This function computes the value tc such that
P(tcan < -tc ) + P(tcan > tc ) =probability.
Insert a new sheet, name it t-distribution, and create the following template for basic probability
calculations.
Review of Probability Concepts 429
A
1 t-d is t rib uti o 11
2
3 df
4
5 value» O
6 P'[ t<=value· =l-TDIST(BS,83,l}
7 Pit>valuej = TDIST(I BS , 83 , 1 )
8
9 value< 0
10 P [t<=value] = TDIST(-89
. ,83,1.)
11 Pit>valu�) = 1 -TDIS'f( - B9 , B5 , 1 )
12
13 cumulative· percentile
14 c ri t i�al value =TINV(2"'(1-B13},B3}
15
normal c:hi-s LIBre t-distribrution
The calculations are illustrated below for a t-distribution with 5 degrees of freedom, a positive
t = 2.3, a negative t = -1.5 and the 95th percentile.
A B C
1 L-1.Ji:sLr ibu1Liu11
2
-3 df 5
4
5 valu1e>O 2.3
6 P[ t<=value] 0.96511377 �
7 r[_t>vQ!uc] 0.03438623
8
9 valu:e-< 0 ·1.5
10 P[ !<=value] 0.09695184
11 P[t>value) 0.81.5380344
12
13 cumulative percentile 0.95
14 c.ritio:: ail value 2.01:::i04837
15 •
·
� � � lo'I • nt>rmaJ,. chi-squ are J t-distributioni II
The F-distribution is used in a variety of hypothesis testing situations. Its shape is controlled by
two degrees of freedom parameters called the numerator degrees of freedom and the
denominator degrees of freedom. Probabilities are computed using the Excel function FDIST.
Critical values are computed using FINV. The formats for these functions are:
This function computes the probability that an F-random variable, with numerator degrees of
freedom dfl and denominator degrees of freedom df2, is greater than x, P(F > x).
The FINV function computes the critical value Fe so that P(F > Fe) = a. The format of the
function is:
FINV(probability, dfl, df2)
Here probability is the right tail probability, and dfl and df2 are the numerator and denominator
degrees of freedom.
Insert a new sheet, name it F-distribution, and create the following template to compute
cumulative probabilities, right-tail probabilities and percentile critical values.
A
"'""""'==f-����----1.��-'-�--.I •
1 F-distribution probabilities
2
3 value
4 df_numerator
5 df denominator
6
7 P[F<=·value] =1-FDIST(B31B4,BS}
8 P[F>value] = FO I ST( B 3 , B4 i B S )
9
10 cumulative percentile
f ' NV( 1 - B 1 Q , B 4, BS )
l
11 critii:al value =
To illustrate, let the numerator degrees of freedom equal 2, the denominator degrees of freedom
equal 10 and the F-random variable value equal 3.2. Finally, let us find the 95th percentile value.
A B c 0 ...
1 F-distribution probabilities
3 value 3.2
-
4 df numerator 2
5 df_denominator 10
7 P [F<;=valuel 0.915709
8 P[F>value) 0.084291
CHAPTER OUTLINE
C.1 Examining a Sample of Data C.5 Hypothesis Tests About a Population Mean
C.2 Estimating Population Parameters C.5.1 An Example
C.2.1 Creating Random Samples C.5.2 The p-value
C.2.2 Estimating a Population Mean C.5.3 A Template for Hypothesis Tests
C.2.3 Estimating a Population Variance C.6 Other Useful Tests
C.2.4 Standard Error of the Sample Mean C.6.1 Simulating Data
C.3 The Central Limit Theorem C.6.2 Testing a Population Variance
C.4 Interval Estimation C.6.3 Testing Two Population Means
C.4.1 Interval Estimation with a2 Unknown C.6.4 Testing Two Population Variances
C.4.2 Interval Estimation with the Hip Data C.7 Testing Population Normality
C.7.1 A Histogram
C.7.2 The Jarque-Bera Test
When faced with a new set of data observations, or a data set, it is wise to "look" at the data
graphically, and to look at its summary statistics. To illustrate open the Excel file hip. You will
find a single list of numbers with the label y in cell Al. Examining the definition file hip.def we
find that the variable y is the hip width of 50 individuals; we also find some basic summary
statistics that we will recompute. Save the workbook as Appendix C. Rename the worksheet ply
hip data.
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
If the Data Analysis tool does not appear on the ribbon, you need to load it first.
431
432 Appendix C
Select the Office Button in the upper left comer of your screen, Excel Options on the bottom of
the Office Button tasks panel, Add-Ins in the Excel Options dialog box, Excel Add-ins in the
Manage window at the bottom of the Excel Options dialog box, and then Go.
. ------
Excel O ptions
Popular
Proofirtg
Advance�
Customize
In the Add-Ins dialog box, check the box in front of Analysis ToolPak. Select OK.
Add·Ins. a>tailable;
�
�D Il
· · mli'll
l · ··ml-n1I� ==1
0 K� I
Analy.sis ToolPilk VBA- IL,f=d
Now Data Analysis should be available on the Analysis group of commands. Select it. From the
dialog box choose Descriptive Statistics and select OK.
Data Analysis
J
8,nalysis Tools
In the dialog box that results specify the input range of the data to be Al :A51, indicate that the
data are in Columns and indicate that there is a Label in the First Row. Under Output options
choose New Worksheet Ply and assign the name hip data summary stats. Finally, check the
box next to Summary statistics so that the statistics will actually be transferred to the new
worksheet. Select OK.
Review of Statistical Inference 433
Descriptive Statistics
__ ]�
mput
OK
lnp.ut Range: A1:A51
�
Groupeq �y: (i' �olumn�
€a'ncel
I
V l,_abels irrFir:stRow
(' Rows ttelp
I
( QutplJt R.;inge :
,
P'!§��0��1·:�:��i��1��j 4111111���-�!i
C Coofidence Leve/·for Me.;in: 95
rJ K1n L_grgest:
pti smallest:
The summary statistics are pressed into two columns so that the labels are not visible, but it
should be highlighted. Back in the Home tab, go to the Cells group of commands. Select Format
and then AutoFit Column Width.
I I
-
A B c
1 y
2
3 Mean H.1582
4 Standa:rd E. 0.25656
Excel -
5 Medi-an 17.085
6 .M od. e t6.4 Add-Ins Acrobat 1� - o x
8 Samp1e V� 3.265i2!H
� Delete -
[i1 -
9 Kurtosis -0.61015 Sort & Ftnd &
-
10 Skewness - 0 0 1 42 6
_
I $1l Format -
] I
� T
rrilter - Seled -
11 RangH 16J17 1Gell Size
12 Minimum 1 3 53
_
XC Row He1ght...
1 3· Maximum .20.4
�utor:[t Row H'eLght
14 Sum 857.91 - ·�
n Columll'll Width...
15 Counf 5.0
� I
-
-
·
This makes the columns fully visible. The values reported and brief explanations are given next.
434 Appendix C
y
Mean 17.15819992 y = "i.yJN
Standard Error 0.255550251 se (y) = 8 /VN
Median 17.0850005 SOth percentile
Mode 16.4 most frequent value
Standard Deviation 1.807013155 8
Sample Variance 3.265296541 (jZ = "i.(yi -y)z j(N- 1)
Kurtosis -0.610148853 measure of peakedness
Skewness -0.014256196 measure of symmetry
Range 6.87 max-min
Minimum 13.53 minimum value
Maximum 20.4 maximum value
Sum 857.909996 "i.yi
Count 50 N
The values of Kurtosis and Skewness are calculated by Excel using slightly different formulas
than used in Principles ofEconometrics, 4e, p. 702. In fact, the statistic reported by Excel is often
called "excess" Kurtosis. It is a measure of Kurtosis minus 3, which is the Kurtosis value for the
normal distribution. The formulas are equivalent in large samples, except for the minus 3. We
will take this opportunity to show the calculations explicitly. The formulas are used in Section
C.7.2 of this manual as part of a test for normality.
Copy the hip data to a new worksheet named statistics calculations. In column G2:Gl8 enter the
labels shown below.
A Ii c u c I- G
y
2 14_96 Mean
-3 17.34 Slam.lcinl Er rn1
4 164 Mflt!iHll
!i 19.33 Mode
6 17_69 Standard Deviation
7 17.5 Sample Variance
8 15_84 Kurtosis
!l 18.6!) Skewness
10 18.53 f{angg
11 18.55 Minimum
12 14.70 Maximum
13 fftl!) Sum
14 18.36 Cuuul
1" 17 !'i9 Sigma tilde
1G 16.64 Mu2
17 20.23 Mu3
18 16.98 Mu4
I� 4 � �I hip dilra s_.mmary stiits , nip d�l:il l statis,_, ______ _____
• In H14 enter the formula =count(A2:A51) to obtain the sample size N = SO.
• In H2 enter the formula =sum(A2:A51)/H14 to obtain the sample mean y = 17.1852.
• In Bl enter the label y-ybar. In B2 enter the formula =A2-$H$2. Copy this formula to
B3:B51.
• In Cl:El enter labels (y-ybar)A2, (y-ybar)A3, (y-ybar)A4. In C2 enter =B2A2, in D2
enter =B2A3, in E2 enter =B2A4. This will create the square, cube and fourth power of
the difference between the value of y and the sample mean y.
Review of Statistical Inference 435
• Highlight C2:E2 and move your cursor to lower right comer of your selection until a
skinny cross is formed. Left-click, hold it, and drag it down to cell E51.
A B
1 y y-ybar
2 M.96 -2.1982
3 H_34 Q_ 1818
4 1r6.4 -
0 7 5 82 ,
.
A B c D E
1 y :y:-ybar (Y-:-ybar)A.2 (y-ybar}113 (y-ybar)A4
2 14.96 -2.19$2 4.83.2083 -10.62188 23.34903
3 H.34 0.1818 0.033051 0.000009 0_001092
4 16.4 --0.7582 0.574867 -0.435864 0.330472
5 1'9,· 33 2.1718 4.716n6 10.24376 22.2.4741
6 17-69 0_53-1801 0.-282812 0_15-04 0079983 ...
G H ....
4 Median
- =MEDIAN(A2:A51)
5 Madie =MODE(A2:A51}
6 Standard Devi ation =SORT(HT)
7 Sampl'.e Variance =SUM(C2:C51)/(H14-1)
8 Kurtosis =H 18/(H 1 S114)
g Skewness =H17/(H15113-)
1D Range =H12-H11
11 Minimum =M:l �SA2:A5: l)
12 M aximu m =MAX (A2; (' 5 1 }
l3 Sum =SUM(A2 A51 �
14 Count
- =COUNT(ALA51)
15 Sjgma tilde ==SQRT(H16)
115 Mu2 ==SUM(C2�\:51)fH14
17 Mu3 =SUM(D2:051)!H14
is Mu4 =SUM E2:E51 ,/H14
I� � � �I h" data statisUcs cakulatio I IUI
The numerical values are shown on the following page. Note that the values match Excel's
descriptive statistics except for the Skewness and Kurtosis, which are computed using the
formulas in the middle of p. 702 of Principles of Econometrics, 4e. The value of excess Kurtosis
=Kurtosis - 3 = - 0 668 4 7 which is close to the value Excel reports for Kurtosis automatically
.
when computing descriptive statistics. In large samples, the calculation using our approach and
that of Excel will converge to the same value.
436 Appendix C
G f:I
the population mean and variance are the sample mean Y = L Y;./ N and sample variance 82 =
L (ii - Y)2 /(N - 1). These estimators are random variables because their values change from
one sample of values to another. In order to illustrate this we will carry out a "simulation"
experiment, creating data such as that in Table C.2 of Principles of Econometrics, 4e, p. 696. We
will create 10 samples of random data from a normal population with meanµ 17 and variance =
a2 = 6.25.
Label a new worksheet ten samples. In cells Bl:Kl place the labels bl, ..., hlO as sample names
of the random samples we draw next.
Data Analysis _
f',nalysis Tools
DescriptjVeStatistics
Exponential smaatti ngi
F-Test Two-Sample for Variances
Fourier Analysis
u
[lelp
Histogram
Jml
o
M vi-
n 'lA e !lll
ra e �-
l ---
Review of Statistical Inference 437
Create 10 samples of 40 observations each based on the normal distribution with Mean = 17 and
Standard deviation = 2.5. Recall that the standard deviation is the square root of the variance.
For <J2 = 6.25 this means <I= V6.25 = 2.5. Creating numbers that behave randomly is a
science, and how they are created is beyond the scope of this work. It is quite a fascinating
subject and many web sites provide introductions. See, for example, http://www.random.org/ or
visit Wikipedia http://en.wikipedia.org/wiki/Random_number. We specify a Random Seed =
12345. The actual value of the seed does not matter, but odd numbers with 5 to 7 digits are
frequently chosen. If you do not include a random seed value, Excel will create its own value
based on the time and date. If you do not use a seed, each time you generate a set of random
values you will obtain different values. This is an exciting possible approach, and one that we use
at various points in this manual. However at this point we will use a seed value so that you can
follow our steps and replicate our values. The values will be placed in the cells B2:K41. Select
OK.
-
arametErs-------
The 10 columns of numbers we obtain represent values we might have collected from a
population. We have N = 40 observations in each sample. The first few rows should look as
shown below.
A 8 c I D E IF c H J K
h1 h? h:l h4 h:i h16 h7 hR h9 h1(]
2 15_ H3482 17_53583 18.99'207 HU3595 14.691114 18.41487 19.47136 14_5651 20.02987 18.41455
3 16.72703 16.69642 17.78928 18.053.22 15.6:2357 �6.55638 18.10231 15.77295 19.06652 1'1.00899
4 15.77338 15.34563 15 9'641 15.78947 13.49482 �5.83066 17_98444 13.92702 12_63854 19 53194
17.95701 13.20390 10.34170 21.08038 14.2090:) 18.24109 14.91513 17.874-18 10 8G04::i 197-0842
16.74914 16.52 18.19'651 18.921·67 16.2352 20.00413 12.78505 15.15303 13.63347 17.30009 ..
The values in sample hl are values of the random variable Y, namely Yi. y2, ..., y40. Using these
40 values we compute the sample mean y = 'J:.yJ 40. In cell A43 enter the label ybar. In B43 enter
the equation to calculate the sample mean, =average(B2:B41).
40 12-27654 .21.13776
41 Hi83417 17_30435
42
43 ybar I =avera_ g e (!J2 : 641 )
44
statistics calc
Press Enter. The value computed is 16.99915. Move your cursor to the lower right comer of the
cell B43 until it turns into a skinny cross. Drag this horizontally to K43 to copy the formula. The
sample averages of the 10 samples are shown below.
A R c: n F f G H I .I K
38 180!)0[16 19.86662 14.07659 19,10689 2U!l849 14.59571 20.52626 18,43039 16,70643 17.1101
39 13.15166 12.34321 16.19744 20.48739 13J34245 13.71275 15.17232 18.62732 19.08595 15_67293
40 12.27654 2L'13n6 20.2!l453 141.628 17.71345 14.84!l!l7 116.28037 17.15855 19.80795 16.05368
41 16.83417 17.30435 13.11689 18.00358 19.99195 15.45454 13.15164 14 07355 15.58759 15.52478
-12
43 ybar I 10_99915 16.39387 17.CD811 :J 17.14914 17.2u004 17.1576 16.71567 15.93504 10.53494 10.712811
..All
l<I � > >I , hp di!ta • �tllthtb C<1kull!l:ion:io J ten .511mples,
Different samples yield different sample means, and the sample mean is an estimator of the
population parameterµ. As shown in Principles of Econometrics, 4e, p. 697, the sample mean is
an unbiased estimator because E (Y) = µ. This property says that if we take many samples from
this population, the average value of the sample mean will equal the true valueµ. Our illustration
has only constructed 10 samples, which is not enough to qualify for "many samples," but we can
still compute the average of these 10 values of ybar to illustrate this property. In cell M42 put the
label average, and in M43 the formula =average(B43:K43).
41 16..13:3417 1 7:)0435 13.11689 16.06J5(l 1'il.99195 15.45454 13.16164 14.07355 15,5875'9 15. 52478
42
•13 ytrar [ 16:99915 16_39387 17.013115 17.1'19H 17.26001 17'_�576 16 71567 16.93504 16.5349"1 16.71281 ]
44
45
Press Enter. The resulting average is 16.894. This is notµ= 17. Repeating the experiment with
1000 samples, the average of the sample means is 16.9902, which is very close to the true mean
17.
The estimator of <J2 is 82 = r.(yi - y)2 /(N - 1). This too is an unbiased estimator. For each of the
10 samples hl-hlO compute the sample variance using the var function. Enter the label sighat/\2
in A44. In B44 enter the formula =var(B2:B41). Press Enter. Copy this formula across to K44.
Review of Statistical Inference 439
In M44 compute the average of the 10 variance estimates by entering the formula
=average(B44:K44). The result is shown below.
A B C __ _!)_ I- F G H_ I_ L .• K .. N
40 12.21654 21.13716 2.Q.294(>3 11.6"26 11./1�45 M.!31997 16.26037 17.1�65� 19Jl0/9� 1\J.l>',;,3G6
41 16.8:3417 17.30435 13.11689 18.•003-GB 19.99195 15.454M 13.16164 H.07355 15.58759 15.52478
42 average
4:l ybar 16.9�915 16.39387 f7.0811!i 17.1�9'M 17.:26004 17 1570 16.71567 16.935().1 iG.53'194 16.71281 16.89-<l
44 sigha1•2 6.457193 5.74902 T.J29992 3.71J165 6.600413 7.tl2167 6.8<10020 :S.797105 9.J119.506 5.7�1701 l 65'15661�
45
..16� • � • llp .i.-4
1 . ''"'"'""' �...1� • '"" "" '"Phis . ;:i]liiiiiii
i iiii
ii iiii
ii iiii
i iiii
ii iiii
ii :iii ll�====:=!i::=====.t::=::::J�Li
The average of the 10 variance estimates is 6.545661. If we repeat this for 1000 samples, the
average value of 82 is 6.252163, which is very close to the true value 6.25.
The variance of the sample mean is var(Y) = a2 /N = 6.25/40 = 0.15625. This value
indicates how much the sample mean Y varies from sample to sample. In the worksheet ten
samples, label cell N42 variance. In N43 enter the formula =var(B43:K43). This is the sampling
variation of Y. In the 10 samples hl-hlO the sampling variation is 0.084495. For 1000 samples
we obtain a calculated variance of the sample mean of 0.144631. Sampling variation is harder to
capture than the average value in a simulation experiment. In a larger number of samples the
variance of Y will approach 0.15625.
The value of the variance of Y is usually unknown because a2 is unknown. The estimated
variance is var(Y) = 82 IN. The square root of var(Y) is called the standard error of the
mean or sometimes the standard error of the estimate. It can be referred to as se(Y) = 8/../N.
The standard error of the mean is a very important component of hypothesis tests and confidence
intervals. It is reported automatically when we use Descriptive Statistics in the Data Analysis
tool. Let us add it to our ten samples worksheet. In cell A45 enter the label N, for sample size. In
B45 enter the function =count(B2:B41) and press Enter. This counts the sample size N = 40.
Copy this formula to C45:K45.
In A46 enter the label Std error. In B46 enter =SQRT(B44/B45). Because B44 contains the
estimated variance, the command takes the square root of 82 /N , which is se(Y) = 8/../N. Copy
this equation to C46:K46. The calculated values should look as shown below.
A B c D E F c HI J K l N
4:J ytrar 16.W915 16.. 39J67 17 00115 17 14914 17.26004 17.1576 16.71567 1G-.93:i04 16.53494 16.7'120-1 16.094 0.004495
44 s1ghat�2 ll.451�93 t;.(4!182 f.329992 J.T1J10� 6.836413 /.El2161 S..!l40020 5.79710!> 9.319500 !>.141/();1 6.>-15661
45 N 40 40 40 40 40i 40 40 �O "'10 40
�6 Std er1m 0.401783 0.3791J8 0.428077 0_3()468 0.'1t4922 0.436511 0-413522 0.360694 OA8268B 0_3788'7
47
·11 :1 • •-� tlat-3 . �� �n tP.n-�111p•1e.�. o
• :..:..mi::::====::ii:i===�••
______
An amazing result in the theory of statistics is the central limit theorem. It says, if we take N
random variables, Y1, Y2, ..., YN, that are statistically independent and identically distributed (no
matter what that distribution might be), then the sample mean Y will have approximately a
440 Appendix C
normal distribution with mean µ and variance a2 / N. This is what in statistics is called a "large
sample" or "asymptotic" result, which means that for the approximation to hold the sample size N
must be large.
Specifically, the theorem (Principles of Econometrics, 4e, p. 699) 1s stated in terms of the
standardized variable:
Y-µ
ZN = .JN � N(0,1)
a/ N
This standardized variable has an approximate standard normal distribution in large samples. To
illustrate, we use one of the simplest but most useful distribution in statistics: a uniform random
variable in the interval between 0 and 1. Create a new worksheet called CLT. Select Data and
then Data Analysis. In the Data Analysis window choose Random Number Generation. We
will create 1000 variables-these will be our samples. Each sample will consist of N = 10
values. The Distribution is Uniform between 0 and 1, and we use a Random Seed= 12345 so
that you can replicate our results. In the Output Range simply specify cell Al. Select OK.
We will not use column and row labels in this example (too many) so remember that each column
is a sample of 10 observations, and the rows 1: 10 are observation values from a uniform
distribution between 0 and 1.
--
Random Number Generation
l!J�
Number of'{ariables; j1000 l
I OK
I
Number of Random Num!2_ers: J10 Cancel
I
Q.istributJon: ! uniform iJ !jeJp
I
arameters--------..,
B§.tween
lo
The result is 10 random numbers between 0 and 1 in columns A to ALL. In cell A12 enter the
formula =average(Al:AlO), and press Enter. This will compute the sample mean Y for the
values in Al:AlO that represents the first of 1000 samples. Next, you need to copy and paste this
formula to cells B12: ALL12 to compute the sample means for all the samples. The easiest way
to do this in this case is frrst select A12, select Copy, select B12, press and hold down the SHIFT
key, press the CTRL and END keys, and finally select Paste.
Review of Statistical Inference 441
Remark: When faced with the task of copying formulas across large ranges
Excel' s keyboard shortcuts become very useful. Click on the Help button and
search.
·�
keyboa1·d shortcuts
Table of Contents x ExcdHome
IVbat':s
J
new
Searched for: "keyboard shortcuts11
;e-tliing help
�
hstolling 2 3 4 • Nerto
Result:sl-25oftop100 Paige: [I]
\cti\I att ng Ex.�el
·� Keyboard shortcuts.im tbe 2001 otfioesyst�rn
:uotomizl ng
Tr.illJng
\cce-sSi bi! 1ty
Keyboi:lrd shortcuts for Microsoft Clip Organizer
I@ Use the keytmarcf ta work: w!th Rlbbo1 Help> Working with gr<iphics >Adding pictures, shapes,
t� Ribu�n, Helpi, and other Miernso�
to!' •
Off
1
'r7
_iJ
The results for the first few columns are shown below. The values in row 12 are sample means.
A B c 0 E
J 1- F G H ..
-
1 0231452 0.584857 0. 78722 5 0.675.222 0.177862 0. 714.28.6 0.838557 0.165197 0..887.234 0.714255.
.
2 o.:oml:453 0486862 ()779015 0..98526 0 254646 0.873.287 0.963012 0.678854 0.610797 0 972472
3 0.987091 06HH49 0.834315 0988189 02275·15 0,.544816 0..572405. 0 795038 0.980153 {)_058687
--
5 0 1167516 0.4975.35 0.463942 0.665487 0 348796 0.166906 0.41251!} 0.163213 0.. 186529 0.576·922
6· O.D22309 0.09.89-41 0.87002.2 0 6961 27 0.873684 0.652058 0 047365. 0.179357 .0 80'1599 04163973
..
7 056
1 3555 0..0116 68 0,854671 0.577502 0711722 0,.726646 0.952055
-
0 396527 -
0407422 0.722037 I
...
1
- -
12 Q.34
. 1014 OA-53856 0.71872 .0 670196 .0.409162
.. 0.617.356 Ofi0137 0.4719' 18 0.54716,9 0.595-572 .,,
I� � � �I lli data statistics calculations ten sam les t:LT I� •1
To display the shape of the uniform distribution, first enter the label Bin in cell ALNl. In
ALN2:ALN11 put the values 0.1, 0.2, ..., 1.0. On the Data tab, select Data Analysis,
Histogram, and then OK. Use the 10000 values in Al:ALLlO to construct the histogram.
442 Appendix C
Specify the Bin Range to be ALN2:ALN11 and the Output Range to be ALN15. Finally select
Chart Ouput and then OK.
-.,,
Histogram _I
Input--------�
·�nput Range: IA1:ALL�
!:!,in Range: J!:-LN2 :ALl\l11 C:I�
t!elp
I �al:'.iels
'ulput options-------.
(.° Qt.rtput Range :
� :����:t::c?:0:�0�
The distribution shows that the 10000 values are evenly spread over the interval [O, 1], with
about 1000 values in each of the intervals of width 0.1.
Bin Frequency
0.1 983
0.2 1002
0.3 975
0.4 1011
0.5 1030
0.6 1003
0.7 1036
0.8 944
0.9 987
1 1029
Histogram
1500
>
u 1000
c
cu 500
::I 0
O"
cu """
... ..-t N M LI) l.D r-.. 00 °' ..-t w
...... • Frequency
LL. ci ci ci ci ci ci ci ci ci 0
�
Bin
Review of Statistical Inference 443
The uniform random variable U on the interval [a,b] has mean E(U) =(a+ b)/2 and variance
var(U) =(b - a)2 / 12. If U is on the interval [O, 1], it has mean E(U) =0.5 and variance
var(U) = 1/12. The central limit theorem says that the standardized Y variable is
asymptotically standard normal distributed, which in this case is
u 0.5 [J 0.5
� N(0,1)
- -
ZN = =
.J1;12;m .J11c12 x 10)
In cell A13 enter the formula for the standardized variable, =(A12-0.5)/SQRT(l/(12*10)), and
press Enter. Next you need to copy and paste this formula to B13:ALL13. An easy way to do
this in this case is first select A13, select Copy, select B13, press the FS key, use your scroll bar
at the bottom of your Excel window to get to the right of your table of data, select ALL13, and
finally select Paste. The first few values are shown below.
A B c D E F G H J
12 0.34'1014 0 453856 0_71872 tH370rn6 0409162 0_617356 0_60137 0 471978 0.54716 9 05955.72
13 - 1 741 6 1
_ -0 50548 2_395958
.. 1_'8644 -,0:99508 1 285569 L11045'6
.. -0 306-97 0.516715 1 _046'936 •
I� � • •I statistics calculatrans _. I
Now we repeat the steps of constructing a histogram. In A14 put the label Bin. In A15:A30 put
the values -3.5, -3.0, -2.5, ..., 4.0. On the Data tab, select Data Analysis and then Histogram.
Fill in the dialog box as shown below to chart all the standardized values in row 13. Finally select
OK.
--.-
Histogram
Input·-------__,,.
The resulting histogram show a bell shaped curve (we have eliminated the gaps) which is the
characteristic of a normally distributed random variable. Note that it is centered at 0 and the range
of values is -3 to 3, which you will see using Table 1 in Appendix E of Principles of
Econometrics, 4e is 0.9974 of the probability from a standard normal distribution.
444 Appendix C
Histogram
250
200
>
u
c 150
cu
:::J
CT
cu
...
100
LI..
•Frequency
so
0
LI'! M LI'! N LI'! rl LI'! 0 LI'! rl LI'! N LI'! M LI'! '<t (])
......
0 0
I I I
M N .....; rl N M 0
I I I I
Bin
What we have shown is that if we take samples of 10 values from a uniform distribution, which is
not bell shaped at all, then the standardized means, or averages, of these samples of 10 values has
a probability distribution that is approximately normal.
The critical values from the N(0,1) probability distribution such that 2.5% of the probability is in
either tail are -1.96 and 1.96 (see Figure C.4 on p. 704 of Principles of Econometrics, 4e).
Consequently:
[-
p y - 1.96
a
<µ :5 y + 1.96
a _ ]= 0.95
-JN -JN
In general, if ct>(zc) = 1 - a/2, then the 100( 1 - a)o/o confidence interval estimator ofµ is:
Review of Statistical Inference 445
It must be emphasized that a 95% interval estimator will contain the true population mean µ in
95% of many repeated samples of size N. To illustrate return to the ten samples worksheet. In
cell A48 put the label LL and in A49 put the label UL. In B48 enter the formula =B43-
1.96*SQRT(6.25/B45) and in B49 enter =B43+ 1.96*SQRT(6.25/B45). These calculations find
the lower and upper bounds of the interval estimate ofµ given that() is known. Copy B48:B49
across to K48:K49.
A a l': I) f. � G H I .J K
13 ywr 10 .9'il�1'1006 10.3�3009:i� 17.0611'1!14:'i ll'.M91'1399 U.2:000l0:13 17.1�7�9609 1G.T1�7155 1tl.'93W3.00 16.53'19J1391 11J.71i600?9
4•1 !.ighat�2 6.457193152 5.7\19819876 7.329�9189 3.7D3185124 11886413044 7.62!6702<1 B.8•10028208 5. 79710il6 9.31950595:2 5.741700823
15 N ·10 �o 40 �o ·1·0 �o -to 40 40 10
413- Sid error 0 .. 401763;!;2 o.::mr:m:i:u 0.420076659 0.3'04r6795S 0, 414922012 (1.436510009 0.413522315 0.1006936 0.4826437941 o.� 7007·00se.
47
46 bl 16.2•2439003 15.6191115 16.30639142 16.37436595 l6.4llS0765 16.:Jl32&1007 15,94001652 16.1602605 15.76011):{)68 15.93005026
49' Ul 17.7'1390009 17.160027� 17,!!S5M740 i7.92J90.201 �IJ.(1�53�455 ll'.9n356tZ 17.49D4,2:$7 11,7097966 17,30009693 17.41l756
6 32
.. . � >I � "'� Cdkulatl0$ j l"'n �11mpl P.<;
Note that the intervals created move around because they are centered at the sample mean Y
which varies from sample to sample. As it happens all 10 of the intervals we have created contain
the true population mean µ = 17. We can ask Excel to tell us this using some logical functions.
In cell ASO enter the label Cover. In A51 enter the label Success. In BSO enter the formula
=AND(B48<=17,17<=B49). Press Enter. This logical function is TRUE if the value of 17 is
between the upper and lower bounds, otherwise it is FALSE. In B51 enter the formula
=IF(BS0,1,0). If the result in BSO is TRUE, then we assign the value 1; and if BSO is false we
assign a value of 0. In this way we can record whether our interval estimate successfully contains
(or covers) the true parameter µ. Copy the formulas from B50:B51 to C50:K51. The result is
shown below.
A l:l c 0 f- I- G H J K
•13 ywr 10.9·99MSOO 10.393�953 17.081149�5 17.1491'1399 17 .2:00036�3 17. l::i7�9509 1G. T1567155 1e.93503BV 16.. 53•193891 10. 712130829
4� sig11itt•2 6-457193�52 5.749819876 7,329991�9 3_7tt::l185124 13_ 00&11 JO.M 7.62167G2<11 6.840028208 5_7971046 9.31950595:2 5.741700823
45' N 40 40 40 40 40 40 40 '10 40 40·
41l ·Sid error 0.401763)2 0,379137�34 0.420076659 0.3'04'67955 0. 414922072 0.436510009 0.413522315 0.3006936 0.482607941 0.370070058
47
4t LL 16.22-439003 tS.619111:5 1S..JOOJ9H:! 16.37438$91) i6A858785 16.M28400'i' 15.94_001652 16-.1002005 15. 760tetl88 iS.93005026
49 Ul 17 773900M 17.1ae.6275"5 17.85.590748 17.9.2}90201 ta_03!139455 17.9l2l561:Z 17,4904'.3257 17 7097960 17.30969()� 1741!7581332
50 cover TRUE muE TRUE TTIUE TRUE muE TRUE TRUE 'ITTUE TRUE
.ti1 s.uccess 1 •
To further illustrate, create in a new worksheet 1000 samples of size 40 from a normal
distribution with mean 17 and standard deviation 2.5.
446 Appendix C
--
Random Number Generation
!;iislributiori:
EJ C:telp
·�tanclatd deyiation �
L1234s
).itput options•------
C Quip.it Range :
For our 1000 samples we obtained an average value of success of. 961. This means that 96.1%
of the 1000 interval estimates cover the true parameterµ = 17, which is close to the expected
95%. If we had used more than 1000 samples to test this idea we would have gotten a success
rate closer to 95%.
The interval estimation procedure described above depended upon specific knowledge of the
value of var(Y) = u2• If the variance is not known we substitute the estimated sample variance
82 = L (ti - Y)2 /(N - 1). When we do so, the standardized variable follows the t-distribution
with N - 1 degrees of freedom.
Y-µ Y-µ
t = = --- � t (N-l)
fJ /../N se(Y)
- a - -
Y ± tc ...[N' or Y ± tcse(Y)
The critical value tc is the 1 - a/2 percentile from the t-distribution with N - 1 degrees of
freedom, or tcN-1)·
Return now to the ten samples worksheet. Put the label tc in cell A53. In cell B53 use the Insert
function key and scroll to the statistical function TINV. This function returns the 1 - a/2
percentile. Given a = 0.05 and degrees of freedom N - 1 = 39, we see that the critical value is
tc = 2.02269. Click OK.
Review of Statistical Inference 447
ffI�J
-
Function Arguments �
In AS4 enter the label LL and in ASS enter UL. In BS4 enter the formula =B43-BS3*B46, which
computes Y - tcse(Y). In BSS enter =B43+BS3*B46, which computes Y + tcse(Y). Copy the
content of BS3:BSS to CS3:K5S.
A B C D E F I G H I I J I K
'f--�����������-t-���������������������� ·
43 y bar 16.9,!f915 16.3�387 f7- 08115 17_149·14 17.216.064 17_1575 16,_71567 16_93504 16_53494 15.71281
44 sig�at�.2' 6.457193 5.74982 7-32999:2 3_71318.5 '5.886413 7_62167 6_840(}2:8 5_797105 931.9'505 5-7417 '01
45 N 40 40 40 40 40 40 40 40 401 40
46 Std erro:r 0.40<1l83 0.379138; 0 .42 8077 0.304'6:8 ·0.41-4922 0.436511 0.41352:2 0.3:8069'4 0.4826-lHJ; 0.378.87
47
48 LL 1 6:2243 9 15.61·911 lS.30539' 16.3743'9 16.48588 1·&.38284 15..9·4092 16.1.60:28 15.7&018' 15.913805
49 UL 17_77391 17_116'853 �7_8.5591
- -- -
17_923,9 18 _ 03539 ff 9323 5
- - .. _
17-49043 17_ 7098' 17 -3097 17-48757
-
TRUE TRUE TRUE TRUE TRUE TRUE muE TRUE TRUE TRUE
51 SLJC081SS 1 t 1 1 1 1
52 - . -,- -
53 tc 2.02269'1 2.022691 2.022691 2.02:2$91 2.022691 2.0122691 2.022691 2.02269>1 2.022691 2.0.22'691
54 LL 16.18646 15.62699' 1 G.'21528 15.53W7 16.42138 115.27467 15.87'92:5 16.1550,1 15.55861 15.94·647
55 UL 11 9:1183 17.1,fi07S 17_94702 17_765421 18';099'9 18_04052
.. 17.5521 17_70506 17_51127 17.47'915 5
56 CD"/8f TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
5,7 :SUOG8SS 1 1 1 1 1 1 1 ....
1111 • I
The resulting 10 interval estimates for µ all cover the true parameter value µ = 17, but note that
now the center of the distribution and its width vary from sample to sample. The intervals are also
slightly wider because the t-distribution critical value is larger than 1.96. In a large number of
samples such intervals will cover the true parameter 95% of the time.
Now we create a template for constructing interval estimates for the sample mean. Label a new
worksheet Interval Template. Set up your template as shown below.
448 Appendix C
-
A ' 8 A
-
1 Interval estimation off the population mea.n
2
3 Data Input
4 sample size �LI
5 Confidence level 0.95
6 Esnfmated mean (Y-bar} 1T 15819992
The values shown in the shaded area for the mean and standard error, under the section called
Data Input can be copied and pasted from the hip data summary stats worksheet. For example,
highlight the entry for Mean, then press both the Ctrl and C keys (Ctrl+C) to copy the number to
the Windows clipboard. Return to the Interval Template and click on the target cell to make it
active, then select Paste and Paste Values to transfer the numbers (or Ctrl+V).
fo�Le
[cirmulas
Pa�te Y:�I��
Nn !!l,llord�Fll
!ranspos�
Parle li!lk
L6 Paste fu>eaal,.
Pn'<tP a� l::!l'P-' rlink
fJ<J Prcture
+r
4
Data Im put
- -- Sampl�SE-e 50
---rr Confiden.ce level 0_95
1j
Estimated mea111 (Y-bar) 17 15819992 _
1:
I
Confidence Interval
lower limit 16_64465247
� 1
l'lppe!_ li mit 17_67174737 � ·
Hypothesis tests about the population mean are based on the statistic:
Y-c Y-c
t = = �t(N-i)
8/{N se(Y)
The null hypothesis will be rejected if the value of the test statistic becomes too large or too
small, depending upon the nature ofthe alternative hypothesis.
For the right-tail alternative hypothesis Hi:µ > c, we reject the null hypothesis and accept the
alternative if t � tc = tci-a,N-i) where tci-a,N-i) is the 100(1 - a) percentile of the tcN-i)
distribution. The value a is the level of significance of the test, and is the probability of rejecting
the null hypothesis when it is true [Type I error]. In the figure below m N - 1. =
Reject.fl0:
µ;;;;;; c
For the left-tail alternative hypothesis Hi:µ < c, we reject the null hypothesis and accept the
alternative if t � tc= tca,N-i) where tca,N-i) is the 100a percentile of the t(N-i) distribution.
The value a is the level of significance of the test, and is the probability of rejecting the null
hypothesis when it is true [Type I error]. In the figure below m N - 1. =
Reject H0:
�L = C
Do mot
rcjcctH0�
µ,=c
For the two-tail alternative hypothesis Hi:µ * c, we reject the null hypothesis and accept the
alternative if t � -tc t(a/Z,N-i) or if t � tc
= tci-a/Z,N-i)· The value a is the level of
=
significance ofthe test, and is the probability ofrejecting the null hypothesis when it is true [Type
450 Appendix C
I error]. The rejection regions each include a/2 of the rejection probability. In the figure below
m= N -1.
f(r)
IRejecl H0: �l = c Rejec;I H0; µ c =
µ Do um rej cct
Aocepl H1: #. r --�� ��-- Acee.pl NI :J.l ;i! l'
f-1o: µ,= c
C.5.1 An Example
Using the hip data, let us test the null hypothesis H0: µ = 16.5 against the right-tail alternative
hypothesis H1: µ > 16.5. For the hip data N = 50, and the degrees of freedom for the t
distribution are N - 1 = 49. We reject the null hypothesis and accept the alternative if t � tc =
tci-a,N-l) where tci-a,N-l) is the 100(1 - a) percentile of the tcN-l) distribution. The value a
is the level of significance of the test. Let us choose the standard a = 0.05 level of significance.
The t-critical value is tc = 1.68. We will reject the null hypothesis in favor of the alternative if
t � 1.68. The value of the test statistic is:
y -c 17.1582 - 16.5
t = = = 2"5756
se(Y) . 2556
We reject the null hypothesis and conclude that the population mean hip width is greater than
16.5 inches.
The p-value is a number associated with a hypothesis. If we have the p-value of a test, p, we can
determine the outcome of the test by comparing the p-value to the chosen level of significance, a,
without looking up or calculating the critical values ourselves. The rule is:
p-value rule: Reject the null hypothesis when the p-value is less than, or equal
to, the level of significance a. That is, if p � a then reject H0. If p > a then do
not reject H0.
If you have chosen the level of significance to be a = .01, . 05, . 10 or any other value, you can
compare it to the p-value of a test and then reject, or not reject, without checking the critical value
tc.
How the p-value is computed depends on the alternative. If t is the calculated value [not the
critical value tc] of the t-statistic with N - 1 degrees of freedom, then:
Review of Statistical Inference 451
For the numerical example in the previous section the p-value is the area under the tc49)
distribution to the right of 2.5756. This probability is 0.00654 and is smaller than a= 0.05,.
Following the p-value rule we reject the null hypothesis. In the next section we will build a
testing template for each type of test.
p = 0.00654
(
-4 -2 -1 0 1 4
t
The syntax for TDIST is TDIST(x, m, tails) where x > 0 is the value at which the distribution is
evaluated, and m is the degrees of freedom, and tails is 1 or 2. If tails = 1, the function returns
TD/ST= P(tcm) > x ). If ails = 2,
t the function returns TD/ST= P(t(m) < -x )+
P(t(m) > x ) .
IF(logical_test, value_if_true, value_if_false) evaluates the condition "logical test" and returns
either TRUE or FALSE. If the condition is TRUE then the function returns "value_if_true", and
if the condition is FALSE the function returns "value if false".
Recall that for helpful descriptions such as that above you can click the question mark icon, and
in the resulting Excel Help window you can type into the search box the term you seek help on.
452 Appendix C
@. Excel Help
7� -
HELP'I
..- P Search ..-
Fill in your worksheet test mean template with the following formulas.
In this template note that for p-value calculations we must first ascertain whether the calculated t
statistic is positive, or not. Recall that argument for the function TDIST(x, m, tails) is x > 0.
Thus the p-value command uses the logical IF statement to check on that. For example, for the
right-tail test, the command is;
=IF(B12>0,TDIST(B12,Bll,1),1-TDIST(ABS(Bl2),Bll,1))
is the t-statistic value and Bll is the degrees of freedom N - 1. This is P( tcN-l) > t).
• If Bl2>0 (the t-statistic > 0) is FALSE, then the p-value is 1-TDIST(ABS(B12),Bll,1)
where B12 is the t-statistic value and Bll is the degrees of freedom N - 1. Here we use
the symmetry of the t-distribution. The p-value for this right tail test is P(t(N-l) > -t)
= 1 - P(tcN-l) > t) by symmetry. The TD/ST function only computes probability
values for positive values of the t-statistic. So we take the absolute value of the t-statistic
Review of Statistical Inference 453
(which is negative based on the IF statement) to change its sign to positive and then use
the fact that the total probability is "one".
The resulting values in the template are given below. Note that the p-value for the test H0: µ = 17
against the alternative H1: µ > 17 is 0.0065 , which is smaller than 0.05. Based on the p
a =
value rule (reject the null hypothesis when p < a), we reject the hypothesis that µ 17 and =
accept the alternative that µ > 17. Recall that µ is the population mean hip size for adults, and
this result means we can conclude that the average hip size is now greater than 17 inches, at the
5o/o level of significance.
A B ..
H Decision= Reject Ho
18
19 Le·ft-tail test
-
In this section we will carry out various tests. To illustrate we will use some randomly generated
data from normal distributions. Label a new worksheet three samples. In Al:Cl enter the labels
Yl, Y2, Y3. In these columns we will create 3 samples of size N 20 from the following
=
distributions: Yl,...,N(0,1), yz,...,N(l.5,1), Y3,...,(1.5,4). Select the Data tab and the Data Analysis
button in the Analysis group of commands found in the far right of the Excel ribbon. Choose
Random Number Generation from the menu. First create the N(0,1) data values as shown
below.
Number of�ariables:
]_
1 _
Q.)s1ribution : tie Ip
lo
�tanda.rd deviation �
112345
-uiputoptfons-------=
(i _Qu(Jeiut Range:
Then create the N(l.5,1) values, starting in B2 and using Random Seed 123. Because we will be
using tests comparing one population to another, by using a different Random Seed we ensure
that the populations are independent of each other.
Review of Statistical Inference 455
--
Random Number Generation II ]�
Number of Y'.ariables: 1 Ok�
Number ofRar.idom NumQ.ers : C anc el
Par:ami:iters--------=-
2tflhdar:d deviation =
8,_andom Seed·:
u'lputop:tions -------:;;;=
-
Random Number Generation j
Number ofY'_arfables:
J_
1 _____
OK �
Number ofRandom NumQ.ers: Cancel I
Q.istr."ibuti. on:
..:J
Parameters-------
2tandard devi<=itior1 =
8,_ahdom Seed·:
. u'lputoptions -------
A I B c D
�
- Y1 Y2 Y3 -
2 -0]3407 -0_7136 -080898
3 0_214334 1_ 705'654 -3-46751
-
4 0_7968.29
- 2-036561 0_938478
5 0.454379 1.24'6432 0 .. 257885
6 -0..92354 3.746707 256976'6
11 1 � �I
Suppose the random variable Y --N(µ, a2 ) . If we have a random sample Yv Y2, ..., YN then an
unbiased estimator of the population variance is:
N - 2
az
=
Li=i (Yi -Y)
N-1
(N-1)82
V= ----
aJ
To carry out this test let us first compute the descriptive statistics for sample Yl, and store them
into the worksheet Yl summary stats. Select again the Data Analysis button in the Analysis
group of commands. In the Data Analysis dialog box, select the Descriptive Statistics analysis
tool.
Review of Statistical Inference 457
Descriptive Statistics
Input-------==;
tput.opt1ons-------....,,.
I Q1Jlput Range :
I- New Worksheet Ely: lv1 sum'mary stats
r• New Workbook
P- r���:��si���ti��,��l
[" Coo.fidence Leve l for Mec;in; %
A B c
1 Y1
.2
,3 Mean 0.077479'3'99
-
4 Standard Error 0.15664J6934
5 Me-di an 0.216·5023345
r6 Mod·e #NIA
7' Standard Deviation 0. 7005.416384
8 Samp�e Variance 0. 490716 5,237
. .
191 Kurtosis -1.00129'8518
10 Skewness -0.29126'6209
11 Range: 2..38:4351774
- -
t2 Minimum -1 ."172402.335
13 Maximum 1.21194'9439·
14 Sum � .549687978
15 Gaunt 20
16
I� � � �I Y'l summa
The Sample Variance is 0.490765. This is the statistic we have called 82• Insert a new
worksheet and rename it test variance. In it, build the following template. Copy and paste the
value of the sample variance into the template.
458 Appendix C
II 8
3 D•hl Input
1 'SAnllllO $�0 = 20 I
$ sam111e V\ltlMCe - a •90m5?'.Xlli�l014
s NVI �Yi>O� HO. "'9•""'2 • < 1S
I lmiu{ of !ilgn1hc.t1rn;c � :; () 05
I
9 Cornp'lllod V1l1u1•
10 di= =B.,.,
ll <nl S<IU&rc�lallSll< l'llKIO. �o IOl'GSIB6
12 I+
1� RighM11H 1es1
1• Rigbl OJ11JGilt""""' • •GHlltiV(B7.810)
1& �"'""' "' (B11>, , AH "R<JjOC! IO:l·,·0o l1Ql roioct Mo")
16
18 Lon <ntKOl l'O\UO = =1<!11\NV(1-£17,1110) I
19 Decision IF(61 lu618.'Reject tlo"."Oo •ot rejecl 110")
20
2.� ifwo·!tolltost
2f t.Qncnoco1Yl}IOO =CHllNV(1 au�1a10)
I
=
The function CHIINV is used to find the test critical value. The arguments of the function are
CHIINV(right_tail probabiliy, degrees of freedom). Thus in B7 the right-tail probability is
0.05. The resulting values are:
A B ..
12
13 Right-tail test
--
14 Right critical value = 3-0.1435.2721
15 Decision Do not reject Ho
rn
17 Le·tt-tail test
rn Left crltical value = 10:11701315
19 Decision Reject Ho
.20
21 Two-tail test
22 Left critical value = 8.90651,6548
-
--
.23 Right critical value = 3.2.8523.2686
24 Decision
-
Reject H91 :yo
Thus, when testing H0: u2 = 1.5 against the alternatives, at the 5% level of significance:
• Hi: u2 > 1.5: we do not have enough evidence to reject the null hypothesis,
• Hi: u2 < 1.5: we have enough evidence to reject the null hypothesis and conclude
that the population variance is less than 1.5,
Review of Statistical Inference 459
• H1: a2 =f:. 1.5: we have enough evidence to reject the null in favor of the alternative
and conclude that the population variance is not 1.5.
If we have two populations Yl-N(µvaf) and Y2-N(µ2,ai) we may like to test the null
hypothesis that the two populations have the same mean. This test is carried out differently if the
two population variances are equal (Case 1) or unequal (Case 2). Recall that the three samples
worksheet contains N = 20 Yl-N(O,l) , YZ-N(l.5,1), and Y3-(l.5,4). Let
observations from
us first test that the means of populations Yl and Y2 are equal. The test statistic formula is on p.
717 of Principles ofEconometrics, 4e.
Go back to your three samples worksheet. Select the Data tab, and then the Data Analysis
button on the Analysis group of commands. In the Data Analysis dialog box, select the t-test:
Two-Sample Assuming Equal Variances.
17.JEJ
Hj:,h1�;:im ...
OK�
Movin� Average
Rar.dnm f'.llmber 1Ge11erat1on
Cctra.�I I
Part :.and Percentile
R..egnxsbni
Sa11,plirig
�
t- 1 est: ra.-oo 1 wo �a , le ltlr Mllans
: 1i!
�t-Tii!·""llt• wo
ll-s
l!"llrlll�lg!l.,.,!.l
" Dl
' u11
m :in11gD
lA1
!\!oci
lllJd
lllll
v!ll
" l icm
r!!D
c9£
••- , .,_
In the t-Test dialog box enter the data ranges, select Labels, use the 5% level of significance and
output the results to a new worksheet. Since we are testing the null hypothesis that two
populations have the same mean, we specify the Hypothesized Mean Difference to be 0.
·&lpha: lo.os
( New .�ork:book
The result shows the calculated t-statistic as well as the one- and two-tail critical values and p
values. Note that the one-tail p-value is calculated from the left tail because the test statistic value
is negative. If the test statistic value had been positive it would have computed the one-tail p
value from the right tail of the t-distribution.
460 Appendix C
A B c .....
Go back to the three samples worksheet. Repeat the test using f 1 and f 3 and use the "unequal"
variance test option. The test statistic and adjusted degrees of freedom for this test are given on
pp. 717 and 718 of Principles ofEconometrics, 4e.
Data Analysis
Malysis Tools
Histogram
OK�
Moving Aver .age
Random Number Generation
.cancel I
Rank and Percentile
tieIp
Regression
Sampling
t-Test: P a i red Two .Sampl�-ftir Means
t-Test: Two-Sam le Assumin E ual variances
����--�-��A �-�--�--����B�---'-----..:
C: __ --!�·
t-fest Two-Sample· Assuming Unequal Variances
2.
3 I������������������- 1-. ����y-3 ��
Y
4 Mean 0_077479399 0Ji14201614
5 Variance 0-490765237 2.769870711 =
6 Observations 20 20
_!_ Hypothesized Mean Difference 0
8 df 26
9 t Stat -1 _329270642
1 O P{T<=tt one-tai� 0_097652808
11 i Critica� one�tail 1-7'05617901
12 P(l<=t) ·two-tail o_ 1953056·H
i 3 t Critic.a� two>-tail 2 .05552.9418
-"�-��
'" " • •1 m.11Jl = mu3-un ual v.ar 1111
Recall that Yl-N(0,1) and Y3-(1.5,4). We fail to reject the null hypothesis that the means are
equal in this case. We commit a Type II error.
Given two normal populations, we can test whether their variances are equal. Recall that the
three samples we drew were from 1-N(0,1) ,Y2-N(1.S,1) and Y3-(1.S,4). Let us first test the
hypothesis that the variance of Y2 equals the variance of Yl. Go back to your three samples
worksheet. Select the Data tab, and then the Data Analysis button on the Analysis group of
commands. In the Data Analysis dialog box, select the F-test Two-Sample for Variances. This
tool will carry out the F-test for equal variances shown on p. 718 of Principles of Econometrics,
4e.
In the dialog box, enter the range for Yl first, and then enter the range for Y2. Which one is
labeled Variable 1 and which Variable 2 does not matter for the outcome (p-value) of the test.
t!i,lph�: lo.as
Outputoi;itions
C· Qutpµt Rar;ige: E
1- New Worksheet !j'.ly: I\Eist var(y1) =var fy2)j
("' New \f,l_orkbook
The test result shows the sample variances for Yl Y2, the value of the calculated F-statistic
and
(0.2403) and the left-tail (since F < 1) critical value for a 5% test. The p-value is also reported,
462 Appendix C
and based on this test we reject the equality of the population variances, even though we know
them to be true. We commit a Type I error.
A. B c -
5 Variance 0_490765237
-
2_042565342
-
6 Observations. 20 20
7 1df 19 19
8 F 0.240269051
-
Ji]
11
2
1 � �I 1111 .. I
....
Testing the variances of Y2 and Y3 we find ourselves unable to reject the hypothesis that the
variances are equal, despite the fact that the null hypothesis is false. We commit a Type II error.
Vari.able l Range:
OK iJ
V.ariable:;;, Range: jc1:c21 Cani;:el . I
t!elp.
P !._abels
atph�: l,._o-o. -s
-�
A B G ..
Hypothesis tests and interval estimation procedures are based on the underlying normality of the
population. If the population is not normal, the same procedures are used based on an appeal to
the Central Limit Theorem, and assuming the sample is adequately large. If the population is
normally distributed then no such worries exist. While there are many tests for normality we will
suggest two. First, construct a histogram and look for a bell shape. Second, use the test proposed
by Jarque and Bera.
C.7.1 A Histogram
Return to the worksheet hip data. For a histogram we must specify the "bins" into which the data
will be placed. The worksheet hip data summary statistics contains the descriptive statistics (see
Section C. l of this workbook). The minimum hip width is 13.53 inches, and the maximum is
20.4. Specify the first bin to be "up to" 14 inches, and the last bin will be 20 "or more" inches. In
cell Cl enter the label bin. In C2:C8 enter the values 14, 15, . . . , 20.
Select the Data tab, and then the Data Analysis button on the Analysis group of commands.
Select Histogram from the pull down list. Fill out the dialog box as shown below. Note that we
have selected Labels. The Output Range is worksheet hip data histogram, and, most
importantly, we want to Chart Output.
Histogram
]�
Input -------�
[nput Range:
Gk tJ
�in Range:
,Cancel I
Pi Labels
utput options:;-------=
l Q_utput Range: 1$C$15 iii
� New W o rk s h eet Ely: jhi.�ata histogram _j
r New Wsrkbook
E F G
J
H I I I J I K L I M N I �
i.;i..
bin Frequerrc�
14 3 Hi:stogram ·�
15 '1i rr-
116 7 i5
.. 6 ��-.•.•,�·1 ,')."I:>1 ,.,•,•,
�
=
11 111 .,,
:I
18 11 <T
... .,,ro ,.0i • J-req;uencv
19 4 ""'; -,_,<'i cl"
u..
�o
20 6
Morre 4 bin
I
•
'
i:en sarr1p_les 1000 5amples UT I hipdait;ad1i�toaram �111 • 1 �' I ., II
To beautify the histogram remove the spaces between the bars. Click inside the histogram until
the bars have little circles surrounding them. Right-click and select Format Data Series.
H J K L M Qelgte
I
Add Trendline ...
bin -�-
Series Options
Series Options
Fill Series overlap-------�
--- J--
Border Color
S gpar.ata11 OvCJrlappcid
Border S I'S
�
Shadov
Gap :Width --------�
::i-n Fnrrnilt
Select the comer of the figure box and drag it to the size you desire.
Review of Statistical Inference 465
Rrstogra"m
12
10
(;' 8
c
Ill
::J 6
IC"
Ill
'-
LL 4 •Frequency
14 15 16 17 18 19 20 More
bin - '-----1---___.__
. _
With only 50 data points, using too many bins can result in a figure with no shape. You should
experiment with fewer bins of alternative sizes to see if you can improve the figure. Using one
inch bins is logical.
The Jarque-Bera test for normality examines the skewness and kurtosis of the data (these terms
are defined in Section C.1 of this manual). For a normal distribution the skewness is zero, and the
"excess" kurtosis should be zero. The Jarque-Bera test statistic is:
N ( (K - 3)2) z
]B = 6 S z + 4 ""X(z)
Using the formulas given in Principles of Econometrics, 4e, p. 702, the skewness and kurtosis
coefficients are:
skewness = S = µ3
u3
and kurtosis = K= µ4
u4
where:
- �'L(yi-:Y)2
a= '
N
Use the results from the statistics calculations worksheet to make the following calculations. The
critical value for this chi-square test will be obtained using the CHIINV function (see Section
C.6.2 of this workbook) and the test p-value is obtained using CHIDIST. Enter the formulas
shown below in a new worksheet called Jarque Bera Template.
466 Appendix C
Data Input
2 sample sizff N 50
skewness:S -0_013824895662746
kurtosis;K 2.3315342883-24
level of slgnifrance: alpha 0 05
6 Calculated Values
1 JB test va�ue = =(B2/6}"(B3112+((B4-3)"'2)/4)
chi-square(2) criti�al v��ue = =C_H U NV( B5"2 )
JB test p-vafue = =CHIDIST(B7,2)
The p-value shows that we cannot reject the hypothesis that the hip data comes from a normal
3
distribution.
4
5
Data rnput
8
-sampie N
'9
skewness:S -0.0138248913
llll
kurtosis:K 2 .. 331534288
�evel of signifiance: .aleha 0.05
Values
J B test = O.!B2522747
critic a� va!ue = 5_9·9 'i 46454 7
A B
JB test p-vatue = 0 .6·27343.294
1
2 siz:e : 50
-
3
4
5
6 Ca�cul1a ted
7 va!ue
8 chi-square(2)
:9
INDEX
467
468 Index