You are on page 1of 484

Using Excel

For Principles of Econometrics, Fourth Edition


Using Excel
For Principles of Econometrics, Fourth Edition

GENEVIEVE BRIAND
Washington State University

R. CARTER HILL
Louisiana State University

JOHN WILEY & SONS, INC


New York I Chichester I Weinheim I Brisbane I Singapore I Toronto
Genevieve Briand dedicates this work to Tom Trulove

Carter Hill dedicates this work to Todd and Peter

This book was set by the authors.

To order books or for customer service call 1-800-CALL-WILEY (225-5945)

Copyright© 2010, 2011 John Wiley & Sons, Inc. All rights reserved. No part of this
publication rnay be reproduced, stored in a retrieval system or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act,
without either the prior written permission of the Publisher, or authorization through
payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc. 222
Rosewood Drive, Danvers, MA 01923, website www.copyright.corn. Requests to the
Publisher for permission should be addressed to the Permissions Department, John Wiley
& Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, (201)748-6011, fax (201)748-
6008, website http://www.wiley.corn/go/permissions.

ISBN-13 978-111-803210-7

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1
Preface

This book is a supplement to Principles of Econometrics, 4th Edition by R. Carter Hill, William E.
Griffiths and Guay C. Lim (Wiley, 2011). This book is not a substitute for the textbook, nor is it a
stand alone computer manual. It is a companion to the textbook, showing how to perform the
examples in the textbook using Excel 2007. This book will be useful to students taking
econometrics, as well as their instructors, and others who wish to use Excel for econometric
analysis.

In addition to this computer manual for Excel, there are similar manuals and support for the
software packages EViews, Gretl, Shazam, and Stata. In addition, all the data for Principles of
Econometrics, lh in various formats, including Excel, are available at
http://www.wiley.com/college/hill. Individual data files, as well as errata for this manual and the
textbook, can also be found at http://principlesofeconometrics.com.

The chapters in this book parallel the chapters in Principles of Econometrics, lh. Thus, if you
seek help for the examples in Chapter 11 of the textbook, check Chapter 11 in this book.
However within a Chapter the sections numbers in Principles of Econometrics, lh do not
necessarily correspond to the Excel manual sections.

This work is a revision of Using Excel 2007 for Principles of Econometrics, 3rd Edition by
Genevieve Briand and R. Carter Hill (Wiley, 2010). Genevieve Briand is the corresponding
author.

We welcome comments on this book, and suggestions for improvement. *

Genevieve Briand
School of Economic Sciences
Washington State University
Pullman, WA 99164
gbriand@wsu.edu

R. Carter Hill
Economics Department
Louisiana State University
Baton Rouge, LA 70803
eohill@lsu.edu

·
Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. Our use does not directly or indirectly imply
Microsoft sponsorship, affiliation, or endorsement.

iv
BRIEF CONTENTS

1. Introduction to Excel 1

2. The Simple Linear Regression Model 19

3. Interval Estimation and Hypothesis Testing 67

4. Prediction, Goodness-of-Fit and Modeling Issues 95

5. The Multiple Linear Regression 143

6. Further Inference in the Multiple Regression Model 154

7. Using Indicator Variables 180

8. Heteroskedasticity 204

9. Regression with Time Series Data: Stationary Variables 228

10. Random Regressors and Moment-Based Estimation 262

11. Simultaneous Equations Models 278

12. Nonstationary Time-Series Data and Cointegration 294

13. Vector Error Correction and Vector Autoregressive Models 310

14. Time-Varying Volatility and ARCH Models 328

15. Panel Data Models 355

16. Qualitative and Limited Dependent Variable Models 391

A. Mathematical Tools 402

B. Review of Probability Concepts 416

C. Review of Statistical Inference 431

Index 466

v
CONTENTS 2.4.1 Model Assumptions 45
2.4.2 Random Number Generation
47
CHAPTER 1 Introduction to Excel 1
2.4.3 The LINEST Function 49
1.1 Starting Excel 1
2.4.4 Repeated Sampling 50
1.2 Entering Data 3
2.5 Variance and Covariance ofb1 and b2
1.3 Using Excel for Calculations 3
52
1.3.1 Arithmetic Operations 3
2.6 Nonlinear Relationships 53
1.3.2 Mathematical Functions 4
2.6.1 A Quadratic Model 53
1.4 Editing your Data 6
2.6.la Estimating the Model
1.5 Saving andPrinting your Data 8
53
1.6 Importing Data into Excel 10
2.6.lb ScatterPlot ofData
1.6.1 Resources for Economists
with Fitted Quadratic
on the Internet 10
Relationship 55
1.6.2 Data Files forPrinciples of
2.6.2 A Log-Linear Model 57
Econometrics 13
2.6.2a Histograms ofPRICE
1.6.2a John Wiley & Sons
and ln(PRJCE) 57
Website 13
2.6.2b Estimating the Model
1.6.2bPrinciples of
61
Econometrics Website
2.6.2c ScatterPlot ofData
14
with Fitted Log­
1.6.3 Importing ASCII Files 14
Linear Relationship
62
CHAPTER 2 The Simple Linear Regression 2.7 Regression with Indicator Variables 63
Model 19 2.7.1 Histograms ofHousePrices
2.1 Plotting the Food Expenditure Data 19 63
2.1.1 Using Chart Tools 21 2.7.2 Estimating the Model 65
2.1.2 Editing the Graph 23
2.1.2a Editing the Vertical
CHAPTER 3 Interval Estimation and
Axis 23
Hypothesis Testing 67
2.1.2b Axis Titles 24
3.1 Interval Estimation 68
2.1.2c Gridlines and Markers
3.1.1 The t-Distribution 68
25
3.1.1a The t-Distribution
2.1.2d Moving the Chart
versus Normal
26
Distribution 68
2.2 Estimating a Simple Regression 27
3.1.1b t-Critical Values and
2.2.1 Using Least Squares
Interval Estimates
Estimators' Formulas 27
69
2.2.2 Using Excel Regression
3.1.1c Percentile Values
Analysis Routine 31
69
2.3 Plotting a Simple Regression 34
3.1.1d TINY Function 69
2.3.1 Using TwoPoints 34
3.1.le Appendix E: Table 2
2.3.2 Using Excel Built-in Feature
inPOE 71
38
3.1.2 Obtaining Interval Estimates
2.3.3 Using a Regression Option
71
38
3.1.3 An Illustration 71
2.3.4 Editing the Chart 40
2.4 Expected Values of b1 and b2 44

vi
3.1.3a Using the Interval 3.4.1 Thep-Value Rule 88
Estimator Formula 3.4.1a Definition ofp-value
71 88
3.1.3b Excel Regression 3.4.1b Justification for thep­
Default Output 73 Value Rule 89
3.1.3c Excel Regression 3.4.2 The TDIST Function 91
Confidence Level 3.4.3 Examples of Hypothesis Tests
Option 74 Revisited 92
3.1.4 The Repeated Sampling 3.4.3a Right-Tail Test from
Context (Advanced Material) Section 3.3.1b 92
75 3.4.3b Left-Tail Test from
3.1.4a Model Assumptions Section 3.3.2 92
75 3.4.3c Two-Tail Test from
3.1.4b Repeated Random Section 3.3.3a 93
Sampling 75 3.4.3d Two-Tail Test from
3.1.4c The LINEST Function Section 3.3.3b 93
Revisited 77
3.1.4d The Simulation
CHAPTER 4 Prediction, Goodness-of-Fit
Template 78
and Modeling Issues 95
3.1.4e The IF Function 79
4.1 Least Squares Prediction 96
3.1.4f The OR Function 79
4.2 Measuring Goodness-of-Fit 98
3.1.4g The COUNTIF
4.2.1 Coefficient of Determination
Function 80
or R2 98
3.2 Hypothesis Tests 81
4.2.2 Correlation Analysis and R2
3.2.1 One-Tail Tests with
98
Alternative "Greater Than" (>)
4.2.3 The Food Expenditure
81
Example and the CORREL
3.2.2 One-Tail Tests with
Function 99
Alternative "Less Than"(<)
4.3 The Effects of Scaling the Data 100
82
4.3.1 Changing the Scale of x 100
3.2.3 Two-Tail Tests with
4.3.2 Changing the Scale ofy 101
Alternative "Not Equal To"(:1:)
4.3.3 Changing the Scale of x andy
82
102
3.3 Examples of Hypothesis Tests 82
4.4 A Linear-Log Food Expenditure Model
3.3.l Right-Tail Tests 83
104
3.3.la One-Tail Test of
4.4.l Estimating the Model 104
Significance 84
4.4.2 Scatter Plot of Data with Fitted
3.3.lb One-Tail Test of an
Linear-Log Relationship 105
Economic Hypothesis
4.5 Using Diagnostic Residual Plots 108
84
4.5.1 Random Residual Pattern
3.3.2 Left-Tail Tests 84
108
3.3.3 Two-Tail Tests 86
4.5.2 Heteroskedastic Residual
3.3.3a Two-Tail Test of an
Pattern 111
Economic Hypothesis
4.5.3 Detecting Model Specification
87
Errors 112
3.3.3b Two-Tail Test of
4.6 Are the Regression Errors Normally
Significance 87
Distributed? 115
3.4 Thep-Value 88

vii
4.6.1 Histogram of the Residuals 5.3.2a Left-Tail Test of
115 Elastic Demand
4.6.2 The Jarque-Bera Test for 146
Normality using the CHINV 5.3.2b Right-Tail Test of
and CHIDIST Functions 118 Advertising
4.6.3 The Jarque-Bera Test for Effectiveness 147
Normality for the Linear-Log 5.4 Polynomial Equations: Extending the
Food ExpenditureModel 121 Model for Burger Barn Sales 148
4.7 PolynomialModels: An Empirical 5.5 Interaction Variables 149
Example 122 5.5.1 LinearModels 149
4.7.1 Scatter Plot of Wheat Yield 5.5.2 Log-LinearModels 151
over Time 123 5.6 Measuring Goodness-of-Fit 153
4.7.2 The Linear EquationModel
125
CHAPTER 6 Further Inferenee in the
4.7.2a Estimating theModel
Multiple Regression Model 154
125
6.1 Testing the Effect of Advertising: the F­
4.7.2b Residuals Plot 126
test 154
4.7.3 The Cubic EquationModel
6.1.1 The Logic of the Test 154
126
6.1.2 The Unrestricted and
4.7.3a Estimating theModel
RestrictedModels 155
126
6.1.3 Test Template 158
4.7.3b Residuals Plot 128
6.2 Testing the Significance of theModel
4.8 Log-LinearModels 129
159
4.8.1 A Growth Model 129
6.2.1 Null and Alternative
4.8.2 A Wage Equation 130
Hypotheses 159
4.8.3 Prediction 132
6.2.2 Test Template 159
4.8.4 A Generalized R2Measure
6.2.3 Excel Regression Output 160
135
6.3 The Relationship between t- and F-Tests
4.6.5 Prediction Intervals 136
161
4.9 A Log-LogModel: Poultry Demand
6.4 Testing Some Economic
Equation 139
Hypotheses 163
4.9.1 Estimating theModel 139
6.4.1 The Optimal Level of
4.9.2 A Generalized R2Measure
Advertising 163
140
6.4.2 The Optimal Level of
4.9.3 Scatter Plot of Data with Fitted
Advertising and Price 164
Log-Log Relationship 140
6.5 The Use of Nonsample Information
166
CHAPTER 5 The Multiple Linear Regression 6.6 Model Specification 167
143 6.6.1 Omitted Variables 167
5.1 Least Squares Estimates Using the 6.6.2 Irrelevant Variables 169
Hamburger Chain Data 143 6.6.3 The RESET Test 172
5.2 Interval Estimation 145 6.7 Poor Data, Collinearity and
5.3 Hypothesis Tests for a Single Coefficient Insignificance 176
145 6.7.1 CorrelationMatrix 176
5.3.1 Tests of Significance 145 6.7.2 The CarMileageModel
5.3.2 One-Tail Tests 146 Example 177

viii
CHAPTER 7 Using Indicator Variables 180 8.4.2 Grouped Data: Wage Equation
7.1 Indicator Variables: The University Example 222
Effect on House Prices Example 180 8.4.2a Separate Wage
7.2 Applying Indicator Variables 182 Equations for
7.2.1 Interactions Between Metropolitan and
Qualitative Factors 182 Rural Areas 222
7.2.2 Qualitative Factors with 8.4.2b GLS Wage Equation
Several Categories 185 223
7.2.3 Testing the Equivalence of 8.5 Generalized Least Squares: Unknown
Two Regressions 187 Form of Variance 224
7.3 Log-Linear Models: a Wage Equation
Example 191
CHAPTER 9 Regressions with Time Series
7.4 The Linear Probability Model: A
Data: Stationary Variables 228
Marketing Example 192
9.1 Finite Distributed Lags 228
7.5 The Difference Estimator: The Project
9.1.1 US Economic Time Series
STAR Example 193
228
7.6 The Differences-in-Differences
9.1.2 An Example: The Okun's Law
Estimator: The Effect of Minimum Wage
230
Change Example 198
9.2 Serial Correlation 232
9.2.1 Serial Correlation in Ouput
CHAPTER 8 Heteroskedasticity 204 Growth 232
8.1 The Nature ofHeteroskedasticity 204 9.2.la Scatter Diagram for Gt
8.2 Detecting Heteroskedasticity 206 and Gt-1 232
8.2.1 Residual Plots 206 9.2.lb Correlogram for G
8.2.2 Lagrange Multiplier Tests 233
206 9.2.2 Serially Correlated Errors
8.2.2a Using the Lagrange 237
Multiplier or Breusch­ 9.2.2a Australian Economic
Pagan Test 206 Time Series 237
8.2.2b Using the White Test 9.2.2b A Phillips Curve
209 239
8.2.3 The Goldfeld-Quandt 9.2.2c Correlogram for
Test 210 Residuals 240
8.2.3a The Logic of the Test 9.3 Lagrange Multiplier Tests for Serially
210 Correlated Errrors 241
8.2.3b Test Template 211 9.3.1 !-Test Version 241
8.2.3c Wage Equation 9.3.2 T x R2 Version 243
Example 212 9.4 Estimation with Serially Correlated
8.2.3d Food Expenditure Errors 245
Example 216 9.4.1 Generalized Least Squares
8.3 Heteroskedasticity-Consistent Standard Estimation of an AR(1) Error
Errors or the White Standard Errors Model 245
219 9.4.la The Prais-Winsten
8.4 Generalized Least Squares: Known Form Estimator 245
of Variance 221 9.4.lb The Cochrane-Orcutt
8.4.1 Variance Proportional to x: Estimator 248
Food Expenditure Example 9.4.2 Autoregressive Distributed
221 Lag (ARDL) Model 252

ix
9.5 Forecasting 254 11.1.2a 2SLS Estimates for
9.5.1 Using an Autoregressive (AR) Truffle Demand
Model 254 281
9.5.2 Using an Exponential 11.1.2b 2SLS Estimates for
Smoothing Model 257 Truffle Supply
9.6 Multiplier Analysis 258 283
11.2 Supply and Demand Model for the
Fulton Fish Market 286
CHAPTER 10 Random Regressors and
11.2.1 The Reduced Form Equations
Moment-Based Estimation 262
286
10.1 OLS Estimation of a Wage Equation
11.2.la Reduced Form
262
Equation for lnQ
10.2 Instrumental Variables Estimation of the
286
Wage Equation 264
11.2.1b Reduced Form
10.2.1 With a Single Instrument 264
Equation for lnP
10.2.la First Stage Equation
287
for EDUC 264
11.2.2 The Structural Equations or
10.2.lb Stage 2 Least
Stage 2 Least Squares
Squares Estimates
Estimates 290
265
11.2.2a 2SLS Estimates for
10.2.2 With a Surplus Instrument
Fulton Fish Demand
268
290
10.2.2a First Stage Equation
for EDUC 268
10.2.2b Stage 2 Least CHAPTER 12 Nonstationary Time-Series
Squares Estimates Data and Cointegration 294
270 12.1 Stationary and Nonstationary
10.3 Specification Tests for the Wage Variables 294
Equation 273 12.1.1 US Economic Time Series
10.3.1 The Hausman Test 273 294
10.3.2 Testing Surplus Moment 12.1.2 Simulated Data 296
Conditions 274 12.2 Spurious Regressions 299
12.3 Unit Root Tests for Stationarity 301
12.4 Cointegration 306

CHAPTER 11 Simultaneous Equations


Models 278 CHAPTER 13 Vector Error Correction and
11.1 Supply and Demand Model for Truffles Vector Autoregressive Models 310
278 13.1 Estimating a VEC Model 310
11.1.1 The Reduced Farm Equations 13.1.1 Test for Cointegration 312
279 13.1.2 The VEC Model 315
11.1.1a Reduced Farm 13.2 Estimating a VAR Model 317
Equation for Q 13.2.1 Test for Cointegration 318
279 13.2.2 The VAR Model 321
11.1.1b Reduced Farm 13.3 Impulse Responses Functions 323
Equation for P 13.3.1 The Univariate Case 323
280 13.3.2 The Bivariate Case 325
11.1.2 The Structural Equations or
Stage 2 Least Squares
Estimates 281

x
CHAPTER 14 Time-Varying Volatility and 15.4.3 Estimation: Different
ARCH Models 328 Coefficients, Different Error
14.1 Time-Varying Volatility 328 Variances 384
14.1.1 Returns Data 328 15.4.4 Seemingly Unrelated
14.1.2 Simulated Data 334 Regressions: Testing for
14.2 Testing and Forecasting 341 Contemporaneous Correlation
14.2.1 Testing for ARCH Effects 388
341
14.2.la Time Series and
CHAPTER 16 Qualitative and Limited
Histogram 342
Dependent Variable Models 391
14.2.lb Lagrange Multiplier
16.1 Least Squares Fitted Linear Probability
Test 344
Model 391
14.2.2 Forecasting Volatility 347
16.2 Limited Dependent Variables 393
14.3 Extensions 349
16.2.1 Censored Data 393
14.3.1 The GARCH Model 349
16.2.2 Simulated Data 395
14.3.2 The T-GARCH Model 350
14.3.3 The GARCH-In-Mean Model
352 APPENDIX A Mathematical Tools 402
A. I Mathematical Operations 402
A.1.1 Exponents 408
CHAPTER 15 Panel Data Models 355
A.1.2 Scientific Notation 409
15.1 Pooled Least Squares Estimates of Wage
A.1.3 Logarithm and the Number e
Equation 355
410
15.2 The Fixed Effects Model 357
A.2 Percentages 413
15.2.1 Estimates of Wage Equation
for SmallN 357
15.2.la The Least Squares APPENDIX B Review of Probability
Dummy Variable Concepts 416
Estimator for Small B.1 Binomial Probabilities 416

N 357 B.1.1 Computing Binomial

15.2.lb The Fixed Effects Probabilities Directly 417


Estimator: Estimates B.1.2 Computing Binomial
of Wage Equation Probabilities Using

forN=lO 361 BINOMDIST 419


15.2.2 Fixed Effects Estimates of B.2 The Normal Distributions 422
Wage Equation from Complete B.2.1 The STANDARDIZE
Panel 365 Function 422

15.3 The Random Effects Model 371 B.2.2 The NORMSDIST

15.3.1 Testing for Random Effects Function 423


371 B.2.3 The NORMSINV
15.3.2 Random Effects Estimation of Function 423

the Wage Equation 373 B.2.4 The NORMDIST


15.4 Sets of Regression Equations 381 Function 424
15.4.1 Estimation: Equal Coefficients, B.2.5 The NORMINV

Equal Error Variances 381 Function 424

15.4.2 Estimation: Different B.2.6 A Template for Normal


Coefficients, Equal Error Distribution Probability

Variances 383 Calculations 424

xi
B.3 Distributions Related to the Normal
426
B.3.1 The Chi-Square Distribution
426
B.3.2 The t-Distribution 428
B.3.3 The F-Distribution 429

APPENDIX C Review of Statistical Inference


431
C.1 Examining a Sample of Data 431
C.2 Estimating Population Parameters 436
C.2.1 Creating Random Samples
436
C.2.2 Estimating a Population Mean
438
C.2.3 Estimating a Population
Variance 438
C.2.4 Standard Error of the Sample
Mean 439
C.3 The Central Limit Theorem 439
C.4 Interval Estimation 444
C.4.1 Interval Estimation with u2

unkown 446
C.4.2 Interval Estimation with the
Hip Data 447
C.5 Hypothesis Tests About a Population
Mean 449
C.5.1 An Example 450
C.5.2 The p-value 450
C.5.3 A Template for Hypothesis
Tests 451
C.6 Other Useful Tests 454
C.6.1 Simulating Data 454
C.6.2 Testing a Population Variance
456
C.6.3 Testing Two Population Means
459
C.6.4 Testing Two Population
Variances 461
C.7 Testing Population Normality 463
C.7.1 A Histogram 463
C.7.2 The Jacque-Bera Test 465

Index 467

xii
CHAPTER 1

Introduction to Excel

CHAPTER OUTLINE
1.1 Starting Excel 1.6 Importing Data into Excel
1.2 Entering Data 1.6.1 Resources for Economists on the Internet
1.3 Using Excel for Calculations 1.6.2 Data Files for Principles of Econometrics
1.3.1 Arithmetic Operations 1.6.2a John Wiley & Sons Website
1.3.2 Mathematical Functions 1.6.2b Principles of Econometrics Website
1.4 Editing your Data 1.6.3 Importing ASCII Files
1.5 Saving and Printing your Data

1.1 STARTING EXCEL

Find the Excel shortcut on your desktop. Double click on it to start Excel (left clicks).

Alternatively, left-click the Start menu at the bottom left comer of your computer screen.

i1/,; Sta rt
... " ' .:,!o., ""

Slide your mouse over All programs, Microsoft Office, and finally Microsoft Office Excel
2007. Left-click on this last one to start Excel-or better yet, if you would like to create a
shortcut, right-click on it; slide your mouse over Send to, and then select (i.e. drag your mouse
over and left-click on) Desktop (create shortcut). An Excel 2007 short-cut is created on your
desktop. If you right-click on your shortcut and select Rename, you can also type in a shorter
name like Excel.

1
2 Chapter 1

Excel opens to a new file, titled Book I. You can find the name of the open file on the very top of
the Excel window, on the Title bar. An Excel file like Bookl contains several sheets. By default,
Excel opens to Sheet I of Book I. You can figure out which sheet is open by looking at the Sheet
tabs found in the lower left comer of your Excel window.

- "

title bar fcrmula bar help button


$ty/es
1-0 cell reference group of
II c1>mmand.s
v
ll_
11

There are lots of little bits that you will become more familiar with as we go along. The Active
cell is surrounded by a border and is in Column A and Row I; its Cell reference is Al.

Below the title bar is a Tab list. The Home tab is the one Excel opens to. Under each tab you
will find groups of commands. Under the home tab, the first one is the Clipboard group of
commands, named after the tasks it relates to. The wide bar including the tab list and the groups
of commands is referred to as the Ribbon. The content of the Active cell shows up in the
Formula bar (right now, there is nothing in it). Perhaps the most important of all of this is to
locate the Help button on the upper right comer of the Excel window. Finally, you can use the
Scroll bars and the arrows around them to navigate up-down and right-left in your worksheet.
And you have a long way to go: each worksheet in Microsoft Excel 2007 contains 1,048,576
rows and 16,384 columns!!!!

Note that your Ribbon might look slightly different than the one shown above. If your screen is
bigger, Excel will automatically display more of its available options. For example, in the Styles
group of command, instead of the Cell styles button, you might have a colorful display of cell
styles.
Introduction to Excel 3

1.2 ENTERING DATA

We will use Excel to analyze data. To enter labels and data into an Excel worksheet move the
cursor to a cell and type. First type X in cell Al. Press the Enter key on your keyboard to get to
cell A2 or navigate by moving the cursor with the mouse, or use the Arrow keys (to move right,
left, up or down). Fill in the rest as shown below:

1
2
3
4
s

1.3 USING EXCEL FOR CALCULATIONS

What is Excel good for? Its primary usefulness is to carry out repeated calculations. We can add,
subtract, multiply and divide; and we can apply mathematical and statistical functions to the data
in our worksheet. To illustrate, we are going to compute the squares of the numbers we just
entered and then add them up. There are two main ways to perform calculations in Excel. One is
to write formulas using arithmetic operators; the other is to write formulas using mathematical
functions.

1.3.1 Arithmetic Operations

Select the Excel Help button in the upper right comer of your screen. In the window of the Excel
Help dialog box that pops up, type arithmetic operators and select Search. In the list of results,
select Calculation operators and precedence.

�Excel He.Ip
R.esults 1-25 �f l'J
- l!ll x (� ... �) �) � � Ai
arithmetic-0perators '_formulas

Standard arithmetic operators are defined as shown below. To close the Excel help dialog box,
select the X button found on its upper right comer.

Anthmetic operator Mear\ln'!ll Example

... (i>lus sign) Addition :J..f.J

- (minus sign) Subtract10J11 3-1

Negation -1

"(asterisk) MuJtlplicalicm '3"3

I (forward slash) DlVISilln JIJ

% (percent sign) Percent 2.0%


- r:l


.. (caret) ExponentiaUo-n 3"2
4 Chapter 1

Place your cursor in cell Bl, and type X-squared. In cells B2 through B6 below (henceforth
referred to as B2:B6), we are going to compute the squares of the corresponding values from cells
A2:A6. Let us emphasize that the trick to using Excel efficiently is NOT to re-type values already
stored in the worksheet, but instead to use references of cells where the values are stored. So, to
compute the square of 1, which is the value stored in cell Al, instead of using the formula =l*l,
you should use the formula =A2*A2 or =A2"2. Place your cursor in cell B2 and type the formula.

SUM
.. ( x "" f;o I =A2"2
A I B j c I D I
1 )(
2 1] ill •

Then press Enter. Note that: (1) a formula always starts with an equal sign; this is how Excel
recognizes it is a formula, and (2) formulas are not case sensitive, so you could also have typed
=a2"2 instead. Now, we want to copy this formula to cells B3:B6. To do that, place your cursor
back into cell B2, and move it to the south-east comer of the cell, until the fat cross turns into a
skinny one, as shown below:
A I B � c
1 x X-s91.1•nea
2.
11 11
,_ f
3 .2

Left-click, hold it, drag it down to the next four cells below, and release!

Excel has copied the formula you typed in cell B2 into the cells below. The way Excel
understands the instructions you gave in cell B2 is "square the value found at the address A2".
Now, it is important to understand how Excel interprets "address A2". To Excel "address A2"
means "from where you are at, go left by one cell"-because this is where A2 is located vis-a-vis
B2. In other words, an address gives directions: left-right, up-down, and distances: number of
cells away-all in reference to the cell where the formula is entered. So, when we copied the
formula we entered in cell B2, which instructed Excel to collect the value stored one-cell away
from its left, and then square it-those exact same instructions were given in cells B3:B6. If you
place your cursor back into B3, and look at the Formula bar, you can see that, in this cell, these
same instructions translate into "=A3"2".

B.3 ... (. /.I =A311.2


I A I B j c I D I
1 x X::s9':_ared
-
2 1 1
-
l
21 4! �

1.3.2 Mathematical Functions

There are a large number of mathematical functions. Again, the list of functions available in
Excel can be found by calling upon our good friend Help button and type Mathematical
functions. If you try it, you will be able to see that the list is long. We will not copy it here.
Introduction to Excel 5

We did compute the squares of the numbers we had. Now we will add them up-the numbers,
and the squares of the numbers, separately. For that, we will be using the SUM function.

We first need to select or highlight all the numbers from our table. There are several ways to
highlight cells. For this small area the easiest way is to place your cursor in A2, hold down the
left mouse button and drag it across the area you wish to highlight-i.e. all the way to cell B6.
Here is how your worksheet should look like:

A B I
1 x X-sauared

2 1 1

a 2 4

4 3 '9

5 4 16

6 5 025 •

Next, go to the Editing group of command, which is found in the extreme right of the Home tab,
and select :r. AutoSum.

i%Aut�� �
!ii f!IC:!:"- Z1f'
Sort & Find &
Cl;ear •
Hitt r • Selt:d •

Editing

Excel sums the numbers from each column and places the sum in the bottom cell of each column.
The result is:
-

.A El I
1 x X-squared

2 1 1

3 2 4

4 3 9

5 4 16

5 5 2.5

7 15 55
..

Notice that if you select the arrow found to the right of :r. AutoSum you can find a list of
additional calculations that Excel can automatically perform for you.

Alternatively, you could have placed your cursor in cell A7, typed =SUM(A2:A6), and pressed
the Enter key (and then copied this formula to cell B7).

A I B
7 l=SUM(A2:::" 6)

Note that: (1) as soon as you type the first letter of your function, a list of all the other available
functions that start with the same letter pops up. This can be very useful: if you left click on any
of them, Excel gives you its definition; if you double left-click on any of them, it automatically
finishes typing the function name for you, and (2) once the function name and the opening
parenthesis are typed, Excel reminds you of what the needed Arguments are, i.e. what else you
need to specify in your function to use it properly.
6 Chapter 1

Now, you could also have used the Insert function button, which you can find on the left side of
the Formula bar .

Once your cursor is placed in A 7, select the Insert function button. An Insert function dialog
box pops up. You can Select a function you need (highlight it, and select OK), or Search for a
function first (follow the instructions given in that window).

- --- -- - __

Ins-ert Function �l'.EJ


s_e,,,rch 'fur a function:
Tyl?e a.brief' desaiption raf what you "-•mt to do and ther> dick
Gg [
Or select a 93tegcry : J Mo•t Re<0en tly
�-------�
u..,d

Select a funttiC!JQ_:

"I

In the Function Arguments dialog box that pops up, you need to specify the cell references of
the values you want to add. If they are not already properly specified, you can type A2:A6 in the
Number 1 window, or place your cursor in the window, delete whatever is in it, and then select
A2:A6. Select OK. Now that you have the formula in A7, copy it into B7 .

. -

Functforn Arguments - CTJ�


SUM

Number1 jA2::A6

1.4 EDITING YOUR DATA

Before wrapping-up, you want to polish the presentation of your data. It actually has less to do
with appearance than with organization and communication. You want to make sure that anyone
can easily make sense of your table (like your instructor for example, or yourself for that
matter-when you come back to it after you let it sit for a while).

We are going to add labels and color/shade to our table. Hold your cursor over cell A until it turns
into an arrow-down; left-click to select the whole column; and select Insert in the Cells group of
commands, found left to the Editing group of commands.

JS.:i.-

·n � l g iH
1 x
2 l 2. 1 [ns_ert De�.e1e li'o�at

3 z 2
_3
4 3 4 3 C:�ll•

Excel adds a new column to the left of the one you selected. That's where we are going to write
our labels. In the new Al cell, type Variables; in cell A2, type Values; in cell A7 type Sum .
Introduction to Excel 7

A B A
1 x 1 v.a�iables
-

2 1 -
2 Values
-

3 -
2 3
4 3 -
4
5 4 5
5 5 5

L 15 7 Sum

Select column A again, make it Bold (Font group of commands, right to the Clipboard one), and
align it Left (Alignment group of commands, right to the Font one).

caribri �l I A � •
[= = =lJ�· / � wrapT�xt
�I Ir T1[03 Tl[&� ,A �/ Ii([§ �J I �� ��l
-

Font fii ·
Al1gnme-nt

Select cells Bl and Cl, and make them Bold. Repeat with cells B7 and C7. Better, but not there
yet. Select row 7, make it Italic (next to Bold). Select column B, hold your left-click and drag
your mouse over cell C to select column C too; select Center alignment (next to Left). Next,
select A2:A6; left-click the arrow next to Merge & Center (on the Alignment group of
commands), and select Merge cells.

Immediately after, select Middle Align, which is found right above the Center alignment button.

AllJJnm�nt

Select Al:C7, left-click the arrow next to the Bottom Border button and select All Borders.

61),r.ilers

BJ llQtl.Om Bo·rder

t::i::! Top_ B.order

E':: !•ft Bcrd\'r


EJ Ri!<hl Be rder
jca;lrnri ·�· IK .A1
No l±lorder
a Tl
i .. : .• ;

f B1 I 1! T1u::n hrT I EB �II Bordie��



fnnt r. EB Ocrt1iok Borden

Select A7:C7 (A7:C7, not Al:C7 this time), left-click the arrow next to the Fill Color button,
and select a grey color to fill in the cell with. Choose a different color for Al:Cl.
8 Chapter 1

Theme Colms

[caJilbri T 111
rA ATJ
T

le I JI ·j I � �1 �. A 1 �

Fant � Ii

Finally, put your cursor between cells C and D until it turns to a left and right arrow as shown
here:
C + D

Hold it there and double left-click so that the width of column C gets resized to better
accommodate the length of the label "X-squared". The result is:

A B c
rtl. -
- 1-- --

1 variables x X-squared
1 1

�"''"�
2 4
3 9
4 16
5 25
7 fsum 15 55

Next, drag your cursor over the Sheetl tab, right-click, select Rename and type in a descriptive
name for your worksheet like Excel for POE 1.2-1.4, for Using Excel for Principles of
Econometrics, 4e-sections 1.2 through 1.4. Press the Enter key on your keyboard or left-click
anywhere on your worksheet.

n Excel for 00£ 1.2-:1�"1- / 5heet2 • �


I 1

1.5 SAVING AND PRINTING YOUR DATA

All you need to do now is to save your Excel file. Select the Save button on the upper left comer
of the Excel window.

A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrow-down located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.
Introduction to Excel 9

In the File name window, at the bottom of the Save As dialog box, the generic name Bookl
should be outlined. Type the descriptive name you would like to give to your Excel file, like POE
Chapter 1. Finally, select Save.

File name: lsaOl!J F.de: name: I chapter


POE 1
�==== I I
Save as !.Ype; I Excel WorkbMk Save as :[)lpe: I Excel Workbook

If you need to create a new folder, use the Create New Folder button found to the right of the
Save in window.

A New Folder dialog box pops up; it is prompting you for the name you want to give to your new
folder, Excel for POE for example. Type it in the Name window and select OK. Finally, select
Save.

� ���folder
f::!ame: jExcel for POE
- = �CgJ

c
If you would like to print your table, select the Office Button, next to the Save button; go to
Print, and select one of the print options.

Preview :and �lliint tl\le llO<Ument

f:rint
Se•lect.a p�inter, nrumb�r of rnpies,·and
oth .. r pri111tin.g optiorn< before prri·ntfng.

Qukl<Print
s�nd th• woukbo.olcdi'r�ctly ti© tm.e default
printer with.a"! makin9 changes,

1 17\ Print Prev'iew


� Preview and rmake <h.anges t<J pages before
'Hinting.

Eri nt �· •

For more print options, you might want to check out the Page Layout tab, on the upper left of
your screen, as well as the Page Layout button on the bottom right of your screen.

Hom,; rnsert: P�.g• �aNout

To close your file, select the X button on the upper right comer of your screen.

- �Ix!
,�, - !'- . � 1-'

10 Chapter 1

In the next section, we show you how to import data into an Excel spreadsheet. Getting data for
economic research is much easier today than it was years ago. Before the Internet, hours would be
spent in libraries, looking for and copying data by hand. Now we have access to rich data sources
which are a few clicks away.

First we will illustrate how convenient sites that make data available in Excel format can be. Then
we illustrate how to import ASCII or, text files, into Excel.

1.6 IMPORTING DATA INTO EXCEL

1.6.1 Resources for Economists on the Internet

Suppose you are interested in analyzing the GDP of the United States. The website Resources for
Economists contains a wide variety of data, and in particular the macro data we seek. Websites
are continually updated and improved. We guide you through an example, but be prepared for
differences from what we show here.

First, open up the website http://rfe.org/.

RFE: Resources for Economists on the I n te rn et


RFE l;/<>!n@

ISSN 1081-·4248.
vol. 1J., No. s
RFE Seaoch May, 2010

Editor: .B ill Goffe


Dept. of E.oonomics, SUNY Oswego
Editori'al As;sistant; Rich Freeh

• Int m d u ctio n
• D ta
• - "onarii=:s; G l o=a rles & Enc do edias
• E omi>ts. Dep.artments, & UniY c r s itii:-.s.
• Fore casti ng & Con:.ulting
• Jobs. Grants. Grad School. & Advice

Select the Data link and then select U.S. Macro and Regional Data.
Introduction to Excel 11

RFE: Resourcas for Econo mists on the Internet


!RFE Ho_,

Title Paqe I Oata


Tabre of Contentis: Abridged I Comolete Contents
Se.arch Economic Web Sites I Search RFE

.Data

• U.S. M<icro and Re<:J1c0MI Cla�


• Other U.S. Data
• W0>rld .:ind Non-U.S. Data
• Finance- and Fina11dal Markets
• Journal Data and Pmqram A.rcchi11e.s

This will open up a range of sub-data categories. For the example discussed here, select the
Bureau of Economic Analysis (BEA).

RFE. � Resources for Econ omists on the Internet

Title Page./ Ct.ata I U.S. Macru and Reqion<il Data


Table of Contents:: Abn dgi:d I Complete Contents
Seard1 Economic \11/eb Sites I Search fl.FE
RFE S<nrC)-1

U.S. Macro and Regional Data

"Pn�mary" maa-o and regiona'f sites that


generate data (mau,Y Jong series)

• Bureau of Ea:ino 11c Anal sis BEA - National Income and


P�·c;duce Accoun't:s (GDP, etc;), in ,atiornll and regior1al d.ata
cl et.ails . • .
• Feder.al Reserve
• Bur:eau of Labor St.ab;tics (6LS) - more th<1_n 25.Q,�Oa Jan>i
12 Chapter 1

Finally, select Gross Domestic Product (GDP).

dmw
Latest Information:
Federal Recovery Programs amd BEA Slatislics
Cl.:lrr.ent Re-leases

N'E!U't'S R@leas·@! Sche-dule U.S. Economfc Acc.o-unts


CongrE!::!3sion.a1 Quick Data

Coni@rences and Meetings.


National lnterm1tiona1
N..ai.-.sroo.m
----
Access National Economic Accounts Data Access International Ec·o·nomic Accounts D.ata
RSS llirformation
.. Gross. Domi?S.ti.c Product (GDP) It- Bi3li3rict! of Pd?Jments
• Per;o•n;;;il rm:ome· and Ou-tlay5 � � Trade i111 G:oods al'ild Servi't:e"S
• Cons:umer Sp:ending • )ntemcrtioflal Servi,ce.g
Su.rv.ts-y iJif Olrr�.t-
.n· _B.11�.11·���
t- Comorate Profi ts. t International lr.i\l'estment- Position
Imteradive Data Tables
(.lii' t- Fixed �-sets. Opcsri3tiorn; of M u1tf n�tion i3I Como'3'11ie:!:i

Dig ital tib·r,,.ry t Satellite Accnunt Survev Forms .aPld Related Materials
11 Rssie•arch arid De,u-elopment.
l'apers. and Working l'all"'rs
View all lnte.rnati·onal Accounts Information •••
Metho-dology P"f>""' • View all N1ational Actounts Infarm.:atio1T1 ••.

Electromic Reporting wtith

The result shows the point we are making. Many government and other web sites make data
available in Excel format. Select Current-dollar and "real" GDP.

Gros.s D:omestic Pmduct (-GDP)


News Release; Gmss Domestic Product I PDF
verniofl o.fthe Grn-ss Domestic Product release. Note
t inclu-des highlights, technical note, and Beginning with, th,e 2010 Q2 adval'loee·GDP re·lease (July
associated tables 3·0, 20.W), the advam;ed download fili!s (xi,;, -csv, and zip)
for th.e NIPA Interactive D.at.a Table.swill be split into two
s:ep<arate t im e peri9dsc 1969 to µresent , .a nd data throu9h
• Current-dollar and "r·eal" GDP 'Exc�I

Percent t9·59. This is b-eing d()ne in order \.C) acwmmodate the


• change from preceding perio · ,- el ·
apµroa·ching column limit in Ex-eel 2.0o:i· for tallies showing
i,li lnteracti11e T.ables: GDP and the National Income q u.ar·t e rly series.
and Pr.oduct Account (NlPAl H1stoncal \·abl=

� Selected Nll?A Tallies: Vie•"' tne ch.ange..s to the layout for the advancoo
download P"ae-

• Te.xt fa.rm at ITe:">tt:


• Co.mma-delimited format ,cs
• Port-able document format (PDF'

You have the option of saving the resulting Excel file to your computer or storage device, or
opening it right away-which we proceed to do next.

Do YQU wen: tu open Ill" saYe this file?

Name gdplev. xls.


Type: Microsoft Of1fke Excel 97-2003Worksheet, 25.CJKB
From: 11\JWW . b ea . g<D'll

Ii -'Op en � �[ _ v·_ e_�


_Sa ] 1 Cancel

What opens is a workbook with headers explaining the variables it contained. We see that there is
a series of annual data and a quarterly series.
Introduction to Excel 13

,., A � B I c J _Q___j__ E I F I G I
1 JCurrent-Dollar and "RealA Gr·OSS Domestic Product
2
Quart�Jy
-�

-
3 Annual

_4_ (S�asonally adjusted a n n ua l rat.es)



5
GDP·in GDP in
'GDP in· hillions of · GDP in billions of
billions of d1ai11ed billions of chained
curr9'nt 2005 current 2005
6- dollars dollars dollars dollars

-
7
8
-

-
9, 1929 103.6 977.0 '1.�47q1 23'7.2 1/�2·.2
10 19'30 '91.2 s92
1 .a 1947q2 240.4 1, 7169.5
·11 1931 76-5 1!34_9 19471q] 244_5 1,7@.0
12
-
1932 SS.:7 725_S. 1_19471q4 254_3 1,7'94,,B
'13 1933 56.4 716.4 1.948i;j1 2-60.3 1,823'.4

The opened file is "Read Only" so you must save it under another name to work with it, graph,
run regressions and so on.

1.6.2 Data Files for Principles of Econometrics

The book Principles of Econometrics, 4e, uses many examples with data. These data files have
been saved as workbooks and are available for you to download to your computer. There are
about 150 such files. The data files and other supplementary materials can be downloaded from
two web locations: the publisher website or the book website maintained by the authors.

1.6.2a John Wiley and Sons Website

Using your web browser, enter the address www.wiley.com/college/hill. Find, among the authors
named "Hill", the book Principles ofEconometrics, 4e.

t*- TEXTBOOK
P1rfm:::i.p,1'es of 6c:Ooonu�trics., 4ttll EdJ1Jirn111
R Carter H ill CLouislan.a State Uni.versity), William E. Griffiths
Univers.ity Ctf'Melbourne·, Australia), Gua: C. Um (University of
Melb·ourne ustra.l ia)
January 2011, ©2012

Follow the link to Resources for Students, and then Student Companion Site. There, you will
find links to supplement materials, including a link to Data Files that will allow you to download
all the data definition files and data files at once.
14 Chapter I

1. 6.2b Principles ofEconometrics Website

The address for the book website is www.principlesofeconometrics.com. There, you will find
links to the Data definitions files, Excel spreadsheets, as well as an Errata list. You can download
the data definition files and the Excel files all at once or select individual files. The data definition
files contain variable names, variable definitions, and summary statistics. The Excel spreadsheets
contain data only; those files were created using Excel 2003.

1.6.3 Importing ASCII Files


Sometimes data that you want to use may be provided but in ASCII or text format. To illustrate
go to http://principlesofeconometrics.com. There you will find that one of the formats in which
we provide data is ASCII or text files. These are used because they contain no formatting and can
be used by almost every software once imported.

Favorites. Tools Help

d' Fa11orites I � � ::iuggested Sites .. lol/e.b Slice Galler:t ..

_I �Principles of Ernnometrics SJ .. ml g iii T Page .,. Safety .. Tools •

lnstriuctor Resourrce s from John Wiley & Sons Data files, PowefPoirit Slides, Tustructo:r's.Mairnal

Student, Resources. frnm John Wiley & Sons Datafiles. .and Using Excelfor Principk� oiEconometri.c
Data files: POE includes 148 data files in various formats_ Usiri,g the links 'below you can download all files in a ".ZIP format,
or d01.Vn'load i'ndhiidual fi'le·s_ The data dennifio.n fil·es should he downloaded by all users_

Data d'e-finitfon files (•_def) are text file·s conta:ining variable- ·n ames., definitions .and summary statistics_

ASCII riles (•.dat) are text files contai.nin·g only data. Variable .names are in �.def files.

Select ASCII files and then go to the food data.


Introduction to Excel 15

ASCII data files (* .dat) are text files containing only data.

Dnwriload all ilie * .. dat files in (a) ZIP format m· (b) a s.e1 f- exib'adin!? EXE file (download and double-dick)

Select i'ndividual *. dat files from the table below.

a irli ne cola gQjQ meat profits fax


ale.oho I c ola2 gQ]f medical W!h tax2
andy c o m m w t si growth metrics pube-xp term
asp-aras comouter grunfeld mex1co· .Q.!..O.l texas
banqla1 consumption grunfeld2'. mininqi quizzes the-ories
beer £m grunfeldJ money returns tobit
bond cps sm a ll hhSUF\18V lilQ!'.!Jill ri_Q!l_ tobitmc
12! ffil.-1 hill mroz robberv toodyay
br2 cps2 house starts music salary tran Sf!CJrl
bro i l e r crime housing ne ls sales truffle-s
brum111 csi hwage nels small savirms tun a
w demand indpro newbroiler share· llk
canada, demo inflatiCJl'I nls sheep unit
capm2: edu ]nc insur nls panel sirmans usa

QI§. .fil!!Q: ivre21 nls 11ane l 2 .w 11town


cattle exrate ivreg2 oil spuri'ous vacan
ces fair � olympics sterling vacation
cespro figureC-3 korea oram:ie stockton •1ar
ch10 fi ori.d a learn oscar stockton2' vec

chard food liquor fil: stockton96 vote


cloth � lon1 � sumlus •1ote2

Right-click on the file name. Select Save Target As. A Save As dialog box pops up. Locate the
folder you want to save your file in by using the arrow-down located at the extreme right of the
Save in window or browsing through the list of folders displayed below it. Finally, select Save.

Once the download of the file 1s completed, a Download complete window pops up. Choose
Close.

r Do�nlmid complete ----- ��1(ill

Do "'nload Complete

food. d�t Ii-om


. VllW'tl . pr:m::i!'iesafernrnxne trirn . rnm

Downloaded: 960 bytes ir:i Lsec


Download to: C:ipocuments<1nd Setti ... \food.dat
Transrer rate: 961:l bytes{Sec

oaose·this dialmg max whien downlo�d .completes.

.Open ] [ Open Foldlec l [ Clo"e ti]

Start Excel. Select the Office Button on the upper left comer of the Excel window, then Open.
16 Chapter 1

Navigate to the location of the data file. Make sure you have selected All Files in the Files of
Type window. Select you food.dat file and then select Open .

. --

Open

Look�: Iii:::! DATA

Fili:'s of!;ype: IAll Files{'*,'"')


11·�
What begins is a Windows "Wizard" that will take you through 3 steps to import the data into
Excel. Our ASCII data files are neatly lined up in columns with no commas or anything else
separating the columns. Select Fixed width, and then Next.

Text I mport W izard - Step 1 of 3 r:I)�


The Text Wizard has determined that your data is Delimited.

If this.i;-,·mrrect, ·choose Next) or ch�ose the data type that best describes your �ata.

_
Original data type

/
S:hoose the file type that best describe� your data:
-
0 Q_�limited - Characters such as commas or tabs separate each fi.eld:

®fh��·�··_cii�.\F1 - Fields are aligned in colum.ns with spaces between each field. ·

Start import at IDW: I1____


.... !-"I File �igin: 4.37 : OEM United States

Preview-of-File C:\data\econ4630\food-.dat .

l . 115.ZZ 3.69
z 135. 98 4 .. 39
3 119-. 31 4. 75
Pr.e'View of Data file
4 ll4. 9oS 6_0_3,
5 lB_'I_ 05 12: 47
__

[ Cancel <Bad' !::!ext > I [ E_inish

In the next step the data are previewed. By clicking on the vertical black line you could adjust the
column width, but there is no need most of the time. For neatly arrayed data like ours, Excel can
determine where the columns end and begin. Select Next again.
Introduction to Excel 17

- ------ - -�=- ·--

;
r �

Tert Import Wiz.ard - Step 2 of J 11:] �


This s,ITeen leiB "lf-Plil set 'fieh:l 'titdttioi (rn'lumn flreaks}.
Lin ef; with <ir:ro1111s signify a rnlumnbreak.

To CREATE a '.break line, dick <it 1he desired position.


To DELEliE a br·eak line, double click on the hne.
To MO\IE a t:Preak line, dick and drag it ..

Data .._reVie'l'll

30 -40 SU 60 7tl

1Hi_2:! .3 _ 6 9 �I
135._:<l·S 4-39 -1
-
11:9.34 4.7Ei I
11'4. S•o& 6. 03

1.87. 05 12.47
�I

Cam:el
l [ <�ck
l �-:· ··.· -��:it_>·_ ··� [ EJnish
]

In the third and final step Excel permits you to format each column, or in fact to skip a column. In
our case you can simply select Finish.

r - ------ -�.

i Text Import Wizard - Step 3 of 3


l1JL8.J
ThlssITeen lets JIOLil .select eac:h -rnlumru -and :set the Data Fo�mat.

column dara funnai:

@ §erJeral
"General' cooverTii rn.1meric 11aliles ill numliers, d<1te v<11ues. ID d11tEs, and all
Ore·xt r·emair;iing values. to :text.
O Q.ate.1 j'-1"1-'o--v_
__ _,,,v,,J,,, [ !!_dvanced . . .. ]
0 1Do. mit [mpcrt column (skip)

Data: g_re view

.... ,
'L3'9
=I
4_7�
fi_ 03
·
12.47 vj
�I

This step concludes the process and now the data is in a worksheet named food.
18 Chapter 1

II A I B I
1 115.22 3.69
2_ B5.98 4.39
-

3 119.�4 4.75
4 114.:96 6.03
-
-
5 187.05 12.47
1 .. � � �1 I food I<" � .•
Rl"aily

Next, you need to save your food data in an Excel File format. To do that, select the Office
Button, Save As, and finally Excel Workbook.

::

�oeel W«kboolt
Save the ffle as an El( (el Workboafc ts
· Enel M.acrn-Eniib!ed Wadl:bcmk.
• Savoe the workbook lrt !he-XML-ba5oed andi
macr.a-e·nabred me farm.at.

E:x<:el _!!in.a'IY Worllboo �


Save the workbook In a. b l n aryfl.feformat
Ol!lfol!l1lzed far '1a1t load ing .and s.avin�.

A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrow-down located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.

.Sa 11e ln: ! g9 My_Daruments

Excel has automatically given a File name, food.xlsx, and specify the file format in the Save as
type window, Excel Workbook (*.xlsx). All you need to do is select Save.

File []3!11E� [food xlsx


,
;:::=====
:::
I
11
Save as· type: Exc:el Workbook ('".xis:() �ave .�

From this point you are ready to analyze the data.

This completes our introductory Chapter. The rest of this manual is designed to supplement your
readings of Principles ofEconometrics, 4e. We will walk you through the analysis of examples
found in the text, using Excel 2007. We would like to be able to replicate most of the plots of data
and tables of results found in your text.
CHAPTER 2

The Simple Linear Regression


Model

CHAPTER OUTLINE
2.1 Plotting the Food Expenditure Data 2.4.2 Random Number Generation
2.1.1 Using Chart Tools 2.4.3 The LINEST Function
2.1.2 Editing the Graph 2.4.4 Repeated Sampling
2.1.2a Editing the Vertical Axis 2.5 Variance and Covariance of b1 and b2
2.1.2b Axis Titles 2.6 Nonlinear Relationships
2.1.2c Gridlines and Markers 2.6.1 A Quadratic Model
2.1.2d Moving the Chart 2.6.1a Estimating the Model
2.2 Estimating a Simple Regression 2.6.1b Scatter Plot of Data with Fitted
2.2.1 Using Least Squares Estimators' Formulas Quadratic Relationship
2.2.2 Using Excel Regression Analysis Routine 2.6.2 A Log-Linear Model
2.3 Plotting a Simple Regression 2.6.2a Histograms of PRICE and
2.3.1 Using Two Points ln(PR/CE)
2.3.2 Using Excel Built-in Feature 2.6.2b Estimating the Model
2.3.3 Using a Regression Option 2.6.2c Scatter Plot of Data with Fitted
2.3.4 Editing the Chart Log-Linear Relationship
2.4 Expected Values of b1 and b2 2.7 Regression with Indicator Variables
2.4.1 Model Assumptions 2.7.1 Histograms of House Prices
2.7.2 Estimating the Model

In this chapter we estimate a simple linear regression model of weekly food expenditure. We also
illustrate the concept of unbiased estimation. In the first section, we start by plotting the food
expenditure data.

2.1 PLOTTING THE FOOD EXPENDITURE DATA

Open the Excel file food. Save it as POE Chapter 2.

Compare the values you have in your worksheet to the ones found in Table 2.1, p. 49 of
Principles of Econometrics, 4e. The second part of Table 2.1 shows summary statistics. You can

19
20 Chapter 2

compute and check on those by using Excel mathematical functions introduced in Chapter 1, if
you would like.

Select the Insert tab located next to the Home tab. Select A2:B41. In the Charts groups of
commands select Scatter, and then Scatter with only Markers.

The result is:

40·

35

30

25 -

20
•.series1
15
• •
10

0 lOIJ 200 300 4-0U 500 60G 700

Each point on this Scatter chart illustrates one household for which we have recorded a pair of
values: weekly food expenditure and weekly income. This is very important. We chose Scatter
chart because we wanted to keep track of those pairs of values. For example, the point
highlighted below illustrates the pair of values (187.05, 12.47) found in row 6 of your table.

.... - ..
-:·
40
'

�5
• I
6:0
... ..
... - .... :
25

.... .. --
••• �
2:0 ......
#"• ,. •• • •seriesl
'15
. ..... .
.... - '
.I'\.
10
_"t I
Serier 1 Point "187 . 1>5000·3 "1
[1!87.050003, 12.47] I
0 I

0 100 200. 30.Q 400 son 500 700 I


I
-

When we select two columns of values to plot on a Scatter chart, Excel, by default, represents
values from the first column on the horizontal axis and values from the second column on the
vertical axis. So, in this case, the expenditure values are illustrated on the horizontal axis and
income values on the vertical axis. Indeed, you can see that the scale of the values on the
The Simple Linear Regression Model 21

horizontal axis corresponds to the one of the food expenditure values in column A, and the scale
of the values on the vertical axis corresponds to the one of the income values in column B.

We actually would like to illustrate the food expenditure values on the vertical axis and the
income values on the horizontal axis-opposite of what it is now. By convention, across
disciplines, the variable we monitor the level of (the dependent variable) is illustrated on the
vertical axis (Y-variable ). And by convention, across disciplines, the variable that we think might
explain the level of the dependent variable is illustrated on the horizontal axis (X-variable).

In our case, we think that the variation of levels of income across households might explain the
variation of levels of food expenditure across those same households. That is why we would like
to illustrate the food expenditure values on the vertical axis and the income values on the
horizontal axis.

X= Income

2.1.1 Using Chart Tools

If you look up on your screen, to the right end of your tab list, you should notice that Chart Tools
are now displayed, adding the Design, Layout, and Format tabs to the list. The Design tab is
open. (If, at any time, the Chart Tools and its tabs seem to disappear, all you need to do is to put
your cursor anywhere in your Chart area, left-click, and they will be made available again.)

Microsoft Excel �i Ch
� a rt-
Ta_
· ·a_

� ���- 1
-
Vlew Add-ms Auobat DeiTgin [;iyo.ut Format

Chart SlylH

Go to the Data group of commands, to the left, and select the Select Data button.

Swit�n Select
Row/CO·IUrtll!l Datot'(
D.ata �
22 Chapter 2

A Select Data Source dialog box pops up. Select Edit.

'
Select Datil Source 11]�
Cbart Qata range: llf@ll!·MRll

rr==1 [ � S�itch,RowfColumn ]�
Le!jel'ld Entries �er,ies) Horizontal (§_ateljory) Axis Labels
���=>'!"'=='=�=rr ����----:---.
[ '§l Md )I CT? E:irut J[ X ;B;emove JI 'It I ' :r/�,
°()
Seriesl 115.220001.
l:J.5.979996
119 .. 339996

114.959999
187 .. 050003

[ !::!)dden and Empty Cells I OK IJ [ Cancel

In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select B2:B41. Highlight and delete the text from the Series Y
values window. Select A2:A41. Select OK.

-- - -

Edit s�ries [1) � :' [dit Series ---- �L8]


_Series aame: Series o.ame:

c__
________ _�
[i]
_, s..., Range m ett'lang�
.Series �values: Series� \lalrues::
�-------�

ifimiiim
m1iiq,iio1:ii
1.••'41:rl
l!ii .11rli 111,-----ji]
-- ri
ii �a. = iu. 22000 i, i3... I�=_Sh _ e_e t_1!_$8_$_2: $8_ _$4_i
___ �[iJ � 3 .. 69, 41.39, 4....

·Series Y values: Series 'i \lalues::

=Sheetl!S8$2 :$8$4 1 � = .3 .. 69, 4.39, 4.... �l=_Sh _ e_e t_11�


$A_$2_:: _
$A_S_
4 1___ �[i] =· l15.220001, 13.. ,

'-------------�
--'
OK iJ I Canrn ] OK t)l 1 Cancel l

The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
income are the X-values, and food expenditure are the Y-values-not the other way around.

The result is:


7{)0

600

500

400

+
300
+ •seriesl
•• •
200

100

() 2() 30 40
The Simple Linear Regression Model 23

2.1.2 Editing the Graph

Now, we would like to do some editing. We do not need a Legend, since we have only one data
series. Our expenditure values do not go over 600, so we can restrict our vertical axis scale to
that. We definitely would like to label our axes. We might want to get rid of our Gridlines, and
change the Format of our data series. Finally, we would like to move our chart to a new
worksheet.

Select the Layout tab. On the Labels group of commands, select Legend and None to delete the
legend.

��ila�T�olt
[;J l"i:l � lib] lil 1
11
Chart Axi·s Ltgen<11 Data Data Non<'
�Label�
De� ta.yo;!) Fermat
InleT nt1e1. �

Labers
T 1able.

��rl r-. Tomi offfle!)rnd


2.1.2a Editing the Vertical Axis

Select the Axes button on the Axes group of commands. Go to Primary Vertical Axis, and select
More Primary Vertical Axis Options.
Show Axis fn !lBllons
Display �.xls with numbers
'e�resente:d in Billions

Show Ax[s with lo-g Seal�


i>isplay Axis u5ing a. tog 10 based
Primali)' J:!oria:ontal Axis • scale
l'rima11Y Yertica U A1ds ' •
• 1 1--Mo•� Prt""'ry Vert<caJ Al!is Optiorn ...
-

A Format Axis dialog box pops up. Change the Maximum value illustrated on the axis from
Auto to Fixed, and speci fy 600.

format Axis �IBJ


[!xis Op'ticng l Axis Optiom
Min'imum:
Number ®.A uto 0 EiKed I" o
Fill Maximum�
Q A!!t:o 0 f.[xed foS�O.O
une Color Major-uriit: @Auto Q R�ed 1100 a

MiMr unit: @ Aut:Q. 0 Fix!l_d


Une 5tyle 110 '

Next select Alignment, and use the arrow-down in the Text direction window to select Rotate
all text 270°.

I I
.ABC Horizontal

!\lumber

Fill I�I .Rotate all text 90"

Line Color I
Alignment
line St>jle
Shadow
Te�tlay,,ut

I
\l_erbcal �lignment: Middle Cente.,, I v
• Rotate all text 210°

1..
J-0 f()rmat

Alignment�
Teir! direction: IHorizonral
C!!_•tom
. "r;ge:
I
"-'J rn c:
Stacked
4T .I i,,�
24 Chapter 2

Place your cursor on the upper blue border of your Format Axis dialog box.

" Format Axis [1]['8]

Left-click, hold it, and drag the box over so you can see your chart; release. Look at the vertical
axis of your chart.

The numbers are now displayed vertically instead of horizontally, but less of them are displayed
as well:

00

00
a
a
v

00

00 0
a
"'
00

We want to change that back.

Select Axis Options again. Change Major unit from Auto to Fixed, and specify 100. Select
Close.

Number � ------- ----- · - -� .

Fill Format Axis Ll] rg)


Line .Color
f Axis Options J Axis. 0 ptions
Une Style
Number Minimum: @ �uto 0 Eixed
Sllado•fli
I 1

Fill Maximum: 0 Ayto ® F!xed J60a.o


.J,-0 Format
Line Color Major unit Q Auto ® Fi�ed l¢o.o
Angnment Minor unit:
Line Style @ AutQ 0 Fix�d I,

2.1.2b Axis Titles

Back to the Labels group of commands; select Axis Titles, go to Primary Horizontal Axis
Title, and select Title Below Axis.

N�me
Do not cd'i1pll!y�nAl<i< Title

Ol·art Axir
Titlies t&
Legenlli Dat.a Datil I� Prirnt•ny !fori>:o°'tal !bi< TlUe �· Trtle Selow Axis
TrtlP · ta.be!'� · Table· Disp!ay Tiflf' belOJ•W Ho ri;zontal t.xis md f°".

· label�
� Prin:ui:yyentlcal Axil Title � re<Lze cha·rt
The Simple Linear Regression Model 25

Select the generic Axis Title in the bottom of your chart and type in x =weekly income in $100.

cr.:: ------------
... x= ;t?
weekly income in S10�J
[!J-- ------------�

Go back to Axis Titles, then to Primary Vertical Axis Title this time. Select Rotated Title.

None
Do nett dl!1Play a.n Aili� Trtle

Chart Axisc Legend Data Dm Primary Horizontal Axis m1e � Rotated' rrtie
Tiitle
� 1iit1E§ N -
Labels� Ta.hie
P1im;:11y Ye rtical �j5. Tltrt
[}i;sp. �a.y Rc.tt iitedl 11.Jcf,5 liitfe and' mile �
"'S labels clnart

Select the generic Axis Title on the left of your chart and press Delete, or put your cursor on top
of the Axis Title box, left-click, and press the Backspace key to delete the generic Axis Title.
Type in y =weekly food expenditure in $.

1:1
·-1
�I
�I
.,, I
=1
i1I
al
.,, I
1111
:i,1
I ...
1

I 111 I
I 1111
I :: I
I .,
o}"j

2.1.2c Gridlines and Markers

Back to the Axes group of commands now. Select Gridlines. Go to Primary Horizontal
Gridlines, and select None.

�I
� -
Axes Grldttnes !iii l?fim a ry .t!o rilzontal Gr�d Ii roes �- � M.aj'or Gr[dlirie5

i\xe5
�� "lilJ l P1imary :\[errtic.al GrldITne;; "\ Dhplay . Hmizontaf G.� icllun es for Major units

Change the Current Selection (group of commands to the far left) to Series 1 (use the arrow
down button to the right of the window to make that selection). Select Format Selection.

Fs ] _j.· . �rRf'Sl w]
� E=ornna.t Selection � l<q,, i'ormat Sell'ction�
� Rfid to M'atcll 'Styl� tij Reset to Matcll S:tyl·�
CurrentSeli:-ction Currenl Selection.
26 Chapter 2

A Format Data Series dialog box pops up. Select Marker Options. Change the Marker Type
from Automatic to Built-in. Change the Type and the Size as shown below:

Marker Type,

0 �bltoma1ic
0 NQne
@ Buili:4n
Type:�
Si2e: a

Next, select Marker Fill. Change it from Automatic to Solid fill. Color options pop up. Change
the Color to black. Select Marker Line Color, and change it from Automatic to No line. Select
Close.
@ ;i.ondfill
-
Marker Fill
·••!'�" il.'11:1 .. •�;�] 0 !?r,.dientfill
Marker Line Color
0 tlofill 0 !:'.icture or te�ture fill Markerfll
� rn
���::;:�tfi
line
Series Options 0 Al,!toma1fc Line Color
N

�olid line
Marker OptiOMS ll D Y:ary colors by poin.t
line Style

��I
0 f:ic.lure or texiure fill 0 i;;radient line
Marlcer Fill
·� @ Ab!toma1ic
�r..lor.:
�- � Markerli"1e Color� ®
-
Ay_toma1fc
- I'
1 11 Close

The result is a replica of Figure 2.6 p. 50 in Principles of Econometrics, 4e: (if it looks like some
of your dots are little flowers, left-click your cursor anywhere on your screen first)

.... .. ,
D I
D .

-
.!ii 0
D
I!! VI .
" .
:t: ..
p
.,, D �
c
... . . . •
8. . .
.
>< D . . . .
llJ 0 . .
.,, m . . . . .
0 . . ..
.g D . . .
::.. 0
. . .
::;;: "' . .
II .

I
"' D . . . .
ii: 0
.....
II
::..
0

0 5 10 15 20 25 30 35 40

:
x� w�eldv inoome in $100

....
I
2.1.2d Moving the Chart

Go back to the Design tab. (Remember if you don't see your Chart Tools tabs, what you need to
do is place your cursor in your chart area and left-click). Select the Move Chart button on the
Location group of commands to the far right of your screen.

Ch.a.rt

li>esngn
T110!5

�: Layout Format
Move
Cha
��rt �<
loGJhcn;
I
The Simple Linear Regression Model 27

A Move Chart dialog box pops up. Select New sheet and give it a name like Figure 2.6. Select
OK.

Choose. where you want the dlart to be placed:

'iF�g _ur _e _2.,6_l


�. � @'Ne:w�he:e:t: ________
_

Q:Qbjectfn! �fs_h ee _tl


_________ �
, v I
OK ] [ Cancel J

Rename Sheet 1 Data (if needed, see Section 1.4 of this manual on how to do that).

We have plotted our data, and edited our chart. Next, we want to estimate the regression line that
best fit the data, and add this line to the chart.

2.2 ESTIMATING A SIMPLE REGRESSION

In this section, we are going to use two different methods to obtain the least squares estimates of
the intercept and slope parameters {31 and {32. Method 1 consists of plugging in values into the
b1 and b2 least squares estimators' formulas. Method 2 consists of making use of Excel built-in
regression analysis routine.

2.2.1 Using Least Squares Estimators' Formulas

The least squares estimators are:


= I(xi - x)(yi - y)
b2 (2.1)
ICxi - x)2

(2.2)

These formulas are telling us two things: (1) which values we need, and (2) how we need to
combine them to compute b1 and b2.

(1) Which values do we need?

We need the (xi, Yi) pairs of values-they do appear explicitly in equation (2.1). We also need x
and y, which are the sample means, or simple arithmetic averages of the xi values and Yi
values-those averages appear both in equation (2.1) and equation (2.2). Note that the subscript i
in xi and Yi keeps count of the x and y values. In other words, i denotes the ith value or ith pair
of values. Also, x and y, are referred to as "x-bar" and "y-bar".
28 Chapter 2

(2) How do we combine those values?

The numerator is the sum of products; L is the Greek capital letter "sigma" which denotes sum.
The first term of each product is the deviation of an x value from its mean (xi x). The second
-

term of each product is the deviation of the corresponding y value from its mean (yi y). The -

products are computed for each (xi,yJ pair of values before they are added together.

The denominator is the sum of the squared deviations from the mean, for the x values only. In
other words, each x value deviation from its mean is first squared, and then all those squared
deviations values are summed.

Equation (2.2): b1 = y - b2.X

This equation tells us to multiply b2 by x, and then subtract this product from y. Note that b2
must be computed first-before b1 can be computed.

There is actually no magic to this. We use the food expenditure and income values we have
collected from our random sample of 40 households, and perform simple arithmetic operations to
compute the estimates the intercept and slope coefficient of our regression line.

As for the computation of b1 and b2 itself, there is only one trick. We need to make sure we
know which values are the x 's and which ones are the y' s. So, we are going to start by adding
labels to our columns of data.

You should be in your Data worksheet. If not, you can go back to it by selecting its tab on the
bottom of your screen.

Select row 2 and insert a new row (see Section 1.4 of this manual if you need help on that). In the
new cell A2, type y; and in the new cell B2, type x. Right-align Al :B2.

I A I B
j' jfood_exp income
_I_J 'J x

Next, we need to lay out the frame of the table where we are going to store our intermediate and
final computations. Type x_bar=in cell D2, y_bar=in cell D3, b2 =in cell D6, and bl=in cell
D7. In cell G2:J2, type x_deviation, y_deviation, (x_dev)(y_dev), and (x_deviation)2,
respectively. (Note that you can use your Tab key, instead of moving your cursor or using the
Arrow key, to move to the next cell to your right).
The Simple Linear Regression Model 29

D E 'F G H I J K
·
2 x_bar= J:<�delliatiory_delliatior (x_dev)(y !ex deviation
_ )2
J. y_bar-=
4
5.
& b2 =

7 b1 =

Below x_deviation we are going to compute and store the deviations of the x values from their
mean. Below y_deviation, we are going to compute and store the deviations of they values from
their mean. Below (x_dev)(y_dev), we are going to compute and store the products of the x

deviation and they deviation for each pair of values. Finally, below (x_deviation)2 we are going
to compute and store the x deviations squared.

To show the 2 of (x_deviation)2 as a square, place your cursor in J2, if it is not already in it.
Move to the Formula bar to select the 2, and select the arrow to the right comer of the Font
group of commands.

A Format cells dialog box pops up. Select Superscript and then OK.

�_nt_; _________, F �� nt _s cy
r � le_: __ �iz _e:_____,
r � r
Arial Regular 10

'It Calibri (Body) liM@I ""'


s,------i
':II' i\gency FB lt.alic 9
!!erian Bold
!
�. Bold rtnlic �"·i
':II' Mal Blad\ 12
'Ii' Mal Narrow 14

Underline : C.ol on
,,_N-on -e -------.,.�1 1 Automatic v I D 't!i.ormal font
.Effects.

I g��::�ut
Osul;i_saipt

This is a TrueType funt. The same fonh'lliTI be used on both y0ur printer.and your
ween.

OK� [ Cancel

In cells D6 and D7 proceed to format the 2 and 1 of b2 and b1 as Subscripts instead. Bold all
the labels you just typed, and Align Right the ones from G2:J2. Finally, resize the width of
columns G:J to accommodate the width of its labels (see Section 1.4 of this manual if you need
help on that).
30 Chapter 2

Now, your worksheet should look like this one:

l'1P'I D j E I F I G I H I I I J
2 )( bar=
- -
!<_:deviation ·y_devia1io11 (�_lfev'}()'�dev) 1(x�d'evi11tionf I
3 y_bar=
4
__§_
-
6 bl=
7 b1 = l " I

We have computed averages before. The formula you should have in cell E2 is
=AVERAGE(B3:B42), and the one in cell E3 is = AVERAGE(A3:A42). Compare the averages
you get to the sample means of Table 2.1 in Principles of Econometrics, 4e (p. 49); they should
be the same.
D I E I F I G I H I I I J
-1:_ x bar= 19_60475 1t _devfatfon l..Y. de
' viation lx dev)(y_d!ev) (1<_ deviati'onf-
_

-� y_bar= 283.5735
-
4
_j_
6 b:z=
-
7 b1 =

Next, we want to compute the deviations. Think about what you are trying to compute. And then
type the needed formulas in G3:J3.

You should type =B3 - E2 in cell G3, =A3 - E3 in cell H3, =G3*H3 in cell 13, and G23A2 in
cell J3. Here are the values you should get:

D I E I F I G I H I I I J I
2 x-bar= 19.60'475 x_deviation y_d'.eviation (x_�ev}{y_d:ey] (x_dE:Jviaticrnf
,__
J y_bar= :283.5.735-- -15_9 1 4 7 501 -16.8_353498 2679. 303845 253_2792692
,_
4
>--

2-
6 b2=
I-
7 b-1= I
Now, in cells G3 and H3, we gave cell references E2 and E3, where the averages are stored. Note
that we will need to use those averages again, and get those averages from these same exact
locations, to compute the deviations of the next 39 observations.

So, what we actually need to do is to transform these Relative cell references (E2 and E3) into
Absolute cell references ($E$2 and $E$3). This will allow us to copy the formula from G3:H3
down below without losing track of the fact that the values for the averages are stored in cells E2
and E3.

A Relative cell reference is made into an Absolute cell reference by preceding both the row and
column references by a dollar sign. Place your cursor back in cell G3 (i.e. move your mouse over
and left-click); in the Formula bar, place your cursor before the E and insert a dollar sign (press
the Shift-key and the $ key at the same time); move your cursor before the 2 and insert another
dollar sign; place your cursor at the end of the formula and press Enter.

� =B3}2 K )( ./ �I =B3-$@ 'X ./ fr =B3-$E$2l


The Simple Linear Regression Model 31

Go to cellH3, and add the needed dollar signs there too. Now, you can select G3:J3. Select
Copy on the Clipboard group of command. Select G4:J42, and select Paste (next to Copy). You
have just copied the formulas to compute the needed deviations for the rest of the (xi, Yi) pairs.

Your worksheet should look like this:

-
D I E I F G H I J 1
2
I--
x-bar= 1 9 60475
_
:C�d!.Y!a�t!C?:'l J�d!:'!l'!;t!�n.. J����Y1!�U!�'!.t Lx�d_e_v11!.'l�'!t.
y_bar 283.5735 : 15 9147501 �68 353498 2679-30�845 253.2792,692'

4
= - ,_

-15-214!501
- _

-147 5 935 03 2245-598261 231. 48861,91


- _

5 -164---233�03 2439J)476'41 '2:20 66 599

t
-14.8547501 _ 3

6 b'2= - 13 51475 01
_ -168_6135 221!8_886121 184.27363891
7 b1 = 7 13475005 -96.52349£3 681!.<6710199 50 .9'Q4,65 828-
-
- _

We have everything we need to finalize the computation of b1 and b2.

Place your cursor in cell E6, and again think about what you need to compute b2. Recall that the
least squares estimators are:
= L(Xi - .i)(yi - y)
b2 2 (2.1)
L(xi - x)

(2.2)

If you refer back to equation (2.1), you can see that =SUM(I3:142)/SUM(J3:J42) is the formula
you need in cell E6. The one you need in cell E7 is =E3 - E6*E2 for equation (2.2).

Your worksheet should look like this:

- - - - - - - - -
A I B I c I D I E I F I G H I I j
2 y x- x bm= 19-60475 x_deviation y_deviation lx_dev')(y_d�ev) 1(x_deviatio·nf
3 115.22 3!69 y_bar = 283.5735 -15,.9N7501 -1-68.3 53498 2679.303845 253.279269'2
4 135.98 4.39 -151-214 7501 -147 5935 03
_ 2245_5 98251 231-48861911
5

119.34 4.75·
--
-14.8.547501 -1'64.233503 243-9.64 7641 220•.-66�599
6- 114.96 6.031 �= 10.2096:4 ·-13.5747501 -168_6135 221! 8.8 86121 184.273838 9
7 187.05 12-47 ht= 83_41501 7 13475005 -9 6_ 5234%3 688:6710199 50 90465828
-
- _ _

In the table above we obtain the same exact least squares estimates as those reported on p. 53 of
Principles of Econometrics, 4e.

That was Method 1 of obtaining the least squares estimates of the intercept and slope parameters
/Ji and {32. For Method 2, we are going to use the Excel built-in regression analysis routine.

2.2.2 Using Excel Regression Analysis Routine

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
32 Chapter 2

If the Data Analysis tool does not appear on the ribbon, you need to load it first.

Select the Office Button in the upper left comer of your screen, Excel Options on the bottom of
the Office Button tasks panel, Add-Ins in the Excel Options dialog box, Excel Add-ins in the
Manage window at the bottom of the Excel Options dialog box, and then Go.

. ------
! Excel Options

Popular

Fcrmurlas

Proofin.!1

'iave

Advanced

Excel Optjam �X E!:it Excel I Manag:e: I Excel: Add-1ns

In the Add-Ins dialog box, check the box in front of Analysis ToolPak. Select OK.

!!dd-Ins available.:
1(8! .. ·-iiirlj
0 ·•· mmiiij \
,___K
D _ .I'...
-=---<"'P
I
O AnalysisTo dlPak - VB A

Now Data Analysis should be available on the Analysis group of commands. Select it.

A Data Analysis dialog box pops up. In it, select Regression (you might need to use the scroll up
and down bar to the right of the Analysis Tools window to find it), then select OK.

-
, Data An alysi s [1.JL.8]
�rna'lysis Tools
'HistIJgram
Movil]g Average
Random Number Gener.ation
Rank arnl Percentile
tfelP'
Re ESSIDn
Sampling
t-Test: Paired Two Sample filr Means
t-Test: Two·Sample Assuming Equal Variances
t-Test: Two-Sample Assuming Une:qual Variances
z-Ted:Two Sam�e for Means

The Regression dialog box that pops up next is very similar to the Edit Series box we
encountered before (see Section 2.1.1). Place your cursor in the Input Y Range window, and
select A3:A42 to specify they-values you are working with. Similarly, place your cursor in the
Input X Range window, and select B3:B42 to specify the x-values you are working with. Next,
place your cursor in the New Worksheet Ply window and type Regression-this is going to be
the name of the new worksheet where Excel regression analysis results are going to be stored.
Select OK.
The Simple Linear Regression Model 33

r - -

1 Re-gressfon l1J �
lilput
[nputJ_Range;

lnput:KJRange;
I :$A$.3:$A$42
I '$8$3::$13$42



1jelp
O!oabe:ls D ·Constant is fero
D Confidem:e Level: �%
0Ulp11I options.

Q .Quj;put Rllflge: �j
0 New l!'JQrkslieet:pJy� I Regre�sionl I
0 New �orlibook
Reslduals--
013.esiduals D Re:sigual P:lotE
Ostandardized Residuals D L!i:ie: RtPlots

Normal Prcliabihty
D·!'.!orrnal-'Probability Plots

The Summary Output that Excel just generated should be highlighted as shown below:

-.,-, A B c D I E I F I G H r J
1 SUMMARY OUTPUT
2
3 Regression Sraiistics
4 MultiplB- R 0.-0.204.85
.5 R Square 0_385D_Q2
& Adjusted F 0.368S·1-6
7 Standard :E .89.517
B Observat.io 40
9
iO ANOVA
11 df SS MS F :Qrr;ficarerc F
12 R1l9ressio1 1 190627 190627 23.78684 USE-OS
1 3 Residual 38 304505.2 8fil13_294
14 Total 39 4951'32.2
15
16 CoefficienManaa'«i E:m t·Sfat P-vaJi.Je l..ower 95% UpDer 95%.ower 95. OfJipper 95. 09�
1 7 lntemept 83_41ifiQ1 43_41orn B21.518 ()'_Qfi2182 -4.4•fi327 1712953 -4_46327 HL2953 -
18 X Variable 10.20964 2.G9326J· 4.87138:1 t.95.E-05· 5.972052 14.44723 5.972(}52 14.44723
19
20
21
•·
22 I L 1

Select the Home tab. In the Cells group of commands, select Format, and AutoFit Column
Width; this is an alternative to adjust the width of the selected columns to fit their contents.

=
,._ n�
rn · �
EB n Cclumn'Width ...
:;:
Autolt=ft CoEUlll'l'li'I Wi1dth.�
�ef,;ult Width ...
34 Chapter 2

Your worksheet should now look like this:

A I B c I
D I
E I F G H I
1 SUMMARY OUTPUT
f-
2
3 Re_qression S/:alisfics
4 Multi'f1leR O.S20485472
5 R $:quare oiSS001Z22�
6. Adjus1ed R Squ;;ire C1.:Jliea1 sos9
7 Stal'ld<ird Errnr 89.51700429
1i OhSe.l'Vatlon s 40
9
10 AN OVA
11 ,rJf SS MS F SianificaQce f
12 R"J:rr:e<ssicm 1 13062�.�788 190626.9788 23".7$884'1 Q7 1.94586E-05
13 Residual 38 )0450.5.1742 8013.294058
14 Tota.I 39 4951'32.153
15
16 Coefficients Slandani Error t Stat P-velue Lower95% Upoer95% LDwer 95.0%. Uppe.t 95. 0%
17 lntemef11 83.4""'16U0997 43.4"1016.1921.9215779�1 0.06,2182379 -4.46:1267721 11129.s2srr -4.4632&n21 1112952877
Hl X Variable 1 10.2095425 2.0!t3263461 4.BTTJB0554 1 .94586E-O� 5.97205221f2 14.4472328 5.972052202 14.447.2;328

The least squares estimates are given under the Coefficients column in the last table of the
Summary Output. The estimate for the Intercept coefficient or b1 is the first one; followed by
the estimate of the slope coefficient (X variable 1 coefficient) or b2. The summary output
contains many other items that we will learn about shortly. For now, notice that the number of
observations or pairs of values, 40, is given in cell BS.

A convenient way to report the values for b1 and b2 is to write out the equation of the estimated
regression line:
Yi = 83.42 + 10.21xi (2.3)

Now that we have the equation of our straight line, we would like to graph it. This is what we are
doing in the next section.

2.3 PLOTTING A SIMPLE REGRESSION

There are different ways to draw a regression line. One way is to plot two points and draw the
line that passes through those two points-this is the method we are going to use first. Another
way is plot many points, and then draw the line that passes through all those points-this is the
method that Excel uses in its built-in features we are going to look at next.

2.3.1 Using Two Points

When we draw a line by hand, on a piece of paper, using a pen and a ruler, we can use any two
points. We can extend our line between the points, as well as beyond the points, up and down, or
right and left. Excel does not use a ruler. Instead, it uses the coordinates of two points to draw a
line, and it draws the line only between them. So, to have Excel draw a line that spans over the
whole range of data we have, we need to choose those two points a little bit more strategically
than usual.
The Simple Linear Regression Model 35

If you look back at your scatter chart (Figure 2.6 worksheet) or back in your table (Data
worksheet), you can see that our x values range from about 0 to 35 (from 3.69 to 33.4 exactly).
So, we choose our first point to have an x value equal to 0, and our second point an x value of
35.

The point with an x value of zero is our y intercept. It is the point where the line crosses the
vertical axis. Its coordinates are x = 0 and y = b1 or (0, 83.42). This is our first point.

For our second point, we let x = 35; plug this x value in equation (2.3), and compute its
corresponding or predicted y value. We obtain:

y = 83.42 + 10.21(35) = 440.77 (2.4)

This is our second point, with coordinates (35, 440.77).

Go back to your Data worksheet (if you are not already there). In cell Ll, type Points to graph
regression line. In columns L and M we are going to record the coordinates of the two points we
are using to draw our regression line. In cell L2, type y; in cell M2, type x. In cell M3, type O; in
cell M4, type 35. In cell L3, we actually want to record the value for our y intercept or bi, which
we already have in cell E7. So, we are going to get it from there: in cell L3, type= E7, and press
Enter. In cell L4, we want to have the computed predicted y value from (2.4). So we type
=E7+E6*M4, and press Enter. Note that instead of typing all those cell references, you can just
move your cursor to the cells of interest as if you were actually getting the needed values-this is
a very good way to avoid typing errors. So, you would type the equal sign, move your cursor to
E7 and left-click to select it, type the plus sign, move your cursor to cell E6 and left-click to
select it, type the asterisk, move your cursor to sell M4 and left-click to select it, and finally press
Enter. Once you have done all of that, your worksheet should look like this:

L J M J N
1 P'oints fo graph regre.ssion line
2 y. x
,_ ..

j
83_41601 0
,_l_
4 440.7535 35

Note that the predicted y value we obtain in the worksheet for x = 35 is slightly different than
the one we just computed in equation (2.4) due to rounding number differences.

Now, go back to your Figure 2.6 worksheet. The data we have plotted on the chart represent one
set or series of data. The two new pairs of values we want to add to this chart represent a second
set or series of data.

Select the Design tab, then the Select data button from the Data group of commands.

Chart loCJh

D�sign C'.t Laveut Format


36 Chapter 2

In the Legend Entries (Series) window of the Select data source dialog box, select the Add
button.
,..- _ _;____ .

' S.elect Data Source

Chart i;!ata range:


The clala '""'ge is !Do comple� to be di'>Piayecil. lf.a new rar
ttie series in the-Series panel.

JP
Legend Entries §eries)

1, '� Add �[ li:? Edit ][ 'X 8,emo�e ],


Series!

Place your cursor in the Series X values window of the Edit series dialog box, and select
M3:M4 in the Data worksheet. Place your cursor in the Series Y values window (delete
whatever is in there), and select L3:L4 in the Data worksheet. Select OK.

· � dit Series

- rli �
Series.name:

[�] :deURlitl!JF
Series.� valLles:
=Dara1�$3-:.$M$4 � = 0, 35

=Daral�$3:!1L$� 00 = 8,'.H1600"997, 4..

GK Can(eJ

The Select data source dialog box reappears. A second data series, Series2, was created from the
selection you just specified. Select OK.

Legend Er.itries (S_erie�J

I \@Add II � Edit II X 8,emove I I�


Seriest
Series2

The two points from your new series are plotted on your chart (squares below):

:
.. .. ..

0
D .

"II>
.5 D
0
:!! Lil
.
.
"'
..
D
.. •
.., 0
., .
..,- ··. .
K .
.
" D . .
.
.. D . . .
.., "' . .
. .
.. . . . .
4! 0 . .

r
D .
J:-
""'
N
. .
. .
.

.. .
D . . .
� ;'; II
.

II
;=,.
D

0 5 JlO 15 20 25 SD 35 40

,._•weekly income in $100


: .. .. l


The Simple Linear Regression Model 37

Now, we need to draw a line across those two points. Go to the Layout tab. Change the Current
selection (group of command to the far left) to Series 2 (use the arrow down button to the right of
the window to make that selection). Select Format selection.

!series 2. 1. I SerHeS 2
i�
L� Fi;nma,t_SelectiCJ�

Chart roars
� Form<>t S:l'll: 'rtior:i
I � Rrset to Matcl'.I Sfyl<". � Resetto Match Style
[}esign �ayout ts Fmmat C:unenlS:ele'Cllon Current 5clection

A Format data series dialog box pops up. Select Line color and change its selection from No
line to Solid line. Select Close.

'"'
,11,!1.-ur.1 •lm.-�w"1...:<:J] line Color
-

�so'ii"cfl 1ile1
Series. Options.
0 r:-!o Line

Marker Optlons I
�;�di�tlne
Marker Fill 0 Ay_tomatk
I

(;_olor;
11
I

Uhe Color Close


t;d I [� T) �

The result is:

0
0
lD

.5 0
0
E lf"I
=
:t: 0
-g 0
..
<;!"
l
x 0
111 0

"Ill f"l
..
.s 0
0
z- IN
...
111
111 0

� 0
rl
II
::..
0

0 5 10 15 35 40

x �weekly inwmie in $1!00

Note that while you need only two points to be able to draw a straight line, you can use more than
two points. So we could have computed a predicted level of food expenditure for every level of
income we have in our original data set, and use the 40 (xi, .Ya pairs of values as our data Series
2. This is actually what Excel does when it adds a Linear Trend Line to a Scatter chart or a
Line of best Fit to Plots of data as part of the Regression Analysis routine.

We are going to delete the line and two points we just added to our graph and successively look at
these other two ways to plot our regression line.
38 Chapter 2

2.3.2 Using Excel Built-in Feature

In the Design tab, go back to the Data group of commands, and select the Select Data button. In
the Select Data Source dialog box, select Series2 and Remove. Finally select OK.

Select Data Source

Gflart !!!.�ta range:


The data nlnge is tpa mmplex l:o be di�pilayed. [fa new rnra
the 'Series in the .Se(ies pane'I,

J
r �S\'!']t:h Row/C<;;fumn

Chart Tool!

Design�- La��LJt form"t

To add a Linear Trend Line, select the Layout tab. Go to the Analysis group of commands,
select Trendline, and then Linear Trendline.

No.ne
Removes the <etecte-d Tr..r1dline OJ all
' Trendlines ili none are selerted
1 Lines UpiDmwn Error Uneatr Trend nne

Layout � Format
Bars·
i!>.n�lysis
Bar1 •
.Ad1'sfse1s a UneafTrendHne for the
�e-lected chart ser�e�
"'
"i I

Your chart should look like this (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):

-0
0
ID
...,.
.! 0
0
� If)
.
"
·" ..
0
.., D
" ...
w..
" a
OJ 0
.., m
0
.e ·O


.a
0
N

.,
OJ a
3 0
rl
II
i:--
·o

0 5 10 15 20 25 30 .35 40

x� weeklyiru:ome ini$1IDO

2.3.3 Using a Regression Option

You can also have Excel add the Line that best Fit your data by choosing that option on the
Regression dialog box.

Go back to your Data worksheet (bottom left comer of your screen).


The Simple Linear Regression Model 39

Select the Data tab, located in the middle of your tab list. Select Data Analysis on the Analysis
group of commands to the far right of the ribbon. Select Regression in the Data Analysis dialog

-----l1J (g]
box, and then OK.
- -
a _ _n a-l -.- ------
:' Da-t_ A ysi s
�alysis Tools
Covariance
Descriptive Sratisties
Exponential Smoothing
F'Test Two-Sample fur �ariances
Fouri:er Analysis I t[elp
Hi.s�ram
M.:wing Average
Random Number Gen�ation
Rank and Percentile
1..-F-
o _
r m_
u a�_ o_al!...._,
s ____ a t! Review
Analysis
Re ess1on V'

In the Regression dialog box, proceed as you did before, except this time, name your worksheet
Regression and Line, and check the box in front of Line Fit Plots. Select OK.

Output options.

0 QutputRange: �1
0 New Worksheet!ely: I Regression anci Line I
0 New W.orkbook
Residuals
D Residuals
D siandar.dized Re.siduals

In addition to the Summary Output you now have a Residual Output table and a Chart in your
new worksheet. The Residual Output table is only partially shown below, and shown after
AutoFitting the Column Width (see Section 2.2.2 for more details on that).

A I B I c X Variable l Line Fit Plot


22 RES.IDUAL OUTPUT J
I-
23 1000 ,----------

24 I ObseNation Predicted Y R.esii:Juals


> 50� l.,.�,•mm1,wmwmm1mm,
1,����,
2:5 1 121 _089590!t: -5-86% 8 9792' - - . - •Y
-- "' ,.._ N "' 6 r-i ui 0 0.11 'lD
128.-2363405·: lD <;t rn rn 00 rl ....
26 2 7 _743555458, 'l: N
ui
r--- rn "' lJ1 r-. ori 0 .... <T N ;-; • Predicte.d V
27 3 131.9'11 ilrni · -12.57181564 rl ..-< ..-< ..-< N N N N
,_._
28 4 144_9801542: -.30_ 02'()1 15524 lCVaria'bli" 1
.� 5 210.7302519' 23 . 6 8 024894
-

The Predicted Y or Yi values have been computed for all the original observed xi values,
similarly to the way we computed y for x = 35 (see Section 2.3.1).

The least squares Residuals are defined as

(2.5)

You can compare the Predicted Y and Residuals values reported in the Excel Residual Output
to the ones reported in Table 2.3 of Principles of Econometrics, 4e (p. 66). They should be the
same.
40 Chapter 2

2.3.4 Editing the Chart

Now, the chart needs a little bit of editing. For one it looks like it is a Column chart as opposed
to a Scatter one. The scales could be changed. Finally, Chart and Axis titles are not currently
very helpful.

Place your cursor anywhere in the Chart area, and left-click, so that Chart Tools are made
available to you again. Select the Design tab. Go to the far left group of commands, Type, and
select Change Chart Type. In the Change Chart Type dialog box, select X Y (Scatter) chart,
and then Scatters with only Markers. Finally, select OK.

,- -

Chamge Chart Type

Templates

lltill Co1umn

� Line

@ Pie

� Bar

Chi!rt 1oor.s
� Area

11:1 XY (Scatter) �I
-

The result is:

X Varj ab�e 1 Line Fit Plot


101)-0

.•'4
,.. so: .
I
0.
-

w
J!�
30 40 • �redicted V

X \I ariable 1

Now that we have the correct chart type, we would like to draw a line through all the Predicted Y
points. Actually, since we are using those points to draw our regression line, what we want to
show is only the line. So, we will use the points to draw the line, and then get rid of those big
square points. This way our chart won't be as busy.

On your chart, select the Predicted Y points with your cursor. Your cursor should turn into a fat
cross as shown below:

X Varfable 1 Line ! F it Plot X Variab e 1 Line Fit Plot


11000 ....
,.. I
S.001-
-::11 •
Seri " < "Pmllicted �· Poiot "26.610001 " 1
•Y

I (26.6100CJI, 5.0946()'71)
35
11 30 40 • Pr<edicted Y

XVariable 1 XVariab'le 1
The Simple Linear Regression Model 41

Right-click and select Format Data Series. A Format Data Series dialog box pops up. Select
Line Color and Solid line. Change the line color to something different from the Y points.
Select Marker Options, and change the Marker Type from Automatic to None. Select Close.

Qelete
� Reset to MQtch Stylle -
r --

Cha ni:i. �· ·seri es C�art T:£pe ... Line Color --


format Data Seri es
Formiilt Data S-eries
GfJi I :s:�lect lllata ... 0 t!_eline
J 3-D B_otation
Series Options

Marker Option.-
0
0
�olidline

§_r adientftne.
Series Options
Marker Options
Marker Type
Adlf: Data. La.Q�f>
0 A�toma1ic
0 A�toma1ic


Marker Flll
Adc!Trendline... Marker Fill
-

fmma.t Dat"' s .. ries ...


� - .;;ol11r;
Line Color �f�����
The result is:

X Varmable· 1 Line Fit Plot


10()0 �--------

,_ 50: I - I ' �11\


20 30

• v

-Predkted'!I

)( Va rfable 1

On your chart, select the Legend with your cursor, right-click and select Delete.

X Variable 1 Line Fit Plot I Qieol e�ieo



� Reset to M�tch Style


.1000

1- ,J'' t\;1
A Eont...
,_ 500
Clilange Cnart TYJ:H' ...
0
0 10 20 30 40 � :S:�lect Data ...

3-n _E'.nt;;ilon
ICVaria'ble !I.
� Eor_mat Legen.a...

Change the Chart and Axis titles as you see fit. Below, we show you how you can change the
Chart title. You can follow a similar process to change the Axis titles.

Place your cursor in the title area and left click.

X Variable 11line Fit Plot


L>-----1Charlr T.itle,_______...,,
1000

; )- 5-00

0
I ..

HJ 20 30 40

XVariable 1
42 Chapter 2

Select the generic title.

G------------ -------------_i;i
l X rVariahle ll Line Fit Plot l
woo
lch>rtTIle;
� - -1T ------------ - ------0

> 500

0 I
0
..... I

10
' ••, ...
·�=··�. !
30
.
40

X Varlab'le l

Type in your new title.

You can select any of the titles and change the Font size by going back to the Home tab. Select
what you need on the Font group of commands.

Calit>ri (Body) • 110 �A• A�]


lej I !1 �l�l � - .A ·!
Fnnll r,

You can reformat the y-axis (and/or the x-axis) by selecting it with your cursor, right-clicking and
selecting Format Axis.
Q.elete

.a R�s·efto Ml!tch, S:tyle


.Figure,:2'.8 The fitted regression A Eont. ..
-

ai Chan:,ge Chart T)lpe...

..
� SS:lect Data ...

:3-0 ll_oh !Or>


'

Add Mi110• Gridltn�s


40
Fu rm at .l!!!.ajor.-Gridli ne s ...
w� Wl'e.lily in.oome in.$100
I& Eu<fm<1tAxi1 ...
[J:_

If you proceed as you did before to edit your vertical axis (see Section 2.1.2a), you should obtain
the following:
'Figure2.8 The frttedl.regres:<ion

To resize the whole Chart area, put your cursor over its lower border until it turns into a double
cross arrow as shown below.


The Simple Linear Regression Model 43

Left click, and it should turn into a skinny cross.

Hold it, and drag it down until you are satisfied with the way your chart looks.

Figure 2.8 The f"itted regression


0
0
"'
...,..
.5 0
a
� U'l
::I
-� 0
"Cl a
.. <t
Ii
x 0
Ill a

"Cl ro
0
Ji! a
0
? N
-""
Ill
0
�" a
..;·

;:a.
a

0 5 10 15 20 25 3() 35 40

11 =wee kly lnoome il'I :$10-0

You can delete the Gridlines by first selecting them, right-clicking and then selecting Delete.

Figu:re 2.g The fitted regre!l-S'icm

,. D
II D
"Cl rn
0
_Qelete
.s
1!-
O•
0 � -
....
Ill
N
� Re5i't to M;!hh- �tyle
II 0

"
0
.--i oll Change- Cha.rt Type ...
;:.. LE@i S.tledi Data...
0
� 3-D _Batat1!ln ...
0 :m 20 40
� furm af Grl d l i n, e-s ...
JI= weeklyinoome iru$10lD
Forma.t Axls...

You can also reformat the Data Series Y by selecting the points, right-clicking and selecting
Format Data Series. Then proceed as you did before to change your markers' options (see
Section 2. l .2c).
44 Chapter 2

Figure 2. B The fJUed regresskm


0
0
\.Cl

.5 0
a
f lI'I
=
:1:1
.. 0
I: 0
...
8.
:.:
Ill
0
.a
Qe>let�
.. m
Cl
.s 0
� Reset to M�WI Sty�e,
0
1=" "" Change Seri:es Ch;utT�pe...
...
Qil

� �
0
0 Sgl e ct Da.ta ...
.-i
ll
>- 3-D B.ol:al1on
CJ

Acfd Data La.!?_els


I{) 10 2() 30 40
Acfd Trl'"ndl.lne ...
weekh! ilil·oome in $!1.00
I�
.>e=
Emm;;;it Data :Seuies ...
� I

Your result might be (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):

Figure 2 .8 The fTttedl regre:ssion


0
0
lD
"91-
.5 0
0
!! U1 .
=
:I: \
0
"Cll
I: 0 . '
<T .
8.
>o: 0
II 0
,, fY1
Cl

:f
.2 0
0
"'
. . . .
.. .
II
0

Ii
0
.,.,
II
...
0

0 10 20 30 4'()

x= we•eklv :in.tome in $100

In this next section we illustrate the concept of unbiased estimators.

2.4 EXPECTED VALUES OF b1 AND b2

To show that under the assumptions of the simple linear regression model, E(b1) = {31 and
E(b2) = {32, we first put ourselves in a situation where we know our population and regression
parameters (i.e. we know the truth). We then use the least squares regression technique to unveil
the truth (which we already know). This allows us to check on the validity of the least squares
regression technique, and specifically to check on the unbiasedness of the least squares
estimators.
The Simple Linear Regression Model 45

2.4.1 Model Assumptions

First, let us restate the assumptions of the simple linear regression model (see p. 45 of Principles
ofEconometrics, 4e):

• The mean value of y, for each value of x, is given by the linear regression function:

E(ylx) = f31 + f32x (2.6)

• For each value of x, the values of y are distributed about their mean value, following
probability distributions that all have the same variance:

var(ylx) = a2 (2.7)

• The sample values of y are all uncorrelated and have zero covariance, implying that there
is no linear association among them:

(2.8)

• The variable x is not random and must take at least two different values.

• (optional) The values of y are normally distributed about their mean for each value of x:

y -N[({31 + {32x), a2] (2.9)

In the specific and simplified case we are considering in this section, half of our hypothetical
population of three person households has a weekly income of $1000 (x = 10), and half of it has
a weekly income of $2000 (x = 20). Because we are all mighty, we know the values of our
population parameters, and consequently the values of our regression parameters. Let µylx=lO =
200, µylx=ZO = 300, and var(ylx = 10) = var(ylx = 20) = a2 = 2500. This implies
{31 = 100 and {32 = 10.

The probability distribution functions of weekly food expenditure, y, given an income level
x = 10 and an income level x = 20, are assumed to be Normal. They look like this:

- t(vl�=10J
-t(vlx=20)
46 Chapter 2

The linear relationship between weekly food expenditure and weekly income looks like the
following:
lJ

300

200

() 10 20

Let us emphasize the difference between this section and Chapter 2 in Principles of
Econometrics, 4e. In this section, we do know the truth. In other words, we have information
regarding weekly food expenditure and weekly food income on all three person households that
constitute our population. In Chapter 2 of Principles of Econometrics, 4e, like it is the case in
real-life, you do not have that population information. You must thus rely solely on your random
sample information to make inferences about your population.

Now, as an exercise, and as a way to prove the unbiasedness of the least squares estimators, we
are going to use the least square regression technique to unveil the truth.

Insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the bottom of
your screen (or Press the Shift and Fl 1 keys). Name it Simulation.

Simu lation�'

We are going to draw a random sample of 40 households from our population. Half of the sample
is drawn from the first type of households, with weekly income x = 10; and half of the sample is
drawn from the second type of households, with weekly income x = 20.

Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and Right-Align it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.
The Simple Linear Regression Model 47

A A
1 20
2 20
3 10 20
4 10 20
5 10 20
6 10 20
7 1Q 20
8 10 20
9 rn 20
1.0 10 20
11 10 io
12 10 33 20
13 10 34 20
14 10 35 20
15 10 36 20
16 10 37 20
17 10 38 20
1.8 10 39 20
19 10 40 20
20 10 41 20
21 10 42

2.4.2 Random Number Generation

We use the Random Number Generation analysis tool to draw our random sample of
households. We keep record of their weekly food expenditure in column B of our Simulation
worksheet: type y in Bl, and Right-Align it.

I A I B II
1 J x y

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

Anal1111sc

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.

�alysi,.Tools
f-Test Two-Sample
Fowrier Analysis
Histogram
Movi�verag_e
for \/ariances
,� [
l c:�
DMfti@Miii.ffil§§·@M·'·!· I tfelp I
Rank and Per c:entile
Regression
Sampling
t-Test: PairedT,..C>Sample for<Means
1YI
t-Test: Two-Sample Assuming Equal Variances vi

A Random Number Generation dialog box pops up. Since we are drawing one random sample,
we specify 1 in the Number of Variables window. We first draw a random samples of 20 from
48 Chapter 2

households with weekly income of x = 10, so we specify the Number of Random Numbers to
be 20. For simplicity we assumed that our population of households has weekly food expenditure
that is normally distributed, so this is the distribution we choose. Once you have selected Normal
in the Distribution window, you will be able to specify its Parameters: for x = 10, its Mean is
µylx=io = 200 and its Standard deviation is .Jvar(ylx = 10) = a = 50. Select the Output
Range in the Output options section, and specify it to be B2:B21 in your Simulation worksheet.
Finally, select OK.

'R�lldom Number Ge ner�ti o_


n ___
ffr[g]
Number. of�ariables:
1
� 1 _ ____I_. �
rllumbff of Random Numt!ers: �lzo
____ �I �
'Qisrnbu·tion: �'
N
o _r m_aJ_
' -----"
� ' [ tielp J
Parameters

M!::,an=

Standard deviatior;i = �

�dom S eed;

Output options

0 Quljxit Range;
0 'New Worksheet.�ly:·
0 New Wodcbook

Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:B41.

ParametErs
QutpLlt options

-
M�an=
� I e Qulµ!Jlt'R<lnge:. 1$8$22;$6$41 �

Here is the random sample that we obtained. NOTE: you will obtain a different random sample,
due to the nature of random sampling.
The Simple Linear Regression Model 49

A B A B
1 :x y 22- :m. ·214.6751
2 HJ 122.490&' 23 20 336.57.85
3 11() 163.1711 24 20 303.5467
4 11() 211.0i02 .25 20 .216.4365'
5 10 294.12.95· 26 20 358.9562.
6 10 192.9407 27 20 278.1513

1 1IQ 228.56.27 2& 20 257.9295


8 10 223.1013 291 20 33.1.23.85
'9 1!0 184.7241. 30 20 328.9643
11() 10 164.82·
-
67 31 20 .297.1585.
11 10 125.1754 32 20 338.727
12 10 274.037 33 20 297.34.23
13 10 136.920'1 34 20 201.38'94
14 llO 190.4468 35 20 309.4635
15 11() 121.6272. 36 20 305.@2.
1·6 10 202.8224 37
-
20 334.5588:'
17 10 123.4H 3& 20 2&6.24(12

l8 10 116.1414 39 20 273.67.85'

1'J 10 209.413.; 40 20 318.1071


20 11() 152.0113' 41 20 .2&3.9447
21 llO 200.4915 42

2.4.3 The LINEST Function

Next, we use the LINEST function to obtain the least squares estimates for the intercept and
slope parameters, based on the random sample we just drew. The LINEST function is an
alternative to using the Least Squares Estimators' Formulas (see Section 2.2.1) or the Excel
Regression Analysis Routine (see Section 2.2.2). It allows us to quickly get the least squares
estimates for the intercept and slope parameters. For this purpose, the general syntax of the
LINEST function is as follows:
= LINEST(y's, x's)

The first argument of the LINEST function specifies the y values, and the second argument
specifies the x values, the least squares estimates are based on. In our case, we thus need to
specify:
= LINEST(B2:B41,A2:A41)

The LINEST function creates a table where it stores the least squares estimates in Excel memory.
It first reports the slope coefficient estimate, and then the intercept coefficient estimate. So, if we
were to look into Excel memory, the estimates would be reported as shown below:

column 1 column 2
rowl

We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. In the case of a table with only one
row, the INDEX function general syntax is as follows:

= INDEX(table of results, column_num)


50 Chapter 2

The first argument of the INDEX function specifies which table to get the results from. In our
case, this is the table of results generated by the LINEST function above. So, we replace "table of
results" by "LINEST(B2:B41,A2:A41)". The second argument indicates from which column of
the table to retrieve the result of interest to us. So, if we want to retrieve the estimate of the
intercept coefficient, b1, from the table above, we would indicate that it can be found in column 2
by replacing "column_num" by "2".

We are going to report our estimated coefficients at the bottom of our table. In cell A43, type bl
=; in cell A44, type b2 =. Bold those labels. In cell B43 and B44, type the following equations,
respectively:
A B
43 bl= =INDEX(LINEST(B2:B41,A2:A41),2)
44 b2= =INDEX(LINEST(B2:B41,A2:A41),l)

Here are the estimates that we get:


A I B· I
43, b1= 67.•64114
-

44 b2= 11.47325

The estimates of the intercept and slope coefficients are based on one random sample. Our
random sample is different than yours, and each random sample yields different estimates, which
may or may not be close to the true parameter values. The property of unbiasedness is about the
average values of b1 and b2 if many samples of the same size are drawn from the same
population. In the next section, we are thus going to repeat our sampling and least squares
estimation exercise.

2.4.4 Repeated Sampling

Note that in Chapter 2 of Principles


of Econometrics, 4e, the repeated samples given to you were
randomly collected from a population with unknown parameters. In this section, we draw our
samples from a population with known parameters.

Go back to the Random Number Generation dialog box. We would like to draw 9 additional
random samples, so we specify 9 in the Number of Variables window. Again, we first draw
random samples of 20 from households with weekly income of x = 10, so we specify the
Number of Random Numbers to be 20. We also select Normal in the Distribution window,
and specify its Parameters. For x = 10, its Mean is µylx=lO = 200 and its Standard Deviation
is .Jvar(ylx = 10) = a= 50. Specify the Output Range to be C2:K21. Finally, select OK.
The Simple Linear Regression Model 51

. �ndom Number G eneratio111 -_- �� -


Number of'{�riables:
lg �
Number of RandomNum�ers: J�20----� �
�---�

Qistribulion: jNarmal !:Jelp

Parameters-

M�an=

::i_t:and"rrl dev.ialion = �

8_andom Seed:

Outp;Jt op lions
@ QutputRa'J9e: �$2:$C$21

0 New Worksheet f'.ly:


0 NewWorl\bcok

Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to C22:K41.

Parameter.s.

I�
Output apfons
QutputR,ange:

Next, before we copy the formula to get our coefficient estimates, we need to transform their
Relative cell references A2:A41 into Absolute cell references $A$2:$A$41, since we will be
using the same x-values for our next 9 rounds of least squares estimates.

I b I :INDEX(UNESlil B2: B41, A 2:A41},2}�I "'INDEX( LIN EST( B2.:B41, $A$2:$A$41},2}


1'
lie I =INDEX(UNEST(B2:B41,.A2:A41),1) r :,: '.'fr j =INDEX(UNBli'(B2:B41,$A$2:$A$41),1)

Copy the formulas from B43:B44 into C43:K44. In cells L43:L44 compute the AVERAGEs of
your estimates from your 10 samples. In cell L43, you should have =AVERAGE(B43:K43); in
cell L44, you should have =AVERAGE(B44:K44). The estimates and average values that we get
for our 10 samples are:

A I B I c I D I E I F I G I H I I I I I I<'. I l

43 bl: 67.64114 65.92893 110.0?45 50.41892. 102.9383 12.7. 2p �6 68.025{)8 30.43498 132..2953 75.4688 89.14425
--·
. -.
44 . b2: 11.4732.6 12.2687 S:.813-088 11.73885 10.11185 8.61•69 11.5.521 10.8758 8.048971 11.33003 10.48296

If we took the averages of estimates from many samples, these averages would approach the true
parameter values {31 and {32. To show you that this is the case, we repeated the exercise again.
Here are the average values of b1 and b2 that we did get as we increased the number of samples
from 10, to 100, and finally to 1000:

Number of samples 10 100 1000 Parameter Values


Average value of b1 89.14425 98.44593 99.48067 100
Average value of b2 10.48296 10.08958 10.04135 10
52 Chapter 2

The next section of this chapter is very short. It points out how you can compute an estimate of
the variances and covariance of the least squares estimators b1 and b2 using Excel. It also outlines
other numbers you can recognize in the Excel summary output. Note that for this section we are
getting back to our food expenditure and income data of Sections 2.1-2.3, i.e. data from one
sample of 40 households that was drawn from a population with unknown parameters.

2.5 VARIANCES AND COVARIANCE OF b1 AND b2

You can compute an estimate of the variances and covariance of the least squares estimators
b1 and b2, the same way you computed b1 and b2. Consider their algebraic expressions (see
below or p. 65 of Principles of Econometrics, 4e), and perform the simple arithmetic operations
needed. You might want to do that as an exercise; you will be able to check on your work by
comparing your estimates to the one reported on pp. 66-67 of Principles ofEconometrics, 4e.

Estimates of the variances and covariance of the least squares estimators b1 and b2 are given by:

(2.10)

(2.11)

(2.12)

where: N is the total number of pairs of values,

2 L -2
and 8 = _!J_ is an estimate of the error variance, (2.13)
N-K

where: K is the number of regression parameters, K = 2,

and ei = Yi - Yi = Yi - b1 - b2Xi are the least squares residuals.

The square roots of the estimated variances are the standard errors of b1 and b2. They are denoted
as se(b1) and se(b2).
(2.14)

Excel regression routine does not automatically generate estimates of the variances and
covariance of the least squares estimators b1 and b2, but it does compute the standard errors of b1
and b2, as well as other intermediary results.
The Simple Linear Regression Model 53

Specifically, the following estimates can be found in the Excel Summary Output you generated
earlier:

Sum of Squared Residuals (SS Residual) in C13

Mean Square Residual (MS Residual) inD13

B: Standard Error of the Regression inB7

Standard Errors of Intercept and X Variable 1 in C17:C18

A I B I c I D I E I F G I H I I
� SUMMARY OUTPUT

JI RefJ.e
I Ssfon Statistic:s:
4 Mul tii:>le :R 0_620485472
c-5 - R Sqllaie 0 _385002221.
� �djus!erl R Square 0.368818059
7 Stan.dard Error 89_51700429
8 Observations 40

e-fo-IANOVA
i! I __
dt SS MS f Sig_niflc1111ce f
'
J1_ f3egr·ession 1· 190 G2'
- &.9788 190626_9'788 .23-78884107 1 . 94585E-()5
-

13 Residual 38 304�05.1742 8 0 13294 OJ 8"


4961 32 .153'
f-.,..
14 Total 39
1fl
,__1s. I -Coefficients Slandard Error I Stal P-v,a.lue Lower95% l.Jooer 95% Lo�·er 950% Uooer95.0%
HH�nter>:ept .BJ .41600997 43.41016192 1_921-5n-9'51 0_0621 B23 !9' -4 .463267721 171-2%2877 -4-453-2'.67721 171.2952877
X V.ariable.1 1·0J'!096425 2:093263461 J
4_87i BO 5-54 1.94586E-05 6.9720522a2 14.4472328 5_97.2'05220:2 14".44 72328

Note that :L if, the Sum of Squared Residuals (SS Residual), is also referred to as the Sum of
Squared Errors - hence the abbreviation SSE used in p. 51 of Principles ofEconometrics, 4e.

2.6 NONLINEAR RELATIONSHIPS

2.6.1 A Quadratic Model

2.6.la Estimating the Model

Open the Excel file hr. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it pr data, and in it, copy the data set you just opened.

Sim ulat'lon -t;J JS smuli!tion


I lnmtWorkshett (Stlift+ fl.1] t,

This data set contains data on 1080 houses sold in Baton Rouge, LA during mid-2005, which we
are using to estimate the following quadratic model for house prices:

(2.15)
54 Chapter 2

In your br data worksheet, insert a column to the right of the sqft column B (see Section 1.4 for
more details on how to do that). In your new cells Cl:C2, enter the following column label and
formula.
c
1 sq ff
2 =B2J\2

Copy the content of cells C2 to cells C3:C1081. Here is how your table should look (only the
first five values are shown below):

A I B I c
1 pric� sq!f sqft2
-
2 6:6500 741 549'081

3 56000 741 549·081
4 68500 790 624100
1 02000
-

5 2783 7745089
6 -
54000 11165 1357225

In theRegression dialog box, the Input Y Range should be A2:A1081, and the Input X Range
should be C2:C1081. Select New Worksheet Ply and name it Quadratic Model. Finally select
OK.
i Regress.io n ��
Input
I $A 2': $As1os [�

.

Input)'. Range: 1

Input! Range: I $C:$2: $c$1os 1



tielp
Obabei� D Constant is ;:;er ci
D Con�dence LeYel:. �%
Qulpui optio�
0 Qulpu± Range: �1
0 New Worksheet�y: J Quadratic ll'\odel I

The result is (matching the one reported on p. 70 in Principles ofEconometrics, 4e):

I
A B I c D I E I F I G I H I I
1
_,_

SUMMARY O'UTPUT
,_
2
3 Rec:ire-ssion Stab-.Slics
4. Multiple R U32075415
5 R .S�uare 0.&92349497
� Adju�.!e<:I R. Sq':J_ar.e
--

OJj.920£4107
.1__1
Standard Error 68205: 74032
8 Observations 1080
9
10 AN OVA
111 (jf SS MS F Stg_nif.lcar1cr;: F
12 Regression 1 1.1286Et13 1.12B6Et13 2425.976064 3.3748E-278
13 Residual -- 1078 5.0150JE+12 465�21594.26 �

fotal 10>79 1.&3011'E;-1-13


f-7--
14
---
15
16. Cooffl.Gfrmts Standard Enor t Stat P-v.alve Lower·95% Upper95% Lower95.0% Upver 95. 0% 1
*I I nterce]}t 5577'6.565-64 289'0.-44!213 192969()357 1.67487E-71 50105.0373 G144B.ml398. 50105,()373 61448. Q9.3-98
x va'�i abte 1 cLo1s421301 3j748E-27B o.oriso69s4 o .11·11fo3 5'Mi a. 014.806954 0 .Oc16035'.648
- . -

Q_OOQ31"3095 49.2'5419844
The Simple Linear Regression Model 55

2.6.Jb Scatter ofData and Fitted Quadratic Relationship

Go back to your br data worksheet and select A2:B1081. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.

Scatter

(olu1r1n Urie Ppe B-ar Are•

The result is:

9000

8000

700!()

6000

5000

4000 • S.erie'l=l

3001[)

2000

10{)0

IQ 50:0000 1()00000. fiOOOOO 2000000

You can see that our house price values are on the horizontal axis and square footage values are
on the vertical axis; we would like to change that around and edit our chart as we did in Section
2.1 with our plot of food expenditure data. The result is (see also Figure 2.14 on p. 70 in
Principles ofEconometrics, 4e):

150000{)

<I>
.5
100000<0
·�ll.
$
::I .
0 . ..
:c soon no .
·. '
. ':..··�·.��· . ....:· .
' •I II

0 2000 4000 6000 800Q

Total Square feet

Finally, we add the fitted quadratic relationship to our scatter plot. In cells Nl:N2 and 01:03 of
your br data worksheet , enter the following column label and formula.
56 Chapter 2

N 0
1 quadratic price-hat sq ft
2 ='Quadratic Model'!$B$17+'Quadratic Model'!$B$18*'br data'!02 0
3 400

Select cells 02:03, move your cursor to the lower right corner of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell 022: Excel recognizes
the series and automatically completes it for you. Next, copy the content of cell N2 to cells
N3:N22. Here is how your table should look (only the first five values are shown below):

N I 0

_1__ quadrnti_c_ pri ce-.ltat :sqft


• 0 I __1_ 55776.56564 0
1 sllft -3 5B243.9B67 400
2
3 40[ --
-
4
5
6.5646.19·855
·n9.a3 _2397
95z55 n913
800
1200
1600
41 _§__ __

Go back to your scatter plot and right-click in the middle of your chart area. Select Select Data.
In the Legend Entries (Series) window of the Select Data Source dialog box, select the Add
button. In the Series name window, type Fitted Quadratic Relationship. Select 02:022 for the
Series X values and select N2:N22 for the Series Y values. Finally, select OK. The Fitted
Quadratic Relationship series has been added to your graph.

' Select Dat<t So1m;e --- ----


t Series
Chart Qat3 raoge' c= Series �ame:

Qelete The data range is too comieJex I


t:Ae seriez in the Series.parlel.
[ Fitted Quadratic Relationshlp
m =f
l!J Reset tQ M�t<h, Styr.e Series� values:

Change Chart T�pe ... �r [I] =C

I� ·��l�ct D�ta... 1:€- Legend :Entries <S_e:rles}


-'
1 J -C' Rolanon I 12t' Edit [ ='br dara'! $N$Z::$NS22
� foorlblt IPlol Area... .Serlest

Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re­
appears, select OK again.

-------
elect IJata Source
'
. - -
Chart Q.ara range: !==: Edit Seri es

The data·range is·toocooiplex t sene.s �ame:


.the series in the-Seiii es panel
._IA_ctu_11a.:... __,[i] _.,i�

�p
______

Series ;ii;: valueg:


[ =b'_ r_d_taa _·, _�_$'._
� 2 :� :$- 1_os_a
__ __ �[i} = 7•

Serie�;( values:
=br data'!$A'$2:-$A�108 l �

Llit;J
Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
The Simple Linear Regression Model 57

Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.

�� [i] �
Chart Tool! Chart Axrs leg.enp'j Data Data
Title· Titles� • [:?Labels• Table
Design
Labels

Finally, we want to reformat our Fitted Quadratic Relationship values series. Select the plotted
series in your chart area, right-click and select Format Data Series. A Format Data Series
dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.

Qelete
� Reset to M.!!_!ch S.tyle

-
Change S:eries Cl'lallt T�!J·e ... I Format Oata s�ries Line Color --

Format D'ill ta S.e ries


tiilJ S_gl ert ID'ata ,,, 0 !::!oline
Marker Options
Ser'ie s Option•
3-DRo o:uon 0 ·l:iolid line Series.Opficns Marker Type
Marker Options· 0 (l:r adiEnt ,fine
Add D'ata La)leP>
0 Aytomatic
Marker Opllons[i Q Al!_tor:natc
Marker Fill
Add li!fndline .. , Marker Fill
f.om1at Data Se-ri�s ...
c;;olor: I�-� Line Color

The result is (see also Figure 2.14 on p. 70 in Principles ofEconometrics, 4e):

• •
1500000

. ..
1h
.5! - Fitte.d Quadratic
1000000 Relationship
-�
... •


..
0
x SOOOGO

• •

D'
a 2000 4000 6000 8000

Total Square Feet

2.6.2 A Log-Linear Model

2. 6.2a Histograms of PRICE and ln(PRICE)


2
In your br data worksheet, insert a column to the right of the sqft column C (see Section 1.4 for
more details on how to do that). In your new cells Dl:D2, enter the following column label and
formula.
58 Chapter 2

D
1 ln(price)
2 =ln(A2)

Copy the content of cells D2 to cells D3:Dl081. Here is how your table should look (only the
first five values are shown below):

A I B I c I D J
I-
1 price sqft sqtt2 lnlpric:e)
i 6&500 741 5490�1 1UCi496
3 6600-0 741 549081 11.09741
,_
4 68500 79n &24100 11-13459
I-
5 102000 2:183 7745089 11.53273
>---
� 54 000 1165 1357225' 10.89674

Next, we specify BIN values. These values will determine the range of PRICE and ln(PRICE)
values for each column of the histogram. The bin values have to be given in ascending order.
Starting with the lowest bin value, a PRICE or ln(PRICE) value will be counted in a particular
bin if it is equal to or less than the bin value.

In cells Sl:T3 of your br data worksheet , enter the following column labels and data.

s T
1 price bin lnprice bin
2 0 9
3 50000 9.2

Select cells S2:S3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell S34: Excel recognizes
the series and automatically completes it for you. Similarly, select cells T2:T3, move your cursor
to the lower right comer of your selection until it turns into a skinny cross; left-click, hold it and
drag it down to cell T29. Here is how your table should look (only the first five values are shown
below):
s. I T
, price bin I npric-e bin
-
2 0 9
s J T I --
3 50000 9.2
1 ori&e bin 1 lnnrioe IJ.in
T 1()000.0 9_4
c--2
3 500 0a 2
3
-9.2+
9:1
-
5 150000 9.6
, ' . ' 6 200000 9.8

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

F1Jrmulas

The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
The Simple Linear Regression Model 59

� - _-_ -:--

; D11ta Analys i s [1J rg)


e,nal�sis Tools
Covariance
Descriptive Sta tis tics
E.xponential Smoothing

=I
F-TestTwo-5ample for Variances
!:!!elp
Fouirier Analysis
w 1sto ram
MC1ving Average
Random N�mber Generation
Rarik and Perceritile
Regression vi

An Histogram dialog box pops up. For the Input Range, specify A2:A1081; for the Bin Range,
specify S2:S34. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Price Histogram; check the box next to Chart Output. Finally, select OK.

1' Hi-s- _______________ CIJCgJ


to _g_r.a_m

Input

[nput R;m�e: 1$A$2;:$A$lll81


�Jr;iRange:

Output options

0 Qutput RQnge:
@New Workshe.etBJy: I Price Hismgr.im
I
0 Ner/11 '8'_arkbm1k
D Pgreto (s(!)rted histogram)
Ooumulative Percentage

�l��·�r.t·90_tiJUt]

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.

�r - -

Format D11ta s.eries


Ll)rE)
Q•l�tt@ I Serles Options j Series. Options
Fill Series Qverlap
� Re�et to M!tch S!yle•

01ang;e Series CElartT11pe... Bi;ir.der Color Separated r------C)- O�erlapped

� s�nect oata ... Border Styles.



3 DE.oti!tlon Shadow
Gap�dth
Add Data Laget:s

Add Trendline ...


J-0 Format
NoGap tQ;: Large GapI
� format Oil.ta, Series ... [!� _J
Go to the Border Color tab and select Solid line, choose a different Color if you would like.
Select Close.
60 Chapter 2

--
'F·�rmat Data Seri e$ -(1)�
S"erfe. s Options
Border Color
0 !:!oline
� 8'Jrder Colmr J 0 :i_olldh

Border Styles
0 �radlent line
0 A!!_tomatic
Shado'll'

3-0
�oler: �

Format
Iransp (Co l<>r) Q"----- �1 Clo

After editing our chart as we did in Section 2.1 with our plot of food expenditure data, the result
is (see also Figure 2.16(a) on p. 72 in Principles ofEconometrics, 4e):

450

400

350

...
"
300

"
250
Ill
"
...
...�
200

150

100

50

0 50-0000 1000000 150DOOO

Sale Prfce, dollars

Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.16(a) of Principles ofEconometrics, 4e are relative ones.

Go back to your br data worksheet. In the Histogram dialog box, specify D2:D1081 for the
Input Range and T2:T29 for the Bin Range. Check the New Worksheet Ply option and name it
lnPrice Histogram; check the box next to Chart Output. Finally, select OK.

,· Histognm LIJ"�
[nput
lnput RaiJge: 1$()$2:$0$1081 �
!l.in Range: I $T$2�n�9 �
tielp
D�abe1s

Output options
0 QiJlplJtRiOnge: I �I
® New Worksheet Ely:: ItnPcice Hisfugram I
0 New \!!!_orkbook
D P!!!eto (sorted histogram)
D Cumulative Pern:er'll:age
� !;;_h,.rt Output

The final result is (see also Figure 2.16(b) on p. 72 in Principles ofEconometrics, 4e):
The Simple Linear Regression Model 61

25()

200-

;,..
"
c::
15()
al
"
1!:11'
� 100
...

50·

"' -<:" 00 "" w rl . ..,. 00 "" .., "' ... 00 "' "'
a'i ai 0 c:i ·rl ,,...j ..... ..:; ...; rl ,.,; ,,.; <i
rl ·rl ·rl rl rl .,.., ·rl rl rl
0

lnPrloe

Again, note that the frequencies given in the graph above are absolute ones, while the frequencies
given in Figure 2.16(b) of Principles ofEconometrics, 4e are relative ones.

2. 6.2b Estimating the Model

We estimate the following log-linear model for house prices:

ln(PRICE) = y1 + y2SQFT + e (2.16)

In the Regression dialog box, the Input Y Range should be D2:D1081, and the Input X Range
should be B2:B1081. Select New Worksheet Ply and name it Log-Linear Model Finally select
OK.

1' R�----------ITJ@
Input
InputY. Range:

Input'� Range:
I $0:�::$051081
I sssz: : ��1oa 1
[fil
[�J
� el

!:ielp
Dtoabels. D !Coo stant is ;;'.ero
D Confidence Level: EJ %
Output opb"onSo
0 Qurtput·Rarige� rii J
e New Worksheet E:IY': I Log-linear Model I

The result is (matching the one reported on p. 72 in Principles ofEconometrics, 4e):


62 Chapter 2

-.Hs'UMMAR:YA ouTPm I
h-1·
B I
}
C. I D I E I F G H I I

3 I Reg_ress1on Stalislics I
,_i_ _Mulliple R 0-79·(}4.13619
.-3 R Square 0.624753·&89
� A·djusted R s.q�are 0.6.24405594
l Standard Error
ti Observations.
0.'.3'2:1465013
108-0

10 AN'OVA I
11 I I df SS MS F Sig_nificc11nG"! F 1
1 2 R·egressiun
13 Residual
i 1
1078
1·85.4720974
111.4002553
185.4720-_9'74
0 .103339'75 4
1794-779738 t1066E-231

J4 Total 1079 296.8723527


15·
16 .GoeffiGienfo Standard Error l Sfaf P-V'alue lower95% Ue_eer 95% Lower 95.0% U!!J!.er95.0%
H Intercept ·10.8385%32 �-024fi0_7484: . 440.45�3�2 I} 1 ();_79031232 10Jl86680'3 1 - 10.790·31232 10�8868:8031
m )( var.i abl e 1 0.0004
· 112.6·9 9'.7D779E-06 42.364840:8.2 1.1066E-231 0•.000392221 0:000430}1 T O_ O'OQ.J9'2'2i1 o.odo4'3oj11

2. 6.2c Scatter ofData and Fitted Log-Linear Relationship

In cells Ql:Q2 of your br data worksheet, enter the following column label and formula.

Q
1 log-linear price-hat
2 =EXP('Log-Linear Model'!$B$17+'Log-Linear Model'!$B$18*'br data'!P2)

Next, copy the content of cells Q2 to cells Q3:Q22. Here is how your table should look (only the
first five values are shown below):
=
.. Q
.Llog-linear p.rice hat
_L_ 50949.81045
-3 6006Qi.27135
4 70799.7%17
5- 83459.681 BJ
-6-
9B383.3t279

Select your scatter plot of actual data points and fitted quadratic relationship and make a copy of
it. Right-click in the middle of the copy of your chart. Select Select Data. In the Legend Entries
(Series) window of the Select Data Source dialog box, select the Fitted Quadratic
Relationship series, and then the Edit button. In the Series name window, replace the old name
by Fitted Log-Linear Relationship. Select P2:P22 for the Series X values and select Q2:Q22
for the Series Y values. Finally, select OK, twice. The Fitted Log-Linear Relationship series
has been added to your graph.
, ------
Select Data Source
,... - --
chart Qata ranl'Je: ·c= ' Edit 5-erit'S
The data range is lo1:1 compi_ex t Series name:
:the. S..ries in the Series pan el .
I ='M�d Log�inear Rela1ionship" m =Fi'
I
Delefe

IJ Reset to M�tch Style Ic-=1 Series� values:

1 Chang� Chart Tl!Jil� ... Legend Entries ffiertes) I =br data'!$1'$2:$P$22 �


LEl!3J = o,
IUiJ S.�lf'tl Data... .[i Series Y values:

13-D RQ.tiltmn [�] = 5�


-�

,_ forma� Plot Area . . .


The Simple Linear Regression Model 63

The result is (see also Figure 2.17 on p. 73 in Principles ofEconometrics, 4e):

• •
1500000 • Actual

••
-4Jl­
�= - fittce d Qi.iail n1t(c
.� 100000-0 R'e J.atlon:>hip
&: •

�..
0
;c 500000

0 2000 4000 6000 8000

Tata! Square fi!et

2.7 REGRESSION WITH INDICATOR VARIABLES

2.7.1 Histograms of House Prices

Open the Excel file utown. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it utown data, and in it, copy the data set you just opened.

I utown data ''ti:..


I
I

This data file contains a sample of 1000 observations on house prices in two neighborhoods. One
neighborhood is near a major university and called University Town. Another similar
neighborhood, called Golden Oaks, is a few miles away from the university.

In cells Hl:H3 of your utown data worksheet, enter the following column label and data.

H
1 bin
2 125
3 137.5

Select cells H2:H3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell H20. Here is how your
table should look (only the first five values are shown below):
64 Chapter 2

H
1 bin
I-
2- 125
H .I f-
:j 137.5
1 bin
2 12� 4·- 15.Qr

137_�1
,_
5 162_5
3 ,_
' I ,5 175

In the Histogram dialog box, specify A2:A482 for the Input Range and H2:H20 for the Bin
Range. Check the New Worksheet Ply option and name it Golden Oaks Prices Histogram;
check the box next to Chart Output. Finally, select OK.

I H istogram rn�
Jnput
!npwt Range: $!.$2:$<\$482: li3 rn;: 1£]
!:!in Range·:
cancel ]
$H$:2:')H� �
O�abels t:Jelp J
Output 01:rtions
0 Qutput �nge:
0 NeVY Worksheet �ly.; J Oaks Prices. Histogram J
0 Nelil Workbo;;ok
0 P,grero (SQl"ted hisilogram)
0 Cu!!!ulanve Percentage
� Chart Output

The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):

90
80

70

60
t"
"
ill 50
"'
'Ir"

40
ILL

30

20
10

0
125 :1!50 175 201J .225 .25() 275 300 325 350

House. Pril:es [$1,000�ln Goldem Oaks

Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.18 of Principles ofEconometrics, 4e are relative ones.

Go back to your utown data worksheet. In the Histogram dialog box, specify A483:A1001 for
the Input Range and H2:H20 for the Bin Range. Check the New Worksheet Ply option and
name it U Town Prices Histogram; check the box next to Chart Output. Finally, select OK.
The Simple Linear Regression Model 65

. -

i Histogram � t8]
Input

lflput Range: ISA$'18.J:$A$WOl [�l


I $H�:$H$2D �l

Output �.P btms


0 Qutput:Range: ,�I
® New Vllorl;sheet!"_ly; I U Town I
Prices.Histngrar

0 :New illorkboo\_
D 'P�reto (scf rted histogram)'
D Cumulative Percentage
� Q.hartOutput

The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):

90

80

70

50
e-
i= 50

u..
40
30

20

10

() -t--.--.--i-.,_
125 150 ]75 200 225 250 275 300 325 350

House Prices ($!1.,000) in Univercsity Towrn

2. 7 .2 Estimating the Model

We estimate the following regression model for house prices

PRICE= {J1 + {J2UTOWN + e (2.17)


The indicator variable is

UTOWN = {� house is in University Town


house is in Golden Oaks
(2.18)

Go back to your utown data worksheet.

In the Regression dialog box, the Input Y Range should be A2:A1001, and the Input X Range
should be D2:D1001. Select New Worksheet Ply and name it Indicator Variable Model.
Finally select OK.
66 Chapter 2

OK
Input)'Jl.�e: 1
Cancel
Irlpuf� Range.: si:J$2:$Dsioo1

t!elp
Dkabeis. 0 Cons'tant is f:er'o
D Confldence Level:

Output �ptions

0 Qutput Range:
___ _____ _
___
_
l1] �
_11
g r_e-_s51-_0
� R-e-0 _ _ _

New Workstiee� �y: [ndicator Variable Mode

lnput �
I SA$2: $A, 5WO [ifil I
The result is (matching the one reported on p. I 75 in Principles ofI Econometrics,
I 4e):
[�l
l J
SUMMARY OUTPUT 6=) %
Statfslir::s
MultipleR 0.728744479, �1
Adjusted R Square 0.53106851&. I I
Standard Ermr 28.90745008
Obser\l:alions 1000

A.NOVA
F
A I 8 I 944476.7536'
c I 94447D6.7536 I 11.30.242684
E 2.'64F
79E-1<&�.I G I H I I
1 I 83 3969.3888 835. 640670>1
f---
2
14 I H78446.143:
3 l Reg_ressrofi'
Coe.fficienfo Standard Error I stal P-value Lowre 95% Lower95.0%
,__!_
L R_Sq�are 215.7324947 131.806625S 1163.673481"2. 213.145S956 213.1459'956 218.J18993�


7
X Vari.a11Jre 1 0 ._53 0598645 ,
61,.5091066'&: 1.829589113· 38.6190.8214
I-
2.'6479E-166 57.9188238 65.0fHr3:89-51 57_9188238 6.5.D9938951

f---
8
..

9j
This
1 a ends Chapter 2 of this manual.
f1 I
I
You might want to save your work before you close shop.
df SS MS
I

F Sif!.n'lfic11nce
J? r�r.essi ar:i 1
J.3 Re�si<iual 9SS. ·

Total 999',
15 1
�Intercept
16 "1
,____

r-
0
Urper95%
21 8.J.189939
Uopw-95. 0%
CHAPTER 3

Interval Estimation and Hypothesis


Testing

CHAPTER OUTLINE
3.1 Interval Estimation 3.2 Hypothesis Tests
3.1.1 The t-Distribution 3.2.1 One-Tail Tests with Alternative "Greater
3.1.1a The t-Distribution versus Normal Than"(>)
Distribution 3.2.2 One-Tail Tests with Alternative "Less
3.1.1b t-Critical Values and Interval Than"(<)
Estimates 3.2.3 Two-Tail Tests with Alternative "Not
3.1.1c Percentile Values Equal To" (;t)
3.1.1d TINV Function 3.3 Examples of Hypothesis Tests
3.1.1e Appendix E: Table 2 in POE 3.3.1 Right-Tail Tests
3.1.2 Obtaining Interval Estimates 3.3.1a One-Tail Test of Significance
3.1.3 An Illustration 3.3.1b One-Tail Test of an Economic
3.1.3a Using the Interval Estimator Hypothesis
Formula 3.3.2 Left-Tail Tests
3.1.3b Excel Regression Default Output 3.3.3 Two-Tail Tests
3.1.3c Excel Regression Confidence Level 3.3.3a Two-Tail Test of an Economic
Option Hypothesis
3.1.4 The Repeated Sampling Context 3.3.3b Two-Tail Test of Significance
(Advanced Material) 3.4 The p-Value
3.1.4a Model Assumptions 3.4.1 The p-Value Rule
3.1.4b Repeated Random Sampling 3.4.1a Definition of p-Value
3.1.4c The LINEST Function Revisited 3.4.1 b Justification for the p-Value Rule
3.1.4d The Simulation Template 3.4.2 The TDIST Function
3.1.4e The IF Function 3.4.3 Examples of Hypothesis Tests Revisited
3.1.4f The OR Function 3.4.3a Right-Tail Test from Section 3.3.1b
3.1.4g The COUNTIF Function 3.4.3b Left-Tail Test from Section 3.3.2
3.4.3c Two-Tail Test from Section 3.3.3a
3.4.3d Two-Tail Test from Section 3.3.3b

67
68 Chapter 3

In this chapter we will use the t-distribution to construct interval estimates and perform
hypothesis tests. We continue to work with the simple linear regression model of weekly food
expenditure.

3.1 INTERVAL ESTIMATION

Open the Excel file food. Save it as POE Chapter 3.

Rename Sheet 1 Data. Quickly re-estimate the regression parameters using Excel regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Regression; you do not need to check the box next to Line Fit Plots.

3.1.1 The t-Distribution

3.1.la The I-Distribution versus Normal Distribution

The t-distribution is a bell-shaped curve centered and symmetric around its mean, equal to zero. It
looks like the standard normal distribution, except it is more spread out, with a larger variance
and thicker tails. The exact shape of the t-distribution is controlled by a single parameter called
the degrees of freedom, often abbreviated as df The notation tern) is used to specify a t­
distribution with m degrees of freedom.

Below is a graph of the t-distribution with m = 3 degrees of freedom and the standard normal
distribution.

D.40 l-"""===;;;;
" ;;;;
;; ;;:='
;; :=::==�k===.;:=:1 - - - N(0.1)
--
fl'3.\

D.1QI===:::

[)_2{)1-""""==�

n.on ------=-----------...__
_ ._____._....__
.. ___

-6 -2 0 6
Interval Estimation and Hypothesis Testing 69

3.1.lb t-Critical Values and Interval Estimates

In order to construct interval estimates, we will need critical values of I-distributions with various
degrees of freedom. The abbreviation used for a critical value is tc. The values -tc and tc are the
endpoints of a closed interval around zero such that the probability of drawing a I-value in this
interval is (1 - a), and the probability is a that a value is either less than -tc or greater than tc.
Since the distribution is symmetric, the probability that a I-value is less than -tc is (a/2), and
the probability that a I-value is greater than tc is (a/2).

We are usually interested in the critical value tc such that the probability that a randomly drawnt­
value is within the closed interval [-tc, tc] is 0.95 or 0.99, which means that the probability of a
value outside the interval, in the tails of the distribution, is only 0.05 or 0.01.

Let a 0.05. This leads to a closed interval [-tc, tc] such that
= the probability is (1 - a) =

(1 - 0.05) 0.95 of randomly drawing at-value in this interval.


=

/(!)

3.1.lc Percentile Values

Since the probability is(a/2) that at-value is greater than tc, this also means that the probability
of drawing a t-value less than or equal to tc is (1 - a/2). The critical value tc is the 100(1 -
a/2) percentile of the I-distribution, denoted tci-a/Z,m)·

3.1.ld TINV Function

We will use the TINV function to computet-critical values. First, we create a new worksheet and
table where we will store our computations.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the Data tab. Name it t-critical value.

14 4 � •I I Rfiljression r 'om '


....t;J rs.. ' � f . •1 I Regression • Data1 J t-ril'.itic.al value { ti!1
Re;ady j llmert W�1.kshe!rt: (S.hift�filJ I L--.../ f.fea lf.y I I

Select cell Al. Select the Insert tab located next the Home tab. In the Text group of commands
select Symbol. In the Symbol dialog box, the Symbols tab should be open. Select a (you might
70 Chapter 3
need to use the scroll bar to move up and down the window and find this symbol). Finally, select
Insert.
- ------

5ymlbol

�)llTlbds Sll,eda[ characters

E_ont: l(normill text)


ti;]
"(�id: Head ..r WordArt s:f.gnatu re Oojo:d Symbol
& footer
� Gfn13 I v I c I
Box Gne •
Insert
Te'lll!

Fill in the rest as shown below:


I A I s:
1 ll = 0•.05
,__ .1
2 m= --
3!.l
3
tc=

t-critical values are obtained in Excel by using the TINV function. The syntax of the TINV
function is as follows:
=TINV(a, m)

To find the t-critical value for a= 0.05 (the combined probability in two-tails) and m= 38,
given the way we organized our table above, we need to write the following formula in B3:

131 �TINV l, B2) �


Here is the t-critical value that you should get:

- -

I- 3 le=
A I B
2.0243�3'4
I

Although we could have directly enter the TINV function, =TINV(0.05,


a and m values into the
38), we chose instead to refer to the cells where we have stored and displayed those values.
Displaying the values of the function's arguments makes our worksheet much easier to read and
understand. In addition, we can compute a new t-critical value by changing one or both
arguments' values.

In cell Bl, change a from 0.05 to 0.01. Here is how your table should look like:

A I B I
I-
1 :tt=·
l 0.0·1
2 m= 36
I-
3 tc =
2.711556
,___

For a= 0.01, holding m constant, the t-critical value is 2.711558.


Interval Estimation and Hypothesis Testing 71

3.1.le Appendix E: Table 2 in POE

Alternatively, we could have gotten those t-critical values from Table 2 at the end of Principles of
Econometrics, 4e. Recall that the critical value tc is also the 100(1 - a/2)th percentile of the t­
distribution, denoted tci-a/Z,m)· For a= 0.05 and m= 38, the critical value tc is the 100(1 -
a/2)= 100(1 - 0.05/2)= 100(1 - 0.025)= 97.5 or 97.5th percentile of the t-distribution,
denoted tc.975,38). At the intersection of the column labeled "tc.975,df)" and the row "38" degrees
of freedom (dj), tc= 2.024.

For a= 0.01, holding m constant, the critical value tc is the 100(1 - a/2)= 100(1 -
0.01/2)= 100(1 - 0.005)= 99.5 or 99.5th percentile of the t-distribution, t(.995,38). Its value
is found at the intersection of the column labeled "tc.955,df)" and the row "38" degrees of
freedom (dj): tc= 2.712. Those t-critical values are slightly different from the ones we obtained
in Excel due to rounding in Table 2.

3.1.2 Obtaining Interval Estimates

The interval estimator of f3k is defined as:

(3.1)

The interval bk± tcse(bk) has probability (1 - a) of containing the true but unknown parameter
f3k· When using data, we say that we have a 100(1 - a)o/o interval estimate or 100(1 - a)o/o
confidence interval.

We are usually interested in constructing either a 95% or a 99% confidence interval, so the
corresponding a values that we would use to get our t-critical values are a= 0.05, and a= 0.01.

To obtain the interval estimates, we use equation (3.1) and replace the least squares estimators bk,
the critical t-value tc, and the standard errors of bk's, se(bk), by their estimated values. The
lower limit (LL) and the upper limit (UL) of the interval will be:

(3.2)

(3.3)

3.1.3 An Illustration

In this section, we will first illustrate how to obtain an interval estimate by plugging values into
the interval estimator's formula. Next, we will go back to the Excel regression analysis tool and
look at the output we already have generated, as well as look at the built-in option available to
generate additional interval estimates.

3.1.3a Using the Interval Estimator Formula

We create a template to compute the interval estimates for the least squares regression parameters
of the food expenditure model.
72 Chapter 3

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the t-critical value tab. Name it Interval Estimate.

11� �
I Re.ady

I
�1 I Reo rassTcrn , Dciti J t-cri.Hcal v;J1ue /15:'1;{1
"
i
� ! Rewe�ion r" Data , t-aitiGJI Vi3lue ] . J Estim.ilte < "'t:J
Interva _A
L--(

Create the following template to construct interval estimates:

A B c
1 Data Input Sample Size= =Regression!B8
2 Confidence Level=
3 Estimated bk= =Regression!B18
4 Standard Error of bk= =Regression!C18
5
6 Computed Values a= =l-C2
7 df or m= =Cl-2
8 tc = =TINV(C6,C7)
9
10 Interval Estimate Lower Limit= =C3-C8*C4
11 Upper Limit= =C3+C8*C4

Note that we get the sample size, estimated coefficient and standard error from our Regression
worksheet. All you have to do in cells Cl and C3:C4 is, first, type the equal sign, and then, go
select the needed value in the Regression worksheet with your cursor. Finally, press Enter. We
are computing the interval estimate for {32, the slope parameter. Cell C2 is left blank for now.
Later, you will enter either 95 or 99 depending on whether you are constructing a 95% or a 99%
confidence interval, but you could also enter any other confidence level. In cell C6, the a level
will be computed based on the level of confidence entered in C2. In cell C7, the degrees of
freedom are set equal to N - 2, where N is the sample size, which we record in cell Cl. Cell C8
is where the critical t-value is computed, as shown in Section 3.1.ld. Cells ClO-Cll are where
the limits of the interval estimate are computed, using equations (3.2) and (3.3).

Before we specify our level of confidence, we would like to reformat C2 so that the level of
confidence can be displayed as a percentage. In cell C2, right-click, and select Format Cells on
the tasks panel that opens up. In the Format Cells dialog box, select Percentage in the Category
window, choose 0 decimal place (use the up and down arrows for that, to the right of the Decimal
places window). Finally, select OK.

Reformat cell C6 the same way.


Interval Estimation and Hypothesis Testing 73

f"
- - • -

Format Cell� ll] �


Number Alignment Fant Border Fill 'Prntecliofil

'Category:
General 1 Sample.
Number
Currenc:y
Accounting
Da'te
Q.edmal places: [ii�
.)I; Cut Time

-
lii@i@.ir.l•i-
J;;opy Fraction
f'aste Scientific
Text
Past' �pec�ar ... Special
Custom
Insert .. .
Q�let� .. .
Cle<ir Content�

Fflt�.r
S.Qrt

Percentage formats multiply the cell value by 100 and displays the result with a.percent
� symbol.
� ,Eorrmat Cells...
.

I
·
Pie.ls From Drop-dovin U.s.t...

N.ame aB,ar:<ge ...


'---o_K_.t;J I Cancel

Here are the results you should get for a 95% confidence interval estimate for {32 (make sure you
type 95, and not 0.95, in C2):

A B c
1 Data. Input Sample Size=
2 Confide.nee Level =

3 Estimated ��=
4 Standard Error of [ii:=
5
6 Comput·edl Values (]'=

7 dfafm=
B �=
9
10 Interval Estimate L·ower Limit =

11 Upper Limit·=

The lower limit and upper limit of the interval estimate above should be the same as those
reported on p. 98 of Principles ofEconometrics, 4e.

We plugged values in equation (3.1), and built a template, to obtain interval estimates. Next, we
will go to our Regression worksheet and look at the interval estimates Excel has already
generated in the regression summary output.

3.1.3h Excel Regression Default Output

Go to your Regression worksheet, and look at the last table of the summary output. Columns F
and G of that table present the lower limits and upper limits of the interval estimates for the
intercept and slope parameters, {31 and {32 (shaded cells below). Excel regression analysis routine
automatically generates the 95% confidence interval estimates.
74 Chapter 3

In cell F18, you can find the lower limit of the interval estimate for {32. In cell G18, you can find
the upper limit of the interval estimate for {32. Those values are identical to the ones you
computed in your Interval Estimate worksheet.

111'1 A B I c () I E I F I G H I I
1 SUMMARY OUTPUT
2
3 I Regression Stalistics
J.
i
+I Mwltip·le R 0_&2:04�5472
R�g,uam 038500•2221
Adjusted R Square 0.36.8818069
Tl standard-Error 89_51700429 I
-a-I Otiservations 40

foi ANOVA
11 1 df SS MS F S'E.nific.ance F
�Regressi on - 1 190626.!1788 190626_97BB 2'];_78884107 1 _94586E-05
Re·siduaJ 38 304505.174Z 8013.2'.94058 I
Total 39 495132-153
15
16 I Coefficients SlanciardEmo.r 1 Stat P'·VB/tJe -Lower95% eUpper95% Low�r 95 0% UeE,er 95 0% ,
1�11'!tercept 83_416.00997 43.4101 &19>2'. l 9>2'15 77(!51 0•.052182379 -·L46�26.n21 '171-2952&!77 -4_463267721, 171 .2%2S77
1. 8 )( Variable 1 10.2a.95425 2_0932534&1 4.877380554 1 : 94 586E·05 5-972.052202 14.4(1.72328 5_972052202· 14.4472328

Excel actually reported the interval estimate for {32 twice: in cells F18:G18, and again in cells
H18:118. The table is set so that, if you choose to, Excel will be able to report confidence interval
estimates, other than the 95% one.

3.1.3c Excel Regression Confidence Level Option

Go back to your Data worksheet. From there, select the Data tab, the Data Analysis button in
the Analysis group of commands, and Regression in the Analysis Tools window. In the
Regression dialog box, check the box next to Confidence Level and type in 99. Select New
Worksheet Ply and name it Regression and 99% CI (for Confidence Interval). Select OK.

Input
Input;yRange:

!jelp
D �abels D Constant Is �ero
� Coniidli'.nce Lev.eJ: EJ %
Output options
0 Q1JtptJt Range: �1
@ New Worksheet �ly:
0 New !!/.orkbook
R.esiduals
0 8.esiduia!. 0 Residual Plols
0 Standardizi=d Ri=siduals D L[fli= Fit Phlls.
Normal PHlbabllity
0 �ormal Probability Plots

Alongside the 95% interval estimates, Excel now has also generated 99% interval estimates for
{31 and {32 (cells H16:118, shaded below):
Interval Estimation and Hypothesis Testing 75

I 8 c E F G H
TT$UMMARY A
OUTPUT
I I D I I I I I

I""fl
��
Rearession Slatk;tirxr
�4- Multiple R
§qu a:re
� Adjastet1 R Sgaare
0_620485472
0_385002221
0 _358818069·
+

------'- l
,_]_ Standard Error
8 0 bservafons 401
89. 517004291
-

;01ANOVA
11 1 df SS MS F Sig_aificance F
�i Regression
�y 1 1906-2:6_9'788 190&26_978ll 2'3-78884107 1 _94!i86E-O!i
1.3 Resi-dual 3a- ,'304505.1742 8013.294050
14 Total 391 495132.1-53
t5 1
1·5 I Goefflcierrts Slane/a.rd Error _!Stal P-va/ue Lower95% Upper95%. lowef-99.0% Uppei99 0% I
��
>-1 Intercept 8 3-4Hi00997 43_4101&192 1.92'15779'51 0·_062'182379 4.463267721 171.2952&T'7 -:n4,29314438 201.1251'643
1 6 )C Variable 1 10.. 2095425- 2_ 0932534•61 4_ B 773805-54 1 _�4586E-05 5_912052202 14-4472328 4-fi336(3 8051! 15"88564�341

The interpretation of confidence intervals requires a great deal of care. The true meaning of being
95% or 99% confident about our interval estimates is that, if we were to repeat this exercise of
drawing a sample size of N = 40, estimate the least regression parameters, and construct interval
estimates for those regression parameters, many more times, then 95% or 99% of all the interval
estimates constructed this way would contain the true parameters' values. To illustrate this
concept we are going back to our simulation exercise of Section 2.4.4.

3.1.4 The Repeated Sampling Context (Advanced Material)

In Section 2.4.4 we drew many random samples of size N = 40, and, based on each, estimated
the corresponding least squares regression parameters. We can repeat this exercise and extend it
to compute, for each sample, not only least squares estimates, but interval estimates as well.

Note that in Section 3.1.4 of Principles of Econometrics, 4e, 10 samples were randomly drawn
from a population with unknown parameters, while in this section we will draw 100 samples from
a population with known parameters.

3.J.4a Model Assumptions

In the simulation exercise we are considering in this section, half of our hypothetical population
of three person households has a weekly income of $1000 (x 10), and half of it has a weekly =

income of $2000 (x = 20). Because we know the data generation process, we know the values of
population parameters for the normal distribution, and consequently the values of our regression
parameters. Let µylx=io = 200, µylx=zo = 300, and var(ylx = 10) = var(ylx = 20) = a2 =
2500. This implies {31 = 100 and {32 = 10.

3.J.4b Repeated Random Sampling

We will draw random samples of 40 households from our population. Half of each sample will be
drawn from the first type of households, with weekly income x = 10; and half of each sample
will be drawn from the second type of households, with weekly income x = 20.

First, insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the
bottom of your screen, next to the Interval Estimate tab. Name it Simulation.
76 Chapter 3

t-crit!Cal v.alue J Jnterval I .--\


Estimate/� 1� t-uJtical v;ilue ; lnteMJ Erunilte J S"mrnlation ,,.ti Al
I In.sertWo rk�heet (Slnfft-fllJI L--,1
�============�����
I

Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and Right-Align it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.

20
20
3 20
4 20
5 20
6 20
7 20
-B 20
9 20
10 20
11 20
12 20
13 20
14 20
15 20
16 20
17 20
18 20
-19 20
2:0 20
21

Next, use the Random Number Generation analysis tool to draw 100 random samples of
households.

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

li'crrrnula; , Data{� R"viie-w


AnalJ!i�

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.

Data Analysis l1.JIBJ


Analysis ToC>ls
OK
F-TestTl'..a-Bample for V<1rian�·" �I
Fourier Analysis. Cam:el
Histogram
Mo vi1'!J�v era_g�
IMMttllfAA•@ii.!.I,
fi tieJp
Q
�1
Rank. and Percentile
Regression
Sampling
t-Test: Paired Two Sample for Meams
t-Te�t: Two-S�mple Ass1:Jmin9 Equ:al 'l�riances �I

A Random Number Generation dialog box pops up. Since we are drawing 100 random
samples, we specify 100 in the Number of Variables window. We first draw random samples of
Interval Estimation and Hypothesis Testing 77

20 from households with weekly income of x= 10, so we specify the Number of Random
Numbers to be 20. For simplicity we assumed that our population of households is normally
distributed, so this is the distribution we choose. Once you have selected Normal in the
Distribution window, you will be able to specify its Parameters: for x= 10, its Mean is
µylx=lO = 200 and its Standard Deviation is �var(ylx = 10) = a= 50. Select the Output
Range in the Output options section, and specify it to be B2:CW21. Finally, select OK.

Ni.amber of!(ariables: 1�10_0 ___�


OK

Number of'Random Num!i.ers.;


l._20
____ _.
Cancel

Qls.trib\Jtioo: !::!elp

Pari'lllleters

M�an=

�dard deviation = �

B..amlom .Seed::

Outµ.A opti()flS
@ Quiput Range:
0 New Worksheet Ely:
0 New W.orilbook

Repeat to draw a random sample of 20 from households with weekly income of x= 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:CW41.

Parameters
Output options
@ Qutput Range;

3.1.4c The LINEST Function Revisited

This time we use the LINEST function to obtain the least squares estimates and their standard
errors. The LINEST function can compute the latter, if you ask it to return additional regression
statistics. For this purpose, the general syntax of the LINEST function is as follows:

= LINEST(y's, x's, , TRUE)

The first argument of LINEST function specifies the y values; the second argument specifies the
x values; we ignore the third argument by putting a space between the second and third commas;
and the fourth argument, TRUE, indicates that we would like LINEST to return additional
regression statistics.

The LINEST function creates a table where it stores the least squares and standard errors
estimates in Excel memory. The following illustration shows the order in which they are reported:
78 Chapter 3

column 1 column 2
row 1 bz b1
row 2 se(b2) se(b1)

We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. The INDEX function general
syntax is as follows:
= INDEX(table of results, row_num, column_num)

The first argument of the INDEX function specifies which table to get the results from. The
second argument and third argument indicate the intersection of a row and a column at which the
result of interest can be found.

The nested commands will thus be as follows:

b1: =INDEX(LINEST(y-values,x-values,,TRUE),1,2)
se (b1): =INDEX(LINEST(y-values,x-values,,TRUE),2,2)
b2: =INDEX(LINEST(y-values,x-values,,TRUE),1,1)
se (b2): =INDEX(LINEST(y-values,x-values,,TRUE),2,1)

3.1.4d The Simulation Template

We will report our estimated coefficients and standard errors at the bottom of our table of random
samples. We will also compute our !-critical value and limits of our interval estimates (Lower
Limit: LL and Upper Limit: UL). Finally, we would like to count how many of our 100 interval
estimates contain the true parameters' values.

We will specify cells A42:B57 as shown below (we outlined some cells in different shades of
gray only to distinguish groups of similar or related cells which we comment on shortly):

A B
42 N= 40
43 a= 0.05
44 m= =B42-2
45 tc= =TINV(B43,B44)
46 b1= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,2)
47 se(b 1 )= =INDEX(LINEST(B2:B4l,$A$2:$A$4l,,TRUE),2,2)
48 LL= =B46-$B$45*B47
49 UL= =B46+$B$45*B47
50 fhin CI =IF(OR(lOO<B48,lOO>B49),"No", "Yes")
51 Yes' =COUNTIF(B50:CW50, "Yes")
52 b2= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,1)
53 se(b2 )= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),2,1)
54 LL= =B52-$B$45*B53
55 UL= =B52+$B$45*B53
56 lh in CI =IF(OR(lO<B54,lO>B55),"No", "Yes")
57 Yes' =COUNTIF(B56:CW56, "Yes")
Interval Estimation and Hypothesis Testing 79

In cells A42:B43, the N (sample size) and a values are specified so that m (degrees of freedom)
and tc (t-critical value) can be computed and reported in cell A44:B45. tc is computed as shown
in Section 3.1.ld.

Cells A46:B47 and A52:B53 are used to report and compute coefficient and standard error
estimates, as explained in Section 3.l.4c. The cell references to the x values are in Absolute
format, $A$2:$A$41, as opposed to Relative format, as we will be using the same x values for
all 100 repetitions.

Cells A48:B49 and A54:B55 are used to report and compute interval estimates, as explained in
Section 3.1.2. The value for tc will be the same over all repetitions; its cell reference is thus in
Absolute format, $B$45, in the formulas of the intervals limits.

3.1.4e The IF Function

We make use of the IF and OR logical functions to indicate, for each interval estimate, whether
or not it contains the true parameter value. The general syntax for the IF function is as follows:

IF(logical_test,value if true,value_if_false)
_ _

Logical_test is any value or expression that can be evaluated to be TRUE or FALSE. In this
exercise we want to determine whether or not the true parameter value, pk, is within the estimated
interval [LL, UL], where LL =bk - tcse(bk) and UL =bk+ tcse(bk)· The logical expression
we use is: if pk < LL or pk > UL. If pk is outside [LL, UL], then this expression is TRUE.
Otherwise, the expression is FALSE.

Value_if_true is the value that is returned iflogical_test is TRUE. For example, if this argument
is the text string "No" and the logical_test argument is TRUE, then the IF function displays the
text "No".

Value_if_false is the value that is returned if logical_test is FALSE. For example, if this
argument is the text string "Yes," and the logical_test argument is FALSE, then the IF function
displays the text "Yes".

3.1.4/ The OR Function

We use the OR function to write our logical test. The general syntax of the OR function is as
follows:
OR(argument_1,argument 2) _

If the first logical expression, argument_!, or the second logical expression, argument_2, is
TRUE, then the OR function returns TRUE. It returns FALSE only if both arguments are
FALSE.

The general syntax for the OR function, nested in the IF function, is:

IF(OR( argument_1,argument_2),value_if true,value_if false)


_ _
80 Chapter 3

Applied to our exercise, the nested function looks like this (which is what we have in cellB56):

IF(OR(pk <LL, pk> UL),"No","Yes")

If flk is outside
[LL, UL], then the logical test flk <LL or flk > UL is TRUE, and "No" is
returned to indicate that flk is not in the estimated confidence interval. Otherwise, the logical
expression is FALSE, and "Yes" is returned to indicate that flk is in the estimated confidence
interval.

3.J.4g The COUNTIF Function

Finally, we use the COUNTIF function to count the number of times flk is found within the
estimated interval [LL, UL].

The COUNTIF function is a statistical function that counts the number of cells within a range
that meet a given criteria. Its general syntax is:

COUNTIF(cell_range,criteria)

Cell_Range is one or more cells to count. Criteria is the number, expression, cell reference, or
text that defines which cells will be counted. Since we are interested in counting how many
interval estimates, among all the ones we will construct, actually contain the true parameter value,
we will count the "Yes" that are generated following the application of our logical test (this is
what we do in cellB57):
COUNTIF(cell_range,"Yes")

Once you have reviewed and understood the formulas and values from B42:B57, you can copy
the content ofB46:B50 toC46:CW50 and copy the content ofB52:B56 toC52:CW56.

Here is how our worksheet looks like (only 10 out of 100 simulations results are shown below):

A 8 c D E F G H J K
42 Ill= 40
43 a= 0.65
44 m= 38
45 'le= 1.024394
46 b1= 163_162645 12:!L1E79 4i6.826J6i 1WW13 13 . 5.5 64J 85._4841>5 93.69496, 89.25071 117.0464 1l9.4847
47 se{b1)= 28.53373 22. 14-145 24.0()9091 23.8.1712 27-41891 25. 52'32'9 19241()2 19.19294 27-79757 22.4184
48 LL= !i.862943 83.33492 �i.7774�1 &2.�56'°3 80.0.5716 31-79'105 54_ 74354 50.39S6S 6.0. 77321_ 74.10106
49 UL= 121.39 172 981 95:4.30:22: 159L1S65 -, 91.()f1 139.178.2 132.6464 128_ 10491 173.3197 164:86!!4
50 �1 in -Cl Yes Yes Na Yes Yes Yeis Yes Yes Yes YEJ>s
51 Yes· �8
52 bi= 12 32048 7.215456 13:31 B9� 9'-29'7985 8.060182 11.0701>1 10.90295 10.74238 9'.0090 1 1 B.548776
53 seCb2)= 1.804631 1.4.00348 l .5164,6&, 1.506]27 1.734124 1.167748 l.2:169()9- 1.21386& 1. 758073 1.417864
54 LL= l:l.&67196 4,380-599 j 0:24497 S..248586 4.549531 7.674729 8.439441 82650·28 5.44998 5�6-i'B459
55 .UL= 15.97377 1 0 . 05031 1°16:39293 12.34738 11.57073 14.46i649 U.36645 1.3.199172: 12.56604 11.4190,9
56 S2 in Cl Yes Yes Na, Y�s Yes Yes Yes_ Yes Y�s Yes
57 Yes' 911

We find that 98 out of our 100 confidence intervals contain the true parameter value, both for our
intercept and slope coefficient confidence intervals. Note that you will draw different random
Interval Estimation and Hypothesis Testing 81

samples, obtain different interval estimates and thus obtain a different number of intervals that
will contain the true parameters values.

We first extended our repetitions to 1,000 samples, and found that 959 out of 1,000 interval
estimates contained {31, and 962 out of 1,000 interval estimates contained {32. Finally, we
extended the repetitions to 10,000 samples and found that 95.08% of both the intercept and slope
coefficients interval estimates contained the true parameters values.

In the next section of this chapter, we will perform hypothesis tests. To go over examples of
hypothesis tests, we are getting back to our simple linear regression model of weekly food
expenditure.

3.2 HYPOTHESIS TESTS

If the null hypothesis H0: {Jk = c is true, then the test statistic t =(bk - c)/se(bk) follows at­
distribution with m = N - 2 degrees of freedom:

(3.4)

When we reject H0, we accept a logical alternative hypothesis H1. There are three possible
alternative hypotheses to H0:
(3.5)

(3.6)

(3.7)

3.2.1 One-Tail Tests with Alternative "Greater Than" (>)


If the alternative hypothesis (3.5) is true, then the value of the computed test statistic will tend to
be unusually large. We will reject H0 if the test statistic is in the right-tail of the distribution.

reje t J-10:
�k=c
do 11ot
rej�ct H �

�k =c

Note that in this case the probability is a that a randomly drawnt-value is equal to or greater than
tc, where tc is defined as the lower limit of the right-tail of the distribution shown in the graph
above.
82 Chapter 3

3.2.2 One-Tail Tests with Alternative "Less Than" (<)

If the alternative hypothesis (3 .6) is true, then the value of the computed test statistic will tend to
be unusually small. We will reject H0 if the test statistic is in the left-tail of the distribution.

1(m)

RejecL H0: �k "' c

Note that in this case the probability is a that a randomly drawn t-value is equal to or less than tc,
where tc is defined as the upper limit of the left-tail of the distribution shown in the graph above.

3.2.3 Two-Tail Tests with Alternative "Not Equal To" (¢)


If the alternative hypothesis (3. 7) is true, then the value of the computed test statistic will tend to
be unusually small or large. We will reject H0 if the test statistic is either in the left-tail or the
right-tail of the distribution.
fl/)
Re-jecl H0;13�= c R.-jecl f/0:13k= i:
Do no< rejccn
Accc�t ll1: �:
1 ,. ....--�� ��--i ,Accept H1 : �k ·I- c
.c
Ho:�k="

Note that in this case the probability is a that a randomly drawn t-value will fall in the tails of the
distribution, either equal to or less than tca;2,N-2) or equal to or greater than t(l-a/2,N-2). Those
limits are shown in the graph above. (Note that those limits correspond to values -tc and tc first
defined in Section 3.1.lb.)

3.3 EXAMPLES OF HYPOTHESIS TESTS

We illustrate the mechanics of hypothesis testing using the food expenditure model. We give
examples of right-tail, left-tail, and two-tail tests. Note that when the null hypothesis of a test is
that the parameter is zero, the test is called a test of significance. We can have one-tail tests of
significance or two-tail tests of significance.
Interval Estimation and Hypothesis Testing 83

Recall our estimated regression model; below the estimated values for b1 and b2, we report their
estimated standard errors, se(b1) and se(b2):

Yi= 83.42 + 10.21xi


(3.8)
(se) (43.41) (2.09)

3.3.1 Right-Tail Tests

We create a template for right-tail tests.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Simulation tab. Name it Right-Tail Tests.

lnte rval estimate Interval estllrate Simulation J Rloht-Tail Tests JI. ti 1

Create the following template to perform right-tail tests:

A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B 18
3 se(bk) = =Regression!C18
4 Ho: flk=
5 a=
6
7 Computed Values df or m = =Cl-2
8 tc= =TINV(C5*2,C7)
9
10 Rb?ht-Tail Test t-statistic = =(C2-C4)/C3
11 Conclusion: =IF(C10>=C8,"Reject Ho","Do Not Reject Ho")

We get the sample size N, estimated coefficient b2 and standard error se(b2) from our
Regression worksheet. All you have to do in each of cells Cl:C3 is, first, type the equal sign, and
then, select the needed value in the Regression worksheet with your cursor. Next, press Enter.
We are performing hypothesis tests on the slope parameter, {32. Cells C4:C5 are left blank for
now. Later, you will specify the value you hypothesize /32 takes, as well as the level of
significance of your test (a). In cell C7, the degrees of freedom are set equal to N 2, where N is -

the sample size, which we record in cell Cl.

Cell CS is where the critical-value for the right-tail rejection region is computed. Recall that all
the probability a of rejecting H0 is in the right tail of the distribution greater than or equal to tc.
The TINV function, on the other hand, gives us a tc value such that P(tm > tc) = a/2. So, what
we need to do, to get the correct critical-value for the right-tail rejection region, is to multiply the
specified a value by 2 in the TINV function (half of a x 2 is a, which is what we want).

Cell ClO is where the test-statistic t is computed. The test statistic is computed by plugging the
least squares estimate and its standard error into the equation fort in (3.4).
84 Chapter 3

Finally, in cell Cll, we use the IF function to determine whether or not our t-statistic falls into
the rejection region. If it does, we reject our null hypothesis; if it does not, we do not reject it (see
Section 3.1.4e for details on how the IF logical function works).

3.3.la One-Tail Test ofSignificance

Let a= 0.05; H0: {32 = 0 and H1: {32 > 0.

B c
N= 40
b;: 10.20964
3 .se{bl<)= 2.09326
· 3
4 Ho: Pk= 0
5 a= 01.05
6
7 C:omrmted Values dfo-rm= 38
6 le= 1.685954
9
10 Right-Tail Test t-statistic: 4.877381
11 C::onc�u�ion: Rejed H·o

3.3.lb One-Tail Test of an Economic Hypothesis

Let a= 0.01; H0: {32 :::;; 5.5 and H1: {32 > S.S.

Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 :::;; 5.5
against the alternative hypothesis H1: {32 > 5.5 is exactly the same as testing H1: {32 = 5.5 against
the alternative hypothesis H1: {32 > 5.5.

A I B c I D
1 �ta Input N= 40
2. b. = 18•.20964

t
I-
3 se(bie)= �.0.93;;"63 -
I-
4 f-!o: �k = 5.5
-
T cr= 01.01

�-
7 C()mputed Values df OF m =
-·-

38
8 tc =
-
2.42B.568
9
10 Ri.ght-Tail Test· t-stati stic = 2.249904
>---
11 Condusio_n� _po Not Reje<:,t H()

3.3.2 Left-Tail Tests

We create a template for left-tail tests.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Right-Tail Tests tab. Name it Left-Tail Tests.

Simulation l Riaht-Tail Tests / £:! IV" ,-\ I SimulatloM . RIO'ht-TaHests J Left-Tai!Tests,. 'D "'11
t·F =ll�lI� � I
,

======�ll=n= W=
s� =rt = � s h e=et�Sh�=if
o= == 11
!=========== =
Interval Estimation and Hypothesis Testing 85

The left-tail test template will be very similar to the right-tail test template. You can copy cell
Al:Cll from the Right-Tail Tests worksheet to cells Al:Cll in the Left-Tail Tests worksheet.

Alternatively, you can select the whole Right-Tail Tests worksheet by left clicking on the upper
left-comer of the worksheet. Your cursor should turn into a fat cross as shown below:

Select Copy. Left-click in cell Al of the Left-Tail Tests worksheet, and select Paste.

m A II s I
N-

You will need to make just a few modifications to create the following left-tail test template:

A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!Bl8
3 se(bk)= =Regression! C18
4 Ho: Pk=
5 a=
6
7 Computed Values df or m= =Cl-2
8 tc= = -TINV(C5*2,C7)
9
10 Left-Tail Test t-statistic= =(C2-C4)/C3
11 Conclusion: =IF(ClO<=C8,"Reiect Ho","Do Not Reiect Ho")

The rejection region for a left-tail test is the mirror image of the rejection region for a right-tail
test; it is on the left-tail instead of the right-tail of the distribution. The critical value for a left-tail
test is thus the negative of the critical value for a right-tail test: in cell C8, we precede the TINV
function by a minus sign to reflect that.

In a left-tail test, we reject our null hypothesis if our !-statistic is less than or equal to our critical
value, not greater than or equal to our critical value as it is the case in a right-tail test; we adjust
the equation in Cll accordingly.

Finally change the label in cell AlO to "Left-Tail Test".

Let a= 0.05; H0: {32 � 15 and H1: {32 < 15.

Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 � 15
against the alternative hypothesis H1: {32 < 15 is exactly the same as testing H1: {32 = 15 against
the alternative hypothesis H1: {32 < 15.
86 Chapter 3

A I 8 I c
1 Data Input N= 40
I-
2 b,,= ·rn.20964
I-
3
-
se(b�= 2.0-93263
,_
4 Ho: �k = 15
5 ci= 0.05
&
f--
1 Computed Values df or m = 38
8 t., = -1.6-85954

-�-
10 Left-Tail Test
-
t-statistic = -2.288464
11 Conc.lusion: Reject Ho

3.3.3 Two-Tail Tests

We create a template for two-tail tests.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Left-Tail Tests tab. Name it Two-Tail Tests.

Left-Tail Tests q R.icJht-T<iU Te.51:s / Lef-t Tall Tests l Two-Ta'il Tests, �:1
l.:-�.;;;;;;;
;; ;;;;
;;;;;;;; ;; ����
;;

The two-tail test template will also be very similar to the right-tail test template. You can copy
cell Al:Cll from the Right-Tail Tests worksheet to cells Al:Cll in the Two-Tail Tests
worksheet. Alternatively, you can select the whole Right-Tail Tests worksheet and copy it in the
Two-Tail Tests worksheet.

You will need to make just a few modifications to create the following two-tail test template:

A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B18
3 se(bk)= =Regression!C18
4 Ho: �k=
5 a=
6
7 Computed Values dfor m= =Cl-2
8 tc= =TINV(C5,C7)
9
10 Two-Tail Test t-statistic= =(C2-C4)/C3
11 Conclusion: =IF(OR(C10<=-C8,C10>=C8),
"Reject Ho","Do Not Reject Ho")

The rejection region for a two-tail test is split in half between the left-tail and the right-tail of the
distribution: only a/2 of the probability is in each tail of the distribution. So, we do not need to
multiply a by 2 in the TINV function any more: delete *2 in cell CS.
Interval Estimation and Hypothesis Testing 87

In a two-tail test, we reject our null hypothesis if our t-statistic is less than or equal to the left-tail
critical value, or greater than or equal to right-tail critical value: we adjust the equation in Cll to
reflect that (see Section 3.1.4f for details on how the OR logical function works).

Finally we change the label in cell AlO to "Two-Tail Test".

3.3.3a Two-Tail Test of an Economic Hypothesis

Let a= 0.05; H0: /32 = 7.5 and H1: /32 * 7.5.

A B c D
ii Data Input N= 40
2 b;.= 10-20964
3 se(b.i,) = 2.093263
4 �= l.6
5 .er= 0.05
6
7 Comp·uted Vilues df or m = 38
B le= '2.024394
9
10 Two-Tail T est · t-stati stic 1.29445 8
=

11 .Co.nclu·s]on: Do Not Reject Ho

3.3.3b Two-Tail Test ofSignificance

Let a= 0.05; H0: {32 = 0 and H1: /32 * 0.

A B I c
-
1 Data Input N= 40

-
2 10.209'6'4
b,.=

-
3 se(b,;)= 2.0932&3
-
4 Ho:�= 0

-
5 II= 0·.05
6
---

_]__ o(:omputed Values df or m = 38


8 le= 2.024394
T
�.-
10 Two-Tail Test
-
t-statist]c = 4.8773:81
J.1 CoIJ1;;lusion: Roejoct Ho•

Note that the t-statistic in a two-tail test of significance is equal to the !-statistic in one-tail test of
significance (compare the !-statistic value above to the one obtained in Section 3.3.la). Also note
that this t-statistic value for tests of significance is reported in the regression summary output
generated by Excel.

Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the left-arrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.

I Re a ressio n rv<.
0
•� '4 � �1

Ready I 'IC
88 Chapter 3

Column D of the last table of the summary output presents the t-statistic values for tests of
significance of the intercept and slope parameters, {31 and {32 (shaded cells below).

�i A
MMARY OVTPITT
I B I c I D I E I F G I H I

J[ Re11.reson
s1 Stil'tisfjcs
4 I Mult ipl e R 0.620485472
'

I R!g? u�re 0.38500 2221


j �

_!_ Adjus!�d RS_.g_uar!! 3 8 8 9'


0. 6 8.1 06

__]__ Stafldard Error 89.517004291


8 Observations 4(}
i
9
1() ANOVA I
11 1 df SS MS F Sig_nificance F
'
_g_ Regression 1 190626_97BB· 190626.9788 23;78884107 U14586E-O �
13 Residual 3B· 304505.17 42 �013.29401sa
14 Total � 495132_q5:J.
15 I
_!Ii Co&fficie!l'ls S tarrdarrf Error t-Stat P-11alue Lower95% Upper 95% Lower95_0% Uee_er 95- 0% 1
JI lrJter.cept 83.41 &00997 43.410·161·92 1.92-15779'51 0.0621823'79 -4.46J2r.n2n 171.2952877 -4_<\63267721 1 T1 -2:9_5?_877
14 4472328 5.9720522GZ 14:4472328'
I

16 X Varialille 1 ro 2osii425, 2.093263461 4.!l7738Ji5-54 1.94586E--05 5.9'72052202

3.4 THE p-VALUE

When reporting the outcome of statistical hypothesis tests, it has become standard practice to
report the p-value (an abbreviation for probability value) of the test. If we have the p-value of a
test, we can determine the outcome of the test by comparing p to the chosen level of significance,
a. This is an alternative to comparing the test-statistic value to the critical value(s) or limit(s) of
the rejection region for a test.

3.4.1 The p-Value Rule

In order to explain the p-value decision rule for hypothesis tests, we first give a definition of the
p-value.

3.4.la Definition ofp-Value

How the p-value is computed depends on the alternative hypothesis of our test. If H 1: Pk > c, p
is the probability that at-value be equal to or greater than the test statistic t value.

0 t
Interval Estimation and Hypothesis Testing 89

If H1: Pk < c, pis the probability that at-value be equal to or less than the test statistic t value.

t 0

If H1: f3k * c, pis the probability that at-value be equal to or less than - It I or equal to or greater
than It I, where t is test statistic value.

p/2

-t
l l 0 t
ll

3.4.lb Justification for the p-Value Rule

We can see that when the test statistic value t falls into the rejection region, this means that its p­
value is less than, or equal to, the level of significance a.

For H1: f3k > c; if t > tc, t is in the rejection region and p < a. The case illustrated below is
where t > tc, and p < a. H0 is rejected.
90 Chapter 3

reject Ho

0 fc = f (l-a,N-2) f

For H1: f3k < c; if t � tc, t is in the rejection region and p � a. The case illustrated below is
where t < tc, and p <a. H0 is rejected.

reject Ho

f fc = f(a,N-2) 0

For H1: {3k =F c; if t � tc on the left-tail of the distribution or t � tc on the right-tail of the
distribution, t is in the rejection region and p � a.

The case illustrated below is where t > tc on the right-tail of the distribution, and p <a. H0 is
rejected.

reject Ho reject Ho

a/2

tc tca12,N-2) 0 tc = t(l-a/2,N-2) t
=
Interval Estimation and Hypothesis Testing 91

The case illustrated below is where t < tc on the left-tail of the distribution, and p < a. H0 is
rejected.

reject Ho reject Ho

p/2

f fc = f(o12,N-2) 0 fc = f(l-o12,N-2)

We can thus compare the p-value of a test, p, to the chosen level of significance, a, and
determine the outcome of our hypothesis test: if p ::::; a, we reject H0 and accept H1; if p > a, we
do not reject H0. This is the p-value rule.

3.4.2 The TDIST Function

p-values are obtained in Excel by using the TDIST function. For hypothesis tests purposes, the
syntax of the TDIST function is as follows:

=TDIST(ABS(t),m,tails)

t is the value of the computed test statistic, ABS is a mathematical function that will return the
absolute value oft, mis the degrees of freedom, and tails specifies whether we are seeking the p­
value for a one-tail test or a two-tail test. Set tails to 1 for a one-tail test, and set tails to 2 for a
two-tail test.

Go back to your Right-Tail Tests and Left-Tail Tests worksheets and add the following at the
bottom of each template:

A B c
12 p-value = =TDIST(ABS(C10),C7 ,1)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")

Go back to your Two-Tail Tests worksheet and add the following at the bottom its template:

A B c
12 p-value = =TDIST(ABS(C10),C7 ,2)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")
92 Chapter 3

3.4.3 Examples of Hypothesis Tests Revisited

3.4.3a Right-Tail Test of an Economic Hypothesis from Section 3.3.1 b

Note that the hypothesis testing procedure for testing the null hypothesis that H0: P2 < 5.5
against the alternative hypothesis H1: Pz > 5.5 is exactly the same as testing H0: Pz = 5.5
against the alternative hypothesis H1: Pz > 5.5.

� A B I c I D
I' Oata Input N·= 40

-
2 b, = 10.20964

-
3 se(b,,) = 2-09326�
4 H<'l��-= 55
5
-
(l = 0�01

_ _§____
7 Comput·edl Values df or m = 38
8 t.c= 2_42856.&

-�
9
--
to Rig1ht-Ta.il Test t-statistic = 2'.249994
11 Co·�clusi6n: Do Not Reject .Ho
·12 -
p-'llalu·e �.·015163
=

13 Concl1.1sion: Do Not Reject Ho

Let a= 0.05.
A B c
1 Data Input N= 40
2 bi.= 10_20964
3 se(bk) = 2_093253
4 Hu: �1<_= 5_5
5 ll = 0_05
6
7 Computed Values dfor m = 38
8 tc-
- 1 . .585954

10 Right-Tail Test t-st.atistic 2.249904


=

11 Concl�sion: Heject Ho
12 _p-value 0 . 0 1 51 & 3
=

13-- C·o·nclusion: Reject Ho

3.4.3b Left-Tail Test of an Economic Hypothesis from Section 3.3.2

Note that the hypothesis testing procedure for testing the null hypothesis that H0: Pz > 15 against
the alternative hypothesis H1: Pz < 15 is exactly the same as testing H0: Pz = 15 against the
alternative hypothesis H1: Pz < 15.
Interval Estimation and Hypothesis Testing 93

A I 8 I c I D
1 D11ta Input N= 401
,___.

-
2: bk= 10_20964
3 se(bk)= 2.093263
f-
4
·-
Ho: P'k = 1S
5 a= 0.01 -
,....._
5
3B·
� Computed: Values df mm=

a. r.., = -2..4285681
,___.
9'
,_____
10 Left-Tail Test t-statistic = -2.:288464
,___.
11 Conclusion: Do Not Re}ect Ho
'12 p-value = 0.013881
13 Conclusion: Do NotRajed Ho

Let a= 0.05.
A B c
Data lnp·ut N= 40
2 bi\:= 10-20'964
.3 s·e(bk) = 2_09'3263
4 '.15
5 a,= 0.05
6
7 Computed Values dform= 38.
a !.: = -1-685-954
9
10 Left-Tail Test t-statistic = -2.2684>64
11 Conclusion: Reject Ho
12 __ p-value = Q_QH881
13 -Conclusion: R·eject Ho

3.4.3c Two-Tail Test of an Economic Hypothesis from Section 3.3.3a

Let a= 0.05; H0: P2 = 7.5 and H1: Pz -=I= 7.5.

A B
Data Input N= 40
2 b·= 10.20964
]. se(bx) = 2.093263
4 Ho:�= 7.5
5 a= 0.0�
6
7 C•omput•.e.di Values df OF m = 38
8 tc = 2.024394
g,
10 Two-Tail Test t-statistic = 1.29M.5B
11 Conclusion: Do ':Jot �eject Ho
·12 p-val�e = 0.20331.8
13 Conclusion: Do N ot ReJect Ho

3.4.3d Two-Tail Test of Significance from Section 3.3.3b

Let a= 0.05; H0: P2 = 0 and H1: Pz -=I= 0.


94 Chapter 3

A B c
1 Q11ta. Input N= 40
2 b,;= 10.20964
3 se'(b1::)= 2.0'9'3263
4 Ho� �k = 'Q
5 o:= 0Jl5
e;
7 Compute<fValues dfor m = 38
8 t,,= 2..(}24394
9
t-statistic = 4.877381
Conclusion: Rej(?ct Ho
p-value- 1.95E-05
Ho
=

Conclusion: Reject

Note that the p-value for this test is very tiny. "l .95E-05" is a standard scientific notation which
means "1.95 times 10 exponent -5":

1 1
"1.95E-05" = 1.95 x 10-5 .95 .95 0.0000195
10s 100,000
= = =

Also note that this p-value for the two-tail test of significance 1s reported m the regress10n
summary output generated by Excel.

Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the left-arrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.

0
Column E of the last table of the summary output presents the p-statistic values for the two-tail
test of significance for the intercept and slope parameters, /31 and /32 (shaded cells below).

A I B I c I D I E I F I .G l K I I
1 SUMMARY OUTPLJT
T +
3 Hearession Slati:slics
-

4 Multiple R 01.1620485472.
,___

RSgua.re 0.38:5()02221
I

I 6 �djustoo R Square ·o.3s.ss1sos9


T Standard Erf·or -6. 9 _ 5. 1700429i
-

� Obsenra'tions 4 01
9
10 MOVA
I--
I
�!!.. ,df SS MS F Sfg_nJticance F
12 Regress-i Ctn 1 19()&26-9788 1!10626_9'7.BS 23. 7ss.s4·107 1.9458:6E-()5
13 Residual 3( 304505.1i4:2 80•1 i2940.58
14 Total 39 495132.153
15
16 Coefficients Slandam Error r Stat P-v;alae tower !15% Utmer 95% Lower95.0% Upper95.0%
17 lnteicl'!'pt g·3_4 1160 0997 43.4101619'2 1._92
• 1577951 O_Q.621823 79 4)153267721 17129528 77 -4.463267721 171.2%2877
18 X Variable 1 2.0932&3451 4.877380554 1._94586E-!li5. 5.9720522()2 14.4472328 5._ 97'2052202 R4472328
I--
10.2:1)%425
CHAPTER 4

Prediction, Goodness-of-Fit, and


Modeling Issues

CHAPTER OUTLINE
4.1 Least Squares Prediction 4.6.3 The Jarque-Bera Test for Normality for
4.2 Measuring Goodness-of-Fit the Linear-Log Food Expenditure Model
4.2.1 Coefficient of Determination or R2 4.7 Polynomial Models: An Empirical Example
4.2.2 Correlation Analysis and R
2 4.7.1 Scatter Plot of Wheat Yield over Time
4.2.3 The Food Expenditure Example and the 4.7.2 The Linear Equation Model
CORREL Function 4.7.2a Estimating the Model
4.3 The Effects of Scaling the Data 4.7.2b Residuals Plot
4.3.1 Changing the Scale of x 4.7.3 The Cubic Equation Model
4.3.2 Changing the Scale of y 4.7.3a Estimating the Model
4.3.3 Changing the Scale of x and y 4.7.3b Residuals Plot
4.4 A Linear-Log Food Expenditure Model 4.8 Log-Linear Models
4.4.1 Estimating the Model 4.8.1 A Growth Model
4.4.2 Scatter Plot of Data with Fitted Linear­ 4.8.2 A Wage Equation
Log Relationship 4.8.3 Prediction
4.5 Using Diagnostic Residual Plots
2
4.8.4 A Generalized R Measure
4.5.1 Random Residual Pattern 4.8.5 Prediction Intervals
4.5.2 Heteroskedastic Residual Pattern 4.9 A Log-Log Model: Poultry Demand Equation
4.5.3 Detecting Model Specification Errors 4.9.1 Estimating the Model
4.6 Are the Regression Errors Normally
2
4.9.2 A Generalized R Measure
Distributed? 4.9.3 Scatter Plot of Data with Fitted Log-Log
4.6.1 Histogram of the Residuals Relationship
4.6.2 The Jarque-Bera Test for Normality using
the CHllNV and CHIDIST Functions

In this chapter we continue to work with the simple linear regression model of weekly food
expenditure to make predictions, compute goodness-of-fit measures, and address modeling issues.
We also work with additional examples.

95
96 Chapter 4

4.1 LEAST SQUARES PREDICTION

A 100(1 - a)% prediction interval at value x0 of the explanatory variable is defined as:

(4.1)

where: Yo = b1 + b2x0 is the least squares predictor, (4.2)

tc is the 100(1 - a/2)th percentile from the t-distribution with N - 2


degrees of freedom,

and se(f) is the standard error of the forecast.

The standard error of the forecast is given by:

se(f) = .Jvar(f) = (4.3)

2
where: 8 is the estimate of the error variance or mean square residual (MS residual),

N is the sample size,

and se (b2) is the standard error estimate for b2 .

The lower limit (LL) and upper limit (UL) of the prediction interval are:

LL = Yo - tcse(f) (4.4)

LL = Yo + tcse(f) (4.5)

Before we create a template to compute prediction intervals, we quickly re-estimate the food
expenditure model; note that this time we also want to generate the residual output. We are
interested in the Predicted Y values generated in this output. Also, since we will use more than
one data set and run more than one regression in this chapter, we will choose to give our data and
regression worksheets more explicit names.

Open the Excel file food. Save it as POE Chapter 4.

Rename Sheet 1 food data. Re-estimate the regression parameters using Excel Regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Food Regression; and do check the box next to Residuals.
Prediction, Goodness-of-Fit, and Modeling Issues 97

-- - -- -
Reg.-essi�n - --- -- l1J(g]
Input
Input")'. Range,:

lnput.l\; Range:
[�]
O'kabels D Constintis �ero
0 Con6d!i!nte Le�el: @=] %

Oulj'JUI tlpt!bns

0 Qt;lputRange: �I
0 New W11r.k:sheet f'.ly:
0 New !!11.orld;loolc
Residuals
� 'R:e�d;;,aii' D Resi�al !>lots
D si�.J�;.i'iz!i!d Residuals D Line Fit Plofu
Normal Probabilicy
D !'iormal Probability P.lots

Next, insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of
your screen. Name it Prediction Interval.

l
I Insert Work.sheet (Shift �FHJ M

Create the following template to construct interval estimates. In the last column you will find the
numbers of the equations and the formatting options used, if any, in the template.

A B c
1 Data Input Sample Size= ='Food Regression'!B8
2 Confidence percentage

Level= 0 decimal place

3 Xo =
4 b1 = ='Food Regression'!B17
5 b2 = ='Food Regression'!Bl8
6 se(b2) = ='Food Regression'!C18
7 MS residual= ='Food Regression'!D13
8
9 Computed a= =l-C2
Values
10 df or m= =Cl-2
11 tc= =TINV(C9,C10)
12 predicted Yn= =C4+C5*C3 (4.2)
13 x-bar= =AVERAGE('food data'!B2:B41)
14 se(f) = =SQRT(C7+C7/Cl+((C3-C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit= =C12-Cl1*C14 (4.4)
Interval
17 Upper Limit= =Cl2+Cl1*C14 (4.5)
98 Chapter 4

At x0 20, the results


= of a 95% prediction interval for y0 is (see also p. 134 of Principles of
Econometrics, 4e):
A B C
1 Data Input Sample Siz·e = 40
2 CiJnfidence Le11e l = 95%
3 XO:: 20
4 b i = ,B3_41601

5 b2 = 10-20964
6 se'(b2) = 2.093263
7 MS msidual = B O U294
s
9 Comput,ed Values « = 5,3
10 df or rn = 38
11 t,, = 2.024.394
i2 preidicted rm= 287 .6089
13 x-i>ar = 19 _,50475
14 se(f) = gQ_'63D86
15
16 Prediction Interval Lower Limit= 104.1363
17 Up[>E!r Limit= 471.0'814

4.2 MEASURING GOODNESS-OF-FIT

4.2.1 Coefficient of Determination or R2


The coefficient of determination, or R2, is the proportion of variation in y explained by x within
the regression model:

(4.6)

where: SSR is the sum of squares due to the regression (SS Regression),

SST is the total sum of squares (SS Total),

and SSE is the sum of squared errors or sum of squared residuals (SS Residual).

4.2.2 Correlation Analysis and R2


R2 can be computed as the square of the sample correlation coefficient between xi and Yi values.
This result is valid only in simple regression models:

Rz =
z
r:xy (4.7)

R2 can also be computed as the square of the sample correlation coefficient between Yi and
Yi = b1 + b2xi. This result is valid not only in simple regression models but also in multiple
regression models that will be introduced in Chapter 5.

Rz =
2�
r.yy (4.8)
Prediction, Goodness-of-Fit, and Modeling Issues 99

4.2.3 The Food Expenditure Example and the CORREL Function

We create a template to compute goodness-of-fit measures based on our estimated food


expenditure model.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Correlation Analysis and R2.

l.t:J .N: J I Correlati:io11 Analysis and R2 II


I I l�erl Workslneft tShift- Fll) i L !I

Create the following template (in the last column, you will find the numbers of the equations used
in the template):

A B c
1 Data Input SS Residual= ='Food Regression'!Cl3
2 SS Total = ='Food Regression'!Cl4
3
2
4 Computed R = =l-Cl/C2 (4.6)
Values
5 rxv= =CORREL('food data'!B2:B41, 'food data'!A2:A41)
6 r2xv= =CY'2 (4.7)
7 ryy-hat= =CORREL('food data'!A2:A41, 'Food
Regression'!B25 :B64)
2
8 r vv-hat= =C7A2 (4.8)

The sample correlation coefficients in cells C5 and C7 are computed using the CORREL
statistical function. CORREL returns the correlation coefficient between two data sets. The
general syntax of this function is:

=CORREL(cell_rangel, cell_range2)

In cell C5, we compute the correlation coefficient between x and y values, which we find in the
food data worksheet. In cell C7, we compute the correlation coefficient between y and y values;
the latter are found in the Food Regression worksheet, under the column labeled "Predicted Y"
from the residual output.

Here are the results you should get (see also p. 138 of Principles ofEconometrics, 4e):

A B
1 Data Input SS Residu.al = 3'04505.2'
2 SS Total= 495132-2
3
4 Compuled Values Rz= 0.38.5002
5 rX\' = 0.620485
6 �xy= 0.385002
1 =
ryy-11at 0.620485
-

8 r2yy-hat = 01.385002
100 Chapter 4

Note that ryy and R2 are actually reported in the summary output of your regression analysis:
cells B4:B5, shaded below (ryy is labeled "Multiple R" and R2 is called by its familiar name "R
Square").
I A I B
1 SUMMARY OUTPITT 1
,_
2
3 Reqression Statistics
�"lti�eR �.620¢.85472
R Square L0.38500.2221
Adjusted R Square 0.3'68818069
7 Standard Error 89.517Cl0429
-slohstirvations 40

4.3 THE EFFECTS OF SCALING THE DATA

In our food data worksheet, weekly food expenditure (y values) are recorded in dollars while
weekly income (x values) are recorded in units of $100.

Recall our estimated regression model. Below the estimated values for b1 and b 2, we report their
estimated standard errors, se(b1) and se(b2):

Yi 83.42 + 10.21xi
=
(4.9)
(se ) (43.41) (2.09)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 10.21 units, i.e. $10.21. The interpretation of the estimated intercept coefficient is
as follows: weekly food expenditure for a household with zero income is estimated at $83.42.

4.3.1 Changing the Scale of x

Let x* = 100x. We change the scale of measurement of our x values so that weekly income is
now recorded in dollars.

Go back to your food data worksheet. In Dl, enter the column label x*=lOOx. In cell D2, enter
the formula =100*B2; copy it to cells D3:D41. Here is how your table should look (only the first
five values are shown below):
A B c D
1 food_exp income x..=100x
2 115.22 3.69' 369
3 135.98 4.39' 439
4 119.34 4_75. 475
5 ..
114 96 6.03 50.3
6 18'7.05 12.47 1247

We want to re-estimate the food expenditure model using our original y values and our re-scaled
x* values.
Prediction, Goodness-of-Fit, and Modeling Issues 101

In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be D2:D41. Select New Worksheet Ply and name it Food Regression lOOx (you do not
need to select Residuals).

------
• Regres.siorn -LIJ�
'.Input
�l OK
Input)'. Range::
(�] [ Cancel
Input 1( Range::
[�]
ttelp
D babels D Constant!is �ero
D Confidence Level:. (0 %

Output opfions
0 Qutput fl.ange:
®New WoFks!ieet Ely: IFood Regression 100x I
0 New '\!'.'l_orkboo�·

D B.esiclu:als D Resigual Plots


D S!,andardired Residt.:1als D wne Rt P.lots.
Normal ProbabWty
D Mormal Prooabtt ity Plots

The results of your re-estimated regression model should be as reported below:

Yi 83.42 + o.1021xi
=
(4.10)
(se ) (43.41) (0.0209)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
1 unit, i.e. $1, weekly food expenditure is expected to
as follows: as weekly income increases by
increase by 0.1021 $0.1021 or 10.21 cents. Note that this is equivalent to saying that
units, i.e.
as weekly income increases by $100, weekly food expenditure is expected to increase by $10.21;
rescaling the data does not affect the measurement of the underlying relationship.

4.3.2 Changing the Scale of y


*
Let y = y/100. We change the scale of measurement of our y values so that weekly food
expenditure is now recorded in $100 units. We hold our x values at their original level of
measurement, which also recorded weekly income in $100 units.

Go back to your food data worksheet. In E l, enter the column label y*=y/100. In cell E2, enter
the formula =A2/100; copy it to cells E3:E41. Here is how your table should look (only the first
five values are shown below):

A I B I c I D I .E
1 foodi_exp in.come x'"=11}0x 'f=ylU>O
-
2 --
115.22 3.69 369 1.152:2
-
,3 135.98 4_39 439 1-3598
4 119.34 4.75· 475 1.-'.1934
-
5 114.96 6.03 603 1.1496
-� 187.05 12:.47 1247 1.87Cl5
102 Chapter 4

We want to re-estimate the food expenditure model using our original x values and our re-scaled
y* values.

In the Regression dialog box, the Input Y Range should be E2:E41, and the Input X Range
should be B2:B41. Select New Worksheet Ply and name it Food Regression divided by 100.

------------------- -
'. Regression
'

�L8J
Input
!J1Jput Y. Range: �

O !..ab€ls D Gonstll'ilttis;:'_ero
D Conjjaence !Level: �%
Output ep66ns

Q.·Qutput Range;
@New Worksheet �ly:: I;ion divided by 1001 l
0 New IJ!orkbcck
Residuals

D Re siduals. D Resi�ual Plcts


D Standan;lized Residullls D L[ne Rt Flo�
Nor.rrial Prababi�fy
D �orm-al ;Prnbability Plots

The results of your re-estimated regression model should be as reported below:

Yi o.8342 + o.1021xi
(4.11)
=

(se ) (0.4341) (0.0209)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 0.1021 of a $100 unit, i.e. $10.21. The interpretation of the estimated intercept
coefficient is as follows: weekly food expenditure for a household with zero income is estimated
at 0.8342 of a $100 unit, i.e. $83.42. Again, note that rescaling the data does not affect the
measurement of the underlying relationship.

4.3.3 Changing the Scale of x and y


Let x* = 4x and y* = 4y. We change the scale of measurement of our original x values and y
values so that food expenditure and income refer to a period of 4 weeks instead of 1. For
simplicity we will refer to monthly food expenditure and income values. Food expenditure (y
values) are still recorded in dollars while income (x values) are recorded in units of $100.
Prediction, Goodness-of-Fit, and Modeling Issues 103

Go back to your food data worksheet. In Fl, enter the column label x*=4x. In Gl, enter the
column label y*=4y. In cell F2, enter the formula =4*B2. In cell G2, enter the formula =4*A2.
Copy the content of cells F2:G2 to cells F3:G41. Here is how your table should look (only the
first five values are shown below):

A I B I c I D I E I F I G
food_e:xp income. x"=100.x x"�.x:
_1_ y"=yJ11H) y*�}'
115.2'2 3.&9' 369 1.152'2 14.75:' 4&0.88
c-1---
3 ns_g.s 4.39 439 1_3S.9'8 17.56: 543.92
4 119.34 4.75' 475 1.1934 19' 4n.J.s
i---

5 114.96 6.03 603 1.1496


' 24.12 45.9.84
I---

6 187-G5 12.47 1 247 1.870'5 49.88: 748.2

We want to re-estimate the food expenditure model using our newly rescaled x* and y* values.

In the Regression dialog box, the Input Y Range should be G2:G41, and the Input X Range
should be F2:F41. Select New Worksheet Ply and name it Regression 4x and 4y.

1 Regrnssiorn LZJ[8]
Jnp:rt
illpllt Y. Range: $G52:5G$41

lnput �Range: $F�2:: $f:$41

D \._oibefs D Constant is �era


D Coofidence Leve1: EJ:o/o
Output o ptions

0 QutputRange:: I ·�l
@New W-0rksheet·e_ly: j. egression 4x a'rid 4y I
0 New W.orkOOok
Residuals
D B_esiduals D Reslgual plots
D Standardized Residuals, D L"!ne Fit :f'.!Otl
Normal Preb,abUi ty
D �ormal Pr.obability pJ. ots

The results of your re-estimated regression model should be as reported below:

Yi 333.66 + 10.21xi
(4.12)
=

(se ) (173.64) (2.09)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as monthly income increases by 1 unit, i.e. $100, monthly food expenditure is
expected to increase by 10.21 units, i.e. $10.21. The estimated monthly food expenditure for a
household with zero income is $333.41; this is 4 times the estimated weekly food expenditure for
a household with zero income (see Section 4.3.1). Again, rescaling the data did not affect the
measurement of the underlying relationship.
104 Chapter 4

4.4 A LINEAR-LOG FOOD EXPENDITURE MODEL

In your food data worksheet, insert a column to the right of the income column B (see Section
1.4 for more details on how to do that). In cells Cl:C2, enter the following column label and
formula.
c
1 ln(income)
2 =ln(B2)

Copy the content of cells C2 to cells C3:C41. Here is how your table should look (only the first
five values are shown below):
· - _._
·
-A _
l B l c
+--
_ ______

1 food elilp income l_n{i!lcome}


2 115.22 3_,59 1.305&26458
-3 135.98 4.3:9 '1.47S32�2:27
4 119.34 4.75 1.558144618
5- 114.96 ,s_o3 1.796741011
6 187.05 "12.47 2.5233257f)

4.4.1 Estimating the Model


We estimate the following linear-log model for food expenditure:

FOOD_EXP = {11 + {J2ln(INCOME) + e (4.13)

In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be C2:C41. Select New Worksheet Ply, name it Log-Linear Food Model and do check
the box next to Residuals. Finally select OK.

r R�----- -f1j�
OK tiJ
Input
inputYRange: �:$2::$Aµ1 �
Cancel I
[nputKRange: I !iC$2:: $t �1 [iJ
Ol,,abels D Cornstant is �ero
!ielp· ]
D Con�dence Level: �%
Output op�on:;;

0 Qu:tputlRange:: !iii
® Ne•111 Worksheet Ely:; I Log-linear Food Model I
0 New �orKbook
Residuals
D ResiQUJal Plots

The result is (matching the one reported on p. 144 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 105

A I B I c I D I E I F I G H I I
1 SUMMARY OUTPUT
.
2 I
:3 R.egression Sl�lislics·
I '
4 Multiple H 0_5917084.978

f
'5 R ?quar� _ 0-35651;04 71
,___,__ I

_£ Adjus�ed R SqL!aFe· 0 . .3 3957 6536 r


7 Standard Error 91-5671. i 0:26
'
'8 Observations 40
'

�AN OVA t '


'
11 I rJf SS MS F Sig_nificance f
'
12 RegF�ssion 1 176:51.9.79 71 176:519.79'!1 21.05301996 4.75993E-05

t
- ·I
t3 Resrdwal 38 318612.3 5 59 8384,_535G82
'
14 Total 39 495132.153
1.5
�6 GoeHicients Sl<1ndani Error I Stef P-vil')Ue .Lowe.r95% Uooer95% Lower95.0% Uooer95.0%
r--
11 lnleKept. cS7.18641517 8423744235 -1.1537199'19 0.255620028. -2&7.716:2004 73.34337005' -267. 7162004 73.34337005
-
Ta x variabie.1 13z:1 s584.24 28_8 0461184 ··.uag357 ii .759'93E--os 73 8 53'g.54-77
_ 190.47773· 7f8.S395477 f 9fl._,f7773:

Note that your ANOVA table should be followed by a RESIDUAL OUTPUT table. This last
table contains a column of Predicted Y or fitted values and a column of Residuals values. We use
the fitted values in the next section.

'! A I B I c I
2:.2 RESJDUAL OUTPUT
I
·23
24 ObseNation Predicted Y .Residua.ts
25 1 75.37280548. 3�.84719-552:
2.6 2 98.330ll827 37.649-51773
27 3 108.747080-8 10,59291519
28
-- 4 140 -282'1-6,7 -2532216803
.5 23K31 1 059·4 --4!}-25105·644

4.4.2 Scatter Plot of Data with Fitted Linear-Log Relationship

Go back to your food data worksheet and select A2:B41. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.
-
Scattn -

c111urn·111

Cham
A,rea Srntf->J
� 1-'-S
Other
Charts T
fi
! Ll �
• a

The result is:


106 Chapter 4

.... ..
4()

35

3()


.. .. ....
25
T
••••
2()
••• .. ...�

. .. ,.. . . .- •Seriesl 1:
15 ..... .
• ••
t()

5

r
()

0 100 200 30-0 4()0 500 60{) 70Q


: -- ---

You can see that our food expenditure values are on the horizontal axis and income values are on
the vertical axis; we would like to change that around and edit our chart as we did in Section 2. 1.
(
The result is see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):

0
D
lD

....,. 0
.i; 0
l1' .
!!! �.
" 0
:= a
,,
"
"" . I
.
8. 0
. .
" <>
" "'
"C . . ·'
0 0
.e a . .
,,_ "' . .
::;;; .
" a . . .
0
"
!: ....
0

0 5 10 15 20 25 .:15 40

wee klvin<:0me in .SUIO

.... .. - -

Finally, we add the fitted linear-log relationship to our scatter plot. Right-click in the middle of
the chart area of your scatter plot and select Select Data. In the Legend Entries (Series) window
of the Select Data Source dialog box, select the Add button. In the Series name window, type
Fitted Linear-Log Relationship. Select B2:B41, from the food data worksheet, for the Series X
values; select B25:B64, from the Log-Linear Model worksheet (Predicted Y values) , for the
Series Y values. Finally, select OK. The Fitted Linear-log Relationship series has been added
to your graph.

eel Data Source


, ------ -
!Edit Series.
Ct.art gata range: c=
Qel�ete The data ranoe Is. toQ ·complel< I I ="Fitte:d Linear-log Relationship" [i] = Fr
the series in ihe Serles panel.
Re·;et.to M.o_tch Stytf Series� va'luesc
-

Change ClhalTtTJ!p� ...


rr==i I ='fwd data' ':$B$2;S8$41 � =
$.

Le!Jend Enbies. (S_eries) Series)'. �a'lui>s:•


S!le.ct Data ... __ � I ='Log-linear Food ·Modd'.J.$85'2.5: $13. [!EJ
3-C foLtLic-n, � I �_Add �[ ��it = 7�

� Eo·mat Pr.otAr,·a...
Seriesl DK G;J
Prediction, Goodness-of-Fit, and Modeling Issues 107

Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re­
appears, select OK again.

r
1 S'0lffi:t Data Source
..

I Edit Se rie5.
J:l:!lart gata range; c=
The dlliD range is roo complex t Series g_ame·:
the serieos in ttie· Series.pal':lel.
[IJ '>!!lei

JP Series �values.:

I ='food clatt.'!$6$2:::58.$"11 lil =3".

Series Y 'lalUJes:
I ='fuod dat:l'!$A$2:-$A$41 liJ. = 1.l

OK_E;J

Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.
Ol!erl!ay Legemf at .Right
Sh1ow Leg1e-ndl at iight of

Ch;ntTool�
the chart
��
wbthouli resizing
ov'11rl.ay i�;;n di at L1eft
Show Legrend at ren of 1-'of
Chart Axi1 Leg:end D.ot:i Data
Title• Title>• • �f;;ablJ"ls • Tattle•
the chort wWlou1i re si:zin: g
Design Layo Format
Labe�
wC::s

Finally, we want to reformat our Fitted Linear-Log Relationship values series. Select the
plotted series in your chart area, right-click and select Format Data Series. A Format Data
Series dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.

Qel:ete

� Resetfo M£tch Sfyle


r
Ch:ang·e· Serre·! Chart:TJ!pe·... I Format Da ta S·eries line Color --- ----_ ,

For mat Data Series


� S:tled Data ... 0 �a'.line
I Series Option& Marker Options
.!3_111tatio"I ® �olid line
Serjes Op1lons Marker Type
I 3-D
Marker Oplicms 0 §.radient line
Add Data1 �a.Q.els
0 >'1!,!fomalic
Madu"r Options� 0 A�matic

�-
Marker Fill
A.dtd lirendli:ne.,, Marker Fill

format Data Serres.... Line Color � �olor·:


Une Color �[�����

The result is (see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):
108 Chapter 4

0
0
"' • Adunl
"' 0
.!ii 0
11\ .
!:! -Fitted Linear-Log '·
" 0
:!: 0 R"latiombip
"Cl <t- • .r,
c

8. 0
"' 0
" m

] 0
.g 0
"'
.i:'
...
!II 0
" 0
....
s
0

0 5 10 15 20 2.5 30 35 40

we•!klyinoome in $100

4.5 USING DIAGNOSTIC RESIDUAL PLOTS

4.5.1 Random Residual Pattern

Consider the following simple linear regression model:

y= 1+x+e (4.14)

First, 300 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4 and 3.1.4. The variable x is simulated,
using a random number generator, to be evenly, or uniformly, distributed between 0 and 10. The
error term e is simulated to be uncorrelated, homoskedastic, and from a standard normal
distribution, or e-N(0,1). We generate these simulated observations next.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Random Residual.

I Random Residual ,./� Al


'I

In cells Al:Bl of your Random Residual worksheet, enter the following column labels.

A B
x e

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

formulas [ D•ta'� R!'lliew


Prediction, Goodness-of-Fit, and Modeling Issues 109

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
----- ------=---- '-

ta Analysis (1]�
Analysis, Tools

F-TestTwo-Sample fur Variances


Fourier MalySis
His tog ram
M ovi�vera9�
umm4m;m!§.l§&l!l!l· t±elp

ti
Rank and Percentile
Regr=ion
Sampling
t-Test� Paired Two Saomple fur Means
t-Test: Two-Sample Assuming EQual Variances �I

A Random Number Generation dialog box pops up. The Number of Variables simulated is 1,
and the Number of Random Numbers generated is 300. The variable x is simulated to be
Uniformly distributed between 0 and 10. Select the Output Range in the Output options
section, and specify it to be A2:A301 in your Random Residual worksheet. Finally, select OK.

,. -
Random Numb-er Generation [1]�
Nw-riber of !£ariables.:
1�1.----�
Number of Random Numb_ers: �I
.3
_ 0
0 ____� �-C_an_ce_I �

Q.lstnbutiom I uniform
Parameters

Ri111dom Seed:

Outputop\iarui
@ Qutput Range:

We repeat to draw a random sample of 300 error term from a standard normal distribution. Select
the Output Range in the Output options section, and specify it to be B2:B301 in your Random
Residual worksheet. Finally, select OK.

-
1' Rarnidom Numltl·er Generation [1] �
Nll!Tlber of ilariables: lt
._ ___ __.I �
Ni.imber of Rilif'ldom NurnQers: l ::m_o
._
____
_.I I cancel ]
Q.istribution:
�IN _rn ma_ _I _____
v I [ ttelp ]

Parameters

M�an=
!CJ
:i_randard deviation = �

Random Seed:

Ou1put oplions

® QU1tput RanQe: :$8$2: $6$30 l


110 Chapter 4

In cells Cl:C2 of your Random Residual worksheet, enter the following column label and
formula.
c

y
=l+A2+B2

Select cell C2 and copy it to cells C3:C301. Here is how our worksheet looks (only the first five
values are shown below):
A B c
x e y
4-405957 0.998193 6.40415
'9.518723 1.011883 11.53061
3.821223 -0.0063 4.812 922
5 :2.649'922 - 0 . 4 32 0 1 3.217908
6 3.976562 0.25586 5..23:2422

Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.

Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.

In the Regression dialog box, the Input Y Range should be C2:C301, and the Input X Range
should be X2:X301. Select New Worksheet Ply, name it Simulated Model 1 and do check the
box next to Residual Plots. Finally select OK.

- - ----�
--== - - -

Regr,es�ion llJ�
Input

Input 'f. Range: I $C$2=$C$30:1 c�J


Input-� Ra"ge: I s,e, $2: $1i$301 �
t::!eip
Dtoabe.ls D Con"t"mt ls.£ero
D ConBdence te�el � �%
Output opbons
0 Qutput Range� �1
® Ne111 Wmrkshtet �ly� J Simulated Model 1
0 Ne"' ��rkbook
Residuals
D &esiduals

In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.

'I A I B I c X Variable 1 Residual Plot


22 RESIDUAL OUTPUT
f--. • -<-
I
4
23
.JI! 2
24 ObseTWJtiorr Precfjcfed Y Residuals ..
25 L 5_.}73941992 1.0302083 94.
"
:g 0
-
0. 95 7il94 7.4t1:' ii
J.§_ 2' 10.5'72:7117 -2 12

J!_ 3' 4.779'3713-02 0.033550994


-4
28 4, J.58836801 -0 .3704603 63 XV11riabl1' 1

2.9 5 4.937323536 029,5098435


Prediction, Goodness-of-Fit, and Modeling Issues 111

After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.7
p. 146 of Principles ofEconometrics, 4e):

Simulated Linear Mod·el Residuals

z

• I
.. • 4ii • •

,. •
, . ••:
l - .. : .,... , •e •. ... .. ..

:.
••
4e45 • I

-: •

.., :
: I
• • • . , 41 \.,. • • .I • "
• ••• •
= • • • • • •

....•• "'1i
ll 0 I • ••
..., I .. ...
••

?•: • • • .. • •.... �-- � •


-
• II •
• II •.
I

i II .Ii ;.. :· I ol

U •• a : I ..• • I •; 6 • • A • ••• e e • II .. �


-l : • ,: • • .. � • I fl..:.: .:
• •• II I e • I .1-·

.. z
j ' • • .. •
• •

-� -+--���������������

0 2 4 6 8 10

4.5.2 Heteroskedastic Residual Pattern

Go back to your food data worksheet, select your scatter plot of food expenditure-income data
points and fitted linear-log relationship and make a copy of it. Right-click in the middle of the
copy of your chart. Select Select Data. In the Legend Entries (Series) window of the Select
Data Source dialog box, select the Fitted Linear-Log Relationship series, and then the Remove
button.
' - ������-

' Sel�ct D;ita Sourne

Chart Q.a ta rarng�:


The data range is, too complex to be dispjayed.
the series in the Series panel .
.Qeleh
� Resotto M;?tch Styl� 1r s.::itmP.

Chang< Chart T�p< ...

3-1:! Rotcuon.

format Pl1r>t Area...

Next, select the Actual series, and then the Edit button. In the Edit Series window, replace delete
the old Series name and re-specify r the Series Y values to be C25:C64, from the Log-Linear
Food Model worksheet. Finally, select OK, twice.

,. -- - ---

I Select ITlata S0>urce
Edit Series
chart !l_ara ranQ€: c= Series name�
The data�ange is 'too corn,Ple:x t
the series irn '!he Series panel.
Senes X �alues.:
-
J� �_cd �d_a:ta_;!-'-�-'--
l,_·rn_ -' -'-4-
- $2::58
---'--$ IJ � 3.
1 - --�•
=

Legend Enlries (Series) Series Y values:


l �'Log�inear .Food Model' !$C$2S'.$C � � �

OK fiJ

The result is (see also Figure 4.8 p. 146 of Principles ofEconometrics, 4e):
112 Chapter 4

Linear-Log Model Re.sidual:s

. " '·
.
.
. . .

: . .. .

10 20 .3 0 40

I mcomein S 1!!0

4.5.3 Detecting Model Specification Errors

Consider the following quadratic relationship:

y = 15 - 4x2 + e (4.15)

First, 50 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4, 3 .1.4 and 4.5.1. The variable x is
simulated, using a random number generator, to be evenly, or uniformly, distributed between 0
and 10. The error term e is simulated to be uncorrelated, homoskedastic, and from a normal
distribution with mean 0 and variance 4, or e�N(0,4). We generate these simulated observations
next.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Specification Error Residual.

II Soeaffcation Error Residual . ti 11


I lnmt Worksheet (Shift +Fll} II
11

In cells Al :C2 of your Random Residual worksheet, enter the following column labels and
formula.
A B c
1 x e
2 1 =2.5-((A2-1)/10)
3 2

Select cells A2:A3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell A52.

A J
1
-
2 1.1
3
. ' �
Prediction, Goodness-of-Fit, and Modeling Issues 113

Copy cell B2 to cells B3 :B52. Your table should look as the one below (only the first five values
are shown).
., A I B I c
1

x e

2 1 '.2-S
3 2 2.4
_!._ 3 23
5 4 2-2
6 5 2_ 1

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
, -

Data Ana lysis --1i]L8)


ArialysiSTools
F-TestTwo-Sameil� for \lar.iances
Fourier Ana'ly9s
HistOQram
Mo��_il-vera�_
tielp
1;mm1mwu4.14E1u..1.

H
Rank and Percentile
Re:gre.ssio n
Sampling =

t-Test: Paired Two Sample for Means


t-Test: Two-Sample Assuming.Equal Variances

We draw 1 random sample of 51 error terms from a normal distribution with Mean 0 and
Standard Deviation 2. Select the Output Range in the Output options section, and specify it to
be C2:C52 in your Random Residual worksheet. Finally, select OK.

Nlumber -of '{ariables:


I�i___ ___,l LSit;J
Number of Random Nu�rs':
�l -'1_--- � J [ Cancel J
·Qistrlbution: '-IN_ al
o r_m_ ___
_
____,"""
" I [ t!_elp
Parameters

Mg_an=

::trandard deviation = �

B.andom Seedc.
OUtputoptions
0 Quti:>ut-Range: I $1::$2: $C$52

In cells Dl:D2 of your Specification Error Residual worksheet, enter the following column
label and formula.
114 Chapter 4

D
1 y
2 =15-4*(A2A2) +B2

Select cell D2 and copy it to cells D3:D52. Here is how our worksheet looks (only the first five
values are shown below):
A B c D
x e J. -
1 2_5 2.72.3068 _7_275g3
2 2-4 -0_50477 -8_54477
3 2.3 1_115236 -5_04476
4 2-2 2_916886 -1_44311
5 2__ 1 2.982706. 0.342706

Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.

Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.

In the Regression dialog box, the Input Y Range should be C2:C52, and the Input X Range
should be A2:A52. Select New Worksheet Ply, name it Simulated Model 2 and do check the
box next to Residual Plots. Finally select OK.

Input
Input :t_ Rar.ige:
Cancel
InputlIR,,,,ge: �$2::$11$52 �
ltjelp
D loaliels D CenSctallt is f_ern
0 Coojider.ice Level: EJ <>r.
Output options
0 QutputRange: l 'sm0roted !odd� �I
0 NeVi' W"rksheet E'.IY: I S"wnulated M"del � I
0 New �orkbook

D B_eslduals � Resi!l_ual Plots

In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.
- -- -

A I B I c X Var iable 1 Residual Plot


22 RESIDUAL OUTPUT
23
24 Oooervafion Predii;ted Y Residuttk;
2'5 1 5_967061�37 -i 3_233�9347
,_
26 2 5_976552712 -14.521.32
2f 3 5_9%043887 -11_040807!!4
,__.la
. 4 -6_015535062' -7-458 64-9129


29 5 6_035026236- -5-'692320172

After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.9
on p. 147 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 115

Mis.specified Medel R·esidu.al:s

15

10

.Iii 5
..
::I
JZ ()>

,;! -5

-10

-15

-20 -1-----.,.----.---.---,,...---.

-3

4.6 ARE THE REGRESSION ERRORS NORMALLY DISTRIBUTED?

Our analysis of normality of the regression errors will include a histogram of the residuals and the
Jarque-Bera test for normality.

4.6.1 Histogram of the Residuals

Go back to your Food Regression worksheet. If you do not see your Food Regression tab, it is
because it is hidden. Use either one of the left-arrows at the left comer of your screen so that the
first worksheets you were working with can be seen again. (If the worksheet you need to go back
to is a recently created one, use the right-arrows.)



Next to the columns of Residuals in the residual output section of the worksheet, we will create a
BIN column. In cell D24, type BIN. The bin values will determine the range of residual values
for each column of the histogram. The bin values have to be given in ascending order. Starting
with the lowest bin value, a residual value will be counted in a particular bin if it is equal to or
less than the bin value.

Fill in the bin values as shown below. Note that all you need to do is enter the first two values:
-225 and -200, select cells D25:D26, move your cursor to the lower right comer of your selection
until it turns into a skinny cross as shown below, left-click, hold it and drag it down to cell D43:
Excel recognizes the series and automatically completes it for you.
116 Chapter 4

D I
24 BIN
2
-

25
25
26
27
28
-200
-1'75
-150
1
29 -125
30 -100
31 -75
32 -50
33 -25
34 0
35 25
36 50
37 75
38 100
39 125
D J 40
41
'1'50
175 .I
��I
2
. + r--1\
:���I I 42
43
200
225
7J T:
E:::========::::::::!I
. ,

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
Ii& Data.�rcalysls I
-

nata[;j�11te�
- -

I I
I
I

"Ila5
E'orn:i J.ln.arym

The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.

analysis Tools
Covariance
Desi::i-iptive Stalistirn
'Exponential Smoothing
.f-TestTwo-Sam ple for V;;irianc:es
1�Fonum
r ie�r�M l ·� s tl_elp
iai
1i#ij!.Ji. I'"' r 1........ ......
I .. ..·.�
'Mov.ing .Average �
'RaAdom Number (:;ener.ation
Rank and Percentile
'Regression -vll

An Histogram dialog box pops up. For the Input Range, specify C25:C64; for the Bin Range,
specify D25:D43. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Residuals Histogram; check the box next to Chart Output. Finally, select OK.
Prediction, Goodness-of-Fit, and Modeling Issues 117

r -

Hi stogram tz:J�
Input
Input Range:
[�] DLt1
!:l.inRilflge: �
't!elp
0!,abels

Output opti om;:


0 Qutput Range; I �I
® New Wod;sheet Ely; I Residuoils Histogram I
0 New '.O'.orkbook
D P5!1'eto (sorted! histogram)
D Camulatrve .Perc:entage
0[\;;��fr'.9,�.tr;i.!1

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.

-
· - ----- _-

Formal Data Series rn�


Series Options Series Options
.Qelete
AU Series Qverlap
� Res etto M§.tcl!I Style
Border Color
Separated overlapped
Olangie s;e·!ies. Ch.art WP·�...
Border Styles
� Ss_lect Data ...

3 DRotat1r;m, Shadow
Gap�dth
Add Dato LaQels 3-0 Format
No Gap Large Gap
Add lirmdline ...

f o.rmat Dat.a 5 e fi�L.


l :·�······· [0�··•1=--
Go to the Border Color tab and select Solid line, choose a different Color if you would like.
Select Close.

r Format Data Series - ru�


.Serie� Options Border Color
Rll 0 !?!_o line
Border color @ S.olld line

Bor·der Styles. 0 !2.radient line

0 Agtomatic
Shadow

3--0 Format
�olor: I��
]"ansp �Col'or] 0 [0% �1 Close.ti]

Finally, delete the Legend, and increase the size of the Chart area (see Section 2.3.4 for more
details on that). The result should be very similar to Figure 4.10 on p. 148 of Principles of
Econometrics, 4e:
118 Chapter 4

Histogram

- 225 -]75 -125 -75 -25 25 75 Jl25 175 225

Dim

4.6.2 The Jarque-Bera Test for Normality using the CHllNV and CHIDIST
Functions
When the residuals are normally distributed, the Jarque-Bera statistic UB) follows a chi-squared
distribution with m = 2 degrees of freedom:

]B =
N
6
( S
z
+
(K - 3)2 ) "'X(m=2)
4
z
(4.16)

where S = µ3
0'3
is a measure of skewness and K = �a 44 is a measure of kurtosis,

where (4.17)

(4.18)

(4.19)

and N is the sample size.

If the hypothesis of normally distributed residuals is true, there is 100a percent chance that the
computed ]B statistic is equal to or greater than the chi-square critical value Xci-a,m)· If the
computed ]B statistic is equal to or greater than the chi-square critical value Xci-a,m)' then this
presents us with evidence that our hypothesis of normally distributed errors is false; we thus
reject it.
Prediction, Goodness-of-Fit, and Modeling Issues 119

2 reject Ho
X(m)

2 x_'L value
X(1-a,m)

We will create a template for the Jarque-Bera test for normality. But before we do that, we need
to go back to our Food Regression worksheet to perform intermediate calculations.

11� � � �1 I Food Rea res.sion /'..'


l Ready l ·"' 11

Before we compute the measure of skewness S and the measure of kurtosis K, note that since
2
L( ei - �4,
3
� = 0, the numerators of equations (4.17)-(4.19): L( ei - �) , L( ei - � , and can
.
s1mpl.1fy to.
. � "2, .t...
.t.... e
� "3
. e and .t.... e"4 .

i i i

To the right of the residual output section, create the following table:

F G H
2 3 4
24 residuals residuals residuals
25 =C25/\2 =C25/\3 =C25/\4

Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H
24 Residuals? Residuals3 Residuals"
25
-- 34A_S208433 -202219603 1186.945115
2'6 59_.�§41�98 �,. 464.3421034 3595.705.263
f--
27 158.0505536� -1986.98245 24979_9n4s
,..__
28 901-2:097207 -2:7054.4557 812178.9608
r--
2:3 560.7541899 - 1 3278 798
� -- -
. 8 -314445.2614
-- - -

Now, we are ready to create our Jarque-Bera test template.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Jarque-Bera Tests.

lt:J f'{ 1
I 1 rnre.lit Warkihe.•t CShi�-Fli) �

Create the following template to perform Jarque-Bera tests:


120 Chapter 4

A B c
1 Data Input N= ='Food Regression'!B8
2 a=
3 dfor m= 2
4
5 Computed a-tilde= =SQRT(SUM('Food Regression'!G25:G64)/Cl) (4.17)
Values
6 µ3-tilde= =SUM('Food Regression'!H25:H64)/Cl (4.18)
7 µ,i-tilde= =SUM('Food Regression'!125:164)/C1 (4.19)
8 S= =C6/C5"'3
9 K= =C7/C5"'4
2
10 x -critical =CHllNV(C2,C3)
value=
11
12 Jarque-Bera JB= =(Cl/6)*(C8"'2+((C9-3)"'2)/4) (4.16)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")
14 p-value= =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")

2
The x -critical value is computed using the CHIINV statistical function. For our purpose, this
function syntax is:
=CHIINV( a,m)

where a is the level of significance of the Jarque-Bera test, and m is the degree of freedom of the
chi-squared distribution.

The p-value is computed using the CHIDIST statistical function. For our purpose, this function
syntax is:
2
=CIDDIST(x -value,m)

2 2
where x -value is the x -critical value for which we are computing the p-value, and m is the
degree offreedom ofthe chi-squared distribution.

At a= 0.05, the results ofthe Jarque-Bera test are (see p. 148 ofPrinciples ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 121

A B c D E F G
1 Data Input N= 40
2 a= o_o5
3 df or m = 2
4
5 Computed Values cr-ti'lde = 87_250383
6 µ.,.-tilde = "�.�39,..66
7 Jl�tilde = 173:220834
8 s= -0_ 097319

--r
9 I{ = 2 _9'890333
10 I-critical value = s:!l-914645
'11
12 Jan1u�-Bera Test -
JB = 0.0633402 �

13 Conclusion = Do nol re]ed the hypothesis of normally dlslrihuled errro-rs

14 p-valua = 0_9'680262
1.5 Cenci us ion= Do not reject the l'lypoth·esis of normally distributed emars

4.6.3 The Jarque-Bera Test for Normality for the Linear-Log Food
Expenditure Model

We first go back to our Log-Linear Food Model worksheet to perform intermediate calculations.

To the right of the residual output section, create the following table:

F G H
2 3 4
24 residuals residuals residuals
25 =C25"'2 =C25/\3 =C25/\4

Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H I
2 � 4
24
--
Resid'uals Residuals Resid'uals
25 1587_ 79ll991 6326�.33&84 2521105.635
--'-

1.§. 1417-493715
..
53368.09651
.
.20092
1 68.432'

_JJ_ 112.2098522 1188_629448 12591-05()94


� 641.2121938 -16236.8829 411153'°'775
29 242.fi.651681 -'119539 .425 588863U82

Now, we are ready to modify a few cell references in our Jarque-Bera test template.

Go to the Jarque-Bera Tests worksheet.

Replace all references to the Food Regression worksheet to the Log-Linear Food Model
worksheet (see outlined below in bold).

A B c
1 Data Input N= ='Log-Linear Food Model'!B8
2 a=
3 df or m = 2
122 Chapter 4

A B c
5 Computed a-tilde= =SQRT(SUM('Log-Linear Food Model'!G25:G64)/Cl)
Values
6 µ3-tilde= =SUM('Log-Linear Food Model'!H25:H64)/Cl
7 l.14-tilde= =SUM('Log-Linear Food Model'!I25:164)/Cl
8 S= =C6/C5/\3
9 K= =C7/C5/\4
10 X,2-critical =CHIINV(C2,C3)
value=
11
12 Jarque-Bera JB= =(Cl/6)*(C8/\2+((C9-3)A2)/4)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")
14 p-value = =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")

At a = 0.05, the results of the Jarque-Bera test are (see p. 149 of Principles ofEconometrics, 4e):

A B c D E F G
1 Data Input f\I= 40·
2 II= 0.05
3
4
1df o� m = 2
t
5 Computed Valu�s er-tilde= 8:9.248579
(); µ3-tilde = 99251.00
7 �-tilde= 20Q.3.353n
5, S= 0.�3961-45
9t K= J: . 2048499
10 i-criticaJ va'lue = 5'.9914645·

.
H
·12 Jarque-Bera Test JB =
0.'.1998875
-
13 Conclusion= Do �ot rej.ect the. hypothesis of normally_distributed emors

�4 p_-value =
01.S048883
15 Conclusion= D_o nat_�eJect the_hypa:the_sis of n_grmally_distributed en:rors

4. 7 POLYNOMIAL MODELS: AN EMPIRICAL EXAMPLE

Open the Excel file wa-wheat. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it wa-wheat data, and in it, copy the data set you just opened.

I I lns.ert Work�fteet (Shtft-f11) II Q


' wa-wbeat data , f;I 11
Prediction, Goodness-of-Fit, and Modeling Issues 123

This data set gives average wheat yield for different regions of Australia for the period 1950-
1997. Time is measured using the values 1, 2, ..., 48 in column E. We would like to plot the
yield data for the Greenough Shire area, reported in column D.

4. 7 .1 Scatter Plot of Wheat Yield over Time

Select the Insert tab located next to the Home tab. Select D2:E49. In the Charts group of
commands select Scatter, and then Scatter with only Markers.

Scatter

Columoi Une Rar Ar..a. 1 :>c a·thi Otll:�r


- - l°f Chart,�

Cllarts r.

The result is:


.. ...... ..
60

50

# •
....• .
40
.,
;
•"·
..
'30
•Series1

20
-
- .••- \ I:

. ...
• #
10
·-�
.. - ...

0
�· .. .
0 0.5 ·1 1.5 2 �5

: - - ...... - - ..

You can see that our yield values are on the horizontal axis and our time values are on the vertical
axis; we would like to change that around as we did in Chapter 2 with our plot of food
expenditure data. Select the points on your plot, right-click and select Select Data.

·60

50

40


.Qel.ete
so
+.s.eries1
,l:J Reset to M:i!_tch Style
.20
_1£11 -Change.SHies Charil:Type...

10
• lliJJ S:i;lect D.atta ....
ht
3 D B�ta'J1rn,,
• -
Adidl Data La!):els
0
AddiT1endlin• ...
0 0..5 1 L5 z Z_'i

.... � fermat Data Series...

A Select Data Source dialog box pops up. Select Edit.


124 Chapter 4

Chart !iati range:


[�)

� I �S�hli Ro.,,/Column J 9
L =e
r;=e i. es �'=
=
n d=En=tr
g= er=
ies=� ===;-;====-==7---, ,Hori2orit:aJ (!;;ategory) Axis Label£
=;;i'
lk � ��=dd�..
� ll
rN..,
...., · �d
.... it"*'t'J�I
=X=�=c w=
em=
u
--1 -1 ' 2{ E 1 I
_ •
e [_'.'_ J ll
Seriei:l 0.9141

0.6721

0.71.91

O.nlill

o.:ms

[ !::!jdden and Einpty Cells J r OK I [ Cancel

In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select E2:E49. Highlight and delete the text from the Series Y
values window. Select D2:D49. Select OK.

- . .
I Edit Se ries m� Edit Series - �rg)
Seriesoame: Series :[lame :

�-----[i] �-------��-; �<ct� �


Series� values� Series X values:

... .. I la.'il = o.:91'!1,o.6n1 ...


� I =vv 1'5fS2,$E$49
' a�A•heatdata'
�-------�-
[�J = i, 2,, 3, 4, 5,.,,
Series Y values:. Series 'f_ values:
I ='wa-.vheat dat.'!$E5:2::SE$49 [i} = 1, 2, 3, "· 5, ... �I _ a_ta_' ' _
='_ vv _a �_.. h_ea_ td · _ M'J
S0 S2 _ ; $0 [ J�
_ _ l_�-:
_ = 1), 9141, 0.6721 ...

OK � [ Cancel OK Cancel

The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
time are the X-values, and yield are the Y-values - not the other way around.

After editing your chart like you did in Sections 2.1.2a-2.1.2c, the result is (see also Figure 4.11
p. 150 of Principles ofEconometrics, 4e):

J '

..

L5

:II .
� .
.
.. . .
. ..
. .
1

. ..
..

OS

0 10 20 30 40 �o

Tunec
Prediction, Goodness-of-Fit, and Modeling Issues 125

4.7.2 The Linear Equation Model

4. 7.2a Estimating the Model

We start by estimating the following linear equation model:

YIELDt =/Ji+ P2TIMEt +et (4.20)

In the Regression dialog box, the Input Y Range should be D2:D49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Linear Equation Model; and do check the
box next to Residual Plots.

- ������� --

! Regn�:1-sion [I] rg)


u
rn p t

!!nJJUt i'. RangJe:

!Input li Ran.ge.:

·l::!elp
D \_abels D Constant is G_er()
D Con�dence Level: EJ %.
Output option&

0 Qutput Range: �1
0 New WorkSheettely: jLinear E:quation M1>d;el I
0 New Woi\Cbook
itesidual,;
DResiduals

The results are (only part of the residual output section is shown below; the residual plot is not
shown at all) :

A I B .C I D I E I F G I H
1 SUMMARY OUTPUT
-;zi--�����-
3 Reores:;ion Sraostir;s
4 Multiple R (J.805849601
5 R·squaM� 0.6'4939358
T Adjusted R Squ.are a.'641771101
- Standard Error
T
8
·

Ohs(!rvatic.ns
0.:21B69Zz34
48

_;�ANOVA
11 ] df SS MS F Signific1tnce F
1 4.074859899 4.074859899 85.20124832 4.B7517E-12
4,6 2.200009496 0 . 04 7826293
47 6.274869'3.95

Coeffidents Standard Error t Slat- P-vaJue lower 95% Upper 95% lower 95. 0% Upper 95. 0%
I ntsrc_spt
J..l.j tL!J.37777837 0_064130508. 9'.944999006 4,.fi492E-13 0.5.QB689822 CJ.7661Hi5B52 .
0. 50868
. 9822' 0,_7'6&865852
1"8 X Vanab.le 1 () ..021031942 o.06227as.:fo �.23o4s2221' 4.87577E-12 ().01G445482 0_02s6ls·402 o.o 16'i4s4si o.01s&1s402

Predrr;fed Y Re.siduaJs
1 (J.:�5fl80,9�?9 0-255290e21
2 a:&79B41721 -o.o"Qn,f1121
3 ()_ 700873663. 0.01822.&337
126 Chapter 4

The estimated linear equation model is (see also p. 150 of Principles ofEconometrics, 4e):

YIELDt = 0.638 + 0.021TIMEt


(4.21)
(se ) (0.064) (0.002)

4. 7.2b Residuals Plot

2.1 or Section 2.3.4, the result is (see also Figure 4.12


After editing the chart as we did in Section
on p. 150 of Principles ofEconometrics, 4e):

Lin·e.ar Yield Model Residu.als Plot

0.8

0.6

0.4
. . .
. . . . ..
.Jll
"'
0.2
"'

3Z
. ..
-··.
. .
••
••it

,i (J
..
-0.2 . . . ..
-0.4
...
-0.6

(J 10 20 s;o 40 5()

Time

Note: to draw the horizontal axis below all the points, select the vertical axis on your chart, right­
click, and select Format Axis. In the Format Axis dialog box, under the Axis options panel,
select the Horizontal axis crosses at the Axis value -6.0. To draw an horizontal line at level 0 of
the residuals values, select the plot of residuals on your chart, right-click and select Add
Trendline. Choose the Linear option, and Close.

.Qelete
r
- ------ ------
� R.-�et to M�tch Style
-- 1 Format T ren d line

Change S:e·rie>. Chart li�pe .. ,


Trendline Option,. Tr>endline Options
tiJ 5�1ed [}ah...
L·in� Color Trend/R_egrerion Type
3--0 Rullill n ...

I Add�-Data La_!!els
Line Style
Shadow
Jl£J 0 EJgJOnenbal
AddlTrendU n,.. ... � JV!.'J �binear
� .Em•rmat Datta 5eries ...
Close c;J

4.7.3 The Cubic Equation Model

4. 7.3a Estimating the Model

We start by estimating the following cubic equation model:

YIELDt =Pi+P2TIME� +et (4.22)


Prediction, Goodness-of-Fit, and Modeling Issues 127

Let TIMECUBEt = TIMEt /1,000,000: our explanatory variable is redefined as our original
explanatory variable, cubed; and it is also rescaled before the equation above is estimated.

3
Go back to your wa-wheat data worksheet. In Fl, enter the column label time . In cell F2, enter
the formula =(E2A3)/1000000; copy it to cells F3:F49. Here is how your table should look (only
the first five values are shown below):

I D I E I F
1 gre.enoug,h time time3
l 0>.9r141 f Oi.000001
3 0..6721 2 Oi.000008
'4
15 o.i1s1 3 01.000021
0·.7258 4 0>.000064
£ 0.7998 5 0'.000125

We want to re-estimate our wheat yield model using our original y values and our re-defined and
re-scaled x values.

In the Regression dialog box, the Input Y Range should be D2 :D49, the Input X Range should
be F2:F49. Select New Worksheet Ply and name it Cubic Equation Model; and do check the
box next to Residuals Plots.

,- - ------- - -

I Reg ressi on LI)rg:i


Input
InputY,Range: 1$)$2:$0;?49 [�J [_o�tiJ
lt$F'§;2:�M9 � [ Carncel I
lnptJt�Range:

D !,,_cibels D Constant is ;;.ere


I ttelp. J
D Con�dence Level: EJ %
Output options

0 Quq:utRange: .�1
0 New WoFllSheete,ly: [ Cubidquaficn Modell I
0 New �orkboolc
Residuals
0Boe'liduals � Re'liQ_ual Plots

The results are (only part of the residual output section is shown below):
128 Chapter 4

A I B I c I D I E F I G I H I I I

I
2
SUfl!'IMARY OLJTPLJT

J( Rearession Statistics
�M,IUpoR •Oi.86&495734
R Square 01. 750814858• -
Adj u sted R Square 0.745397789
S1and:ard Error 01. 1 84367557
0 bs ervatio ns 48

1901ANOVA
11 I df SS MS f Sig_nrficance F
J_?_ Regressi'on 1 4.711265172 4.7112&5172• 1,3S.:50'16965 1. 76303.E-15

,.11 Resi-Ow1I
14 T�t�I --
46:
47
1.56:i604223
6.274869395
0_03399139'6.
. -�· .... ...--
1
I
15 I
161 Uee_er 95% Lower95.0% Upper95.0%
�lnterVG1crie-po11blle
X 1
Coefficients Standa!d Error
(}_8.7411<6-582
9.68151584.
t Stet
0.0·35'63066'3 24.532702.71
0.1322354527' 11. 7729<2217
P-val11e
4.6(}22-3 E-28
V680.3E-15
fo1•;redl5%
!H02395IG9i 0:9458373 96, O<. $0�3�5:76,9
!L 026202058. 11.336829&2 8. 02620205 ll
Q.945837.396·
11.33682962
-
19

JQ_
21
f---I-
22 RESIDUAL OLJTPLJT
,___
:23
241 Observalion Predicted Y Residuals
1 0.8.74126?64 Q.()3991373&
�6 2 0.8741941:)34 _:_iL2D�.094034
rm 3 0:874377983 -0.1552n9B3 I

The estimated cubic equation model is (see also p. 151 of Principles ofEconometrics, 4e):

YTELDt 0.874 + 9.68 TIMECUBEt


(4.23)
=

(se ) (0.036) (0.082)

4. 7.3b Residuals Plot

Notice that when you choose the Residual Plots option in the Regression dialog box, Excel
generates a plot of the residuals against the explanatory variable, which, in this case, is
TIMECUBE. We would like to have a plot of residuals against time instead. Select the data point
in your chart, right click and select Select Data. A Select Data Source dialog box pops up. Select
Seriesl and then Edit. In the Edit Series dialog box, change the Series X values references to
E2:E49. Finally, select OK, twice.

,., - .
S-9'lec:t Data Source

Chartgatar:ainge: c=
1he data range is too c.omplex to
e.
!he s r ies in the Series paneL . ������

Qelete Edit SE.'ries

� Re>etto M�tch·S:tyle J�
1Le!iiend Entries (§eries)
Change Se<ies Chart Type.. .
l.q S�led i)a�a... � Seri�s K values:

3-0 £otal1or I ='wa-'Wheat data'! $E.$2:�$'19I

2.1 or
After editing the chart as we did in Section Section 2.3.4, the result is (see also Figure 4.13
on p. 151 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 129

Cubic Yi·eld Medel Residuals Plot


0.4
0_3 . � ..

0.2 . .

. �

Cl.:l! ..
Jll . ..
.. 0
.
=
:!:I
-ll.ll ... .. ..
.�

..
.. .
..
--0.2 .+

-{)_3
-0.4
--0.5

() 10 20 :!O 50

Time

4.8 LOG-LINEAR MODELS

4.8.1 A Growth Model

We would like to estimate the following growth model:

ln(YIELDt) = P1 + P2TIMEt +et (4.24)

where y; = ln(YIELDt); i.e our dependent variable is redefined as the natural logarithm of our
original dependent variable.

In your wa-wheat data worksheet, move your charts to the right a little bit if you would like. In
cell Gl, enter the column label ln(greenough); resize the width of your column so it fits the new
label. In cell G2, enter the formula =ln(D2); copy it to cells G3:G49. Here is how your table
should look (only the first five values are shown below) :

D I E I F I G

,_
1 gre.enough time tim!!3 ln(gre-::nough)
2 (}_9141 1 1E-06· - 0 _ 089'815304
3- ()_6721
- 2 BE-05 -0_39'7348.14
-
,_
4 0>_7191 3 3E-05· 0 _ 329'754849
--
-

5 01_7258 4- 6E-05· -0: 320'480 784


6 0-79'98 5 1 E-04. -0_2233.93583

We want to re-estimate our wheat yield model using our original x values and our re-defined y
values.

In the Regression dialog box, the Input Y Range should be G2:G49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Growth Model.
130 Chapter 4

Input
Input� Rar;ige:

Input lt Range:
I :$G$2: $(;s49
:$E$2::$E:$49
�]

� el

tielp
D b.abels D t;onstant:is �ero
D Con�deni:;e Level: �%
output oi;i�i;rns
0 QufputRarige: �1
@ New WGrk:ihleet Riv: IGrowth Modell I
0 Ne111J �orkbook
Residuals

0 8_esiduals 0 Resid_ual Plots


0 5:\ilfldardized.Residuals 0 Ltne Fit Pfo1ts
Normal Probability
0 f'.>!ormal Probaoility Plots

The result is:

I I I I I I
A B I c I D E F G fl I
mSUMMARY OUTPUT I I I
l
JI Reg_re;i.sjon S/ati:>fic;:; I l
4 Multirile:R 0.785168587
f-"-
5 £l. �qua�e o.51648911 •
-
c-§__ Adjusted R Square -o.1ios1s.2s:i
7 Standard Errm 0.1'.'l9164869
r--- '
8 Ol:iservations 48
'
9
c-- j
1 0 AN OVA
11 df SS MS F Sr11.nifica11ce F
t2
f----
Regres'Siol'l 1 _2 9·3313542 2!}3313'542 73. 944£3042 3.9'3229E-11
13 Re,sidual 46 1.8.24655579 0 .0396-€>&645.
c-- 1
14 Tota.I 47 4. 7s.rno 1099
15 [
t61 Coefficienft; Standard Enor t Slal P-rnlue lowr
e 95% UeP_er'95% lower95.0% Upper95.D%
H j lnterce19t -0. 3 43366453 0. 0•5 8404196 -.5.8791400!34 4,�9317E-07 - (). Mi,0> 928 0 0 1 -0.2258049'05, -0.460928001 -(). 225.804905
Ts ' x vaiia'l>le 1 b� Q,j 7843872 0.0 0'2075084 8.599106374 3.93229E-11 0-. 013666943 (LOi20,2:08- 0.013666 943 - ·a_o.2262osl

The estimated growth model is (see also p. 153 of Principles ofEconometrics, 4e):

ln(YIELDt) = -0.3434 + 0.0178 TIMEt


(4.25)
(se ) (0.0584) (0.0021)

4.8.2 A Wage Equation

Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it cps4_small data, and in it, copy the data set you just opened.

l
[Insert Worbheet (Shift-FilJ M
Prediction, Goodness-of-Fit, and Modeling Issues 131

This data set gives information on hourly wages, years of education and other variables. Based on
this data, we would like to estimate the following wage equation:

(4.26)

where Yt ln(WAGEa;
= i.e our dependent variable is defined as the natural logarithm of the
variable WAGE.

In cell Ml of the cps_small data worksheet, enter the column label ln(wage). In cell M2, enter
the formula =ln(A2); copy it to cells M3:M1001. Here is how your table should look (only the
first five values are shown below):

A I l'3 I c I D I E I F I G H I I I j I K I L I M

_J_ wage educ: exp_er hrswk ;marrie.d-


female metm midwest south west black -
asian ln{'!�9el
2· 18..7 t6 39 37 1 ·1 1 0 1 (} 0 0 2.92:852352

-
j. 11.5 1·2 16 62 QI 0 0 1 0 QI U1 0 2 . 4423 4704
4 15.04 16. 13 4·01 1 Q1 1 0 0 1 � 0 2.71071332
5 25.95 14 11 401 Qi 1 1 0 1 Qi I 0 3 255171 51
-
-+-"'"
G 24. 03 '1:2 51 401 1 o· 1 0 0 O· O· 0 3.179-3·03'05

We want to estimate our wage equation using our original x values and our re-defined y values.

In the Regression dialog box, the Input Y Range should be M2:M1001, the Input X Range
should be B2:B1001. Select New Worksheet Ply and name it Wage Equation.

������� --·-

' Regression l1]�


Input
input:r Range: I $"1$2.::$'1$1001 � �
Input! Range: $13$2::$8$1001 �

t!.elp
D b.abel� D Consbnti:o �ero
D ConBdenae G.evel: � o/.o
Output op1ions

0 Qutput Range; �I
0 New Workslieet ['.ly: J WE1ge Equation
0 Nel\' j&'.or•kbook
Re s iduals
0 B.esiduals
D SiandEirdized ResiduElis
Normal Probability
0 t:!ormal Prnbability PJot&

The result is:


132 Chapter 4

I
A __J_ B I c _j_ D _)___ E �
F _l_ G J_ H I I
1 SUMMARY OLJTPUT
,_
2
3 Regressiorr Slelistio.S
4 Multiple f3: 0_4?2142751
5 RSquare 0_ 1 7BZ04 502
'T
, A\ljust<ed R Square o. 17738106:
0-526611364:
_

,_l_ Starida.rd Erro.r


B Obse!'Jdtions 1000'

MAN
I
OVA t �
11 I df SS MS F SiS]_ni'f(carice F
Jl Regression 1 &0_01.5342·69 60_015842:69 2!6.4;1'!_q?_11 U455 9E-44
f3 Residual ----i- 998' 276.7648898 0 2773195'29
� ., �·

1.4 TolaJ 999' 336.78°'7325


15

JfiI
16
I nterc-ept
18 X Variable 1
_
CoeffiGient:i
1-60>944446&.
0_090408247
Standard Eiror
0. 08642:2944
0.006145615
t Star
18__ 622_381
14-71101802
P-value
1. 14·645E"66
U4559E-44
Lower 95% Ue_l!_er 95% Lower95.0% Upper 95_0%
1.43c9852B7 1.7790'35995 1.4391652937 1_7790%995
0.078348438· Q_ 1·02468 056 ff.O:i8:34B438 o_ 102458 as&:

The estimated wage equation is (see also p. 153 of Principles ofEconometrics, 4e):

ln(WAGEi) 1.6094 + 0.0904 EDUCi


=
(4.27)
(se) (0.0864) (0.0061)

4.8.3 Prediction

For the natural logarithm the antilog is the exponential function, so a natural choice for prediction
in a log-linear model is:

Yn = exp(b1 + b2x) (4.28)

An alternative and corrected predictor is:

2
Ye = exp(b1 + b2X + 8 /2)
(4.29)

2
where b1 b2 are the estimated intercept and slope coefficients of the log-linear model, and 8
and
is the estimate of the error variance or mean square residual (MS residual).

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Prediction in Log-Linear Model.

Pred.iction in Loai-Linear M.odel .. 'I


I Insert worksheet [5hifHF11) I

Create the following template to make prediction (in the last column below you will find the
numbers of the equations used in the template):
Prediction, Goodness-of-Fit, and Modeling Issues 133

A B c
1 Data Input Xo = 12
2 b1 = ='Wage Equation'!Bl7
3 b1 = ='Wage Equation'!B18
4 MS residual = ='Wage Equation'!D13
5
6 Computed Values natural predicted y0= =EXP(C2+C3*Cl) (4.28)
7 corrected predicted y0= =C6*EXP(C4/2) (4.29)

Here are the results you should get (see also p. 154 of Principles ofEconometrics, 4e):

A I B I C_
1 Data Input Xo = 12
-

2 b1 = 1_60H444
,_

3 �= Qi.090'408
;-----
4
--
-MS residual = 0.27732-
5
;-
6 Computedi Values natural predicten y0 = ·r4_795s
7 correc1ed pr edicted y = 16_9'9'5431

Next, we want to show graphically how the correction affects our prediction. Go to your
cps4_small data worksheet. Here are the formulas and labels you should enter (in the last row of
each of the tables below, you will find the numbers of the equations used):

N 0
1 educ Yhatn
2 0 =EXP('Wage Equation'!$B$17 + 'Wage Equation'!$B$18 * N2)
3 1 (4.28)

p
1 Yhatc
2 =02*EXP('Wage Equation'!$D$13/2)
3 (4.29)

Select cells N2:N3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell N23.

N �
1 l 1educ


�.LL !

Select 02:P2 and copy their content to 03:P23. Here is how your table should look (only the
first five values are shown below):
134 Chapter 4

I N I 0 I p
-

_j_ educ yhatn yhatc


+--
2 0 5.0000 32:76� .24.40129'449
-
3 1 5.473141211 I6_'9'M427aS
-
4 2 5_9·910156B 24.40129449
-
5 3 6.55·7091994 20.36503968

_§_ 4 7.17840673 £6.99642785

Select the Insert tab located next to the Home tab. Select Nl :P23. In the Charts group of
commands select Scatter, and then Scatter with only Markers.

ColLlmn Line �ie Bar .Aoea


I Scatf'V Oth·er
. I<[ (hilrt5 -
r.-

The result is:

45

40

;15
• •
'30
• •
25
+vhatn
20
• vhatc
15

10

0
0 5 15 20 25

Next, we would like to plot the actual values on the same chart. Select the points on your plot,
right-click and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
Edit Series dialog box, specify earnings per hour for the Series name, select B2:B1001 for the
Series X values and A2:A1001 for the Series Y values-all from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.

: Select Data So1JJrce


-------� -

Qe[ete ChartQ_ata range.: � t Serie�

� Reset to M:2tch Style Series name:

Change S�rics Chart'Ty:pe ... �lea_r_in ng�s�_r n _our_ ___-'�


pe
I Ui S!'i-"'d Data ... J? Series � 11 alues:

3-D !!,oli!flon. Leg.ond En11'le.s §eries) I ='w.;'l_srnall data'! SB$2::$6$l001 [iJ



Series 1 vlliues:
Add Data La.!l,elsl
I ='cps'!_small data'! $A$2: �$tDO tl [i}
y.hatn

AddtTrendline.. ,
OK
!' <>rrnat Dato s .. ries ... ynatc

After editing your chart like you did in Sections 2.l.2a-2.l.2c, the result is (see also Figure 4.14
p. 155 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 135

BO .

.
70 ,.

501 ..

50•

4'0
3()• •

201
]0

0 5 :to 15 20 25

- - -•,hatn - yhak • earniri,gsper hollr

4.8.4 A Generalized R2 Measure


2
A generalized R measure can be computed as the square of the sample correlation coefficient
between y and Ye, where Ye are the corrected predicted y values:

Rz r.2�
(4.30)
=

YYc

Make sure you are in your cps4_small data worksheet. We will compute the corrected predicted
y values in column Q, and next to it, we will compute the generalized R2.

Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):

Q
1 corrected predicted y
2 =EXP('Wage Equation'!$B$17 +'Wage Equation'!$B$18 * B2)
*EXP('Wage Equation'! $D$13/2)
3 (4.29)

Copy the content of cell Q2 to cell Q3:N1001.

R
2
1 generalizedR
2 =(CORREL(A2:Al 001,Q2:Q1001))"'2
3 (4.30)

The result is (see also p. 155 of Principles ofEconometrics, 4e):


136 Chapter 4

Q I R
correctedl predlicted y· generalized R2
____!__
2 24_40129449 0-185930705
3 E6:!>9M2785
>--
4 24-40129449
s 2{)_36503968
,_
6 [6_99642785 I

4.8.5 Prediction Intervals

The lower limit (LL) and upper limit (UL) of the prediction interval in a log-linear model are:

LL= exp(yn - tcse(f)) (4.31)

UL= exp(yn + tcse(f)) (4.32)

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it PI in Log-Linear Model.

Lt! K I .----\ I PJ in LM1-Linear Model /I


I IJnshtWorltlhed IS�jf!:-.-FUJi L---v' I

Copy the template from the Prediction Interval worksheet (if you cannot see it, it is because it is
hidden further to the left of your visible worksheets) to the PI in Log-Linear Model worksheet.

You just need to make a few modifications to it: (1) get your regression results from the Wage
Equation worksheet instead of the Food Regression worksheet, (2) change x0 to 12, (3) compute
i from the cps4_small data worksheet instead of the food data worksheet, and (4) take the anti­
logs of the interval limits using the EXP function. Those modifications are outlined in the table
below.

A B c
1 Data Input Sample Size = ='Wage Equation'!B8
2 Confidence percentage

Level= 0 decimal place

3 Xo = 12
4 b1 = ='Wage Equation'!B17
5 b7 = ='Wage Equation'!B18
6 se(b2) = ='Wage Equation'! C18
7 MS residual = ='Wage Equation'!D13
9 Computed a= =l-C2
Values
10 df or m= =Cl-2
11 tc = =TINV(C9,C10)
12 predicted Yo= =C4+C5*C3 (4.2)
13 x-bar = =AVERAGE ( 'cps4_small
data'!B2:B1001)
Prediction, Goodness-of-Fit, and Modeling Issues 137

A B c
14 se(f) = =SQRT(C7+C7/Cl +((C3-C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit = =EXP(C12-Cl 1*C14) (4.31)
Interval
17 Upper Limit = =EXP(C12+Cl 1*C14) (4.32)

Here are the results you should get (see also p. 155 of Principles ofEconometrics, 4e):

A 8 c
; 9 Computed Values a= 5%
A I B I c
1 Data Input SamJl'le .Size= 1000 10 df.or m = 998
r---
Confiden{;e Le.vel 9'5% 11 le = 1-962344
,_L =

3 xo= 12 12 predided yo= 2_6"94343


,_
4 b1= 1.609444 13 x-bai; =" 1J. S5
,_
_,e(f) 0.546471

r
14
5 b1 = 0.090408
=

- 15
6 :se (b2) = 0.006146
16 Predictio·n Interval Lower Limit= 5.0631()6
,_
1 MS residlilal = 0.27732 17 Ufl'per Limit= 43.23744

Note that the results above and the ones from your textbook might differ slightly due to rounding
number differences.

Next, we want to show graphically how our prediction interval changes over the range of years of
education. Go to your cps4_small data worksheet. Here are the formulas and labels you should
enter (in the last row of each of the tables below, you will find the numbers of the equations used
in the template):

s
1 lb wa2e
2 =02* EXP(-'PI in Log-Linear Mode1'!$C$11*'PI in Log-Linear Mode1'!$C$14)
3 (4.31)

T
1 ub wa2e
2 =02* EXP('PI in Log-Linear Mode1'!$C$11*'PI in Log-Linear Mode1'!$C$14)
3 (4.32)

Select S2:T2 and copy their content to S3:T23. Here is how your table should look (only the first
five values are shown below):
s T
1 lb_wage ub_wag·e
2 1-711005 14Ji114!l
3 1.872!102 15.99404
4 2.050118 17.50741
5 2.244103 19.16398
s Vl56442 20.9773

Select the whole plot area you completed in Section 4.8.3, which compares the natural and
corrected predictors of wage (replica of Figure 4.14 p. 155 of Principles of Econometrics, 4e).
Select Copy and then Paste. You should have two identical charts. Below we will work with one
138 Chapter 4

of them. On that chart, we want to remove the yhatc series and add the lb_wage and ub_wage
series instead.

Select the points on the chart, right-click and select Select Data. A Select Data Source dialog
box pops up. Select the yhatc series, and then Remove. Then select Add. In the Edit Series
dialog box, specify lb_wage for the Series name, select N2:N23 for the Series X values and
S2:S23 for the Series Y values-all from the cps4_small data worksheet. Select OK.

'
Select Data 5oura!

Chart !l:a'ta ran.l)e;


Edit Seri es
The daia range is·j:oo compleli,to be· displayed.
the series in ilhe Serles panel Seri�s !Jame:
-===:l -- �S_jl&F! Ch"rt gati [ J1tu11age
Jl .nange:
ii}
s es
The data range is toll <;c Series K values:

otf!-
Legend Entr,ies <s_eries) 1he eri in the series 1
-
[ t:JMd ][ � Edit ] I J< &em
vhatn
JL Series 'f. �alues�

JI
legend Enlries �eries)

earniri.g s. per hour


�Ad.:!,;�

Select Add. In the Edit Series dialog box, specify ub_wage for the Series name, select N2:N23
for the Series X values and T2:T23 for the Series Y values-all from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.

-
�· -
1 Edit Series
Select Data Source
Serie s o.ame:

lub_wage
�--------

The.data range is too cc Ser. ies l( �alues::
the series in 1he Series J
I ='cps4_smaU data'!$N$2::$f\1$2.l [ii]
·series f 'lalues�

After editing your chart like you did in Sections 2.1.2a-2.1.2c, the result is (see also Figure 4.15
p. 156 of Principles ofEconometrics, 4e):

BO

70

60

� i

. . _i 1 .-i.
: -
__

1: �-�-�J�-;;:-�-�1�-���-�. -�r��-��::1.::jrt=::::_
• I : ; • !. -

__

0 5 10 15 20 25

Yearsof Education

--- yh atn • earnings:pe.rhour -lb_wa_ge -ub_wage


Prediction, Goodness-of-Fit, and Modeling Issues 139

4.9 A LOG-LOG MODEL: POULTRY DEMAND EQUATION

Open the Excel file newbroiler. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it newbroiler data, and in it, copy the data set you just opened.

t:J .K newbmiler data ..- II


I Iris.rt Worhhe.et (S:hift�Fi:LJ IJ

4.9.1 Estimating the Model

We estimate the following log-log model for poultry demand:

ln(Q) = P1 + Pzln(P) + e (4.33)

where Q is the U.S. per capita consumption of chicken, in pounds and P is the real price of
chicken, for annual observations over the period 1950 - 2001.

In cells Kl:L2 of your newbroiler data worksheet, enter the following column labels and
formulas.
K L
1 ln(q) ln(p)
2 = ln(B2) = ln(D2)

Select K2:L2 and copy their content to K3: L53. Here is how your table should look (only the
first five values are shown below):

K I L J
_1_ ln.{q) ln{p)
2 2.66026 1.0591116
-3 2. 714595 1.030993
4 2.727853 1 .0 1 4683
T 2.721295 0_992232
6 2. 7@'01 0.872986

In the Regression dialog box, the Input Y Range should be K2:K53, and the Input X Range
should be L2:L53. Select New Worksheet Ply and name it Log-Log Model. Finally select OK.

' -- - ....
1 Regre>.sion ITJ l'.8J
Input
OKW
lnput '.!'. R21fl9e: I $K$2, $1q;53 � Cancel I
Input,); Range: I $;$2: u 53 [�1
DLabels. D Consmnt is.lero
b!elp l
D Cor>�denc:e Level: �%
Oulput l.lplfons
0 Qutput Rar;ige: 'fiii
0 New Worl<Sheet e:1y: j Log-log Model! I
140 Chapter 4

The result is (matching the one reported on p. 157 of Principles ofEconometrics, 4e):
'
A I B I G I D I E I F I G I H I I
1 SUMMARY OUTPUT
-
2:
1S Coeofffofon fa Standaro Error I Stal P-value Lower 95% Ut>.o&r 95% Low.er95. 0% Upper95_0%
1 7 lnteFc-ept 3.716943882' 0_022:3594'14 166.236191 -!i 2.94446E-70 3_672 Q 336.77 3.761854086 3_6720336.77 3 .7618.54086
----
18
-· · -·

X Varia.tlle 1 -1.121358001 0_0487�6431 -22-999118135 2_99987E-28 -1-2192881 74 -1.02342782'9 -1-219288174 -1. 02342782'9

4.9.2 A Generalized R2 Measure

Make sure you are in your newbroiler data worksheet. We will compute the corrected predicted
y values in column M, and next to it, we will compute the generalized R2.

Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):

M
1 corrected predicted y
2 =EXP('Log-Log Model'!$B$17 +'Log-Log Model'!$B$18 *L2)
*EXP('Log-Log Mode1'!$D$13/2)
3 (4.29)

Copy the content of cell M2 to cell M3 :M53.

N
1 2eneralized R2
2 =(CORREL(B2:B53,M2:M53))1'2
3 (4.30)

The result is (see also p. 157 of Principles ofEconometrics, 4e):

M N

,_l_ corrected predicted y g·eneraliZ:ed R2


2 12.16:3229'70'7 0.88U757.76·
-
3 13.G3700687
·4
I-
13.27763969
5
-
13_76970711
G 15.56421528

4.9.3 Scatter Plot of Data with Fitted Log-Log Relationship

Enter the following formulas and labels you should in your newbroiler data worksheet (in the
last row of each of the tables below, you will find the numbers of the equations used):
Prediction, Goodness-of-Fit, and Modeling Issues 141

0 p
1 p Yhatc
2 =EXP('Log-Log Model'!$B$17 +'Log-Log Model'!$B$18 * ln(02))
0.9
* EXP('Log-Log Model'!$D$13/2)
3 1.0 (4.29)

Select cells P2:P3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell P22.

0 :I
1 D

l
2 0.9
3 1.0
. '

Select P2 and copy its content to P3:P22. Here is how your table should look (only the first five
values are shown below):
0 I p
1 p y'hatc
2 0.9 40.62103
>--
3 1-0 4t.42584
I\. u 37.22676
>---
5 1.2 33.76609
6 1.3 3-0.&674

Select the Insert tab located next to the Home tab. Select Ol:P22. In the Charts group of
commands select Scatter, and then Scatter with only Markers.

Column tune PL� B.ar


'

Chart<

The result is:

yhatc
so

45

40 •

35
•....
30
.. ...
25
... .
21{) *•
15 ···�
••••••
10

0.0 0.5 LO 15 2.0 3.0 :>.5

Next, we would like to plot the actual values on the same chart. Select the points on your plot,
right-click and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
142 Chapter 4

Edit Series dialog box, specify actual values for the Series name, select D2:D53 for the Series
X values and B2:B53 for the Series Y values-all from the newbroiler data worksheet. Select
OK, and then OK again in the Select Data Source dialog box.

'
nata Souroe
.Qelcte Edit Series

Rei ett to M'gtch �ala range.: {j


Chan . ge 'Serie; Cinar!! liJlpe ... actucil values - .a;

Series.� values:
Data.. ,
- 2.
E.otd11Dn,
Legend Entries. (S_eries) Series, I values:
Addi Data Lll_Q_e·li

Audi TEendli n·f" ...


�add
form<it Data. S:eii es, .. yhatc

. ------
----- --- --·

! S·elect
After editing your chart like you did in Sections 2.l.2a-2.l.2c,
: the result is (see also Figure 4.16
p. 157 of PrinciplesS:fyle
ofEconometrics,
Chart 4e):

'---
---------
- ��

S,gtect

3-C R L-1-·n ..:...


_ _ew bri . _o _l re _ d_a m
_' __ 52.
l _ $0 � _:$J _·$_s 3_ _&$J
....
�-. ..

� - yhatc:
1-'newbroCTer dam'!$6$2.;$6SS3 liJ - 1·

� 4.()
� a dual values
,___O._
K t;J OK .G;l
..

£0

.. Price Gf Chicken
50
..
.i

...
u
....
0
>
.t::
- .30
r::

a
20

10
05 LO :1!5 2.0 2.5 3.0
CHAPTER 5

The Multiple Linear Regression

CHAPTER OUTLINE
5.1 Least Squares Estimates Using the Hamburger 5.4 Polynomial Equations: Extending the Model for
Chain Data Burger Barn Sales
5.2 Interval Estimation 5.5 Interaction Variables
5.3 Hypothesis Tests for a Single Coefficient 5.5.1 Linear Models
5.3.1 Tests of Significance 5.5.2 Log-Linear Models
5.3.2 One-Tail Tests 5.6 Measuring Goodness-of-Fit
5.3.2a Left-Tail Test of Elastic Demand
5.3.2b Right-Tail Test of Advertising
Effectiveness

This chapter is a simple extension of the material covered in Chapters 2-4. Instead of only one
explanatory variable in the simple linear regression model, two or more explanatory variables will
be used in the multiple linear regression model.

5.1 LEAST SQUARES ESTIMATES USING THE HAMBURGER CHAIN DATA

Open the Excel file andy. Save your file as POE Chapter 5. Rename Sheet 1 data.

We would like to estimate the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain:

(5.1)

where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).

143
144 Chapter 5

As we have done before, we will use the Excel Regression analysis tool. There are only two
things to note.

• First, because we have more than one explanatory variable, we will include the labels of
the variables in the input ranges we specify. Those labels will then be reported in the
summary output Excel produces, and we will be able to distinguish the different
estimated slope coefficients.
• Second, as long as the data on the explanatory variables are stored in adjacent columns,
all we have to do is select the whole range of data and Excel will recognize each column
of data as separate observations on separate explanatory variables.

In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:C76; do check the box next to Labels. Finally, select New Worksheet Ply and
name it Regression.

-----

• �ression -r:fj�
'.Input
Input 'Y' Range: I $>1.$1: $A$76 1�1 I
l1'1pllt l!'.. Range:
1$$1:$($75 �

tielp
�babek D Cori�tant is �ero
0 Confidence Level; �%
Ou !put op fions
0 Qutpl.Jt Rionge: I 1�1
@ New Worksheet f:ly: J Regre s�io nl I
0 New worklrock
Residuals
O&esiduals 0 Reslgual Plots
0 Standardized Re$idoals D L[r1e FitPl1,1tll
N.;.rmal Pro bability
O.t>!ormal Prdbab"dity Plots

The result is (see also p. 175 in Principles ofEconometrics, 4e):

A I B I c I D I E F I G I H I
1 SUMMARY OLJTPITT
-2� ,._���������-
.3 I Regression Statistics
_i_ Multif>le R 0J)69520_55
5 R Square 0-446257766
JL Adju_sted R S<�l!<lre 0.432931593
7 Stand.ard Error 75
4.886124039
T ' O �s erv at i o n s
i
_:i_Q__AN OVA
1j dt SS, MS F SignifiGarace F
-¥- R.e91!:ssion 2 13:9'6.538993 6982694963 29_24785998 _ 9.86J=-W I
5-:.01
1 3 Resid11al 72 11'18_942995 23-87420813 - - ·
14 Tot a'I 74 3-11. 5.481"978 I

�5 1
161;--������G -
o e
_ffi_ e-
6 _t s�S-a
n - f-ffd_a_
rd _Eiro
�r- . �-l-S_
a
r_f ��-P
--v- ��-
mu-e L-o- %5 -��-U
__9- ---- -�-L- --59-_0_% _ _U___-9�-0-%�
w w pp er 5� % •o .,,.r a p pe r _
�Intercept 11 S.913613 1 6.351637.595 18_72172:5-12 2_21 42.9E-29 106-.2518552 B1.5753711 1 06.2.51$5-5-2 131.575-3711
PRICE -7_907854804 .(0%993()37 -7.215241826 4_423.9'9E-10 -10_09267696 -.5.12J032645 -10.0926Tfi,9fi -5_7iJ�Q·645
,
ADVERT (S625B3787 0.6831954 BJ 2.726282349 0�0-08 0381 99 0_500658501 3_224-509073 0_500658501 3-224509 · 073·
Multiple Linear Regression 145

5.2 INTERVAL ESTIMATION

Recall from Chapter 3 that the interval estimator of {Jk is defined as:

(5.2)

The one important thing to notice is that, in the case of the multiple linear regression model, the
critical value tc is from a !-distribution with m = N - K degrees of freedoms, where K is the
number of parameters in the multiple linear regression model.

To compute interval estimates, we could use the template we created in Chapter 3 and make sure
we specify the degree of freedom correctly.

Instead, we use the interval estimates Excel has already generated in the regression summary
output.

The results of interest to us, reported on pp. 182-183 of Principles of Econometrics, 4e are
highlighted below:

A I B I c I D I E I F I G
1·6 Coefficients Slendard Error tStat P-�elu& Lol'l'er 95% Upper95%
18-72172512 2. 2142 9E-29'
___R Intercept 118.9136131 635 i ()375.95 105,2518552 1.31-5753711
18 PRICE 7 9 078548 04
-
- 1.0959930:37 -1.i1 s24.ns26- 4.42399E-101 -1b.o9261,595 -5..7:21032645
O.OOH038199'lo.500058501 3.·224509073
-
·�

19 ADVERT 1.8£2583787 0.583195483 2.726282349

Recall that to obtain interval estimates other than the 95% ones, all we have to do is to specify a
different Confidence Level in the Regression dialog box (see Section 3.l.3c).

5.3 HYPOTHESIS TESTS FOR A SINGLE COEFFICIENT

Similarly to results from Chapter 3, we have the following: if the null hypothesis H0: {Jk = c is
true, then the test statistic t =(bk - c)/se(bk) follows a !-distribution with m = N - K
degrees of freedom:
(5.3)

Again, note that in the case of the multiple linear regression model, the !-distribution of interest
has m = N - K degrees of freedom, where K is the number of parameters in the multiple linear
regression model.

5.3.1 Tests of Significance

Recall that when the null hypothesis of a test is that the parameter is zero, the test is called a test
of significance. Results of two-tail test of significance are reported in the Excel summary output
and highlighted below (see also pp. 185-186 of Principles ofEconometrics, 4e):
146 Chapter 5

I A I B I c I D I E I F G
161 Coefficients Standard Etror t sral P-value Lovrer95% Upper 95%
-mfioto<e•� 118.913&131 6.351637595 18..72172512 2.2142.9E-29 iOG.2518552 131_5753711
PRICE -7. 907 8-54804 1.0%99'3037 -7.215241826' 4.42399E-10 -i 0_09267696 -5_ 723032645,
ADVERT i _8625.83787 0_683H5i83 I 2.72fi282349 QJ}081l381991 0_ 50 0 658501 3.224509073;

Note: you could also have used the Two-Tail Tests template you created in Chapter 3.

5.3.2 One-Tail Tests

5.3.2a Left-Tail Test ofElastic Demand

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left corner of your
screen, next to the data tab. Name it Left-Tail Tests.

I i-t 1 � �1 .J Reare5.5Jon data /ti .K II Reoression /; dati J Left-Tail Tests •. t:J I


q

I Re.ady llns.e-rtWork§heoet ['.ihift-FllJll I

Open your POE Chapter 3 Excel file and go to the Left-Tail Tests worksheet. Copy its content
to the Left-Tail Tests worksheet you just created in your POE Chapter 5 Excel file.

You will need to make just a few modifications to create the left-tail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, insert a new row, underneath the first one, for
K. Finally, modify the degrees of freedom formula. All needed changes are highlighted below:

A B c
1 Data Input N= =Re2ression!B8
2 K= =Regression!B12+ 1
3 b k= = Reg r essio n !B 18
4 se(bk)= =Regression!C18
5 Ho: Bk=
6 a=
7
8 Computed Values df or m= =Cl-C2
9 tc= = -TINV(C6*2,C8)
10
11 Left-Tail Test t-statistic= =(C3-C5)/C4
12 Conclusion: =IF(Cl1 <=C9,"Reject Ho","Do Not Reject Ho")
13 p-value= =TDIST(ABS(Cl1),C8,1)
14 Conclusion: =IF(C13<=C6,"Reject Ho","Do Not Reject Ho")
Multiple Linear Regression 147

Let a - 0.05; H0:{33 2:'. 0 and H1:{33 < 0. The result is (p. 187 of Principles of Econometrics,
4e):
A B c
1 Data Input N= 75
2 K= 3
3 b.= -7.90785�
4 se{b<) = 1. 0 9'5.99'3
5 Ho: �k = 0
6 a= O,O:S
7
8 Compu-fe.d Values dform= 72
9 le= -1.G66;2937
1 ()
11 Left-T ailT est !··statistic"' -7.215.2418
12 Conclusion: Reject_Hci
13 f)-Valwe = 2.212E-10
14 Conclu:1;.ion: Rej�ct H()

5.3.2b Right-Tail Test ofAdvertising Effectiveness

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the Left-Tail Tests tab. Name it Right-Tail Tests.

dcrti- -/ Left-TaH Tests J Rinhll:-Tail Tests El II

In your POE Chapter 3 Excel file, go to the Right-Tail Tests worksheet. Copy its content to the
Right-Tail Tests worksheet you just created in your POE Chapter 5 Excel file.

You will need to make just a few modifications to create the right-tail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, change the reference to bk and se(bk) to the
ADVERT coefficient estimates instead of the PRICE coefficient estimates. Also, insert a new row,
underneath the first one, for K. Finally, modify the degrees of freedom formula. All needed
changes are highlighted below:

A B c
1 Data Input N= =Regression!B8
2 K = =Regression!B12+ 1
3 bk = =Regression!B19
4 se(bk)= =Regression! C19
5 Ho: J3k=
6 a=

8 Computed Values df or m= =Cl-C2


9 tc= =TINV(C6*2,C8)

11 Right-Tail Test t-statistic= =(C3-C5)/C4


12 Conclusion: =IF(Cll>=C9,"Reject Ho","Do Not Reject Ho")
13 p-value= =TDIST(ABS(Cl1),C8,1)
14 Conclusion: =IF(C13<=C6,"Reject Ho","Do Not Reject Ho")
148 Chapter 5

Let a = 0.05; H0:{33 < 1 and H1:{33 > 1. The result is (see also p. 188 of Principles of
Econometrics, 4e):
A B c D
1 Data Input N= 76·'
2 K= J.
3 b,= 1_8,62583787
- --
4 se(b,)= O.S83195483.
5 Ho: �k = 1
5 cr= CJ.OS
7
8 Computed Values dform= 72
9 t., = 1-666293697
10
11 Right-Tail T�s_t t-statistic = 1 _262572438
12 _,___Conclusion: Do Not Rej�ct Ho
13 f)-value = 0_ 105408444
14 _ C onc � �s i �r:i_ Q_ o
: . �ot_B_��ct_ H9 _

5.4 POLYNOMIAL EQUATIONS: EXTENDING THE MODEL FOR BURGER


BARN SALES

We estimate the following extended model for Big Andy's Burger Barn hamburger chain.

(5.4)
2
Go back to your data worksheet. In Dl, enter the column label ADVERT • In cell D2, enter the
formula =C2A2; copy it to cells D3:D76. Here is how your table should look (only the first five
values are shown below):
A I B I c I D
2
_l_ SALES PRICE ADVERT ADVERT
-
2 732 5_69 1_3 u;.9
3 71.& 5_49 2_9 !L41
-
4 52_4 5_53 0_8 (Ui-4
5
-
67_4 5.22. 0.7 ()_49
6 893 5._02 1.5 :225

In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Extended Model. Finally, select OK.

,'Regression
CIJ�
[nput

lnputl RanQe: I SA$1: $A$7G � ()K t+J


Input �Range: $8$1!$$76 �
Cancel
l
�Labels D Constant is ;;_ero
t'!elp
l
D CDn'(ider.ic.e Level: � �,.
Output options
0 QutputRange: �I
@ Ne"' Worksheet �ly; I Ex tl':nded Modei I
Multiple Linear Regression 149

The result is (see also p. 193 in Principles ofEconometrics, 4e):

A I B I (; I D E I F G H I I

� SUMMARY OUTPIJT
2
l
3 I Reares.s;on Sfafjsfics ·

� MultipJe R 0-712906125
_R !?quare _q.§Q82,3.5142'
�� �.�

6
,....__
Adjwsted J3. �iqua��, 0.48145'6345

_I_ Sta:nolard Ermr 4.645283'161


8 Obse'rvations 75
9
To ANO VA I
:D_ I I rif SS MS f Sig_nfficance F
12 Re<;ir·essicm I 3' 158.3.39742:7 527.7991422 24.459'3153 s.. 599'97E-11
'13 Resid�aT 71 1532..0·84551 .21_578·65565
r
f--
14 T-0tal. 74 ],115.481978
15
16 Coefficients Standetd Error t Siert P-value Low.er 95% Uo.aer95%
Lower 95 0% U.cmer 95.0%
11_Intercept 109-7190398 6. 79904556 16.1374177 1.870 31E"2:5 96.162-1.2798 96.16212798 'J232759515
-123275,9515

--- PRICE
f18
-7.6',100G0543 1-045938915 '· -7:30:14443_84 3_23648E-10 -9.725543479 _5_55445,7,5oa _9_ 725543479 -5_55445750B

J.! ADVERT
20 ADVERT2
12.15123398
-2_ 767°%2762
J.556164048 3.41&949784
Cl'. 94062405'9 -2.94:i6 876:07
0.001"0516
0.004391267 1°
5.060444353
-4.643513842
1.92'1:2:0235
-0_892411-683
5.060444353
-4.643513842
19.24202:36
-0.892"4116.83

5.5 INTERACTION VARIABLES

5.5.1 Linear Models


Consider the following life-cycle model:

PIZZA = {31 + {32AGE + {33INCOME + e (5.5)

where PIZZA is annual expenditure on pizza, AGE is age, and INCOME is income of a random
sample of 40 individuals, age 18 and older.

Open the Excel file pizza4. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it pizza4 data, and in it, copy the data set you just opened.

I oiua'I data .<

I lns�rt WorlGneet fStilift+Fll) � I

In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:G41. Check the box next to Labels. Select New Worksheet Ply and name it Life­
Cycle Model 1. Finally select OK.
150 Chapter 5

Input
lmpu t I Range� �
Input! Range:: I $!" $1:$Gs41 �
�Labels D Consta111trs z.ero
D Con6dencoe: Level:. � D/�
Output options
0 QulputR.ange: I 1 ili1l Clde 100I
Ur� e;

0 New Wor'ksheet!:ly;; J Life-Cyde Mc.del � I

The result is (see also p. 196 in Principles ofEconometrics, 4e):

A I B I c I D' E I F I G I H I I
t-+-ISUMMARY OUTPUT i
I
3 I Regression Stefistie;.s
I
4 Mulrifl'le R 0_573803829
- l
5 R Squa�.e 0:32·g:-2.sos34
-
6 Adjusted R Sq;llare 0�292994123 ;
I
�1a -- Standard Error 1 '.}1.070099
Observations 40 I
1
I
!�ANOVA
11 elf SS MS .F Srgnificnnce F ,
l
I
i
Regression 2 �12015_ 1787 1560()7_5894 9'�0·81100278 0.000618533
Jg_
13 Residual 37 1&356 35_ 7213 17179-370 85 I
14 Total 39 I
947651_9
15
1£ Coe ffic;ients Standard E1JUI f Sfaf P-va!ue lower 95% Uee_er95% l'._01'1'er 95_0% Uee_er950%
17 Intercept 342. 8848.279 72.3434.19'66 4_ 739-682�3 3. 14373E-05 196.3()3H73 4891.4665184 196.3031373 469.4665°184
- -
�1f income 1 _832:478934 0'.-4643()0741 3.946749963 o:_.060340943 0.8917162:78 2.773'241589 0_89171'6.278 2-773241589
19 aqe --1.57.5555694 2',J169.fl758:J -3269571209 °'- 0 0233260'7 -12_27021864 -2.8808931.53 -12.2 7021664 -2.860893:153

To account for an effect of income that depends on the age of the individual, we add the
interaction variable (AGE x INCOME) to the life-cycle model:

PIZZA = P1 + P2AGE + P3INCOME + P4(AGE x INCOME) + e (5.6)

Go back to your pizza4 data worksheet. In Hl, enter the column label age x income. In cell H2,
enter the formula =F2*G2; copy it to cells H3:H41. Here is how your table should look (only the
first five values are shown below):
H
1 age K°i'ncome
2 487_5
3 1755
4 312
5 728
6 487.5

In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:H41. Check the box next to Labels. Select New Worksheet Ply and name it Life­
Cycle Model 2. Finally select OK.
Multiple Linear Regression 151

.
Regression Ll] [8]
Input
lnjXJt r_ R"nge�

Inpuq Range�

t!elp
0 h<it>els 0 Constant is f;ern
Ocmn�denGelevei:· �%
output optioro-
0 Qutput R<lnge: I lf�lr rte:J Hode .�1
1

@Ne\\/Worksheet['.ly: I Life-<:�de Model 2J J

The result is (see also p. 196 in Principles ofEconometrics, 4e):

� ______A
__ I B c I D I E. I F I G H I I
1 SUMMARY O'UTPUT
T
3 Rearession Statistics
+
l MultirJ"R ()_62:2349295
R Square (),38'73111645

± A-Ojuste-0 R Square o-33G26f8·!i6


1;26_�95134
t
_J_ S.tand;ird Error--
l
8 Otrservati-01'1-s 4el
+ -

__1__
-
10 ANOVA
11 rff SS MS f Sig_nifi@noe F
12 Regression 3, )67043.25 122347.75 7.5650;37514 0.00046:8085
13" fl;esidual ·36 580608.65 16128.01806
t
-
14 Total 39· 947551-9
15
16 1 Coefficients- Slandaro EnDr ISfaf P-value Lo�•er95% Upper95% Lower95.0% Upper95.0%
];. ln_terc_ept, 1;61.46?4.32 120.6G34096 1.338147434 0.189?3�6.8�9 -83.2513.0349' 406.1821675 -83-25130349 4 06.18211675
J_! inc-ome 6.917990507 2.82276761 2_,if�116s"fi4 0.01826628' 1_z55ofi:7055 12-7o414309 1-isso61055 12_70474309·
__:12,_age_ -z__9t7423365 3.352100814 -0_88§22SQ 8 0.380315589 :9. ns798!J7 3 _s20952_139 -?-17579?8 7 3Jl_209_52139,
20 , age x income
-
-().1232393.51 0. 0136718728 -1.847147792 O.ll7Z957528 -0.Z5il55·12Q2 0.01<'0725 -0.258551202 Q_0120725

5.5.2 Log-Linear Models

Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it cps4_small data, and in it, copy the data set you just opened.

I
lrrisertw�rk.sheet rsnift... lflla

Consider the following wage equation:

ln(WAGE) = {31 + {32EDUC + {33EXPER + y (EDUC x EXPER)+ e (5.7)

Go back to your cps4_small data worksheet. In cells M1:02, enter the following column labels
and formulas.

M N 0 p
1 In(wage) educ exper educ x exper
2 =ln(A2) =B2 =C2 =M2*N2
152 Chapter 5

Copy the content of cells M2:P2 to cells M3:P1001. Here is how your table should look (only the
first five values are shown below):

M I N I 0 I p
1 _ln(wage·) educ: exp educ x ·e.:icp

_1 2.9285:235, 1·& 3,9 624
3'
-
2.442347 1:2 16 19'2
4 2.710>7133 15 13 208
5 3.25·S:H16 14 11 154
rS: 3.179303 12 5,1 612

In the Regression dialog box, the Input Y Range should be Ml:MlOOl, and the Input X Range
should be Nl:PlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Log-Linear Model w Interaction. Finally select OK.

, - -

1 Regression L1Jrg)
Inpu° t
Input)!'. Range:

Irnput ! Range::
j .§M.$1:$M�1001 �
1�$1:�§;1001 � � I

t!e:lp
0r�abels D Constant1s :?_er.o
D Confider;ice Level: � o/.
Output opliol'ls

0 QutpJ't Range.; �1
@New WorkShee:t Ely: j icie I w In te•iilciion
I

The result is (see also p. 197 of Principles ofEconometrics, 4e):

1
b- A-�.l
SUMMARY OLJTPLJT
B I c I D E I F I G H I I

2" 1
t
2I\ 1 Multip.le Pleg_ressipn
.!!.__
Staofistics
Q_44115987
'
R Squa�·El 0 .19�W22031
.2.... -

��
r-- ·
8
Adju;:iteu RSqu.ar•e
Standard Ermr
Qbservat·iorns
0·-1921961�
0.521847758
1000
t
-
9'
10 AN OVA -
11 df SS MS f Sig_niffca11ce F
��Re� r0ess i·oii 3 6.5.54495019' 21.B4&nwn 8022880786 1.7205E-46
1J. Re·s:idual 99'6' 2'71,.2357823 : 0.27232508);
14
,__
To1<11 99'9; -336.760732'.5
15
16' Coe.fficients Sfanrierri Error t Slat P-vaiue Low.er95% Upper95% L-ower 95. 0% Upper 95.0%
17 lnt�rce�t 1 :392317989 Oi:206�44H3• 5. 7377J.7'608 2.7172'.E-11 0: 985:808 989 1.7978;2£91l8 Cl'.98680B�ey9: 1-79'7 8259 88
18. educ
f---
0..-09493849'5 0.. 0145245$7 ' 5.4.91712999' 1.33643E-10 0.0652399-95 0.123£3699'4 0 .0£62'3 999>5 0.1236'.ls.994
1,g exp GJ.006329514., 0.00569851 CJ>.94:4.9'13664-; 0·�3449'32118 -0.006615298 0.019474326 -0.00681529>8 0.01947432.6
2o OOUC.1'( rait:f!i ,J_64453E-O.S o 000483,rss' -ci.07533629�1 o.9'.3�96227 -o:·ooci9s516-61 o.ooo.9fa8is- -0�.000�8576-6 o.ooo�fi2:a75
Multiple Linear Regression 153

5.6 MEASURING GOODNESS-OF-FIT

The coefficient of determination R2 is reported in the Excel regression summary output. For Big
Andy's Burger Barn multiple linear regression model of Section 5 .1, it is highlighted below:

SUMMARY OUTPUT

sties
Multiple. R 0.66952055
R Square o_44B25no6
Adjusted R Square
Standard Error 4.886124039
Observations

--
A I B

-
1
2 I
3 I R.egress.ion Sfott
4
5 I
El 0 .4]2931593
-

7
8 75
CHAPTER 6

Further Inference in the Multiple


Regression Model

CHAPTER OUTLINE
6.1 Testing the Effect of Advertising: the F-test 6.4.2 The Optimal Level of Advertising and
6.1.1 The Logic of the Test Price
6.1.2 The Unrestricted and Restricted Models 6.5 The Use of Nonsample Information
6.1.3 Test Template 6.6 Model Specification
6.2 Testing the Significance of the Model 6.6.1 Omitted Variables
6.2.1 Null and Alternative Hypotheses 6.6.2 Irrelevant Variables
6.2.2 Test Template 6.6.3 The RESET Test
6.2.3 Excel Regression Output 6.7 Poor Data, Collinearity and Insignificance
6.3 The Relationship between t- and F-Tests 6.7.1 Correlation Matrix
6.4 Testing Some Economic Hypotheses 6.7.2 The Car Mileage Model Example
6.4.1 The Optimal Level of Advertising

In this chapter we continue to work with the multiple linear regression model of Big Andy's
Burger Barn hamburger chain to illustrate the F-test procedure. We also work with additional
examples to address nonsample information, model specification and collinearity issues.

6.1 TESTING THE EFFECT OF ADVERTISING: THE F-TEST

6.1.1 The Logic of the Test

In Chapters 3 and 5 we worked with t-tests for null hypotheses consisting of a single restriction
on one parameter f3k· An F-test will be used when a null hypothesis consists of a single or more
restrictions, each regarding two or more parameters.

154
Further Inference in the Multiple Regression Model 155

An F-test is based on a comparison of the sum of squared errors from the original, unrestricted
model, with the sum of squared errors from the model in which the null hypothesis is assumed to
be true and in which the restriction(s) implied by it has(have) been imposed-this latter model is
referred to as the restricted model.

If the null hypothesis is true, then the following F-statistic follows an F-distribution with m1 = ]
numerator degrees of freedom and m2 = N - K denominator degrees of freedom:

(SSER - SSEu)/J
F F -K) (6.1)
SSEu/(N - K) � (m1=f,m2=N
=

where SSER is the sum of squared errors from the restricted model,

SSEu is the sum of squared errors from the unrestricted model,

] is the number of restrictions in the null hypothesis,

N is the sample size,

and K is the number of parameters in the unrestricted model.

If the null hypothesis is not true, then the value of the computed F-statistic will tend to be
unusually large. We will reject the null hypothesis if F ;::: Fe, where Fe is the critical value shown
below.

6.1.2 The Unrestricted and Restricted Models

We will use the Big Andy's Burger Barn model to illustrate the F-test procedure. We start by
specifying and estimating the unrestricted and restricted models.

Recall from Chapter 5, the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain. This is the unrestricted model.

(6.2)

where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).
156 Chapter 6

Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0:{33 = 0,/34 = 0 and H1:{33 * 0 or /34 * 0 or both are nonzero. If we impose our null
hypothesis or restriction to equation (6.2), we obtain the following restricted model:

(6.3)

We would like to successively estimate the unrestricted model (6.2) and the restricted model
(6.3). First, open your Excel file andy. Save your file as POE Chapter 6. Rename Sheet 1 andy's
hamburger chain data.

2
In Dl, enter the column label ADVERT . In cell D2, enter the formula =C2"2; copy it to cells
D3:D76. Here is how your table should look (only the first five values are shown below):

A I B I c I D
1 SALES PRICE ADVERT ADVERT2
2 73.2 5_,59 u i.69
,_
3 7'i.8i ·5.49 2.9 8.41
,_

I---
4 62'.4 S_fi;3 0.8 0.64

,-2_ 67.4 ·5.22 0.7 0.49


6 891.3 :5_0,2 1 ..5 2.25

For the unrestricted model (6.2), the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Unrestricted Model. Finally select OK.

l.[lput
OK!8
lnput'Y:'Range: I SA;$1:$AS76 �
I $8$1:$[)$76
Cancel
I
[nput;X1Riilng�:
[�]
.!ieip,
�b.abels: D Constant.is �ere
D Con�dence Level; EJ o;,.
Output cptian�

Q Qulput Ra1:1fie;; rill


@· New Worksheetf:ly.: I Unrestricted Medell I

The result is what you already obtained in Chapter 5:


Further Inference in the Multiple Regression Model 157

A I B I c I D E I F I G I H I I
1 SUMMARY OUliPUli

2 T
3. I Reg·ression Statistic.s
4 Multtple_ R 0.712'9051.33
5 RSqusre 0.50&2'35155-
1--
5 Ad'justed R square 0.48-7455358
7 StaJ11dard Error 4 .·645.2&3021
a Observatiorus 75


10 ANOVA

JII rlf SS M:S F Significance F


12 Regi:_essibru 3. 1583.397408 527.799136. 24.45931648 ,5 .59'9'36E-U

_J
,___ --

13 Residual 71 153:2.0844.S.9. 21.57865435


1--
14 T.otal 74 3115.4818157

15 J
.t6J Caeffidenfs Standrn:d Er'mr tSt·at P-vril/Ue Lowef'95% Upper95%· tpwef'95.0% Uppff 95.lJ%
108.719035 •6.79'.9045455

�' "'""''
16.13.741763 1.B7037E-25 96.16212457 123.2759'474 95.16212457 123.2759474
-

18 price - 7 ,.540000035 1.04593888'4 -7. 3-04442117 3.23548E-l0 -9.725542907 -5 .554457162 - 9 . 725542907 -5.554457162

19' advert 12.15123567 C3.S.55HB941 3.416850354 0.001051598 5.050446253 19.24:202509 5.060446253 19.242025()9
f- -

20 adveat2 -2. ?57963()89 {).9'40624011 -2..94Zn88043. 0.004392655 -4.643514112 -0.812412065' -4.6435·1411"2 - O . S92412G 56 ,

Go back to your andy's hamburger chain data worksheet. For the restricted model (6.3), the
Input Y Range should be Al:A76, and the Input X Range should be only the PRICE data
Bl:B76. Check the box next to Labels. Select New Worksheet Ply and name it Restricted
Model. Finally, select OK.

Input OKW
input Y:Rcinge; I ¥'-111: $11:$76 �
input�Range: 513$1::$8�76 rs
cancel
l
t:!elp
�b:a'bels D •Coras�nt is �ero
D ConBdeno:e 'Level: �%
Oatput optlans

0 Qul::put Rijnge: �1
@Ne\111 WQrl;:;heet1�Jy;; j Restricted Model I
The result is:
158 Chapter 6

I I I I I I I
�SUMMA.RY
A B c D E F G H

OUTP'UT
i
r I
3 I Regr:essivn Statistics
4 Multple R 0.62554053
- -t
5 R:5'qua[e 0.391300�55
i
5 Adjusted R Square 0.382952'612
- .,
-
7 standard Error 5.09685752'9
8 Obs.grvatlons. 75

l�ANOVA I
i

i
nl df SS MS F Sign.if.icancf! F
i
12 _Re15re:;sion 1 12l<j,091Q3. 121'9 .09103 46.9279-0295 1.97078E-09
- --- + i
13
-
Re.si,dual n 1896.390837 25.97795667
i
14 Iota I 74 3115.4818&7 I
15·
101 Coefficients Stamfarri Error t stat P-voiue lowef"'J5% Upper95% Lower95.0% Upper95.0%
�ter<0e-pt 121.9001736 6.5262906'98'. 18.67832421 l.5876E-29 108.8932951 134.9o7052 108.&932951 134.907052
pri ce. - 7 .829'073515 1.142864644 -6.850394365· 1.97-078E-09 -10.10679943 -5.551347597 - 10.10679943 -5,551347597
'

6.1.3 Test Template

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it F-test.

I F-test '

I I !111mt Work.sheet ('lhift�Fll) �

Create the F-test template as shown in the table below.

F-critical values are obtained in Excel by using the FINV function. The syntax of the FINV
function is as follows:

where a is the level of significance of the test, m1 is the numerator degrees of freedom and m2 is
the denominator degrees of freedom of the F-distribution.

p-values for F-statistics are obtained in Excel by using the FDIST function. For hypothesis tests
purposes, the syntax of the FDIST function is as follows:

=FDIST(F-statistic, mi, m2)

A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B 12+ 1
4 SSEu= ='Unrestricted Model'! C 13
5 SSER= ='Restricted Model'! C 13
6 a=
Further Inference in the Multiple Regression Model 159

A B c
8 Computed Values mi= =Cl
9 mz = =C2-C3
10 Fc= =FINV(C6,C8,C9)
11
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")

Note that the number of parameters K is equal to the Excel regression degrees of freedom plus
one (see cell C3 above).

With 2 restrictions in the null hypothesis H0:{33 = 0, {34 = 0, and at a 0.05, the results of the
F-test are (see also p. 225 of Principles ofEconometrics, 4e):

A I B I c
I
8 Computed V1:1lu.es m1= 2
,_
A I B I c 9
-
n12 = 71
1 Data Input J= 2 10 Fe= 3.125764
-
I-
2 N= 75 ll
- �
3 K!: 4 12 F.test F-statistic = .&.44136

4 SSELI= 1532.084 13 Gonc.lusion =1Reject Ho



5 SSER = 18%.'.!91 14
��
p-value = 0.000514

·fi I a=I 0.05 15 Conc]usion =Reject Ho

6.2 TESTING THE SIGNIFICANCE OF THE MODEL

6.2.1 Null and Alternative Hypotheses

For a general unrestricted multiple regression model with K 1 explanatory - variables and K
unknown coefficients: Yi = �1 + �zXiz + �3xi3 + + �KxiK + ei> the null · · · and alternative
hypotheses of a test of significance of the model are:

Ho:/32 = 0,{33 = 0, ...,{3k = 0

H1: At least one of the f3k is nonzero fork = 2, 3, ...,K

Note that, in this one case, in which we are testing the null hypothesis that all the model
parameters are zero, except the intercept, the sum of squared errors from the restricted model is
equal to the total sum of squares from the unrestricted model: SSER = SSTu.

6.2.2 Test Template

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Test of Significance of Model.
160 Chapter 6

LI Test of Sianificanoo,of Model,.,


I [nsertWa-rkl h""t (Shitt� Fil�!

Copy the template from your F-test worksheet into your new worksheet. You just need to modify
the reference in cell CS, as highlighted below, to obtain a template for a test of the overall
significance of the regression model.

A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B12+1
4 SSEu= ='Unrestricted Model'!C13
5 SSER = ='Unrestricted Model'!C14
6 a=

8 Computed Values m1= =Cl


9 m1= =C2-C3
10 F e= =FINV(C6,C8,C9)

12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)


13 Conclusion = =IF(Cl2>=Cl0,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(Cl2,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")

For the unrestricted model (6.2), SALESi = {31 + {32PRICEi + {33ADVERTi + {34ADVERTl + eb

the null and alternative hypotheses of a test of significance of the model are:

H1: At least one of {32 or {J3or {34 is nonzero

The null hypothesis above contains two restrictions. With 3 restrictions, at a = 0.05, the results
of the test of significance of model (6.2) are (see also pp. 226-227 of Principles of Econometrics,
4e):
A 8 I c
s Computed: Value's: m1= 3
A I 8 I c '9 m2= 71
-

1 Data Input J= 3 10 f<C:; = 2..733047


,_

2 N= 75 11
-

3 K= 4 12 F-test F-statisti,c = 24.45:932


-- -

-
4 SSEu= 1532.084' 13
-
Conclusion = Reje-ct Ho
5 SSfai= 3115.482 14 p-v:alu·e = 5.6E-11
- �
6 a= 0.05 15 Conclusion = Reject Ho

6.2.3 Excel Regression Output

For the test of significance of a model, since SSER = SSTu, there is no need to estimate a
restricted model-all the information needed to compute the F-statistic is available from the
regression analysis of the unrestricted model. This is why the F-statistic of the test of significance
Further Inference in the Multiple Regression Model 161

of a model and its p-value are found in the Excel summary output (see your Unrestricted Model
worksheet):

A I B I G I D I E F
11 I rJF SS MS F Sifl_niffr;anc;e F
.R Reg,re-ssion t
2 1396.536993 693 .26'94963 29.24785,998 5.0'408SE-10
Jl. Reisidual 72 rna.94.2sss 23-8:7420B13
14 Total I 74 3115-.481978

6.3 THE RELATIONSHIP BETWEEN t- AND F-TESTS

Reconsider the following multiple linear regression model for Big Andy's Burger Barn
hamburger chain. This is the unrestricted model.

(6.2)

Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0: /32 = 0 and H1: /32 * 0. If we impose our null hypothesis or restriction to equation (6.2),
we obtain the following restricted model:

(6.4)

Go back to your andy's hamburger chain data worksheet. In the Regression dialog box, the
Input Y Range should be Al:A76, and the Input X Range should be Cl:D76. Check the box
next to Labels. Select Output Range and specify it to be cell Al in your Unrestricted Model
worksheet: you can place your cursor in the Output Range window and move it to that cell to do
that, or type 'Restricted Model'!Al in the Output Range window. Finally, select OK.

egression ffi l8J


Jn put

Input Y.. Range:

Input l'.I Range:


s,.\$1:SAS76

I $CS1:$0$76 �

� I

t:j_elp
0babels D co,,,;rant i• !.ero
D Co.nfjdence Level: �%
Ouiput options:
0 Qu'tputR�e: (i6?dei'!�:$.1. m

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.

. - ��������---

Microsoft Offi ce Excel

? Regression - Outµut ramge will lilverwrite eliisting data. Press OK to overwrib:: data in range

I o<J:il [ Cancel ] ( Heli:i ]


162 Chapter 6

The result is:

A I B l c I [) I E 1 F 6 H I I

t
1 SUMMARY OUTPUlf
-

3 I Regression Statistics

__±__ Mllltiple R 0.372:4-04526,


5 R S.quare 0.13&!>85131
-

6 Adjusted RSquare 0.11475'9718'


-

-
7 star11dard Error 6.1048829
& Observations I 75

j

-

10 A NOVA I
11 I I cJf 'SS MS F Significan oe F
12 1 Regres�irnn 5. 796561616 0.004632556.

t
2 432.0710103 216.0355051
-

13 11e:sidual 72 2683.4108'.16 37.2695952.l


---

14 Total 74 3115.481857 I I
15

16J Coefficients Standan:i Error t Stat P-vafu·e Lower.95% Upper.95% Low.er.95.Cl% Upper-.95.0%
17 lrnt·ercept 64.1l4148981 3.827012492 1·6. 9431()� 7.87896E-27 57.2.12:47994 72.47Q4S968 57.. 21247994 n.4704996B
18 advert 14.249.15942 4.6582829 3.058886559' 0.003118901 4.96304.2303 23.53527653, 4.%3042303 :B.53527653
19 advert2 -3 .3 55•8'94266 1.231488631 - 2 7331915C 7
. J 0.00788'726& - 5.&2082195 -0.9'lfr966582 -5 .. 82082195 -0.910965582

Go back to your F-test worksheet. With 1 restrictions, at a = 0.05, the result is (see also p. 227
in Principles ofEconometrics, 4e):
A I B I c I
g Computed Values m1= 1

I I
I

Data Input
A B
J=
c
1
,_
'9
11()
m2=
Fe= 3.97581
71

,_
2 N= 75 11
-
,_
3 K= 4 12 F-11est F-s.talisti c = 53.35487
I-
4 SSEu= 1532.084 13 I Conclusion = Reject Ho
1-
.SSER = 2683.411 14 p-value = 3.24E-10
,_
1+ o= 0.05 15 I Conclusio·n =·Reject Ho

Note that we used at-test in Chapter 5 (Section 5.3.1) for this same test of significance of {32.
When testing a single "equality" null hypothesis (a single restriction) against a "not equal to"
alternative hypothesis, either a t-test or an F-test can be used and the test outcomes will be
identical.

If you go back to your Unrestricted Model worksheet and look at the p-value for b2, you should
find that it is exactly the same as the one computed in your F-test template. We highlight both
results below:

- - - -

-
A I B I c I Di I E
A B c 1fj Coefficients Standard Error t'Stat P-11u/ue
11 F-tes1 F-statishc = 53.35487 17 Intercept 109.719:D35 5.7990454551 15.13741753 1.87937E-25
,_
13 Conclusion= Reject Ho 18 price -7.540000035 1.04.59:388841 -7.304442117 3.23648E"l0
14 p-valuB = 3.24E-10 19 advert
-
12.1512356.7 3 .555153941 3.416�50364 0.001051598

15> Conclusion = Rej ect Ho 201ad11ert2 -:!.757963089 0.940624031 -2:.94258804'.l 0.004392565


_

As explained in pp. 227-228 of Principles of Econometrics, 4e, note F-statistic = 53.355 = t2


(-7.3044)2.
Further Inference in the Multiple Regression Model 163

6.4 TESTING SOME ECONOMIC HYPOTHESES

6.4.1 The Optimal Level of Advertising


For this test, as explained on p. 229 of Principles ofEconometrics, 4e, the restricted model is:

Go back to your andy' s hamburger chain data worksheet. Because explanatory variables must
be adjacent, insert a new column to the right of the PRICE data column. In Cl, enter the column
label x*. In C2, enter the formula =E2-3.8*D2; copy it to cells C3:C76. In Fl, enter the column
label y*. In F2, enter the formula =A2-D2; copy it to cells F3:F76.

Here is how your table should look (only the first five values are shown below):

·1 A I B I c I D I E I F
1 SALES, PRICE x:" ADVERT ADVERT2 y"
2 73-2 S,_&9 -3_2S 1-3 U19 71_ 9'
'3
--
71-8 ,5_49 - 2 61
. 2.9 a_41 68_9'

-
4 62-4 £._6;3 -2'-4 o_a o,_fi4 61 _ fi,
5 67-4 .fi,_22 -2.17 0.7 0-49 Eifi'-7
....____
6 89_3 s._n2 -
3 45
. 1-5 2'-25 ll7_8

For the restricted model (6.5), the Input Y Range should be Fl:F76, and the Input X Range
should be Bl:C76. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Restricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Restricted Model'!Al in the Output Range
window. Finally, select OK.

"-� ------- � -

' lteg resskm


CTJL8]
Input
lnputy Range:

b[elp
�labels: D Ct?nstant is f;ero
D Gonfjdence Level: EJ %
OIJtpUt CpllOl:'IS-

@ Qutput Range.: I 'Res.1ricted Model'! �

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
164 Chapter 6

mSUMMARY OUTPUT I I I I I
A B C. I 0 E I F G I H I

3 I Regression S falistics
4 R
Multipl., 0 _ 693339057
f--
� R Sqwarn 0-480719048
6 Adjusted R �quare
,__
o_4662945n -.
T Standard Error 4.643224.3-9
1-�
a Ol>sen1a.tions 75

1�AN0v'A
J_'.1_ I dt SS MS F Sig_riificance F
12
f--
Regression 2 1437_0 1327 1 718._5066355 33�321'&3303 5J>-818E-11
R�sidua.I 72 1552.28"6;357 21 �55953273

14 Total 74 2989-2�96.?8"
15 I
�16-I Coefficients Sla11dard Error t Slat P-v:alue Lower95% Upper95% Lov,.er 95.0% Ueeer95.0%
17 Intercept 110.35B95-9'9 ,6_763!10,3393 16.31610996 6.84193E-26 96.87556446 123-B4Z 3554 96_8.7,5.56446. 123.. 8423554
-7 .60310422 -7.21722771 -5:5203-727·7 -9_ 6 8 5 835675 , -5 §2037_27GS'
J_i[ PRICE
19 x* -2-87651491
1.044 78()'30 9
0.9334%59 -3.Cl8144457
3.3961 TE-10
0.0029'17717
-9Ji,85835675
4. fi7404337 -1-0156·2'549, 4-737404337 -1.01562503

Go back to your F-test worksheet. With 1 restriction, at a = 0.05, the result is (see also p. 229 in
Principles ofEconometrics, 4e):
A I B I c I D
8 Computed! V.alues m1= 1
A B
9 m2= 71
1 Data Input J=
2 N=
iO
-f--
Fe= 3.�7581
11
3 K= 4 -
12 F-test
--
F-statistlc = 0.936194
4 SSEu= 1532.0BS ·13 Conclusion= Do No.\ Reject Ho
-
SSER= 1 552 _ 2!!6 14 p-value = 0.336543,
-
a= Q. (15 15 Conclusion= Do Not ·Rej.Bcl Ho

6.4.2 The Optimal Level of Advertising and Price

For this test, the restricted model is:

f31 + f32(PRICE - 6)i


(SALES - ADVERT - 78.l)i (6.6)
+ {34(ADVERT2 - 3.8 ADVERT+ 3.61)i + ei
=

Go back to your andy's hamburger chain data worksheet. In cells G1:12, enter the following
column labels and formulas.

G H I
1 y** X1** X2**
2 =A2-D2-78.l =B2-6 =E2-3.8*D2+3.6 1

Copy the content of cells G2:12 to cells G3:176. Here is how your table should look (only the
first five values are shown below):
G I H I I


1 y"* I x.i* X2"'
2
,__
-6.i -0.31 0.3·6
3 -9.2: 0.4'9 1
4 -165
-
-0.37 1.21
5 -11.4 0.2.2 1.44
f--
6 9.7' -0.9:& 0.16
Further Inference in the Multiple Regression Model 165

For the restricted model (6.6), notice that there is no intercept; so you will need to select the
Constant is Zero option in the Regression dialog box. The Input Y Range should be Gl:G76,
and the Input X Range should be Hl:I76. Check the box next to Labels and Constant is Zero.
Select Output Range and specify it to be cell Al in your Restricted Model worksheet: you can
place your cursor in the Output Range window and move it to that cell to do that, or type
'Restricted Model'!Al in the Output Range window. Finally, select OK.

- Re�-- ------ LI)l'.8J


input
Inp111ty Ral'lge:

Input.ii RMlge;
I ·�$1::$1;$76
$--1$1:$.1$76
[�]

� I

'tielp
�Labels. � CoMt:arntls z_ero
D Confidence tevel; EJ %
Output options

® QutwtRarnge; I J Model'!.$"'4$� [iU

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:

A B I G I D E I F I G H
1- SUMMARY OUTPUT
2
3 R.�gr.ession Statlstic.s
4 .Multiple R 0._699423441
5R Square 0-489193159!
6 Adjusted R Square 0•-466497175
'7 Standard Error - 4_937778213
T Dbservations 75
9'
10 ANOVA
J1_.�
I ____________ � s_s�-�-1w _ s _______ F w�n_m_ca_n_
s� ec _F_
12 Regression 2 1 704_549'/'rn 1152-2748861 34-_95558173 2_46249E-'11
B Residual 73 1719_8'60-71.9 24-18165368
'14
f--.-\
.
Total 75 3484_410'495 , ,
15
���������������������������-

Goeffrcienls StandB'rd Error t SlB't F'-Vil']ue .Lower 95% Upper 95% lowedl5. 0% Upper 95 0%-
� Intercept _ 0. #NIA #NJ.A ___ #NIA, _ _ #NIA #NIA #NIA #NIA
-.i1-_17957010 -:.s_2os f2o s3'4 -4.1191sf81�1
_

18 :x1'* -6 _1�1495 1 Ul'1082�01& -6-12761579 4J 11S2E-08 .0_200.no934


--
19 x2'" -5.08-0167ID4 0_&79983611 -1-479& 3769 U304�E-10 -6_441372408 -3_1309
° 6168 -6_44137240-8 J D0 96168 2
- _ '

Go back to your F-test worksheet. With 2 restrictions, at a= 0.05, the result is (see also p. 231
in Principles ofEconometrics, 4e):
I
A B I c
8 Compute.di Values m1= 2
,_
A I B I c
� m2 = 71
_J_ Da1a Input. J= 2: -
1Qi Fe= 3.125764
2 N.= 75 ,_
-
-
11
3 K= 4
12 F-t·est F-statistic = 5,_7412:33
4 SSEu= 153-2_085 ,__
-
,_
13 Conclu�ion = Reje_ct .Ho
-
5 SSER= 1779.Jl61 14. p-value = [)_004885
1�
6 a= o_o.s 15 I Conclu.sicin = 'R0eject Ho
166 Chapter 6

6.5 THE USE OF NONSAMPLE INFORMATION

Consider the following unrestricted demand model for beer:

where Q is the quantity demanded, PB is the price of beer, PL is the price of liquor, PR is the
price of all other remaining goods and services, and I is income. All information for this model
has been collected over a period of 30 years from a randomly selected household.

The assumption that economic agents do not suffer from "money illusion" can be imposed on the
demand model. This lead to the following restricted demand model for beer (see pp. 231-232 in
Principles ofEconometrics, 4e for more details):

(6.8)

Below we estimate the restricted model (6.8).

Open the Excel file beer. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, rename it beer data, and in it, copy the data set you just opened.

U heerdata,./
I I Insert Wen.ks heft [S!lifHfll ) �
I

In cells Fl:I2 of your beer data worksheet, enter the following column labels and formulas.

F G H I
1 y
* X1* X2* X3*
2 =ln(A2) =ln(B2/D2) =ln(C2/D2) =ln(E2/D2)

Copy the content of cells F2:I2 to cells F3:131. Here is how your table should look (only the first
five values are shown below):
F G H
r x1• x:l X3�
2 4_403054 0_472253 1_834382 10_025:7a
3 4.0412915 1.2:20257 2.39'1:088 10.58768
4 4_160444 0_979322' 2_Wi509 10_.33316
5 4.180522 1-05315 2.2'58981 10.49'711
6 4.160444 0.757095 1.%1287 10.15131

In the Regression dialog box, the Input Y Range should be Fl:F31, and the Input X Range
should be Gl:I31. Check the box next to Labels. Select new Worksheet Ply and name it
Restricted Beer Demand Model. Finally, select OK.
Further Inference in the Multiple Regression Model 167

• R�gression ---- - -l7]�


Iilput

Input '!'.Range:

i:Aput �Range:
J.$F$1:�'.$.,31
J$G$i:$1$3i

r�l
� -

t:[elp
�Labels D Cons:tant is ;'.ere
D Crin�clence Lev-el: �%
Output options

0 Qutput Range: �
@New Worksheet['_lyo J�er Demand Mc-dell I

The result is (see also p. 232 in Principles ofEconometrics, 4e):

A B c I D I E I F I G I H I
11-1""'1-"'-
S � UM�M
�....
ARY.....: OLJTp UT
'2
3 -----Re- a-,r. -e s-·
s, on_
- S_ta _t1-
- sti -c_s __

4 Multiple R 0.898659761
'5 RSquar-e 0.80794887
L AdJ usied R Square
-

o i as i a9124
_

7 I Sta.ndard Error 0.0-6.1675593


8-! 0bservalions_ 30

�ANOVA
11 I df SS MS F Significance F _

,JL Regression 3 0.41•6070592 0.138690197 "Ju.46020486 1.83399E-09


13' ResiduaJ 26 o-.o98sooa41 o.ob3a.o3679
'14total 29 0•_ 5-149-7 1 439 I

15
Coefficienls Sfapd,.rd Error t Sfat P-value LowerY5% Upper 95% lowe.r95J)% Upper95_0%
17 1 lnterce·pt _-4.7'f7797376 3.7139(}504
- .:1.2_9184707-9 0207775913 -12�43'183 844 2.83624369'1 -12.43'183844 2.83624369'1
1if !K"'l' -1299�8¥8:4- _0.16573!623 -7.840021241 �.57799E-08 -l.640065044 - 0 95s i' o1925 �1.64oo6s-044
_ -0_9�8ro7�.?�
tl
,_
x2. 0.186615879 0284383258 0.656915882 051700'8126
- - - -0,3!!'77�?275 0. 771374032 -01.3917742275 0. 77137403'2
20' x3� 0.945628579 �-427046831 2.214812313 O.Qo3574:2225 0.0&.8021255 1. 823&35904 0·.0&8021255
. 1.823&359.04

6.6 MODEL SPECIFICATION

6.6.1 Omitted Variables

Consider the following family income model:

(6.9)

where FAM INC is the annual family income of married couples where both husbands and wives
work; HEDU is the years of education of the husband and WEDU is the years of education of the
wife.

If we incorrectly omit the relevant variable WEDU (wife's education) from the family income
model, it becomes:
(6.10)

If we add the omitted relevant variable KL6 (number of children less than 6 years old) to the
family income model, it becomes:
168 Chapter 6

(6.11)

You can estimate models (6.9)-(6.11) using the edu_inc data set. Below, we will show you how
to get the correlation matrix as shown in Table 6.1 of Principles ofEconometrics, 4e (p. 235).

Open the Excel file edu_inc. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, name it education and income data, and in it, copy the data set you just
opened.


I
edura.tion and in.come datal, 'tJ
Jln�"rtWor�lrteet (�pitl'"-FtlJ 1 Q
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.

The Data Analysis dialog box pops up. In it, select Correlation (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.

Analysis Tools
e Factor
OK
Ano�a: Singl "' I
e
l io!An o .,. a; m
/:ova: '11/iiiiii fi caibiioniiin •••• l
Cancel
hD
•oi -fi
a ctiiiil
o r Without !h· iiRliipii
i!ijlim
Cov ariance
T'lll o-Fact w Reolicaoo


dl :t[elp
Descriptive Statsncs
Exponenti;31 Srnaething
F·TestT•wo-Sample fur Variances
Fowrier Anill�sis
Histogram

A Correlation dialog box pops up. Specify the Input Range to be Al:F429. Select Grouped by
Columns, as this is the way the data on each variable are stored. Check the box next to Labels in
first row. Select New Worksheet Ply and name it Correlation Matrix. Finally, select OK.

,
Correlation (f]�
Input

lnput Range:
ISA$k.�.$'\:2.9 �
Grouped By: 0 Column�
OB.ows
� eel

t;!elp
0 tabels in first rn•w
Output options
0 QutputRi0n9e: I"" 1 . �1
@ NewW11rkl:heet E'Jw:· ICorrelation Matrix I
0 New \O!_nr :kbool
Further Inference in the Multiple Regression Model 169

The result is:


A B I c I D I E I F I G
1 FAMING HE WE KL6 X1RA X5 XTRA X6

� FAMINC 1
3 HE 0.354·684 1

T WE
>---
0-3-62328 0.594343 1
'
5, KL·6 -D·. .0-7195 (}_ 104877 0 . 1 2�34 1
T XTRA XS 0.289!!17 0.!!35468 0.517798· 0.148742 1
T XmA-X6 0.351.365, 0.820563 o.7:m6& 0.159522 0.900206 1

6.6.2 Irrelevant Variables

To see the effect of irrelevant variables, we can add two artificially generated variables X5 and X6
to the family income model (6.11):

(6.12)

You can estimate model (6.12) using the edu_inc.xls data set. Below, we will show you how the
variables X5 and X6 were generated.

Variables X5 and X6 were constructed so that they are correlated with HEDU and WEDU, but
they are not expected to influence family income. Specifically, they were defined as follows:

Xsi = HEDUi + 2N(0,1) (6.13)

x6i = Xsi + WEDUi + N(0,1) (6.14)

where N(0,1) are random numbers from a normal distribution with mean 0 and standard
deviation 1, generated the way we generated our random samples in Section 2.4.4 and Section
3.1.4.

Go back to your education and income data worksheet. In cells Hl :N2 enter the following
column labels and formulas. In the last row of the table you will find the numbers of the
equations used in the formulas.

H I J K L M N
1 N(0,1) for x5 N(0,1) for x6 HEDU WEDU KL6 Xs x6
2 =B2 =C2 =D2 =J2+2*H2 =M2+K2+I2
(6.14) (6.15)

Note that we copy the values of the HEDU, WEDU and KL6 variables in columns J-L. The
reason we are doing this is that we need to have the columns of explanatory variables next to one
another to be able to use the Excel regression analysis tool.

In columns H-1, we will generate samples of random numbers from a normal distribution with
mean 0 and standard deviation 1.

Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
170 Chapter 6

I� Data�nialys.f� I

I Fcirl!!ILilll� I l':lata� --
R:evh•w
Anal)'�is

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
r· ---= -------

I Data Analysis. m�
,;nalysis,Too1$
OK
F-TestTwo-Sam�e for Varianc.es
Fourier Analysis Cancel
l:lisfogrlilm
Movi".1.9.!l"era�
IMf;ffldl MNfui,@@l�!.J, !::!.elp

Rank and Percentile


Regression
Sampling
t-Test: Paired Two .Sample fur Meal'lS
t·Test: Two·Sample Assuming Equal \larianc.es

A Random Number Generation dialog box pops up. We need to generate two sets of random
numbers: one for our X5 variable and one for our X6 variable, so we specify 2 in the Number of
Variables window. We would like to generate as many data points as we have in the data set we
are working with, so we specify 428 in the Number of Random Numbers window. We select
Normal in the Distribution window; the selected Parameters should be Mean equal to 0, and
Standard deviation equal to 1. Select Output Range and specify it to be H2:1429. Finally, we
select OK.
Random tfomb<er Generation
��
'Number'of!!'.ariables;
J 2.-
'- - ---� �
'Number of Random 'Num!;\ers: I.._4:2_8
_____,

'gjstribulion.: ! Normal �1 t!elp

M'�an=

2,tandard deviation = [=1

"B,arndom Seed::

Output options
0 Quiput Range;

0 Jlle'll' WGrk!;l'leet Ely:

0 Nell)' �Gtkbook

After you copy the content of cells J2:N2 to cells J3:N429, your table should look like the one
below (only the first five values are shown below):
Further Inference in the Multiple Regression Model 171

l;-l I I I j I K I L I M I N
1 NIO� 1 �for x� Nfa>, 1) for x6 HE WE KLG X'!J XG
2 1-167550>181 0-2U471 5il9 12 12 1 14.3351 2&_53982'
]. 0_2412639'33 0_08421011 9 12 0 9_482528 2'1 _56'674
T -0_ 7'237940·74 0_549'94871 12 12'. 1 10_55241 23,_1023G
-
5,- 0.459443648 0.53'153258 10 12 0 1 o.:J36:89 23.47042
_? 1_7905404-0.9 -0. 5.&18233 12 14 1 15_58108 29'_01926

In the Regression dialog box, the Input Y Range should be Al:A429, and the Input X Range
should be Jl:N429. Check the box next to Labels. Select new Worksheet Ply and name it
Irrelevant Variable Model. Finally, select OK.

!
������������������- �-

R·egre5.sicm ��
Input
Irnput I Range:

lnput ;ii;_ Range:


1 $A$1=$A�429
$J:$1::FJ.$4�


� el

D Consblnt is Zern
t!elp
0!..abels
0 Confider1a> !Level: EJ ·ry,.
Output opoons.

0 Qutput Range; I Re::;'r :·:j - - �1


•-

0 ·Nel'I• Worksheet E'.Jy: I ;nt Variable Model ! I

Note: we obtained different random samples than the ones recorded in the edu_inc data set, this is
why our resulting estimated equation will also differ from the one reported on p. 236 of
Principles of Econometrics, 4e. You will also obtain different parameters estimates for equation
(6.12) because your random numbers will differ from those above.

Our regression analysis results are:

_-6_ B c D E F H
1 SUMMARY OUTPUT
2
3 Re ression Sfalistics
4 MultipleR 0 _421302759<
5 _R ��;uarn ,,,--+-0.1774960·�5
6 R Square-

T
Adjust•ed 0_.1§n�_0707
7 Stantlard Erri:rr 40247-24063
8 0 bs erva't i o-n s 428

190 IANOVA
11 I df SS MS F Sig_nificance F !
t2 Regressi(m 5 1A751.5E+11 29502937711 18.21348455 2.23?01E-i 6:
13 R·esidt:1al 422' 6_83573E+11 1619'840378
14 fatal 427 8_31 OS7'E+11
151
16-I C1Jeffic;lerrls Slandard &ror t Stat P-�aJue L1Jwer 95% Upper95% Lower95.0% Ue_e_e95"0%
17 Intercept -7'682_625 15.2 11�8!U2S23 -0_6.86602894 DA927!�098 -2967· 6 . 3&31 2 14311.132'81 -2967iU8"312
.. .. ... 14311_132'81
18 HE 2_ 4592'645
_ 1
� .

303 5_ 184489'· 1234 _ 183�29 0_01432250 B fi09-27116B:3: .5461.09781 609-27116B3• .546-1_fr97'8 1


19 WE 4097.602729' 224:8.8598.81 1.822:080052 0.069150476 -'322.7591451 a:s11 .964&04 -322. 7591451 8517.9646;04
20 Kl6 1 42 75 1 99 41 5016_721707 -2_ 84552348 0_0046498&3 "241J6_014os, -4414-324769 "241JG_oi4os -4414_324769,
21 Jit5 -487_�()440J-38
- . . 1

22 34_009 772 -4ilts2i&s14·


- 3904--.1. 28447 -481s21&s14
. 3904_ 128447:
1945_4 82493 Q_ 73255486:3 -3158.77510-5; 448!U1067:J' -3158_775105
-0-2'.lB 01JJ.B6 0_8275-2406;3
22 x6 665_267784 0_3:419551164 4499_310&73'
172 Chapter 6

6.6.3 The RESET Test


Let (b1, b2, b3, b4) be the least squares estimates of the family income model (6.11); the
predicted values of family income are:

(6.15)

Consider further the following two artificial models and their associated test for misspecification.
We will use an F-test for both even though at-test could be used for the RESET test 1.

(6.16)

RESET test 1: H0: y1 = 0, H1: Y1 * 0


Unrestricted model: equation (6.16)
Restricted model: equation (6.11)

FAMINCi
- 2 - 3
=

(6.17)
/31 + {32HEDUi + {33WEDUi + {34KL6i + y1FAMINCi + y2FAMINCi + ei

RESET test 2: H0: y1 = y2 = 0, H1: y1 * 0 and/or y2 * 0


Unrestricted model: equation (6.17)
Restricted model: equation (6.11)

Go back to your education and income data worksheet, from where we will first estimate the
restricted model (6.12). In the Regression dialog box, the Input Y Range should be Al:A429,
and the Input X Range should be Bl:D429. Check the box next to Labels. Select Output Range
and specify it to be cell Al in your restricted Model worksheet: you can place your cursor in the
Output Range window and move it to that cell to do that, or type 'Restricted Model'!Al in the
Output Range window. Finally select OK.

. ------- · - �

, Regres.s.imi ��
Input
lr\pu t y Range ::

Input� Ral'illlle,:
I $A$1:SA�$42.9
$8$1:�$429


� el

t!�P
�!..abels 0 Consrant is. £em
D Coojjdena- Le.11el :, EJ '%
Output options

@ QulptJt RanQe: j Model'l $1'1 $1I �

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
Further Inference in the Multiple Regression Model 173

A I B I c I D E I F I G H I
1 SUMMARY OUTPUT I I
-

2 t I
3 Regr;ession SJatisfics
l Multiple R ()_420919613
2-
-

R Squ�!:e ()_ 1771'73321


6 Adjusted R S-q,uare 0_171351434"

_]_ Standard Error 40160.08{4


6 Observ.a1ions 418
9 I
10 ANOVA I
11 rif SS MS F Sii:inific.ance F
-
1'2 Regression 3 1.47247E+11 49082167249 30.4 3.228498 U736E-18
-
1.3 Resid·ual
-
424 6.ll3841E+11 1'612832138 -

14 fotal 427 B.;J1087E+11 I

15
16 CoefficietJ/S Standar:d Etror 1 t Stat P-val!I� Lowe-r95% UtJtJer95% Lovler 95.0% Uooer 95. 0%•
JL Intercept -7755.33133 11162. 9'344 7 -Q_,594 7394!l 1 0-4 87.5 9 9 098 - . -
-29696.9'12. 14186�24934 -29696.912 14186.24934
1S HE 3211.525676 796.7(}2:!);365 4.03"1021775 6_5 84QITE-05 1645_547195. 4777.504158 1645_547195 4.7H5CM158
J2_ WE 4776.907489 1ID6.,.16:�I2- 4 .5�157�47 8.727013E-OG
_ .
.. -
2691.11101? 6862.703965 2691.1110·12 - --
6862. 70·3%5--

20 KL6 -1431 Or.9203 5 0 03. 9'28; 369· -2_85993709 0.004446558 -241465149'2 -44 75.32572 -24146.5-14-9'2' -4475.3'2:5-719

Go back to your education and income data worksheet. In cells Pl: W2 enter column labels and
formulas as shown in the tables below.

p Q
1 b1= ='Restricted Model'!Bl 7
2 b2= ='Restricted Model'!B18
3 b3= ='Restricted Model'!Bl9
4 b4 = ='Restricted Model'!B20

In the last row of the table you will find the numbers of the equations used in the formulas, if any.

R s T u v w
2 3
1 yhat HEDU WEDU KL6 yhat yhat
2 =($Q$1+$Q$2 *J2+$Q$3 *K2+$Q$4 *L2) =J2 =K2 =L2 =R2"2 =R2"3
/10000
(6.16)

Again note that we copy the values of the HEDU, WEDU and KL6 variables in columns S-U
because we need adjacent columns of explanatory variables. Also, in cell R2, the division by
10,000 is there to re-scale they values.

Copy the content of cells R2:W2 to cells R3:W429. Here is how your table should look (only the
first five values are shown below):

I :p I Q I R I s I T I u I v I w
1 b1 =
.7755_33 yhilt HEDU wrnu KL6 yh11t2 yhafl
2 b2 =
3211-526 7-3794.95 12 12 1 54-456941 401.8647041
3 b3 = 4"77-�_907 7_847129 9 12 0 61.5774329 483.2060575
- -

4 b4= -1"4310_9 7.319495 12 12 1 54,456.9411 40Ui64704i

_J_ s, 168282 10 12 0 &i'u2os2:M, 544.9944674


-
6 t3
l 348716 ·-
12 14 .,. &9_4fo1-80.1 :;79_02s1793
-

We are now ready to estimate equation (6.16) and subsequently run the RESET test 1.
174 Chapter 6

In the Regression dialog box, the Input Y Range should be Al:A429, and the Input X Range
should be Sl:V429. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output
Range window. Finally, select OK.

Input

Input Y R:an_ge:

Jnpu t � Range:
1$A$1:5A$429
1.$5$1: $V$429
[E§]

� I

!:felp
0!,_abels 0 C:onstantis ?;_ero
D Confidera:e level: EJ %
Output options
@Qutput Range: IJ Model'! $A$:1I [r¥]

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:

A I B I c I D I E I F I G I H I
1 SUMMARY OUTPUT
--
2
!
3 I Reg_ression Statistics

4 M�lfip.le R
--t--c0-4343�9!!37
2- R S.qLJare 0 . 1886510 94
6 A_OjustedR Square o,1� 0978 �64
__]___ St an-d a rd Error �992G.fOT72
8 dbservatioas 426
___._;._,
__i_
10. ANO VA
11 I df SS MS F Sic:mificance F !
12 Regression 4 1.56786E+11 391.95383356: 24.58850071 2. 5•6531'E-18
15940940T7

13 Residual 423 <6.74302E+ 11


14 Total 427 -B . 31!l 87E + 1 1
15
16 I CoeffiGienfs Starnhmi Error I Still P-v11Jue l:owe195% U�l!_e195% LoVl>er 95. 0% Uf!Ber.95. 0%
17 lnte-rcept 87242.911975 40389.39055 2.1600471% 0.031330115 7654-.091549 166631.88.7� 7854.091549 166631.8879
Ta HEDU -238'1.46§072 2419.691817 -.0.9�420222& 0..325578691 -7137'. 5 8 3 081 2374.GSO�m -7137.583_081 2374.650937
19 WEDU -4235.1094!!1 3B32.139371 ·1.105J55,3:9, 0�2597z o4 6 -11757 .51634 3297.297379' -11767.51634 3297.297379
-
20 KL6- 1088i3J?OB 1 143 9.27632 0.95175061if1 n.34il66941 -
"']3372.23-�4 ii -1159 7 .56332 3j37i.23948
1 1597 . 563_3 2
21lvhat2 993:6799334 406.21069 2.4462180-78 0 .014842729 195.2371077 1792�121 759' 19'5.2371077 1792.122759

Go back to your F-test worksheet. With 1 restriction, at a 0.05, the results of RESET test 1 is
(see also p. 239 of Principles ofEconometrics, 4e):

A I B I c
8 Computed Values m1= 1
-
A I B I c 9 m2-= 423
-
.:L Dat11 Input .J = 1
Fe= 3.86-3536
2 N= 428 -1!
- 11
K= 5 --
]___ 12 F-test F -s1 ati stic = 5.�83!183
4 S-SEu = 6.74E-t11

- 13 Conclusion= Reje-ct Ho
5 SSE:R = 6.84E-t 11 14
- pw;alue 0.014643
--

G a= 0.05 15 �
-Conclusion = Reject Ho

Next, we estimate equation (6.17) and subsequently run the RESET test 2.
Further Inference in the Multiple Regression Model 175

Go back to your education and income data worksheet. From there, go to the Regression dialog
box. The Input Y Range should be Al:A429, and the Input X Range should be Sl:W429.
Check the box next to Labels. Select Output Range and specify it to be cell Al in your
Unrestricted Model worksheet: you can place your cursor in the Output Range window and
move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output Range window.
Finally, select OK.
r ------ - - -·

, Regre:.sion LZJ �
Input
Input. I 'Range: J $A$1:$A$4Z9 [oo] eel
Input Jt Ra11ge::
I $S$1::$W$4l9 �]

t1dJ
�!,_abels D Constant is iero
D Conficlence Level: EJ·%
Oulj:rutoptions
0 Q.uf'lJut Range: j Model'!$A�� ri3

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:

A I B I c I D I E I F G I H I I
1 SUMMARY OUTPUT I
..._, ,_

>---·-
2
3 Re_qresslon Statistics
� Multiple·H 0•.434939912'
. . .
5 R .Squa!e 0 .189172727
1--

f-6_
Adjws!oo R Square 0 .179565 768
J_ Standard Er.ror 39960.53362 - -
B Observations 428
9
,__
1 0 AN OVA
1j df SS MS F SirJ!J.ificanr:;e f
-=� Regre5sion 5 1.57219E+11 31443811188 19.69121988 1_19924E-17
13 Residual 422 •6.738G8E+11 1596844247
t- -
14 Total 427 B.3'1087E+11
15
161 Coefficients Staf'fdarcf Err-or I Stat P-vaJu e i:ower95% Upper95% l·ower 95. 0% Upper95.0%
1-17
- I nlerc epl 150186S287 127386_8411 1.H8979927 0-.239070463 -100205.2101 4005782&74 -100205.2101 400578.2'674

� HE:DU -8451.558269> 11896.918·74 -0.710279-518 0:4 H 9? 3 l 39 -31840.0·8825 14936.97172. -31840.08825


. - 14936.97172
19 W E DU -13016.3&262' 17284.10885 -0 753()82657 0.451320157 -46990_0292 20957.30396 -4699CUJ292 2'0957 .30395
'20 KL6 37410.41301 5:2175.36.886 ()>_7170129091
- 0<.473 762 78S -65_ 1.�!i.. 5 §912 13991)6_3851 -65145.55912' 1�9.966.:1851
,.....,_..
21 yha12
r-
3234. i57003 4 320 :igs.i:J.85 0.74873452!) (}.4544,3 '1339> -52 57.2282 33 i1126.r�ns -5257 .12s233 117l6. 74225
22 1�hat3 -85.69352811 1·64.4649942 -0.52104418 0.6:02609294 -40.8.96&1322 237 .5790 76 -4 08.9561322 237-579•07•6

With 2 restrictions, at a 0.05, the results of RESET test 2 is (see also p. 239 of Principles of
Econometrics, 4e): ·-

'
A I B c
8 Computed Values m1 = 2
-
A I B I c
9• !l'l2 = 422
1 Data Input J= 2: -
- 10- Fe= 3.0171
2 N-"' 428
11
K·"' -
-�-

3
-- 5
12 F-test F-statistic = 3.122582
4 SSELJI= 6.74E-t11
- 13 C?nclusicm = R�jed Ho
5 SSER 6.84E-t11 14 p-value = 0.045063
- --
=

6 a= 0.05· 15 Conclusion = Reject Ho


17 6 Chapter 6

6.7 POOR DATA. COLLINEARITY, AND INSIGNIFICANCE

Open the Excel file car. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, name it cars data, and in it, copy the data set you just opened.

I
q cars data , ·�

I In;ert Workshut.tShitt-FU) �

6.7 .1 Correlation Matrix

Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.

The Data Analysis dialog box pops up. In it, select Correlation (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.

������� -�•

Dilta Amilysi� rnIBJ


[!nalysis Teals·
OK
'
Anov a: Sin9le F'actor �I
Anoya: Two-factor With Replicabon cancel

1ol
Anova: Two-factor W ithoutRephration
t•.JllQ$�(.J,
&
I
Covar�nce
:t:J.elp
Descriptive Statistics
Expcmeni!ial Sr;noothlhg
F-Test Two-Sample. for Variances
Fourier Analysis
Histogram �I

A Correlation dialog box pops up. Specify the Input Range to be Al:D393. Select Grouped by
Columns, as this is how the data are stored. Check the box next to Labels in first row. Select
Output Range and specify it to be Fl. Finally, select OK.

Input
lfiput Range:

Grol1j)ed By:
I $A$1;$0 $39�
0!:!.olumns
Qgows
[�]
� !:!!!Ip
l

0 �abe!s in firs.tro..
Oulpu t options.
0 QutputRilnge: 1�$1 lOOl
The result is:
Further Inference in the Multiple Regression Model 177

-
F I G I H I I I j
1 MPG CY!.. ENG WGT

-
2 MPG 1
3 CYL -0.ms.2 1
T 'ENG
-
-0.805,13 0_950823 1
5 WGT -0 .8.32'.24. 0'Jl97527 0.932994 1

6.7.2 The Car Mileage Model Example

We first consider the following car mileage model.

(6.18)

where MPG is miles per gallon and CYL is number of cylinders.

In the Regression dialog box, the Input Y Range should be Al:A393, and the Input X Range
should be Bl:B393. Check the box next to Labels. Select New Worksheet Ply and name it Car
Mileage Model. Finally select OK.

Input
Input '.!:.Range:

InJ')lit K'Ran1:re:
1 $.o.$1':�$393
1 $8$1:$8�393
[�]

� I

'tielp
8J..abels. D Constcintis.,ero
D Confjdence Le�el: � "lo
Output options.

0 Qutput fl_ange: �I
@ New Worksheet E'.1¥: J car Mileage Model I

The result is (see also p. 242 in Principles ofEconometrics, 4e):

j I I I I I I I
�SUMMARY A
OUTPUT
B

I
c D E F G

l
H I

-�J S liS'tistics I
_i_ M.l!_ltiple
R1;ig_re'5'sicm
R 1 0.177617509

I
5 'R Square; i1'.&046ss99
6 Aaju5t,EJd R Squar� 0.60367'5372
_]_ Standard Er.ror 4.913589267

MANO
8 ·Ol:lse.rvati·ons 392
I

VA t
11 df SS lv!S F Srg_niffc-anc-e F

1
12 Regress ii;rn 1 14403.08236 144U3.08:28G 596.5649839 U1138E-8()
13 Re�jdL1al

390 9415.910199 24.14 335948
14 Total I 391 23818.99306
Hi[
113'! Co efficients Standard Error I Sl at P-va/CJe lower95% Uo.oer.95% Lov.1er 95. 0% Upper 95. 0%.
1 7 l1jtercepl 4-2. 915 505.2 0.!!)4866841 51.4040121 8.1!2�_E-1/6 4127410251 44 55690789 41.2741_0251 44 56.690789'
18 CYL -3.55 8078341 0.145G7.5537 -24.42467981 1.31131!E-fl0 -3. 84448 5952 :J.271670729 -3.844485952 -3 .27 1 61 0 729
178 Chapter 6

Now, consider the following unrestricted model:

(6.19)

where ENG is the engine displacement in cubic inches and WGT is vehicle weight in pounds.

In the Regression dialog box, the Input Y Range should be Al:A393, and the Input X Range
should be Bl:D393. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output
Range window. Finally select OK.

. ---------
' Regressio11 [1]�
lnput

Input Y Rilngf : I $A$1:$A,$]9::J; [�]


Input1'.;R�e: $$1:$0$393 �
! C::aj
t:!elp
� k_abels D Consrant is ;:;_era
D Co��deri� Level: @=] "°
Oul;i:iut aptlnns
@QutputRarige: Ii IY)odel'!-$A$1 �

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is (see also p. 242 in Principles ofEconometrics, 4e):

A I B I c I D• E F I G I H I I I
1 SUMMARY OUlPUT I
-
2 t
3 I Rearess:ion Sla>t1!Slics
__!_ Multiple R 0_836237128
r
_i_ R Sciuare 0_1)�9292534.
-� A'df�sted R Sq:u are -0.596.9
- . 6747&- �
L Standar'.1- Error 4-291i530924
8 Observations 392'
9
--
10 AN OVA
11 cff SS MS F S1qnificance f
J1_ Regression 3 166-56-444 5 5 5-2 . 148001 300.7635141 7 .58 54E-10>1
13 Resi•dual 3!!8 7 152-549()5 7 18A6017798

14 Total 391 23818,99306 I

1'5
1s I \]oeffide11fs' StancJa.rd Error tStat P-va.lue Lo�;r,r·95% Upe_er 95'%
Lowe1 .95. 0% Uep_ef950%
�lnterc_ �J>l 44_37096115 1.4 80685053 29.9665(}844 5.3199E-103 41.459791 47.2821313 41-459791 47.2821313
�CYL - 0 . 26' 7 796 7ll.7 0.41J0673i5 -0.64831Z588 0.517166276 -1.07S9270�4 0'.544333601 -1.lll'.9.927094- 0 ..5443 33 6.01
EN'G -0..9 t�6i'.3% 0. 0 00250068 -1 . .5;3622488�' 0.1252"98269 -0.028894392· 0.0035413473 -(l.02:8894392. 0.003546'4 73'
-
WGT -0_0057078fJ4 0. 0 00713919 - 7:99 5 1 42'549 1 50112'E'-14 -0.0071 11518 -0 _004311425 -0.0071'1151.8 -0.1104304.25

To test the null hypothesis H0: {32 = {33 = 0 against the alternative H1: {32 * 0 and/or {33 * 0, we
need to run the following restricted model:

(6.20)

Go back to your cars data worksheet, and then to the Regression dialog box. For the restricted
model (6.20), the Input Y Range should be Al:A393, and the Input X Range should be
Bl:B393. Check the box next to Labels. Select Output Range and specify it to be cell Al in
Further Inference in the Multiple Regression Model 179

your Restricted Model worksheet: you can place your cursor in the Output Range window and
move it to that cell to do that, or type 'Restricted Model'!Al in the Output Range window.
Finally select OK.

Input
lnput'Y__'Range: $:Ao$1: $AS393

Ihput�·R<llilge: 1:$0$39·3
t!elp
�!,_abels D Const.ant
D Confidence Levd;

Output options

Qutj:lut Range: Moderl$A-Sl


,· R·egr�----- ----- m�
I
Excel informs you that the output range will overwrite
the data in the specified range. With 2 restrictions
[�] data. Do press OK to overwrite
existing

1� in the null
� hypothesis H0: {32
� l
= {33 = 0, and at
a = 0.05, the results of the F-test are (see also p. 242
is, �ero
of Principles ofEconometrics, 4e):
EJ %
1 Data Input J=
0 11 N= [oo] 39'2
K=
SSEu= 71i52_549

SSER=
a= G_05

8' Compuf.edl V.alues m1=


9 m2 = 38:8
10 A B' Fe = 3_0•18.982
c
11 2
12
2 F-t,est F-statistic = 4-298024
---
13
1 Conclusion = R:ei�.ct .HI?4
= _q:D•14248
4
C�nclusion = Reject Ho
5 7321. 234
6
7
2

'14 �:vaJu�
15
CHAPTER 7

Using Indicator Variables

CHAPTER OUTLINE
7.1 Indicator Variables: The University Effect on 7.3 Log-Linear Models: A Wage Equation Example
House Prices Example 7.4 The Linear Probability Model: A Marketing
7.2 Applying Indicator Variables Example
7.2.1 Interactions Between Qualitative Factors 7.5 The Difference Estimator: The Project STAR
7.2.2 Qualitative Factors with Several Example
Categories 7.6 The Differences-in-Differences Estimator: The
7.2.3 Testing the Equivalence of Two Effect of Minimum Wage Change Example
Regressions

This chapter considers the use of indicator variables to add more flexibility to the regression
model. We work with different examples to illustrate the use of this tool.

7.1 INDICATOR VARIABLES: THE UNIVERSITY EFFECT ON HOUSE PRICES


EXAMPLE

Consider the following house price equation:

/31 + 81 UTOWN+ {32SQFT + y(SQFT x UTOWN)


PRICE = (7.1)
+ {33AGE + o2POOL + o3FPLACE + e

where PRICE is house price in $1000, SQFT is number of hundreds of square feet of living area,
and AGE is the age of the house in years. Three dummy variables are used to indicate the house
location (UTOWN = 1 for homes near the university, 0 otherwise), whether the house has a pool
(POOL = 1 if a pool is present, 0 otherwise) and whether the house has a fireplace (FPLACE
1 if a fireplace is present, 0 otherwise).

Open the Excel file utown. Save your file as POE Chapter 7. Rename sheet 1 utown data. In
cell Gl of your utown data worksheet, enter the column label sqft x utown. In cell G2, enter the

180
Using Indicator Variables 181

formula =B2*D2; copy it to cells G3:G1001. Here is how your table should look (only the first
five values are shown below):

A I 8 I c I D I E I F I G
1 price s_qift a.ge ut.own pool fplaoe . sqft x: utown
2 205.4.52:-
23.4,S 6 0 0 1 0·
3 185.328. .20.03 5 0 0 1 a
4- 248.422: 27.77 Ei 0 0 n (}

-
5 154. 6 9' 20•_17 1 0 0 a a
5 221-801 2fi45 0 0 0 1 (}

In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Bl:GlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
house price equation. Finally select OK.

Regressio n tIJ [g)


Input
Inpuf!'..1R:<inge : I $A$1:�s1001 [oo] �
Input'� R;;inge: I SBs1:$tis1001 (� �
C!elp
� b,abels. 0 Constant is. iero
0 Confidenre Level� � o/o
Output optioris

0 Qutput Range: �1
®New Workmeet eJy: I house price eqlli3tion l I

The result is (see also Table 7.2 on p. 264 of Principles ofEconometrics, 4e):

A I B I c I ri: E F I G I H I I I
t
1 SUMMARY OUlPUT I
2
3 Re:Jression Sf:atislios
j

5,
Multiple· R.
R Sqwi.re
0· 9330433·69
..

IJ.8705'6g·928·
1
f--- I
fi,
f----
AdJ_usl� R Sq�are �L 869,l' 8 7873<
I
7 St a·nd a rd E rroer 1 5 225'2'114'1'
81 Observations 1 0 00 1i
��1 ANOVA
11 rif SS MS F Sicmmc;-anc.e F

1I
12: Re-gress.ion 6 15482·61.7 3 2-580·43_&21& 11tU82743 a

1
-

1f
f-,--
Re-sidual 993 230184.413 231.8070624
1 4 Total 99·9 1ns44G.14-3 I
15
15 Goefficien
. ts Stl'l'ndard' Error f Stat P-va/u.e Lower95% UoDer95% lower95.0% Uooer95.0%
:;>4_49,9 98'329 8.13332E-05 12. 3496232,6
Ji_ 1.nteroept 6.191721216. 3.956"893801 36.55034333 12.34962326 36.6.5()34333
1lil
1�
sqfl 7.'6121766: 1� 0.2�517fi45.8 31-_047746�1 1-8 674E-1 48 7_ 131.05J1�� 8.093300056 7_131053169 8.09'330.0056'

J.! a:ge
:2� u1own
- 0· 1 90 0 86·36 8
2:7.45295:601.
. ' 0.0512:04606, -3.712290'812'
&.42iss204' J.2594465.55
0'. 000'2168 1 2
0.0011542:08
-0.290568043 -0.089604732 -0_290568043 -o_os.9604 nn;
·10_92485.33 4i9·s·w·sar2 10_9248533 43_9,§"1 OSB72:
2T pool
--

4.3771'64078, 1-19669
° 168'9 3.65772104 CLOOCl267B>36 2.028829359 6. 72'5498798
.. -
2.028 �29359 ,5 _72508798,
22 fpla.ce 1.64917557 0.9'7196679'1, 1.'696758113 01.090055792 -0.258 1494.75 3_556500614 -0<.2581494.75 J_S56.S00614
23 sg!\ x utown 1-29940476 0.3 3204 7741 .3:913307036 9.72454E.-05 0_6478089.51 1 . % 1 00057 0-. 64 78 089·51 1.95100057
182 Chapter 7

7.2 APPLYING INDICATOR VARIABLES

7.2.1 Interactions Between Qualitative Factors

Consider the following unrestricted wage equation:

WAGE = {31 + f32EDUC + 81BLACK + 82FEMALE + y(BLACK x FEMALE) + e (7.2)

where WAGE is hourly wage and EDUC is years of education. BLACK and FEMALE are dummy
variables for race and gender.

Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it cps4_small data, and in it, copy the data set you just opened.

10 � l f cps4; smaD data/


Lr m3lrt: Wmli:sheat (Shift-'-Fll). Q
In cells Ml:P2 of your cps4_small data worksheet, enter the following column labels and
formulas.
M N 0 p
1 educ black female black x female
2 =B2 =K2 =F2 =N2*02

Note that we first copy the values of EDUC, BLACK and FEMALE in columns M-0. Next, we
create our interaction variable in column P; this way we end up with contiguous columns of
explanatory variables.

Copy the content of cells M2:P2 to cells M3:P1001. Here is how your table should look (only the
first five values are shown below):

M I N I 0 I p
1 educ black femal� black x female

_1_ i 6i 0 1 0

-
3 i2' 0 a 0
_1_ 16i -1 a 0
5 i4 1 i 1
-

6 12' 0 0 0

In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Ml:PlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Unrestricted Model. Finally select OK.
Using Indicator Variables 183

. ------ ------ - -
! Regres>ion [1][8]
Input
Ihp11t '!_Range:

lmput {Range:
I SA$1:SA�l001 [00)
1�s1:$P$1001 � � :le

tJ.el p
� !.,abels D Constant�s £:era
D Co.n�dence le�el: EJ %
Output opt OM�
0 Quip.;t Raplje: �1
®New Wciksheet�ly: J Unrestricted MmdEll I

The result is (see also Table 7.3 on p. 265 of Principles ofEconometrics, 4e):

A B I c I [)' E I F G I H I I
SUMMARY OUTPUT
r-1-
2
j
3 Reg_i:essipn Sfolistic;s
4 Mult.ipie,R 0.4570-09'544
5 R Square 0.2088.577
· 2'.4 l
Er A,jjuste:dj Squ.are 0.2056772:52
11 .4'389'2:0911
,_]_ Standard Error
8
9
b bs ervations 1000 t j

To AN OVA
11 elf SS MS F SianfficMce F I
R�ressi-0.n 4 34310.76063 65.66879348
$. 8592.690158 2_52617E-49
13 Residl:lal g,95 130194.6671 no:a4s9111
14 Total 9'9'9 164565.4278' I
15
16 C-Oefficier:rts SiandareJ Error ISfaf P-vaJue Lower95% Urwer 95% Lower95.0% 1Jooer95 0%
•.

-5.28115·�154
.J1_ lnteccept
educ 2.070391064.3
1.9·00457714 -2.778873387
0.134878085 15.35008912
0.005557491
6.90939E-48
-9.010543846 -1_551774462 -9.010543 84 6 -1_551774462
;_ilo5712491'l 2.:3J5068788 1.?osi1249a 2.335068788
�.
...ll. blad -4-.116907'7148 H74713857 ·2.34 9 15456 0. 0•19010856 -7. 651688634 -0.686465662 -7.651688634 -0_686465662,
za femal•e.
>---
-4-.78460•1i i7 o .7734 nsi2 -5.'i BG34 7764 s:9s137E-10 -6,36231665'1 3 2558 91704 -6.302J166!i1 -3..256897704.
- .

2:1bl.a-ck x female 3.844i9t3�·�1 2.3:276S282B U51575J31 0. 098�36686 -o. r233n831 8_411965813 -0.723377831 8.411965813
-

To test the hypothesis that neither race nor gender affects wages (H0: 81 = 0, 82 = 0, y = 0), the
restricted model is:
WAGE = /Ji + {J2EDUC + e (7.3)

Go back to yourcps4_small data worksheet. From there, go to the Regression dialog box. The
Input Y Range should be Al:AlOOl, and the Input X Range should be Bl:BlOOl. Check the
box next to Labels. Select New Worksheet Ply and name it Restricted Model. Finally select
OK.

� Regressim11 LllIBJ
Input

Input r_ 'Range::

lnput �.Range::
I $A$1:jA�1001 �I
I $8$1:$8$1001 � � 1

t!e'lp.
0 !,_abels D Constantis l,er-o
D C:onBdence Level: � �a
Oulput options
0 QJJtput·Range: �1
@tNew Worksheet �ly;: J Restricted Model I
184 Chapter 7

The result is (see also p. 266 of Principles ofEconometrics, 4e):

A B I c I D I E I E I G I H I I I
J_ SUMMARY OUTPUT
2
i-
3 Regression Statistics
MultipleR 0:4182�6152
,_i_
5 R Square C)_ 1 i'�97!�71
r
,__

,_§___ Adjusted R Square 0_174�44989


1 Standard Error 11 _66375696
8 Obs�rvations '1000
--·

l
10 AN OVA
1-11 . rff SS MS F Siqnifice1Fce F
-<


t
Regressi o·n 1 287St4.28782 28794.281'8.2, 211"6·554318 1.24945E-43
13 135771.1 J.91� 136Jl'4.3.2264
-

Residual 998
T4 Total 999 1 64565.4278:
1s.1
Rfi
1 &1
lnte�c .ep_
educ
t
_
Coefficients
-6-71032842
--+- ·
1 . 980287588
Stilmlard Error

0_ 13l)1'17J.72
t Stai
1 . 91415.5839' -3. 5_Q5Gi3321.14.
14.548JB244
P-11alue
0.000475773
1.2:4945E-43
Lower95% U[!per·95% Lm11er 95. O"Ai
-10.46656027 -2.954096574 1 0 46 656027 -2.%409657
·- - -
- . ' · 4
1.713178506 22473966691 1. 71317850-0 2:24 73 9 666:9'
Lfpper % . O')f]

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen,
rename it F-test.
[ti ,I'S' II
I 1 Insert Work1h�et (Sh.tft-Fil] II

Open your POE Chapter 6 Excel file, go to its F-test worksheet and copy its content in the one
you created in your POE Chapter 7 Excel file. Go back to each formula and delete the references
to POE Chapter 6 Excel file: [POE Chapter 6.xlsx]; this way the F-statistic will be computed
based on the regression results of your current Excel file: POE Chapter 7. Your F-test template
should look like the one below:

A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!Bl2+1
4 SSEu= ='Unrestricted Model'!C13
5 SSER = ='Restricted Model'!C13
6 a=
7
8 Computed Values m 1= =Cl
9 m z= =C2-C3
10 Fc = =FINV(C6,C8,C9)
11
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(Cl2>=Cl0,"Reject Ho'',"Do Not Reject Ho")
14 p-value = =FDIST(Cl2,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")

Note that the extension to your POE Chapter 6 Excel file might be different than .xlsx if you
chose to save your file in a different format.
Using Indicator Variables 185

With 3 restrictions, at a 0.01, the results of the F-test are (see also p. 266 in Principles of
Econometrics, 4e):
A
8 Computed! Valu.es m = 3
A I 8 I c
9 m2 = 995
1 Data Input J= 3
- 10 Fe= 3.80134471
2 N= 1000
-
H
3 K= 5
12' F-tesl F-statistic = 14205882:55
4 SSEu= 1301 �4.'6671
- 13 Co·nclusian = Reje-ct Ho
5 SSER= 135IT1.1399 14 •p--value = 4;S.309�E-O�
5 u= 0•.01 15 Conclusion = Reje_ct Ho

7 2 2 Qualitative Factors with Several Categories


. .

Consider the following unrestricted wage equation:

f31 + f32EDUC + 81BLACK + 82FEMALE + y(BLACK x FEMALE)


WAGE = (7.4)
+ 83SOUTH + 84MIDWEST + 85WEST + e

where SOUTH, MIDWEST and WEST are region dummy variables.

Go back to your cps4_small data worksheet. In cells Ql:S2, enter the following column labels
and formulas.

Q R s
1 south midwest west
2 =I2 =H2 =12

Note that all we are doing is copying the values of SOUTH, MIDWEST and WEST in columns
Q-S so as to create columns of explanatory variables next to one another.

Copy the content of cells Q2:S2 to cells Q3:S1001. Here is how your table should look (only the
first five values are shown below):

Q I R I s
1 south mid:west west
-
2
-
1 Q 0
3 0 1 0
4 0 0 1
5
-
1 Q 0
6 0 0 0

In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Ml:SlOOl. Check the box next to Labels. Select Output Range and specify it to be
cell Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model'!Al in the Output
Range window. Finally, select OK.
186 Chapter 7

- .

Regression [RJ@
Input

'Input Y Range I $A.$1=$Jls1001 [�]


Input� Range. I $M$1=�$1001 [�)
: ca::H
!:J.elp
��abets D Constant is £era
D Confidence level·: �%
Output options

® QutputRange: J:J Model'! SA�il [�]

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.

-------

: 11\icrns.ott Office Excel

? Regre5siorn -Outputrar-1ge·w]I mverw.fite existin@ datii. Press OK to civera'fite data1n range

OK fJ J Can�el ] J Help

The result is (see also Table 7.4 on p. 267 of Principles ofEconometrics, 4e):

.LJM MAR
YAO
�UTP
�--UT
-+-- B ..::...
.... -'- -C"-- -'- D
-= _,_. E
-.::. �- .F
- -'- C ...
-..:. -'- H
-'-'- --'---'-----'1
S
- - - - - - _ -" - - - - -

l
3 lf-----R-eg-ro-�-8-ron_ _Sl_a_tili_u_Gi'.i
___

=H M u ltip� 0.457853351
5 R Sq_uar_e -0 21 8886758'
6,Adjustw RSquare b.2'i\3374B71
r1standa�d Emu 11.38335986
-
6 Obse�rat ions - i 000
-g:-
10 ANOVA
11.�
J df
_
ss
_
�! s F
_
-
s��nmran ce _

____
F
12 R81)ressie>n
________
7
________
36021_ 19302 5145Jl�4717______
39-71175875 2.3·198
· 1 E-49
13 Residual 992 128544.2347 ·129.5808818
14 Total 999 164565_4276
15- 1
1-----�
16 Caetficie.nts Standard EJror t Stat P-�<1lue fovm 115% U er 95% lower 95.0% U
��er 95. 0%
+-- •
-4- B -0_62_0-99 _8_6 __
2 02_$_
69 -1 -
1 4-2 �-
. 3-
69 -1-
1 8-6-� 1 80 -2-21 _1 _
0-. 0- - 6 -S- -7-87- 2� -7-1- --0 �82 -5-1 �
.2- 9 1-262
- --8 -
7- 2 _ 7_
8 '7-2-
0 1 _- 0 82 -5�1 9-1 -
2 6-2-
. _ . . . .
------ _ - - - -

2.071230565 0.1"3446671 15.40306709 3.685E-48 1.807354 788 2.335106343 1.8°'735478B 2.335106343


-3.905465125 1.7B52�7B24 -2.136394972 0_02901sn3_ -7.410742s3'1 -0.40D�B7414 -7_41 o T428Jf -0.4ori1sw14
4_14412.9209 o)G!i1ff8f01--=-s.1625o243r 1_gm1E_-09 -6..2,s4B21342 .:J23:3.ii3-:1076 -6.2:5482134-2 -:i.23'.l"!J1016
3.825020988, 231&375246 1.563604077 0.118229512 -O.S24M;-rns- s.11i50-35g3 -o_giiM.61718 s.114503693
-o-.449905574 1J125023s17 -o.4:ia 922os9 o.660813586
- -2.4-61.3U4-791.561558331 -2.41);1369479 1.561558331
.
-2.608405756 1Jl59643B66 -2.461587 181 0.014001531 4.168780659� -0_529004913 4.687806599 -0.5290049·13
a·.9s6633193 1_tis!ls1507 -o.930948447 0�35210673 -1.093103{) 14 3.066370001 -1.09'3,103514 3.066310001

To test the hypothesis that there are no regional differences (H0: 83 = 0, 84 = 0, 85 = 0), the
restricted model is our old unrestricted model (7.2).

Go back to your cps4_small data worksheet. From there, go to the Regression dialog box. The
Input Y Range should be Al:AlOOl, and the Input X Range should be Ml:PlOOl. Check the
box next to Labels. Select Output Range and specify it to be cell Al in your Restricted Model
worksheet: you can place your cursor in the Output Range window and move it to that cell to do
that, or type 'Restricted Model'!Al in the Output Range window. Finally, select OK.
Using Indicator Variables 187

-
. - _
-_____ ---- _-

I Regression l1JIBJ
lnput
Input Y Range.: I $A$1:SA�1001 [�I ��
Input� Range: :$M�1:$P$1001
Cancel
I

tielP
0!.al:lels D Con �tant is �ero
D Confidence Level: @=] %
Output options
0 QulplJt Range:: I1 Mod!i:I' I SA$ l �

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. With 3 restrictions, at a = 0.01, the results of the F-test are (see
also p. 268 in Principles ofEconometrics, 4e):

A B c
8 Computed Values 3

I 1
A
Data Input
I B
J:
I c
3
9
10
m2 =
Fe= 3_6.014114549
99'2

2 N= 1000
11
' T K= s
12 F-test F�statisti,c: = 4.24556:55-5
4 SSEu= 128544.2347
13 Conclusion = Rejed Ho,
5 SSER= 130194-6611 14 p-value 0_005427611
- =

& a= 0_01 15 Conclusion =' Refoc_t f:!o'

7 2 3 Testing the Equivalence of Two Regressions


. .

Consider the following unrestricted wage equation:

/31 + /32EDUC + o1BLACK + o2FEMALE + y(BLACK x FEMALE)


WAGE= + 81SOUTH + 82(EDUC x SOUTH)+ 83(BLACK x SOUTH) (7.5)
+ 84(FEMALE x SOUTH)+ 85(BLACK x FEMALE x SOUTH)+ e

Go back to your cps4_small data worksheet. Insert four columns to the left of the midwest
column R (see Section 1.4 for more details on how to do that). In your new cells Rl:U2, enter the
following column labels and formulas.

R s T u
1 educ x south black x south female x south black x female x south
2 =M2*Q2 =N2*Q2 =02*Q2 =P2*Q2

Copy the content of cells R2:U2 to cells R3:U1001. Here is how your table should look (only the
first five values are shown below):

I R I s I T u
1 educ x south black x south female x south blac:kx female x south
2
-- 16 O• 1 0
3 () 0 (} 0
T
- (I 0 0 0
14 1 1 1

I± -
()
-
O•
-
0 0
188 Chapter 7

In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Ml:UlOOl. Check the box next to Labels. Select Output Range and specify it to be
cell Al in your Unrestricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Unrestricted Model' !Al in the Output
Range window. Finally, select OK.

' - ---------------- 1 -c

Regrnssi on 1!1J l8J


Input
Input'.!'., Range:
DKtsj
5A�l:.$A$1001 �
lllput �Range: I $M$1:$u.$1001 �
•Cancel
I
tlelp·
0babels D Constant is f.er·o
D Confidence Level: �%
Output options

0 Qutput Rafilge:· I J Model'!:�$:tl �

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is (see also column (1) in Table 7.5 p. 269 of Principles
ofEconometrics, 4e):

A B c D E F H
1 SUMMARY OUTPUT
-
2 I
3 Re mssion Slatistir;s
4 Multiple.R 0-458405258
5 §�ware
R Q_.Li R1053Bj
6 Atljuste.d R S_q;uare 0.2rf2954794
-11.4585071
7 Standard Error
8 O'bs ew.ati om;. 100()
J
9
10 ANOVA
11 I df SS MS F Sf nmcanc.e F
12 Regressfon '9 34.56 UM 886 3842.335429' 29-2•64 37183, .2'. 00107E-45
13 Residual 129984.4089 13 1 .29'73827
14 Total 164565.4278
15
<----- 1
16 Coefficients Standard Error t Stef P·watue Lower 95% er 95% Lower95.0% U. er U.. 95 0%
17 lntercep� -6.16 05!'!72133 2',336627-655 -2.826968225. 0,�047937�2 -11.19088392 -2.0:W,260348 -11.1908S392 -2.02026.oJ48
18 ed'Oc- 2.17255-37'05 o j 654&3 BBB 13_05 120125 4.B3652E�36 1_s4.539{12·1 2499'21628:9 1-8458.91.121 2499216.2119
19 black -5.06935'.99'�16 2.543060109 -1.9255-5587], 0.05444•6013 � 1Q27q00343 O.Q9!283596' -10.2750()343 Cl.0972835�6
-20 femal·e 1 ·5.00501788� Cl.8990CP7421 -5.567337678, 3.330B1E-08 -6.76i92561H5 -3.240898937 -6.769256636
-3 . 240898 93-7
2'1 black x female· 5.305574257 J.49726GB24 Ui170'63045 0.12957•0005 - 1 55 733295 5
. 12.161848147 -·1_557J32955 12. 1684.8147
22 s.outh 3 . 943 9103 83 4.048453462 0.9'74177033, 0.3302066-12 -4.000625124 1Ul&8445!l9 -4.rioo&2s124 11.88844.589
.2'3 edlilc x sowtn· -0.3-0854104 Cf.285734274 - 1 . 0 798 1 8 0 99'
' 0.26048·6184 -
0 86925.542 3
. . 0.25217i34-2 -0.869255423 CJ.2521733421
2:4 blai;k X SCJUth 1.704395981 3.633326787 0.4-0910065& 0.6391009'76 -5.42551027-6 8.834302238 -5.425510276 8._834"302;238·
25 femal-e x sou1h I 0.'9 011119838 1J72§6���2 0.50834187� 0.61 B2SB56 -2...5774923- 94 4.3J�?�2.0�·9 -2.57!�923�4 4.37973-2()S9
2.6 black ·x-female x south 1 -2 . :-935833839 4-.787,647()47 -D�S:t3210166, 0: 5398-782-'32 -12_3:fo93553 6.459'2&7852 -12.33093553 &.45 926785'2'

To test the hypothesis that there is no difference in the wage equation between the southern
region and the rest of the country (H0: 81 = 82 = 83 = 84 = 85 = 0), the restricted model is our
old unrestricted model (7.3). With 5 restrictions, at a = 0.1, the results of the F-test are (see also
p. 270 in Principles ofEconometrics, 4e):
Using Indicator Variables 189

A I B I c
B Computed Values 5
A I 8 I c
9 990·
1 Data. Input J= 5
I- 'w 3AH573503B
2 N= 1000 ,_
1--
3 K= 10: ,J.:L
,_g_ F-t·est F-statistic 0.3202778�2
I- =

4 SS'Eu= 1.29984.4089 r
1-- ,_rr_ C�nclusi-0n = Do Not Heje<:t Ho
5 SSER= 130194.66711 p-value- 0_900944.SSS:
I__:)_!_
=
1--
5 Cl= 0_01 15 Conclosicm = Do Not Reje<:t Ho

Note that as explained on pp. 268-269 of Principles of Econometrics, 4e, estimating (7.5) is
equivalent to estimating (7.2) twice-once for the southern workers and again for workers in the
rest of the country.

We first sort our data according to the region of origin of the workers. Go back to your
cps4_small data worksheet. Go to the Data tab in the middle of your tab list on top of your
screen. On the Sort & Filter group of commands, select the Sort button.

, deor

!.- Reappl}'
-� Ad'll arrmd
S.ort & Fi'lte'

Note, alternatively you can select the Sort & Filter button in the Editing group of commands on
the Home tab. On the drop down menu, select Custom Sort.
U �ort.A.tol

U S.QrtZto A
-�-

I rm C_y:�tom s�ut...

Y= filter [t
Sort & Find & ,;:tear
Clea� T
Filter Select �
K •r, I Rtap�lt
Edltmg kt

A Sort dialog box opens. Select the box next to My data has headers. Select the south dummy
variable column in the Sort by window. Values should be selected in the Sort On window and
Smallest to Largest in the Order window. Finally, select OK.

' ------------------ - �

� ffi �

column Sort On· j oraer


Sort by
l��l!i:. ::::=:=_ :: :�:::::=:Jij] v -.1 I Smallest
I,...a\V-. lu-es--------- to Largest cancel ,L

In the Regression dialog box, for the non-southern workers wage equation, the Input Y Range
should be Al:A705, and the Input X Range should be Ml:P705. Check the box next to Labels.
Select New Worksheet Ply and name it Non-South Wage Equation. Finally select OK.
190 Chapter 7

- .
� -
Regression m lKI
Input

Inputy R<inge·: J$A$1:'$11$705 [il �


[ Cancel J
Input l!. Ri:!nge :
I �$1:$P$705 �
� !,;abels. D Constanh �ero
D Con§dence Level: � %

Outpufopbom;
0 Qutpl.Jt Range: J Rec.,. _•ed �1
@New Wor-ksheet�ly: j outh Eciua'tionl l
Wage

The result is (see also column (2) in Table 7.5 p. 269 of Principles ofEconometrics, 4e):

A I B I c I D I E F I G H I I
1 SUMMARY OUTPUT I
2
3 Re�rossion Stafistjcs
4 Multiple· R 0.470626476
t-
1i R Square 0_22 148.928
f-
El ,Adjustoo R Squam 0 .21703428:3
>---
7 Standard Ermr
f--·-
1_2 3943329 t I
8 0 bs•ervatlons 704 I
� �

_.!..
10 AN OVA
11 df SS MS F Sii:/nifica-nce F
12 Regression 4 2-5346.00835. 6336"502086 4�L71704395 UB666E-3T
>--
13 Residual 699 B�08l'L4S154 127.4513641
f-
14 Total 703 114434.46991 I
15
161 .coefflc:fents Starrdaro Error t Stat P-value Loww!l'5% Uppe1 !15%· Lo�;er 95. 0% Upper\15.0%
17 ln.tercepl -6. 505572:13.3 2.302150012 -2_86930569'2 0_004237901 -11-12552952 -2..085614746 -1 '1. 125 52952 -2'. 08551474·6
18 educ 2'- 17:2'55 3 705 o� 164007663 -13.246659'7 6-7&256E.:-36 1.850547039 2.49466037 i._8505470r39 2_49456037
19 black -5.089'35�·91'.6 _].�94059.954 -1.9543�3537 0 . 0 5 1 0530 1 -10.20·20782·6 0.023358424 -10-20207326 0·. 02335.8424

][ female
21 black x female
_5,_00507788-6
-f>.305574257
0.883742296. -5.650715684
3.445663602: 1.539783005
-
2.32'704E-08
0_ 124065
, -
. 8()1
-&.7441�2'013
- 1 .45S51&ot5
-3_2.66043 76 -6_744112013
-
12_0 7066453

-1_4595 16Wi
-32660437;6
1 2. 07066453

Go back to your cps4_small data worksheet, and then to the Sort dialog box. Change the Order
to Largest to Smallest, and select OK.

....��-----=-=-
. -- -.---
- - ---=--- ..:.---�- --- - �
- - - - - - - �--
- -
- -

'Sort rl]�
I �j a_dkl Le�el II x Q.elete 'Lewel II 0- �opy Leve'I 111 � � My data has b.eaders

Columrn Sort On I Order


I south ·I�V-a ul-e�, - 1 -1·:·,,·� l �. -�. --·· ·- -;;s-f - .
- ··- •--· ···t--fu-Sinal �
. . -.-.-- - --··-- ---- , - �)
Sort by
�------
v
I
-------
.,
t .. .�--- ·------- ·-·--- -·--l t I[ OK [)J [ Cancel ii
We are re-sorting our data so we will have variables labels when we run our regression analysis
for the southern workers.

In the Regression dialog box, for the southern workers wage equation, the Input Y Range
should be Al:A297, and the Input X Range should be Ml:P297. Check the box next to Labels.
Select New Worksheet Ply and name it South Wage Equation. Finally select OK.
Using Indicator Variables 191

;' R·e- -_n_______________ ,�-�


g _es_sio
-r

Input
lnputYRan�e: I $h.$1::1\A$297 �
Input ![ Range : 1$<1$1:.$?$197 �
t:!e'lp
� !..abels. D Constant is !!:ero
D C."Onfidencetevel� �·-y.
Outµut opb" ans
0 Qu:lputRange;: I ·r let th
��QU E r:iJ I
age

E) New Worksh�et.Ely: I South \!\/age Equatonl I

The result is (see also column (3) in Table 7.5 p. 269 of Principles ofEconometrics, 4e):

A I B I c D I E F G H I I
_j_ SUMMARY OUTPUT
2
_3 I ReQression Stalfstics
4_ 'MlJltipile R 0-429191687
--
-
5 R Square
6 Adjuste.� H Square
Q_ 184iCJ5504
D.172S91834
7 Sta.ndaid E'r.ror i 1.as478:>91

8 O.t•servat:i1Jns 296
-
-M ANOVA T
-11 I
�R��ssion Re.sidual
df

2�1
4
SS

408% 9473� ...


92.34.260·13 9
MS
23 08.5650'35
140.5359016
F
1·6.426870}5 -
Siqnr'lict?rios F
3.79382E-12

14 ! Total 295 50130.2075 I I


15

'17 lnte-rcept
1B educ
Goefficienf<;
-2.661651751
1.8640126,64
.
Sfarrdard Etror t Stat
3 .42041342 -0-7781S9&02
0.24026821;2' 7.758049�32
P-v11/11e
OA37101217
1.46·646,E-13
U:i111·er95%
-91.393546684
1.3,91,126905
Uboer95% Lowe195.0% Unoor 95. 0% 1
4.0 70'22 3'1133 -9.39'3546684 4.0702231 B3
2.33\5�96423 1.391128905· 2.3368%423

*!
tlhl a ck
fe m al e
1 tllack xfuma1e
-3.384953936
-4. I039580fl9
2.3&9740418
2.57926843!!
1.58062'1274
3.:J.82738.729
-1.3.1237365
-2.59642086
0. 70 05390 04
CL 190428274
0.009r89!l453
Cl.484150483
-fl..46134 9801
-72�4857006
-4 .28 79 9509
1.i691'4�19.3 -8:461349801 1:69142193
-0.993059091 -7.214857006 -Oi.993Q59091
9.027475927 -4.287995·09 9.027475921

7.3 LOG-LINEAR MODELS: A WAGE EQUATION EXAMPLE

Consider the following wage equation:

ln(WAGE) = /31 + f32EDUC + 8 FEMALE+ e (7.6)

Go back to your cps4_small data worksheet. In cells Xl:Z2, enter the following column labels
and formulas.
x y z
1 ln(waee) educ female
2 =ln(A2) =M2 =02

Copy the content of cells X2:Z2 to cells X3:Z1001. Here is how your table should look (only the
first five values are shown below):
192 Chapter 7

I x I y I l
1 lnlwaget educ female
2 2_929'524 16. 1
3
-
3.25&172 14 1
4
-
3.766-9'97 16. i
-
5 2.956472 12 0
6 2.6396.5t 14 i

In the Regression dialog box, the Input Y Range should be Xl:XlOOl, and the Input X Range
should be Yl:ZlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Log-Linear Model with Dummy (Dummy Variables is another name used for Indicator
Variables). Finally select OK.

--
-�

�rn� n m�
Input
Input! Range:

Input� Range:
I .$>::;$1: $X$1001
$Y$1:$Z$1001


� eel

t!"�
�Labels D Constant isl/HO
D CGnfjdence Levd: �%
Output options

0 Qutput Rarnge: �1
0 New Worksheet B)y: I 1ear Mod et 1A·ith Dummy I

The result is (see also p. 272 of Principles ofEconometrics, 4e):

111 A I B I c I D I E I F I G I H I I
I
f
1 SUMMARY OUTPUT
'2
� I Reqressioa Slatislics
I
I
4 Multiplt)R 0.4704•&4761
5 R Square 0.221337091 I
c--g- Adjuste� R Square o.219775079
cl-.
f--"�
S.t�ndard. Error 0.51286-2309
8 JObservat1pns 1000
9 I
to AN OVA I

.iti
11
Regressic;r1
R�s1dual
df
2
997
SS MS
74.54206772 37 .271 (}3386
262.238&647 -0.26302:774 8
F
14-1.7000075
Sig_nificance F
6.B8208E-55

336.7807:125
-

Total ·999
151
161 Coefficients Standard Error t Stat P-11aiue Lo�1er 95% ue.eer95% Lower95.0% ueeer!!5.0�
J:1 1 ntew erit 1.553857936 0.()843785>78 19.60056664 1.299'.IE-72 H882a7955 1.819447917 1.46828795,5 1.819447'917
18 educ 0.09624B417 OJJD6036534 15_9443:1559 :n&1'J2E-51 o_os440:2547 o:1oao·94187 !Unl440264 7 O_ ioB094187
1'9 1iemale -0.2432 1 395 8 a·_o32m.s 05 -7.4J14 8'4915 2'.30536E-13 ag;91 24 -ojo7436652
-- . . ·-·-

-0.307436552 - 0_1 7 6 -fL 178991264


7.4 THE LINEAR PROBABILITY MODEL: A MARKETING EXAMPLE

Consider the following equation:

COKE = {31 + {32PRATIO + {3 3 DISP_COKE+ {34 DISP PEPSI+ _ e (7.7)


Using Indicator Variables 193

f 1 if Coke is chosen
where COKE=
l 0 if_ Pepsi_ is
_
chosen

fl i� store di� play i� present for Coke


DISP- COKE=
l 0 if store d1spaly is absent for Coke

fl i� store di� play i� present for Pep �i


DISP-PEPSI=
l 0 if store display is absent for Pepsi

and PRAT IO is the relative price of Coke to Pepsi.

Open the Excel file coke. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it coke data, and in it, copy the data set you just opened.

lt:l K" I �
J 11nmrt Worhh<'et (S:h1ft-ill] � L-v'

In the Regression dialog box, the Input Y Range should be Al:A1141, and the Input X Range
should be Dl:F1141. Check the box next to Labels. Select New Worksheet Ply and name it
Marketing Model. Finally select OK .

. ----------- -- --·-- -

' Regrei.;.i;ion [1] [8]


Input
I, OK�
Input :i'_ RangE : l.5A$1:5A51141 [ml]
Cancel
Input X Range : $0$1:$f$114l �3
!jelp
��a'bels D Constant is ;i;.ero
D Confide-nee Level: E=:J .,-.
Qu:tput optlon�
0 Qutp1..itRange� I LCg-l'ner Mode 1�1
e :New \11/ori<sheet e.iv: IMarketrig Medell I

The result is (see also p. 275 of Principles ofEconometrics, 4e):

A I B I c I D I E I F I G I H I I
1G Cor;ffrcienl:> Standard Error I tSlaf P-vaiue Lower95% Upper95% L·ower 95. 0% Uep_er95.0%
Intercept 0.8_90�.'1:5056 0.065484883 4.14!l47E-39 0.761730152 0.751730152: 1.018699959
$
18 dis i:>_p.e-i>si -0.1·65•663685

0.035599674
13.59420692
-4.G5:i516aa2: 3.541i99E-O 6 -0.235512182
1.018699959
-0.095815187 -0.23551218:2 -0'.09581,5187
19 dis.p_rnke 0.077174455 0.034391933 2243969687 0. 0•2'.5(}2:6 3 35 0.009695612' 0.144653298
-- 0.01l959·5612: Jl'-1,44553298
>--- �-
21l prati() -0:4008161399 0 06"1-349448 -6.534066944' 9_541 HE-11 -0.521232352 -0.280490445 -0.521232352'. -0.' 280490445

7.5 THE DIFFERENCE ESTIMATOR: THE PROJECT STAR EXAMPLE

Consider the following equations:

(7.8)
194 Chapter 7

(7.9)

where SMALL
= {1 if the student was assigned to a small class
0 otherwise

and TOTALSCORE is the combined reading and math achievement scores; TCHEXPER is the
teacher years of experience.

Open the Excel file star. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it star data, and in it, copy the data set you just opened.

l ti I'S II I star data. I


1 Insert Work.stieef (Sl:lift-fllJ)I
I I I

We first sort our star data so we can easily select the subset of regular and small classes only-­
those characterized by the absence of teacher aide. Go to the Data tab in the middle of your tab
list on top of your screen. Select all star data. On the Sort & Filter group of commands, select
the Sort button.

A Sort dialog box opens. Select the box next to My data has headers. Select the aide variable
column in the Sort by window. Values should be selected in the Sort On window and Largest
to Smallest in the Order window. Finally, select OK.

r -- ------
--- -- -- -
� 00 00
[ 0�! �dd Level J [ )( Q.�lete te... el I l -- !::;,opy Level 11- [ Qpjions: .. ] � My aata Iii.as b.eader.s

Co lwnJJ S:l:lr t On Order


Sort by
- �es�-------rv r fSmallest
� :�:: :=�::: =:: ::: = :::=:] v r 1·r-\lal�u to Large:st

For model (7.8), the Input Y Range should be Hl:H3744, and the Input X Range should be
Ql:Q3744. Check the box next to Labels. Select New Worksheet Ply and name it Star Model
1. Finally select OK.
Using Indicator Variables 195

' Regressio n ----- mg]


lnput

Input 1: Range: I $H$1 : $H$3744' 1�1


I
Input':>; Range:
1$Q$1 :$Q$3744 mru

tjel p
�!,_abels D Constant is 1'._eto
D Con[idence Level: �%
Output options
O··Qutput Range: �1
®New Worl<5tie�t E_ly: I star. Model ·1 I

The result is (see also column (1) in Table 7.7 p. 280 of Principles ofEconometrics, 4e):

A I B I c D I E I F I G I H I I
-
1. SUMMARY OUTPUT
2
f--- .
3 Re�Jessron St9l1st1cs
4 Multiple R 0_092483170
'T R_Square 0.0085531'38
6 f'\<;lju�;ted R S11uare 0_0002aa 11 &
1-,-
Sta ridard ETror 74-55066365
'3 Ob.sO?rvalions 374�
9
Ta AN OVA
11 df SS MS F Sianificance F
jf� Regressio Ii 1 179850-2664 179850-2664 32-27332709 1.4414 7E-OB
13 Resid'ual 374 f 20847551.44. 5512.r215s3
14 Total 3742' 21027401 71
1'5 1
16 Goefficienfa Slandard Error 1 srat P-value Lovier 95% Upper95% Lo�ver95.0% Upper950%
17 lntar.c.ep.t 918.0428928 1. 6.&7156939 550.6637504 0 914.7742'6TB 921.3115178 914 7742678 921.3115178'
-
1a .small 13.89899446> 2-445591778 5.6.8095181 1.44147E··08 9.102210873 18-69577804 9_102210873 18.69577804

Go back to your sorted star data worksheet. In cells Tl:U2, enter the following column labels
and formulas.
T u
1 small tchexper
2 =Q2 =D2

Note that all we are doing is copying the values of SMALL and TCHEXPER in columns T-U so
as to create columns of explanatory variables next to one another.

Copy the content of cells T2:U2 to cells T3:U5787. Here is how your table should look (only the
first five values are shown below): -
T I u
1 small tchex.per
2 1 3
""3 0 12
4 1 7
5 1 4
1--
6 0 6

For model (7.9), the Input Y Range should be Hl:H3744, and the Input X Range should be
Tl:U3744. Check the box next to Labels. Select New Worksheet Ply and name it Star Model 2.
Finally select OK.
196 Chapter 7

) R-eg-r-es -s-
o
i _n_______________ �rgj

1nput

Input y_ Rang�: I$H$ $H$3744


l:
[�]
Input c; Range: I $T$1 : $U$3744 �
�I
��abe.ls D <;:on$tant is ;;:.ero
!Jelp
D Con[idence Level: � O/o
Output options

0 Q_utput Range: I 001


@New Worksheet ['.ly: I star Model 2
I

The result is (see also column (2) in Table 7.7 p. 280 of Principles ofEconometrics, 4e):

1-rlsu i A I 'B I c I D I E I F G H I I
1 MMARY OUTPUT
l
11
�['ll u )tipl�R
Regression Stalrsl1cs
O_ l276529>9'J

=f=IAoj�st_ed
[
5 RSquare o_o1 5346,JBS
R _Square 0_(11582'037
7 Stamfard Error 74355629'79' !-
- .·- - j
.1 0 b•S·B rvati on s 3743
9
10AN OVA t
_!!_I

df SS MS F Sifi.nificance F
J.?� Regr�ssio_n z: 343722.0648, 171861.0324 31.07572116 4.12.E-W
ITTl Re�i dLJa.I
Total --
3.7'40,'
37'42"
201183&79,_64: 553°-395627
2102'7401_71 - -t
15
J§. 1 CoO'!tficients Sf1N�<ia.rd En·or rstal P-�alue Lower95% Up_p_er 95% l.ower95.0% U[!_[!_l'!T 95.0%
JL Intercept 907 _554342'9- 2.542413485 3.56.9696072 0 902.57'.91)9'1 912.5489948
. --
902.57969'1 912.5489948.
.s.i3i12b1s2 1a_i&1S:9n.sl1
� -· - --- . . .
18 small 13}JS32'68; 35. 2A3rn2osi· 1.03937E-08 9204&38�42" 18-7618977.5 -9-204'63"8942
Ts lchex:1>.er USS51053:2' 0.212.275513 .5.443447136 5. .56172E-08 (J_73g;32349-5 1 .571697.569 C.739323495
. 1.571097569'

Next, consider the following equation (7.10):

{1 if the student is male


where BOY=
0 if the student is female

WHITE - ASIAN= {1 if the student is white


0 otherwise
or asian

FREELUNCH =
{1 � free lunch �s
f provided
.
0 if free lunch is not provide

Go back to your sorted star data worksheet. In cells Vl: Y2, enter the following column labels
and formulas.
v w x y
1 boy white asian tchexper Freelunch
2 = 12 = J2 = D2 =N2

Note that all we are doing is copying the values of our explanatory variables in columns V-Y that
are next to one another.
Using Indicator Variables 197

Copy the content of cells V2:Y2 to cells V3:Y5787. Here is how your table should look (only the
first five values are shown below):

v I w I x I y
1 boy whit-e asian tchexper free lunch
,__
2 0 0 3 1
,___
3 1 1 12 0
.._...__
4 1 1 7 0
,__
5 0 1 4 1
1 6 0,
� 1

For model (7.10), the Input Y Range should be Ql:Q5787, and the Input X Range should be
Vl:Y5787. Check the box next to Labels. Select New Worksheet Ply and name it Check
Random Assignment Model. Finally select OK.

� ------- -- - -- -

Regression
LZJ�
1nput
Input y_ Rarige: I $T$1 :$T$3744 [�l
Jnput�Range: I$V$1 :$Y$3744 [�]
tfelp
� Labels D Constant is ;;:.ero
Ocon[idence Le'vel: �%
Output opfom;

0 Q_utput Range: �1
0 NewWorksheet BJy: I m Assignment Model I

The result is (see also p. 281 of Principles ofEconometrics, 4e):

A I B I c D I E I F G: I H I I
1 SUMMARY OUTPUT
T
i_I Ren_re-ssion Sfalis 1.ios: I
4- Mu ltipl e R 0 0079470[)5
5 R Square· 6'. 31549E-05
" -
6. Ad]us!ed R Sqware'- -0.001006868
f--
l Standard Error D.499043.954
8 0 ose rvatio ns 3743
9
r---
10 ANO VA t
11 I df SS MS c:
r Significance F

t
12 Regres'Sion 4 0. 0 587 9 6:4 78 0.014699119 0.0590:21973 o·.9:9 3554 656
930�9297154 o . .24!11 0 448&8
·--

l3 Res.idual .'.3738
14 Total 3742 930.988�119 I I
1.5
16 C.o&ffitients. Stimdaro EITOr tSlat P-v:alue Lo1>11&r95% Vpp6'r95% lower95.0% U;:.pef 95. 0,%
17 lnterc_�pt 0_46646232.7 0_025155394 18.5424953 7
- - ·-
1.G.S957E-73 0_417140731 0.5157839'23 0·.417140731 0.515783923.
18 b-oy 0. 0014 10759 O_Ol63JB·S12 0.08&345603 0 .931196313 -0.030622509 0_033444G26 -0 .030622509' 0.033444025
19 white- asfan -
0.004405672 o_o 19597025 0�224813302 0_822136805 -0.034016231 o._([42:�27?! 5 �0.034016��1 Cl>.042827575
B:s : .
.. ....
-�_4.1 Bi685 CJ.00221s4fis
,___
20 tchexper -0.000602546 iHJ 0 1438 0_'6754093811 -0.003423556 0_002218465 -0.00342355&
21 freelunch -0 000-885877 o_ri1B1.9Ji1 1 -Q_0486'9297il 0_961166577 -0.0 3 655 5:267 0.034783513 -IJ.036555267 �.034783513
198 Chapter 7

7.6 THE DIFFERENCES-IN-DIFFERENCES ESTIMATOR: THE EFFECT OF


MINIMUM WAGE CHANGE EXAMPLE

Consider the following equation:

(7.11)

if the observation is from New Jersey


where NJ = {1
0 if the observation is from Pennsylvania

D= {1 if the observation is from November


0 if the observation is from February

and FTE is the number of full-time-equivalent employees.

In equation (7.12), we add explanatory variables in addition to the ones included in (7.11):

(7.12)

= {1 for a Kentucky Fried Chicken restaurant


where KFC
0 otherwise

= {1 for a Roy Rodgers restaurant


ROYS
0 otherwise

WENDYS= {1 for a Wendys restaurant


0 otherwise

D= {1 for a company owned restaurant


CO-OWNE
0 for a franchise owned restaurant

In equation (7.13), we add explanatory variables in addition to the ones included in (7.12):

(7.13)

OUTH] = {1 if restaurant located in Southern New Jersey


where S
0 otherwise

]= {1 if restaurant located in Central New Jersey


CENTRAL
0 otherwise

pA 1 = {1 if restaurant located in the Northeast suburbs of Philadelphia, PA


0 otherwise

Open the Excel file njmin3. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 7 in one file, create a new worksheet in your POE
Chapter 7 Excel file, rename it njmin3 data, and in it, copy the data set you just opened.

ltl K:
.. l
I llme0rt Worl<51\eet (Sh.iff-Fll! I
Using Indicator Variables 199

In cells 01:X2, enter the following column labels and formulas.

0 p Q R s T u v w x
1 n.i d d nj kfc roys wendys co-owned southj central.i pal
2 =G2 =L2 =M2 =12 =J2 =K2 =A2 =B2 =C2 =D2

Note that all we are doing is copying the values of our explanatory variables in columns 0-Q that
are next to one another.

Copy the content of cells 02:X2 to cells 03:X821. Here is how your table should look (only the
first five values are shown below):

"I 0 I p I Q I R I s I T u v I w I x
I
-
nj d d_nij kfc: roys• wendys co-nw.ned southj_ c:entra!j pa1
2
- , 0 0 0 rn 0 0 o�. ·1 0
3
- 1 o. -0 Q1 0.1 0 0 (} 1 o,
_i_ 1 0 --·
0 0 1 0 0 O'' 1
a!
� 1 0• 0 0 1 0 1 0 0 0
__§____ 1 0 a 0 D' a 0 0 0 o:

We first sort our data according to fte because we have missing values, which means we cannot
use the corresponding observations to estimate our regression model. Go to the Data tab in the
middle of your tab list on top of your screen. Select all your njmin3 data. On the Sort & Filter
group of commands, select the Sort button.

A Sort dialog box opens. Select the box next to My data has headers. Select the fte variable
column in the Sort by window. Values should be selected in the Sort On window and Largest
to Smallest in the Order window. Finally, select OK.

[ oil �dd Level J [ 'X Q_elete Le1tel J l = �opy Level II v - [ Qpnons... J


II DK� [ C2mcel L

For model (7.11), the Input Y Range should be Nl:N795, and the Input X Range should be
Ol:Q795. Check the box next to Labels. Select New Worksheet Ply and name it Minimum
Wage Model 1. Finally select OK.
200 Chapter 7

������������������ -

; Regression L1] [8]


Input
'Inputy Rar!!Je:

Input'� Range.;
I $1\1�1:$1\1�795
I $0Sl::$Q$795
[�]

� .

!::ielp
�!.,abeh D C:m1stant'is f_ero
D Gonfidence Le'i'el: EJ %
Output optton�-

0 Qutput Range: �1
@New Worksheet Efr: I Minimum Wa'ge Model �I

The result is (see also column (1) in Table 7.9 p. 285 of Principles ofEconometrics, 4e):

A I B I c I D I E F I G I H ! I I
SUMMARY OUTPUT I I
-1_
2' I
t
3 Re,qres.sion Statistics.·
4 Mrultipl�R Q1_08503�6f41 I
5 R �q,uare
-
(i, A_djustoo_ R Square
G_ 0074 01277
01.00303•1915
!
t 1
J_ Standa.rd E::rr,or
a, ObseroatiGns
!M05618976
794 j
9,
w ANOVA l
J.:11 df SS MS F Sig_n.ific'1111 Ge F j
1 2' R_e9re s s I o_n _31 52'1. 11-64632 173-7054877 ·1_953635584 a_ 11798 2119
f----<-
13 Re·sif1ual 7901 69'8 87-8 7797 8 8.465658 32'
14 Total 79;3 70408.99444 I j
I
1.6
rnJ Coefficients Standard' Error t Stat P-value Lower95% Upper95% L.ower 95_ 0% Upoe.r�50%
17 lritercept 23-331168'83 1_Q7186976 21. 7§_679152 1_,_1635_,_JE-82 21-22711921 25_43521846 2'_!,22711921 25.4352'1846·
ffi nj -2-8917607.31 1.193523676 "2.422876721 0.01�i6219'92 -5-?34&13.598 -0_.548�07955
-
-5.2346'13508 -0_548;JC1'l' 95.5:
�d
. I - -
-2'. 1'65.584416 1.51585275-2 - 1.428624'.523 © . 153507425 -5.1411599'3.2
. - . 0.809991101 -5.1411599132 ·0.809991101
'2e d nj 2:.753605783 u:s8409131 1 ..63o-ss77w (>_ 10331257
· 8 -0.56069296 6.0S.7904526 -0 .. 56069296 €.0&79045261

Go back to your njmin3 data worksheet. For model (7.12), the Input Y Range should be
Nl :N795, and the Input X Range should be 01:U795. Check the box next to Labels. Select
New Worksheet Ply and name it Minimum Wage Model 2. Finally select OK.

i l1Ns1::$N $7.9 5 �
Cam:el
�nptJJt�Range: I $0$1:�$795 �
tie'lp
� !,_abels. D Constiint is 7:,ero
D Con�dence Level: � "/.,

Output options

0 QLJ\put R;inge::
@New Worksheet E'.lv: I Minimum WilgE Model 21
Using Indicator Variables 201

The result is (see also column (2) in Table 7.9 p. 285 of Principles ofEconometrics, 4e):

A I B I c I D I E I F I G I H I I
1 SUMMARY OUTPUT I I
r--
2
3 Re_Qression Sfatjslics ·

4 Multi pJe R 0_443204145


- +
5 R Square o_ 196429914
f--6 A_djusted R SqLJare 0_ 189273438

J_ Standard Erro:r 8.484273858


l
t
8 , Observations 794 I
+

Tol ANovA -
-
11 1
,___
,df SS MS F Srgnificarwe F
12 Re�r-ess_ion 7 1J830_43275, 19.75_n61Cl8 27 .44785-2.59'

7-724nE-34
r--- --
-

13 Resid�al 786 50578_551168 71.9829029 -


14 Total 7�3 70408.99'444 I
15.
1& I Coefficierrfs Stancfard Error t Slat P-vaiue Lower95% Uooer95% Lower95.0% Upper 95. 0%
17 lnlefcept 25_ 95 ·117614 t-0382234-61 24_ 9 9575199 5::20HE-102' 2H131573'2 27.�9194% 23.91Jj5732 27 g.s 91- 9'4 9&
_

J.!
19 d
nJ -2.376608094
-2_2235&5041
1.079192:119 -2'.2:022103�'8
1. 36;7;692339 , -1 _6:25778677'
0.02'794tl'p9
0.104.397.5'95
-4.49'504784
-4_908-326875
-0.258168•347 -4.49504784 -0.258168347
0-46'119S793 -4. 903326875 0-46i1_1%793
� ��j 2-8450.6.6555 1.523.336497 1.86765.22391 0: 06
-1
i 825
- a4 -Oc 1452266.13
- -
· 5Jl:-i5359724 -n_ 14s22s&l:3 5_83535�724
2i kfc
r--
-1 CL453·3897 1 0_848955906: - 1' 3 1 32 1 5 9 3
2 . ' 5'.5243 SE-32 -12_ 1198808 - 8 78 689 8-61 4
- ' -121196808 -8_ 1ass9mif4.
22 roys -1 _6:24999072 0_8�·9797951 -1-889977836 0.05:91286'21 -3:' 3121109gi 0_052772848 _3_·312no992 0_062772848
23 wandys -1 _0{)3708623 0_9'29-1'5025 -1 1 448 187 5
. 4- 0252632J2 9 -2 8 !!76 1 8 1 81
- 0-760200934
-
-2_887518181
- 0_7602009
· �4
f--
24 co-ownoo -1.168'54545 0.7161661246' -1.&3166786'.6 0_1-Q:3150035 -2. 5 743 702.4 7 0.2372l9347 -2.574370247 0.237279'347

Go back to your njmin3 data worksheet. For model (7.13), the Input Y Range should be
Nl :N795, and the Input X Range should be 01 :X795. Check the box next to Labels. Select
New Worksheet Ply and name it Minimum Wage Model 3. Finally select OK.

------ -
----
- � tg)
R�;�
l'.nput
OKtsj
lDput Y. Range:
I $N$.1::��795 �
I $0s1:$.l($]9s
Cancel
l
lrnp1.11t�Range:

tJ.elp
01,abels- D Cons.tanfis !(;.ero
0 Con!:iderJce Level; � D/�
Output optmns,
I 0 Qul:piJt Range.: �1
@ New Worksheet !:ly; I Minimum Wa:ge- Model sJ I
202 Chapter 7

The result is (see also column (3) in Table 7.9 p. 285 of Principles ofEconometrics, 4e):

A I B I c I D I E I F I G I H I I I
-
1 SUMMARY OUTPUT
f-
2
3 _R.·eqression Slaf"islics
4 Multiple- R 0 .470 527732:'
T
,___
R'squ<ne· 022{39634&
6i Adjustoo R S·quar>e 0-211452494
7 Standard Err.or 8_3674 1691'31
r--
9 ObseNatioris 7.94.:
9 1

10 AN OVA
I
-+
1·1 df SS MS F SigniffoaJJGe F
10• 22_2646449'7 7 _05,564E-3:7
g R�gre�_sio·n 155!l8_29412 1558-829412
13 Residual 783 54820-70G32 70_0136&58
14 Tota.I 793 70408_99444
15 I
1& 1 Coefficients Standa.r:d Error t Stat P-value Lowe-r-95% Upper95% L ower 95 0%
_ Upper95.0%
1 7 lnte.rcept 2?)20:5'1?� 1.2_1Cl90Jl35 20· 91 Oi4
' 361'7
.. _ 1- ?391E-FI 22,9435119'4 27_6975� 346. 22_ 94351 194 27.6975.1346
m nj
� • ·

-0_90·7%3'605 1 -2'1'1741824 -0-713%2776 ()_475469143 -3_404390609 1-5884633991 -3.4 043 9050� 1-588453399
J1_ d 2 21 1 85 0 952'
- . ' 1.340859-584 -1.6·39793333 0.101449806 -4.es9659985 0.4 3.595 808 -4.859659985 0.4359'5808

JJL d_nj
kfc
2.8.1490803
-f0_0580'01'i3.
1.5�i36;1'65 1.873630464
0·.844S-i1 0 8 9 -11-90759558
().06135362'7
3_6J.754E�3o
-0.134264�.5.5 5.764080614 -0.134264.555 5.7�Q80§14
-11-7160896-2 �aj�9g13837 -1 1-71608962 -8_399913837
22--
22' roys -1-693 392 5'95 0_85·918373 - 1 970 93 1 8 7
- · ' (Ul49083476 -3.37 9968772 -0_006816418 -3-37996sn2 -0_006816418
2 3 wendy�
1---
- 1 0 64-95 1 933
- . 0_�0538473 -1_15675 3612 0241nsfo9 -2_8.72163664 0_742259798 -2-B721�3S64 0_742259798
� CO-O·Wned -
- 0 71 6 309731
_ H
I -18·990484 -0_9·96271505 O_J1'9426023 -2_ 12768'6808
0.695067345 -2- 127686S08 0_6:95057346
25 S·Outhj' - UG 1 7&D689 1 o_ nsi953:1'91 - -:4-746_131�:59 ?A64'8?E-06 ��232807456 -2 _170713:923 -5-232807456 -2-17071392:3
f----
26 �>efltralj 0 ,001 7 88}354 Qo_8,9749%57 o_9D87p744 0_9·92993914 -1-75�9�947 1-769 661655 -1-_?53894947 1.769651655
e---- _
2.7 pa1 0.923861954 1.384927728 0.�67083152 0.5049':1.5554 -1.794746784 .3:642472692 -1. 794748784 3.642472692

Finally, consider the following equation:

(7.14)

where t::.FTE is the change in the number of full-time-equivalent employees.

Go back to your njmin3 data your njmin3 data. and then worksheet, select all go to the Sort
dialog box. Change the variable in the Sort by window to demp, and select OK.

� My:,dara hasbeaders

Order
l�:::: :: =:::::::: :::::::: ::: ::::::i tij I�v_alu_e_s _______
..,�I I Largest to Smallest Cancel LI
For model (7.11), the Input Y Range should be Fl:F769, and the Input X Range should be
Gl:G769. Check the box next to Labels. Select New Worksheet Ply and name it Minimum
Wage Model 4. Finally select OK.
Using Indicator Variables 203

,..

L1Jrg)
Input
Input I Range·:
Caruol
lnptJt� Range.:
:$G$1 : $13:$769
!::ielp.
0!.abefs. D Constantis �eni
D Confidencelevel:

ou-lput op
0 Qutput Range·:
@ New Worksheet !:.ll1; Minimum Wage Model
- -= -- -

I Reg ression

The result is (seep. 287 of Principles ofEconometrics, 4e):


1$=$ 1:� $/'69 [�l
A
I �
SUMMARY OUTPUT

Stetistia.s
�%
Multipl.e R 0_120992&85
c on�.
R Square OJM4639'23
Adjusted R Square 0_013352:858 ool
Standard Error I 41
ObsEl'rvati-ons

10 ANOV.A.
F
Regression 912_81Z3828 912_8173828 .()_OD 0'779502
Re.sidual 61441.376&7 80.2'106745

1�1
II Total I B I 62354_19405
c I D I
I
E I F I G I H I

3 I lnt�r��ptReurossion -2.2B3;?J3:3�3
Standard" Error
OJ3'12577CM
t Stat P-value
0_0018608.17
Lower95%
-_3-7188
_ 40264
lm¥er 95.D%
i
-01• 84 782-64013 : -3. 7188402<64 -0_847826403
'
95_0% :
'

� 2:.75 0.0007796,02. 1.149735887 4.3.502:641'13. 1_149735867 "4.3502164-113:


,l_
6i
J

8_95604.1 22 9
J_ :
8: 768
9
r----

'
11 df SS MS F Sjqnificarroe

12 1 1 U 8 0248'.IB
T3 766
T4
- '
707
15

J fi
16[

18 nJ
Goefticfenfs

0.81518'6215
-3_122474225�
3 .3Z3462.344
Upper95%
. . .
Upper
CHAPTER 8

Heteroskedasticity

Chapter Outline
8.1 The Nature of Heteroskedasticity 8.3 Heteroskedasticity-Consistent Standard Errors
8.2 Detecting Heteroskedasticity or the White Standard Errrors
8.2.1 Residual Plots 8.4 Generalized Least Squares: Known Form of
8.2.2 Lagrange Multiplier Tests Variance
8.2.2a Using the Lagrange Multiplier or 8.4.1 Variance Proportional to x: Food
Breusch-Pagan Test Expenditure Example
8.2.2b Using the White Test 8.4.2 Grouped Data: Wage Equation Example
8.2.3 The Goldfeld-Quandt Test 8.4.2a Separate Wage Equations for
8.2.3a The Logic of the Test Metropolitan and Rural Areas
8.2.3b Test Template 8.4.2b GLS Wage Equation
8.2.3c Wage Equation Example 8.5 Generalized Least Squares: Unknown Form of
8.2.3d Food Expenditure Example Variance

This chapter is concerned with the nature of heteroskedasticity, tests for heteroskedasticity, as
well as generalized least squares estimation for heteroskedastic models.

8.1 THE NATURE OF HETEROSKEDASTICITY

Let us re-consider our food expenditure model first introduced in Chapter 2:

(8.1)

where y is weekly food expenditure in dollars and x is weekly income in units of $100 for a
random sample of40 three-person households.

Open the Excel file food. Save your file as POE Chapter 8. Rename sheet 1 food data.

In this section we illustrate the nature ofheteroskedasticity by re-estimating (8.1) and plotting the
estimated regression line along with the food expenditure data.

204
Heteroskedasticity 205

In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Bl:B41. Check the box next to Labels. Select New Worksheet Ply and name it Food
Expenditure Equation. Check the boxes next to Residual Plots and Line Fit Plots. Finally
select OK.
' ------- - - -

I R·eg r<.'5SJ.Qll
rn�
Input
Input)':Ran.;ie: l�s1:$A:S"!1
1�1
,!::'[elp
�labels D Constant is. -�ero
0 ConBdenc:e Level:. � %

Output llp tions

0 Qutput Rar:ige: �1
® Ne1111 Wcr'ksheet�e)y: I xpern:liwre Equation I
0 New Worl:book
Residuals

DBesidi.iials 0 Resi!i,ual ·P.lots­


0 S!afldarr;li2ea Re.sidwals � IL!_rne Fit Plots

The regression analysis results are (see also p. 300 in Principles ofEconometrics, 4e):

A I B I c I D I E I F G I H I I
i--'.!-i S UMMARY OLJTPUif
Y. I
3 I R:epession Statfatics
�'!_ Multiple R 0.620485472
�R Square 0.385002221
G AdjLISterl R Sqllare 0.368818069·
Vl Standa rd Error 89.51700429
m o bs eo:valion� 40


1 0 !ANOVA
11 df SS MS F Sig_nifiuam;e F
1 2 Regr1€lssion 1 190_626.97&8 190626·. 9788 �3.788841:07 1,945�6E-OS:
_ l
i 3 Residu�I JB 304505.1742 301329-4058
"14 Total 39 495132.153
15
16 Coefficients Standam Error l Sfaf P-value· Lower95% U;ee_er 95% Lovi'er 95.0% Upper 95. O"�
H lnteccept 83.41600997 43.41016192 1-921577951 G.062182379 -4.463267721 111_295wn -4.463267721 1711.2%28771
r- -
1 8 in-come 10-2096425 2.09.3263461 4.8773805!;4 1.94586E-05 5.9720522021 14.4472328 5.972052202 14!.4472328

After editing the income Line Fit Plot (see Section 2.3.4 for more details on how to do that), you
should obtain a replica of Figure 8.2 p. 301 in Principles ofEconometrics, 4e:
206 Chapter 8

Figme8.:Z
0
0
lJ)

0
� 0
"' .
.5 .
..
� 0
" 0
... ...
...,
..
l
0
"'
" "'
.,

income Line Fit Plot


-5:-
... D
0
..
.. N


1000 �----- Ii D

!i l1Nl!N!!�)IW!\�l\\\\1,1,1,l,J,l,l,I\(�((
::.. 0

so�
....

c "' oo tn oi t-i N N r.i •food_exp D


i t.O CJ'l lY'l f'> rl lll V rl
U red ic!ed food_exp
rn � ·� � � � � � 0 10 20 30 40

income x� weekly inoome in$HJO

8.2 DETECTING HETEROSKEDASTICITY

8.2.1 Residual Plots

By checking the box next to Line Fit Plots in the Regression dialog box, you were able to obtain
a replica of Figure 8.2 p. 301 in Principles of Econometrics, 4e (Section 8.1 ). If you go back to
your Food Expenditure Equation worksheet, you will find the plot of the residuals against
income, which was generated following your selection of Residual Plots in the Regression
dialog box.

income Residual Plot


300 j
200 •

C
i ioo

� U •• �r�t',f
•i<l
t

titi
I l

li -100 rn iP ""
-200 j :
-&OO 1 inmmr-

8.2.2 Lagrange Multiplier Tests

8.2.2a Using the Lagrange Multiplier or Breusch-Pagan Test

Logic of the Test:

Consider the following general heteroskedasticity assumption for the food expenditure model:

(8.2)

Consequently, the null and alternative hypotheses for a test for heteroskedasticity based on the
variance function (8.2) are: H0: a2 = 0 and H1: a2 * 0.
Heteroskedasticity 207

To obtain a test statistic we consider the linear variance function in (8.3):

(8.3)

where e'f are the squares of the least squares residuals from model (8.1).

When H0 is true, then the sample size Nmultiplied by the R2 goodness-of-fit statistic from (8.3)
has a chi-square distribution with m = S - 1 degrees of freedom, where S is the number of

parameters in (8.3):
2 z 2
X N x
R -xcm=s-1) (8.4)
_

Because a large R2 value provides evidence against the null hypothesis, the rejection region for
the statistic in (8.4) is the right tail of the distribution. Thus for a a% significance level, we reject
H0 and conclude that heteroskedasticity exists when the computed x2 -statistic is greater than the
chi-square critical value Xc1-a,m=S-1)·

2
X(m)

x.2- alue
2
Xc1-a,m=s-1)

Note that we have used a test based on a chi-square distribution before, in Section 4.6.2 (for the
Jarque-Bera test for Normality).

Estimating the Linear Variance Function:

Go back to your food expenditure equation worksheet, if you are not there already. In cells
D24:D25 enter the following column label and formula.

D
2
24 residuals
25 =C25/\2

Copy the content of cell D25 to cells D26:D64. Here is how your table should look (only the first
five values are shown below):
D
24 residuafs 2
25 34.452084J.3
26 s 9. 9M I 99a.5
21 158.0505536
28 go.1.2o972(}7
29 5 S0.7.54189'9
208 Chapter 8

In the Regression dialog box, the Input Y Range should be D24:D64, from your food
expenditure equation worksheet, and the Input X Range should be Bl:B41, from your food
data worksheet. Check the box next to Labels. Select New Worksheet Ply and name it
Variance Function. Finally select OK.

;' R�gres.sion - __ ---- rn�


Input
I oKtsJ
Input YRarige: 1$0$24:$0564 [�]
I1'1$8$1:$Bc$<\ I Cancel
I
Inpu1 ! RaJTige: l

t!elp
It.) babels 0 Coostant:is :l_ero
D Con�denc:e Level: �%
Output op5ons

0 QutputRange: �1
® New Wbrlciheet Ely: I Variance Function ! I
The result is:

I
I B I c D I E I F I G H I I
7 SUMMAR� OuTPLJT I

2
r-1-1 Reoretiiiion SfatistiGs
Multipl� R 0.429663-36
,_i_
y_ R Square (>, 1 S.4610"&03
6 Adjusted R Square 0·.163152"9'87
7 Standard Error .9946.!}40"92
• 2
8 Observations. 40
s
f-- ·-· --

10 ANO VA
11 I
I
df SS MS F Sif}_nificanc� F
12
,___
Regression
'
1 8.511930·27 851193027 8.60350028 0.00!5&5S
-- 104

Jl_ Resi-Omil ---- -r 38 3759SS!i:294 98935665.64


14 Total
I- -
I jg 4·& 10148 321
15
16 1 CrJ>effici&nt.s Sta-n.dard Encor t Sta-I P-v:a-Jue Lower.95% Upper95% Lower95.0% Uppet 95. 0% I
�l r:it· erc·ept -.5·7&2.367573 ---
4823.500254 -1.194644401 0�23962946_? -15527.0:B25 4002.298108 -15527JJ•3325 _j. .Q. 02.29610�
income &82.2324&37 232.5920105 2.93317239.2' 0.005!i59104 211.3745591 1153. 09036-& 211.3745591 1 1 53.090368

Test Template:

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Lagrange Multiplier Test.

I U L.aor.rnae Multialier Test! .ttif 1


!Iruert We·rksh .. et 1$hi�-F11)} I 11

Create the following template for Lagrange multiplier tests:

A B c
1 Data Input N= ='Variance Function'!B8
2 S= ='Variance Function'!B 12+ 1
1
3 R = ='Variance Function'!B5
4 a=
Heteroskedasticity 209

A B c
6 Computed Values rn= =C2-1
7 x2-critical =CHIINV(C4,C6)
value=
8
9 Lagrange x2= =Cl*C3
Multiplier Test
10 Conclusion= =IF(C9>=C7,"Reject Ho","Do Not Reject Ho")
11 p-value= =CHIDIST(C9,C6)
12 Conclusion= =IF(Cll<=C4,"Reject Ho'',"Do Not Reject Ho")

At a = 0.05, the result of the test is (see also p. 306 in Principles ofEconometrics, 4e):

, B c
, A I I
1 Data ln1mt N= 40
T S-= 2
3 Rz= 0_ 184611
a=
� 0_05
_§____
_§___ Comp,utedi Values m=

7 f-critical value= ]_84�459


8
-
_J_ L119ra.nge Mul!iplier Test ,,_? = '7_384424

J.Q_ Conck1sion = Rej�t. Ho


11 p-value= 0_0065,79
12 Conclusion = Reject Ho

8.2.2h Using the White Test

For the White version of the test, we base the test statistic on the following variance function:

(8.5)

where ef are the squares of the least squares residuals from model (8.1).

Go back to your food data worksheet. In cell Cl, enter the column label x2• In cell C2 type the
formula =B2"2; copy it to cells C3:C41. Here is how your table should look (only the first five
values are shown below):
c
1 x2

2 13_6161
3 19-272'.1
4 22_5625
5 J6.36tl9
6 155.5009

In the dialog box, the Input Y Range should be D24:D64, from your food
Regression
expenditure equation worksheet, and the Input X Range should be Bl:C41, from your food
data worksheet. Check the box next to Labels. Select Output Range and specify it to be cell Al
in your Variance Function worksheet: you can place your cursor in the Output Range window
and move it to that cell to do that, or type 'Variance Function' !Al in the Output Range
window. Finally, select OK.
210 Chapter 8

Jnput

Input f'Ra nge : I $0$'24= $0564 [�] 1�K w


I 58$1: SC-$41 [ G:ancel I
Input!-Ra nge : ' , �
t:!el p
0k.abels D Cornstant�s �ero
D Confiden'e le�eh �%
Output opbons
@ Qutput Range: [ undio�'!:$A·�1 @ij

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is:

A I B I c I D I E I 'F I G I H I I
1 SUMMARY OUP
T UT
T
3 I R&:_c:ression Statistics
_i_ Mu lt iple R 0-434599748
. � . -

�5- R Squar-e 0.188876941


6 AdjustedR Sqtilare 0.145032451
7 Standard Error 10 0 5 3-75321
-,�

--
8 Observations 40
9
1fr ANOVA l-
111 I df SS MS F Sill_rriflcanc;e F
J� ReQFession 2 8'70>864-013!1.9 435432(}19-.4 4.307883213 0.020801 �88·:
1 3· Residua.I 37 3739884282 1 o·forrs53.G
14 Total 39, 4610748321
15
1S .yoetf.icienfs Stamfa.rd Error I Stat P-value Lower 95% U(>per95% LOV/&f 950% Upper 95.0%
JI lnterce;pt -29 0 8 785889'
_ 8100.107691 -0.359104595 ..0.721558112: -19-32U&2'9r2: 13503,59114 -19321_
-
16292 13503.5'9114
.
rn income 291.7463539 915.. 8460198 0 .318553935 0.75185.6075 -1563.9.rnH4 2147.426642 -1563.!}33934 214J. 42-&64?, 1
- ' ' {
-40.11'67009'4 62.4:4723543 £0.·11 5foo94 52.44723543..
c •

19> x2

11.16526724 25.30952489 0.441148828 0.'6616


• 72:455 .•

At a = 0.05, the result of the test is (see also p. 306 in Principles ofEconometrics, 4e):

A B c
1 Da111 lnp-ut N= 40
2 S= 3
3 Ri:·= O'. 108877
4 11= (JJJ5
5
6 Computed Values m= 2
7 ;('-criti-cal value= 5-.99'i465
a
9 Lagrange Multiplier Test l = L555076.
10 Concl.usion Reje-ct f-:io,
=- ·

11 p-value = 0•.02'2879
12 Conclusion = Reject Ho

8.2.3 The Goldfeld-Quandt Test

8.2.3a The Logic of the Test

Consider the right-tail hypothesis test: H0: (J i = (Jf against H1: (Jf > (Jf , where (J i is the error
variance of subsample 1 model and (J � is the error variance of subsample 2 model. If H0 is true,
Heteroskedasticity 211

then the following F-statistic follows an F-distribution with mi = Ni - Ki numerator degrees of


freedom and m2 = N2 - K2 denominator degrees of freedom:

(8.6)

where Bf is the estimated error variance from subsample 1 model with Ki parameters and Ni
observations; 8� is the estimated error variance from subsample 2 model with K2 parameters and
N2 observations.

If H0 is not true, then the value of the computed F-statistic will tend to be unusually large. We
will reject the null hypothesis if F > Fe, where Fe is the critical value shown below.

.g
..,
c
2

The right-tail Goldfeld-Quandt Test is similar to the F-test from Section 6.1.

For a two-tail hypothesis test: Hi: al * <Ji. If H0 is not true, then the value of the computed F­
statistic will tend to be unusually large or unusually small. We will reject the null hypothesis if
F < FLe or F > Fue where FLe and Fue are the lower and upper critical values shown below.
Note that in this case, a/2 of the probability is in each tail of the distribution.

reject Ho reject Ho

a/2

Fuc F

8.2.3b Test Template

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Goldfeld-Quandt Test.

t Goldfeld:Quandt iTest, I
q I
212 Chapter 8

Create the Goldfeld-Quandt test template as shown in the table below.

A B c
1 Data Input N1 = ='Subsample 1 Model'!B8
2 K1= ='Subsample 1 Model'!Bl2+1
A B c
3 MS Residual 1= ='Subsample 1 Model'!Dl3
4 N1= ='Subsample 2 Model'!B8
5 K1= ='Subsample 2 Model'!Bl2+1
6 MS Residual 2= ='Subsample 2 Model'!Dl3
7 a=
8
9 Computed m1 = =Cl-C2
Values
10 m1= =C4-C5
11 F-statistic= =C3/C6
12 Goldfeld-
Quandt test
13 Right-tail Fc= =FINV(C7,C9,C10)
14 Conclusion= =IF(Cll>=C13,"Reject Ho","Do Not Reject Ho")
15
16 Two-tail FLc= =FINV(l-C7/2,C9,Cl0)
17 Fuc= =FINV(C7/2,C9,Cl0)
18 Conclusion = =IF(OR(Cl1<=Cl6,Cll>=Cl7),"Reject Ho",
"Do Not Reject Ho")

Cells C16:Cl 7 are where the lower and upper critical values of the two-tail Goldfeld-Quandt test
are computed. Recall that, in this case, a/2 of the probability is in each tail of the distribution.
The FINV function, on the other hand, gives us a Fe value such that P (Fcm1,m2) >Fe ) =a. So,
what we need to do, to get the correct upper critical-value, is to divide the specified a value by 2
in the FINV function (see cell Cl 7). Further note that the FINV function returns a F-critical
value, once we have specify the probability to the right of that value. For our lower critical­
value, the probability to its right is 1 - a/2; that is what we specify in cell C16.

8.2.3c Wage Equation Example

Wage Equation with the METRO Indicator Variable:

Consider the following wage equation:

WAGE = /Ji + /32EDUC + f33EXPER + f34METRO + e (8.7)

where WAGE is hourly wage, EDUC is years of education, EXPER is years of experience, and
METRO is an indicator variable equal to 1 for workers who live in a metropolitan area and 0 for
workers who live in a rural area.
Heteroskedasticity 213

Open the Excel file cps2. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 8 in one file, create a new worksheet in your POE
Chapter 8 Excel file, rename it cps2 data, and in it, copy the data set you just opened.

I cps2 data,. ·

Insert a column to the left of the female column D (see Section 1.4 for more details on how to do
that). In your new cell Dl, enter the column label metro. In cell D2, enter the formula =K2; copy
it to cells K3:K1001. Now we have a table where the explanatory variables of interest to us are in
columns next to each other.
A I B I c I D

-
1 wage ,educ exp er metro

-
2 2 OJ 13 2 Qi
3' 2.07 12 7 1

__±___ 2.12 12 35 1
5 2.54· 16 20 1
fi; 2.68 12 24 1

In the Regression dialog box, the Input Y Range should be Al:AlOOl, and the Input X Range
should be Bl:DlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Wage Equation. Finally select OK.

-
. --------- - -

i Regression
I
f?]r8]
-

Input
Input Y Range:

ll'lput?;. Rfillge;
I $A $1: $.a.$100 1 �
I :$J1$: 1:·$[)$100 i [�] � I

t!.elp
� !:abels D Cqns!:i!lnt is £em
D Confldenc:eilevel: � "'"
Output options
0 Qulput Range: 1�1
@ New Worksheet E'.Jy: !wage Eqoation I

The result is (see also p. 307 in Principles ofEconometrics, 4e):


214 Chapter 8

B --�
l C D -�
l_ E 1 F I I H I
_,__��
+ SUMMARY-- "_J_
OUTPUT
___ � _ __ _ _,__

3 R�wssion St11tislii;s
_i_ 'Multi_�le- R 0.5%6266,77 +
-5
__ R Square _ 0.2_6�90312�
6 Adjuste<l .R Sqware 0.264695001
T Standa.rd Err�r - 5.3564897
T ObseNatio-rr5 -moo
s
lQ ANOVA
1 1�-------�_f ___ _ s_s____ M_S ___ ___ F _s�m�
m_ F
fic_a_n c_�_
12 Re11rce•ssion 3 10404-28343 3468.09-4.475 1?0Jll3?919 !l-.2�965E-67
13 Residual 996 2'857721.397 28. 691981 9•
14 T t loa ·999 389B1.4974
Ts·1
Cot;ftiaienfs Standard Error t Stat P-vaJu-e Lower 95% Upper 95% l.oi6'er9q_0% 1Jppe.r 950%
-9.913984216 1.07566251 T -9.21 6630734- 1.77326E-19 .
-12.02480904- -7.80-3159397 -12:.0•2480904 - -7.80.3"159'397
j_23395399!J
. -· -

__ 01.Q69961261 1 TAi3?1J:1812 8.425l4E-61 1.09667.56116 1.3:7125238- 1.09667.5616 1.37125238


0.133-243iiS1 0_015231619' ll"]47B34543· 9-13-789E-18 D.103353935 0.1&3133421>0. 10-3353935 0.1&3133426
�-S24104i66 o_4Ji09-0949' J53545Bfi11 CJ.000425795 a_67s1534s3: i31oos4sz a_s7s1 5,.J49.J iJ1oa5492

Separate Wage Equations for Metropolitan and Rural Areas:

We estimate the following equation (8.8) twice-once for workers living in a metropolitan area
and again for workers living in a rural area.

WAGE = {31 + {32EDUC + {33EXPER + e (8.8)

We first sort our data according to the area of residence of the workers and then successively
estimate (8.8) with metropolitan area observations only (subsample 1), and with rural area
observations only (subsample 2).

Go back to your cps2 data worksheet. Select the whole worksheet by left clicking on the upper
left-comer of the worksheet. Your cursor should turn into a fat cross as shown below:

Select the Sort & Filter button in the Editing group of commands on the Home tab. On the drop
down menu, select Custom Sort.

I
U iortAto l

U S.Qrt Z to A

HilJ Custorn Surt.,,


L Autosum • '.
[ii Fill • w 'T= f�ter �
Sort & Find & (<. _Q�ar
a CIPar •
Filter """"'Select.
Editing. i'S � R.;apply_

A Sort dialog box opens. Select the box next to My data has headers. Select the metro indicator
variable column in the Sort by window. Values should be selected in the Sort on window. Select
Largest to Smallest in the Order window. Finally, select OK.
Heteroskedasticity 215

[ 1l �dd L"vel ] [ X Qelete:Level J [Gia �opylevel JI i ( Qpti�ns...


Ci;lumn Sort On Order
Sortby '-[m etr o
__
____ ..=v.=I i""v�al�ue�s------�
v l �$i:��1; : : ::: ·OK .ft] [ cancel J::

In the Regression dialog box, for the metropolitan area wage equation, the Input Y Range
should be Al:A809, and the Input X Range should be Bl:C809. Check the box next to Labels.
Select New Worksheet Ply and name it Subsample 1 Model. Finally select OK.

.
Regres sion [!I)�
Input
Tuput Y Rar:ige: I $A:s1:'5A$!109 �
rnput � R>!rnge: I $8$1:$G$309 �
: c:iB
t!elp
�!,,abets D Const.al'.ltis f_ero
D Confidence Level: EJ o/o
O\Jtpu e.op.1ioos
Q QutputRange: 001
0 New Wa.rk:sheet.E)y: I subsample 1 Mo dell
I

The result is:

I A I B I c D I E F I G I J:-t II I
m·S_U M MARY _OLJTPUT

3 I Reare•ssion SlatistiGs
,_1_ Mdltirile R 0 .508117361
0 .258183252
� R �g,u�·B>
Ao:Jjus!ed R SIJ.Uare 0.256:340229
._!__
L Standard Error -5.6412'5268
8 O b.servafo... �s . 808
+
9
To A'NOVA
111 df SS MS F Signi fi�11 c e F
J1_ Re9res.s1on 2 8916.17':1611 4458.085806 -
140.0868331 6.22867E-53 -
13. Residu.al 805 25618.1041 31.8237318
f--14 -
T1Jtal 807 34534.21511
15
16 I Coefficients Sti1ndard Error I Still P-11aiue l.ower95% Upper95% Lower95.0% Uf!P.eor 95.0%
�lnter�e�t -9.052478207 1 .18945608.2 7 6 1 0603 1 53
-
. 7.6J367E-14 -1 1.38727966 -6.717676756 -1 1.38727966 -6.7 1 7676.756
edu·c 1 .281714419 o:o 79762684 16. o·s969843 i,Jzs15E-5o 1. 12514fo33 1.43.a2s 1 so6 1.125147033 -f A.3B.281sci6
19 1ex�er 0.1345�9682 .
0 0 1 7947 5 84 f.4�7370149 1-71985E-13 O. O �9330096 0. 1 &9789269 0. 09•9330096 0 169789269

Go back to your cps2 data worksheet, and then to the Sort dialog box. Change the Order to
Smallest to Largest, and select OK.

[ �j �dlevel JI Xi;i,<!ietelevel .JI ��cpylevel JI [ Qplions... j � My data has. i}eaders

C<>lumn SortOn 1 Order

5'lrtby v [ l. ,__v-al�
1:m:etr=o========�-= u e� �I ·r-,�E.Aert-�:_ - :�-,:=:.�-
s �-----�v - J- :g_:es
:� - -:::
: _. -�· :=: =:
t- :::-:=:: -:: ��-
__
- J �_,,
216 Chapter 8

In the Regression dialog box, (or rural area wage equation, the Input Y Range should be
Al:A193, and the Input X Range should be Bl:C193. Check the box next to Labels. Select
New Worksheet Ply and name it Subsample 2 Model. Finally select OK .

. --------• • '-" .-• -

Reg:ressi()n ITJ�
Ihput

InpiJtrRa119e:

Inp1J1 li �ang e:
I$11.$1: $115193
1:$S$1::$C$193
m
[�)
� el

!::!elp
f�l_b_abels 0 Constant.is �ero
0 Con�dence Leve:I: @=)%
Output ol}tions
QQutput Range: I �1
® New Worksheet �I\\'� ISubsample 2 Model I

The result is:

I I E
,_1i-------A B c D F I G H I I
SUMMARY OUTPUT
, --1__,
2
3[ Rearession Slalislics
4 Multiple R 0.508673076
�� R �q.u_are . _ (} 25'8748:298
....§_ A-Oj usted R S_quare 0.250904365 T
___]._9_0422_6?_6
,_l_ S!a:ndard I
t
Error
8 ·O·bserva:tions 192

19o·1ANOVA
1_:1_ I df SS MS F SiQnificunce F
2 1()05:642618 502.8213091 32.987 05973 5.H943E-13
,_g R·egr�ssion
13 Residual 189 2880.924466 15.24298659
'14 Total 191 388-6�567084


15
1&J Coefficients Standard Error t Sfat P-v-alue Lowe195% Uooe195% Lower95.0% UoDer95.0%
Jn_ti;irrnpt -6.16.5854725 1.89!!510693 -3247732418 0.001376545 -9.9'1084 7494 �2.42086195'6 -!f.910847494 -2.42086'1956
'
edl!c- 0.9555B5JsJ 0.133189909 _J.174607953 1.6011E-11 0.&9285.5629 121!i315137 0.&92855629 1.218:i.1 51}7
·ex per 0.125973719 a_o24na91 5 _()85538445 IL790&9E-O 7 0.0 77110627 0.174836811 0_()77110627 ()_ 1748-36811

At a = 0.05, the result of the Godfeld-Quandt test for the wage equation example is (see also p.
308 in Principles ofEconometrics, 4e): -
- - -

j A I
B I c
9 Co-mp_uted V11lue5 m1= 805
-

10i m2= 189


A I B I c -- :; -:;- F-statislic
= 2087762
1 Data Input N1= 808 12 Goldfeld_-Quand_t test
-

I-
2 K1= j 13 Rigid-tail. Fe= 1.215033'
3 MS R·esidual 1 = J.1.82373 14
-
"conclusion=RijE!ct Ho
4 N2= 192 -
15
I-
5 K2 = 3 16 Twu-1ail Fu,= 0.805198
-

6 MS Residual 2 = 15.24299 17 Fuc = 1.26173


I-
7 ·II= 0.05 m C(mcJusiol'l = Reject Ho

8.2.3d Food Expenditure Example

We would like to test the hypothesis that the error variance increases as income increases. This is
a right-tail hypothesis test where: H0: CT I = CTf and H1: CT{ > CTf . To get the estimated error
variances and test this hypothesis, we will split our sample into two equal subsamples of 20
Heteroskedasticity 217

observations each, and successively estimate (8.1) with higher income observations only
(subsample 1), and with lower income observations only (subsample 2).

Go back to your food data worksheet. Place your cursor in any cell of the income data column B.
Go to the Data tab in the middle of your tab list. In the Sort & Filter group of commands select
the Sort Largest to Smallest button. Your data set should be sorted by descending order of
income values as shown below.
A I B
1 - food_exp income·
2- 37S._73; 33_4
3 257.96' 29.4
I ZA 1 1\. Sort fi!t�r , _., '4 587_65' 28.62
�'1 l\cfllaace"d 438.2'9. 27_ 1•S.
�"'{
..

Formu[a.s Data!J�ew
, ��-
Sort& Rlte-r
, ..£
� 48:2:.S.5 :27. 14

In the Regression dialog box, for the higher income food expenditure model, the Input Y Range
should be Al:A21, and the Input X Range should be Bl:B21. Check the box next to Labels.
Uncheck the box next to C ons tant is Zero. Select Output Range and specify it to be cell Al in
your Subsample 1 Model worksheet: you can place your cursor in the Output Range window
and move it to that cell to do that, or type 'Subsample 1 Model'!Al in the Output Range
window. Finally, select OK.

�·�-� - �

Regrns.sion [1J [g)


Input
Input ':[Range.:

Input :X. Range::


I $A,$l;SA5:21
1$8$1:�$21

[�]
� .

0!._abels D Cornstant i5 �ero I tie'.lp

D Confidence Le!lel; E:J"%.


Output option& -

0 Qu:tpUt Range: I L Mocdel' !$A$1 [�]

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.

. �����������������������

1 Jll.icro�oft Office Excel

?• Regression. - Outp\Jt range will overwrite e Xistlrng data. Press OK to overwrite data in range

[ OK •E;l [ Cancel ] [ Help ]


218 Chapter 8

The result is:

A I B I c I Di I E F I G I H I I
=
SUMMARY OUTPUT
,J_
2
3 Reomssion Sfalisfics
i.. Multiple R 0.41248222
5 R Squa-re 0.170141582
I-
ti �pjust·ed R Squar.e 0.124038337
rr- Standard Error 1n&146495
8 Observationi;; 20
9
To ANO VA
f1 1df SS MS F S1'.qnificam;e F
12 Regression 1 4 76 8 7-68234 47687.68234 3.69•04469'64 0.0707070921
-
,_
13 Residual
- -
19 232594.6668 129213259'3
f---
14 Tntal 19 280282.3492
1.5
16 Goeffi-Gients Standard Error t Stal P-va!ue Lovter 95% Upper95% Lov.·er 95. 0% l.JJJoer95.0%
17 lntmeept -24. 9146:22�4 �84.92:4846 -0_ 1'3.472 836:9 0. 8943217'37 -413.42'73[)71 361.598.0612 -413.4273071 %3_5�80512'.
1
Ta income 14.26400003 7.42509'2131 1 W'1tl536018 2
0 . 070 7 0 70 9 - 1 . 335539657 .29.1!6353971 -1. 3J5.S3965·7 29'.86353971

Go back to your food data worksheet. Place your cursor in any cell of the income data column B.
Go to the Data tab in the middle of your tab list. In the Sort & Filter group of command select
the Smallest Sort to Largest button. Your data set should be sorted by ascending order of
income values as shown below. -

A I B
1 foo·d_e-x:p inc.ome

Dear 2 115.22 3.159'


,_
3 135.96 4.39
,_
filler 4 119.34 4.75
.YJ" Aclvan(ed
--

I Formula.s J �. Review
Sort&. Fllte r
5
6
114.96
187.05
6.03
12.47

In the Regression dialog box, for the lower income food expenditure model, the Input Y Range
should be Al:A21, and the Input X Range should be Bl:B21. Check the box next to Labels.
Select Output Range and specify it to be cell Al in your Subsample 2 Model worksheet: you
can place your cursor in the Output Range window and move it to that cell to do that, or type
'Subsample 2 Model'!Al in the Output Range window. Finally, select OK.

- �-----��----�

i RE>gression l1J l8:]


Input
Input '!'.:Range :.

Input�Rang12:
I i!A$1:$A$21
$El$1::$13 $2 l


� .

!::[elpr
0:1.abel� D C(i)n�t:lntjf> i:_ero
D Confldi::nci:: Level: EJ %
Output eptiofls
0 Qu:tput Range: I ?Mild el'! $Ml �

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is:
Heteroskedasticity 219

A I B I c I D I E I F I G H I I
1 SUMMARY OUTPLIT
-2
l
3 Reoression Sfatisfic.S
__1__ _Multipl� R,_ 0. 734079402
s R Squarn 0.538872568
-
_§_ Adjus.ted R Square 0.513254J78
_]_ �Janda�. Emir 59-789399'39 - -
8 Obser;alions 20 --
9
-
10 AN OVA I
11 I dt SS MS F Sig_11ificanoe F
11 RE!gressie>n 1 75194-48762 75194.48762' 2H34762-9fl 0.000229013

-t3 R�si<l1rnl J.8 6'4345.90103 3574-77228


14 Total 19 119540.3887
15
16 Coefficfeflts Standard E11Dr I Slat P-vaJue Lower 95% Uooer 95�� /..,ow-er95_ 0% (J;poer �5 0%
17 lnleo
-- r ept 72.961T39'71 JfUl3435406 l. 8787936-9'9' 0.076566246 B . 6262 1_( )_5 17___:_154. 5_49_68994 -B/62621 0 5 1 7 ;54.54968·9'9•
-

18 income 1-1.50037 916 2_5075.B89 4_586367679· o_ood229ti13 6.232287967 16-7-&!!47035 6.23228796-7 16.76!!47 (}35

At a = 0.05, the result of the Goldfeld-Quandt test for the food expenditure example is (see also
p. 309 in Principles ofEconometrics, 4e):

I '

! Data Input
A I B I c
1 N = 20

A I B I c I


K1 = 2 9 -Computed Values m,= 18
-
MS Res-idual 1 = 129'21-93 10 m2= 18
-
4 Ni= 2a 11 F-stati sti c = 3.6147·55
- -
5 K2-= 2 12 Goldfeld-Quandt test
-
-
6 MS Re·s iduaJ 2 -= 35,74_772 13 Right-tail F 2.217197
-
=

IL� jJ = 0.05 14 Conclusi1:rn = Reject Ho

8.3 HETEROSKEDASTICITY-CONSISTENT STANDARD ERRORS OR THE


WHITE STANDARD ERRORS

The White standard error estimator is given by:

White se = Jvar(b2) = (8.9)

Go back to your food data worksheet, if you are not there already. In cells Dl:E2, and Gl:H2,
enter the following column labels and formulas.

D E
1 x-bar = =AVERAGE(B2:B41)
2 White se(h2) = =SQRT(SUMPRODUCT(G2:G41,H2:H41)/SUM(G2:G41Y'2)

G H
1 (xi - x-bar)2 residuals2
2 =(B2-$E$1Y'2 ='Food Expenditure Equation'!C25'"2
220 Chapter 8

Note that we are using the SUMPRODUCT mathematical function to compute the White
standard error. The general syntax of the SUMPRODUCT function is as follows:

=SUMPRODUCT(cell_range_in_columnl,cell_range_in_column2)

where the cell range in both column 1 and column 2 must specify an identical number of rows.
For each row, the value from column 1 is multiplied by the value from column 2, and all products
are then summed.

Copy the content of cells G2:H2 to cells G3:H41. Here is how your table should look (only the
first five values are shown below):

D E F G H
'
19_60475 (x;: x .h a rf l . residual�
1.75.327 253.279269 34.452084
231.488'619 59.9642
220.&63599 156.05055
184.273839 901.20972
50_ 9 0•4,6583 560. 7541.9

The estimated White (b2) above differs slightly from the value reported on p. 310 of Principles
se

of Econometrics, 4e. The reason for this is the value reported in Principles of Econometrics, 4e
was computed using the following modified White standard error estimator:

modified White se = jvar(b2) = (8.10)

The source of this adjustment follows from the discussion on pp. 64-65 of Principles of

E[L el] Na2. However the expected value of the sum of the squared least squares residuals is
Econometrics, 4e. Namely, the expected value of the sum of squared regression errors is

E[L el] (N 2)a2. The squared least squares residuals are smaller, on average, than the true
=

= -

regression errors. The adjustment is to offset this fact.

In cells D3:E4 of your food data worksheet, add the following column labels and formulas.

D E
3 N= =COUNT(B2:B41)
4 Modified =SQRT((SUMPRODUCT(G2:G41,H2:H41)/SUM(G2:G41Y'2)*
White se(b2) = (E3/(E3-2)))

The estimated modified White se (b2) should be equal to the value reported on p. 310 of
Principles ofEconometrics, 4e:

D I E
3 N= 4-0·
T Modified Whit·e se(b ll = 1 !lCJ9CJ
• 77
Heteroskedasticity 221

8.4 GENERALIZED LEAST SQUARES: KNOWN FORM OF VARIANCE

8.4.1 Variance Proportional to x: Food Expenditure Example


Consider the following heteroskedasticity assumption for the food expenditure model:

(8.11)

Given assumption (8.12), the following food expenditure model has homoskedastic errors:

(8.12)

Yi
1
where the transformed dependent and explanatory variables are defined as Yi*
=
Fi.,xi1
*
=
Fi. ,

and xi2 =

jX; .
Note that model (8.12) does not have an intercept.

Below, we first calculate the transformed dependent and explanatory variables, and then use
Excel regression analysis tool to get the generalized least squares estimate of model (8.12).

Go back to your food data worksheet. In cells Jl:L2 enter the following column labels and
formulas.
J K L
1 y* * *
X1 X2

2 =A2/SQRT(B2) =l/SQRT(B2) =SQRT(B2)

Copy the content of cells J2:L2 to cells J3:L41. Here is how your table should look (only the
first five values are shown below):

J K L
1 Y'. x{ x{
2 59_98114 0.52:0579 1-9120937
3' 64_899'71 0 . 477274 2_095.23,3
4 54_75S9S 0.4588.31 2.179'449
4K81S33
- - �
0 . 40.7231 2.45.5606
.. ,. . . .. .. .

52_969'33 0.28.3183 3_53


. 1289

In the Regression dialog box, the Input Y Range should be Jl:J41, and the Input X Range
should be Kl:L41. Check the box next to Labels and next to Constant is Zero. Select New
Worksheet Ply and name it GLS Food Expenditure Equation. Finally select OK.
222 Chapter 8

r - - -

, Regression lZJ(g]
Input
Input l Range:

Ihput !!; Rarnge�


I $] 1�$.)$'1
I $K$1:$l.S41
l

[�]
� eel

tie Ip
� b_abels 0 Constal'.lt is f:ero
D Con5aenc:.e Level: EJ %
Oalpu t op ti ans

0 Q.utputRange: �1
@ New Worksheet�y� I Je11ditur·e EQuation I

The result is (see also p. 313 in Principles ofEconometrics, 4e):

A 8 I c I D I E I F I G H I I
1 SUMMARY OUTPUT I
-- -

2
3 Re!J.re.Ssion Slaffrilics
4 Multiple R 0_952446484
_,_
___&____ R, Sqi.rnre 0.925303234-
__§____ A<.ljusted_R Square 0.898048056· �
18.75005��2
-
7 St.ao<.lard Error
8 Observalicms 40.
I

9
Ta AN OVA
-,.,-1 dt SS MS F Sionifioance F
12 'R�egr•es si on
--

1� Residw1I
2:1 167916.5405 63'158.27027
351.5645932
236. 613213 6 7.06526E-22 �
14 TGt.al.
15
38
40
13359_45454
181275.9951 I
16 J Coefficients Stam;iar.d Error t Stat P- value_ Lower95% Uge_er9.5% Lower95.0% Upper95.0%
JI. lnter.cept 0 #NIA #NIA #N!A #NIA #NJA #NIA :#NfA
18 xi' 7&..684082:03 23._78672165 3 .30'7621263 0.00206413 30.52633316 126.641830 9 30.52·63 3316: 126.64163091
19 ;x2• 10_45100:89 1-3858!1�227 7 541002276 .. 4_6137E>E-a9 7_645418811 1J.256!i9899 7.645418811 132565'9899

8.4.2 Grouped Data: Wage Equation Example

8.4.2a Separate Wage Equations for Metropolitan and Rural Areas

If we assume that the error variances in the metropolitan and rural areas are different, instead of
estimating equation (8.7), we can estimate the following equation (8.13) twice-once for workers
living in a metropolitan area and again for workers living in a rural area.

WAGE = /31 + /32EDUC + f33EXPER + e (8.13)

We already have done this in Section 8.2.3c.

Now, if the assumption that the effect of education and experience on wages is the same for
metropolitan and rural areas is true, then better estimates can be obtained by combining both
subsets ofdata and applying a generalized least squares estimator to the complete set ofdata, with
recognition given to the existence ofheteroskedasticity. That is what we do next.
Heteroskedasticity 223

8.4.2b GLS Wage Equation

Given the assumption that the error variances in the metropolitan and rural areas are different, the
following wage model has homoskedastic errors:

(8. 14 )

where {ii = {jM for metropolitan areas observations, and {ii = {jR for rural area observations; {jM is
the estimated standard error from (8.8) using metropolitan area observations only (subsample 1
model), and {jR is the estimated standard error from (8.8) using rural area observations only
(subsample 2 model).

Note that model (8.14) does not have an intercept.

Go back to your cps2 data worksheet. In cells Ml:N2, and Pl:T2, enter the following column
labels and formulas.

M N p Q
1 O"-hat ='Subsample 1 y* X1*
metro= Model'!B7
2 O'-hat rural ='Subsample 2 =A2/IF(D2=1,$N$1,$N$2) = l/IF(D2=1,$N$1,$N$2)
-
Model'!B7

R s T
1 X2* X3* X4*
2 =B2/IF(D2=1,$N$1,$N$2) =C2/IF(D2=1,$N$1,$N$2) =D2/IF(D2=1,$N$1,$N$2)

Copy the content of cells P2:T2 to cells P3:T1001. Here is how your table should look (only the
first five values are shown below):

I M I N I 0 I p I a I R I s I T I
1 o--hat metrn = 5.641253 >J x;{ x.2· X·:{ x./
I-
2 u-hat rural = 3.904227 0.519949 01256133 3.329725, 0.512265. 0
3
I-
0.80.9379, Q1.25613J 3.329725 0.2§�1.:n 0
3.0iJ5921 3.84199'
-� - . - .

4 . 0]6829 Q·,2'.56133 0
5
I-
OJ36829 0�256133 3.073592 S..196245· 0
6 0.9425·68 01.256133' 3.073592 1.280663· 0

In the Regression dialog box, the Input Y Range should be Pl:PlOOl, and the Input X Range
should be Ql:TlOOl. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it GLS Wage Equation. Finally select OK.
224 Chapter 8

Input

Input :i. Range: 1 $" $1 : $!' $1 001 � eel


Input � Range: SQ$1:$r$1001 �

� Consjarnt is l_ero t!elp
0 Labels
D CgnMer:i� Le11e/: �%
Output Qptions
0 Qu1put R'anQe: . �1
@New Worksheet!':ly: I GL s Wage Eq 11ation
I

The result is (see also p. 315 of Principles ofEconometrics, 4e):

A I B I c I D I E F I G I H I I
1 SUMMARY OUTPUT
+
T
3 Rearessfon StatistJc;s
4 Multiple R ()_8917416094
'T R Square 0_805355646
6 Adj_u·sted R Square o _ s 013765352
7 �tandar<I_ Error 1 :00·1216522
I
�a - Obserwtions 1 0 00
9-
10 AN OVA
11 df SS MS F Sirmific-.11n c-.e f
12 Regressiori 4 413'1-057618 1032. 7644 0·4 1030.2562:2 0
13 Resi<Jua.l 996 998.424786'7 1_00243452'5
T4 Total 1000 5129.48240 5 I
15
16 J Coefficients Standard EtTOr t Stat P-v-11Jue L-ower95"% Uee!r!.l5% Lower95.0% Upper95.0%
�lnter(>il'pt 0 #NIA #NIA #NIA #NIA #N!A #NIA #NJA
x1 ' - 9 3 :10 3 6i 561
. 1.()195726'0 8 -9 -2170384 77 1_ f5706E-19 -113 9931 47•& -7-39740856 -11.39931476 -7.39740856·
1-1 �5720589 QJl68507954 17.45374644 l00115E-!i9 1.0�12841 OJ 1.330157075 1_061284103 1-331l157075
()_ 132208766 0Jl l 4548 50·2' 9.087443443 5-3446·1E-19 0.1 ()3659533 Q,_%0·758 o_ 103659-533 0.160758
JC4' 1.53BB03242 0.346285576: 4.4437405 0•4 9_!!3178E-06 0.8592702-31 2.218JJ 6253 0:8592702''.31 2.218336253
��: -

8.5 GENERALIZED LEAST SQUARES: UNKNOWN FORM OF VARIANCE

Consider the following more general heteroskedasticity assumption for the food expenditure
model:
(8.15)

Given assumption (8.15), the following food expenditure model has homoskedastic errors:

(8.16)

*
Y
i *
1
where the transformed dependent and explanatory variables are defined as Yi =
a , xi1 =
a
,
i i

Note that 8i = exp( a1 + a2ln(xa ) , where a1 and a2 are the least squares estimates of (8.17):
Heteroskedasticity 225

(8.17)

and ef are the squares of the least squares residuals from model (8.1).

Below, we first estimate (8.17). We will then calculate the transformed dependent and
explanatory variables and use Excel regression analysis tool to get the more generalized least
squares estimate of model (8.16).

Again, note that model (8.16) does not have an intercept.

Go back to your food data worksheet. In cells Nl:02 enter the following column labels and
formulas.
N 0
2
1 In(e-hati ) ln(x)
2 =LN(H2) =LN(B2)

Copy the content of cells N2:02 to cells N3:041. Here is how your table should look (only the
first five values are shown below):
I N I 0
1 ln(,e-hatl) ln(x)
2 3.5395§9 1.30562§
,_
3 4.093748 1-479' 32:9
I-
4 5.06291.5 1.558145
E Ei.8.03738 1.79&747
I-
6 6.329283 2:.S.2,3326

In the Regression dialog box, the Input Y Range should be Nl:N41, and the Input X Range
should be 01:041. Check the box next to Labels. Make sure the box next to Constant is Zero is
not checked. Select New Worksheet Ply and name it Log-Log Variance Function. Finally
select OK.

i Regressfon �L8]
Ini:iut
l["\put 1 Ral'l!Je:

Input?;, Ra11!Je.:·
$N$1:$N'$41

J.$0$1:$0$•n

[j1]
� .

�'!o_a!Jels 0 Coostint is. �em tfelp


D Confidence LEvel: EJ "Ja
Oulputopiions

0 Qutput R;ange: �
·@ Ne1111 Worksheet �ly: J VaFian11.e Fum:tlonl I

The result is (see also p. 317 in Principles ofEconometrics, 4e):


226 Chapter 8

A I B' I c I D E I F G I H I I
1 SUMMARY OUTPUT I
f-
2
.3 R,eare.ssion Slali:>lics

� Mult'iple· R 0_5723�9254
.5 R Sq�are ll.3275.97405
i6
i--
�dj u s't.e<:J R S;quare o 3 Oi990.2'6
_

7
1--
Stal'ldard Ermr 1-720854391
8 Ob.seMlfions 40
9
1I
1 o ANOVA
t
I

11 df SS MS F Si!J.nificarwe F
R139re-s ion s 1 54.82554 54.82554 18.51376:169 0.000, i 3872
$
1 3 Res.id'UaJ 38 ·112_5,30.9137 2,961339835
-

14 Toti
< :J 39 167.3'5 64537 I
15
f--�
l6 CDetfi"'r;iimts Standa!d Emir f Stat P-value Lower95% Uor;w .95% Lower95.0% Uoo.er95.0%
17 Jnter· c �pt 0.91719'654 1.583105245 o,_5923773s1 O.S.57106:301 -2.i67032452' 4.142'6255].2 -2.267032452 4. 1426255.32
f-
ln·�i
- -

18 2.3292-38594 [J_5413,35668 4.3027, 621' 1l1 D.DD01'1387:2. 1.233361 B37 3..425115351 1-2333&1837 3-42.5115351

Go back to your food data worksheet. In cells Ql:R2, and Tl:W2, enter the following column
labels and formulas.

Q R
1 ai-hat = ='Log-Log Variance Function'!Bl 7
2 ai-hat = ='Log-Log Variance Function'!B18

T u v w
1 a-hat y* X1
*
X2
*

2 =SQRT(EXP($R$1+$R$2*02)) =A2/T2 =1/T2 =B2/T2

Copy the content of cells T2:W2 to cells T3:W41. Here is how your table should look (only the
first five values are shown below):

Q I R s I T I u I v I w
1 •CJ.1-hat = 0.93ll97 o-hat y'* -
X1" xl"
2 1a2-hat = 2.329239 --
7.31155£ 1.5.758&2 0<.13677 (}_504fi,81
-

-
J. B.9'5GB96 15<_19'171 0.111721 0'.490454_

-
4 9.S.11386 1 2:1£342.,
.
0_ 10,1922
'
0'.484B1

-
& 12.95426 8.81430'2 0.077195 0_4'55484
6 30.19'306 6.19.5132 Q_(}3J12 Qi-41'3009·

In the Regression dialog box, the Input Y Range should be Ul:U41, and the Input X Range
should be Vl:W41. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it GLS Food Expenditure Equation 2. Finally select OK.
Heteroskedasticity 227

lhput
I R11nge; 1:�$41
Cani::el
Input ;g_:REl!lge; $\':$1�$\1\1$41
b[elp
[t] !..abets [t] is z_ero
D Coo5deni::e Level:

Output

. 0 Range: -- .

'
Regression
® New Worksheet �y: )enditJ.Jre Equation l7JLRJ
Irput
lw �
OK

The result is (see also p. 318 in Principles ofIEconometrics, I
� 4e):
Constarnt
I
SUMMARY OUTPUT @=J �.1�
�ptions
J. Slatislics
0.9'7588(}711
Qutput �1
Squar,e 0. %.2:34.31161 j I
ste-d R Square 0.!12.4773245
Error 1.54&73969
40

AN OVA
11 f
I
A
R�re-ssion I B I c
181&.712874! D
908-35&4372 E
319.6836 I F
2.19348E-25 I G I H I
J_

1
Res.idual 90·.9'H3'.J.9>'.3.5 2.392403667 - I
2 14 Total 1907.624-214 I
I Regression
�ultipleR
17 RIntercept
Coefflciertfs
0
StanclBJ.cl Error
#NIA
t Stat
#NIA
P-vafue
#NIA
Lower95%
#NIA
.Lowerc95�0%
#NIA
95.0%

,_Ji18
;i�JLI
--

7·6_0-5.379039 9'.713489955- 7.829708039 1.90952E-D9 56.38985818 95 5&.3891858.18 95.7177226


f--l_r
19 � tan�ard


1 0 63349'15 8 o 9-71514 2.81 10.94527563 2.61541E-13 6,666753754 12.600-2194
· 12.600219.4
!Observations
.. ..

t
6'
9'
To I

rJF SS MS f Sig_nifican:oe
2

13 38
>---
40 I
1�1
15·1 Upper95% Uooer
tlN/A #NIA
x1•; 11n22&,
x2' 8.666763754
CHAPTER 9

Regression with Time Series Data:


Stationary Variables

Chapter Outline
9.1 Finite Distributed Lags 9.3.1 t-Test Version
9.1.1 US Economic Time Series 9.3.2 T x R2 Version
9.1.2 An Example: The Okun's Law 9.4 Estimation with Serially Correlated Errors
9.2 Serial Correlation 9.4.1 Generalized Least Squares Estimation of
9.2.1 Serial Correlation in Output Growth an AR(1) Error Model
9.2.1a Scatter Diagram for Gt and G1-1 9.4.1a The Prais-Winsten Estimator
9.2.1b Correlogram for G 9.4.1b The Cochrane-Orcutt Estimator
9.2.2 Serially Correlated Errors 9.4.2 Autoregressive Distributed Lag (ARDL)
9.2.2a Australian Economic Time Series Model
9.2.2b A Phillips Curve 9.5 Forecasting
9.2.2c Correlogram for Residuals 9.5.1 Using an Autoregressive (AR) Model
9.3 Lagrange Multiplier Tests for Serially 9.5.2 Using an Exponential Smoothing Model
Correlated Errors 9.6 Multiplier Analysis

This chapter is concerned with the nature of autocorrelation, generalized least squares estimation
of AR(l) models, and tests for autocorrelation. Forecasting, finite distributed lags models, and
autoregressive distributed lags models, are also introduced.

9.1 FINITE DISTRIBUTED LAGS

9.1.1 US Economic Time Series

Open the Excel file okun. Save your file as POE Chapter 9. Rename sheet 1 okun data.

Below we plot the time series of some important economic variables for the US economy as in
Figure 9 .4 on p. 345 of Principles ofEconometrics, 4e.

In cells C2:C3, enter the following labels and formulas.

228
Regressions with Time Series Data: Stationary Variables 229

c
2 Du
3 =B3-B2

Copy the content of cell C3 to cells C4:C99. Here is how your table should look (only the first
five values are shown below):
·C

2 cl'u
-

� -0.1
-

4 -
-0.2

-
s. 0
5 0.2-
-

7 -0.2.

Select the Insert tab located next to the Home tab. Select C3:C99. In the Charts group of
commands select Line, and Line with Markers.
,2-() U:ne

���

!8J)B�
C.illumn Lint Pie Bar Area Statler Other
T T
Chol'l:s T

Cham
·.-_-- --·· - -

After editing, the result is (see also Figure 9.4(a) p. 345 in Principles ofEconometrics, 4e):

r : Chanc·e irn t'lu� U.:S. li.Jnemplcy mernt: Rate


Time Serieo

1.25

0.75

05
=
"Cl

0.25

-0.25

-0-5

1 s 9 13 11 2l! 25 29 33 37 itl 4.5 49 53 57 61 t>5 59 73 n s1 .a s &9 93 97

1985 Q:3 t.o200i9Q3

To plot the change in the US GDP series select cells A3:A99. After editing, the result is (see also
Figure 9.4(b) p. 345 in Principles ofEconometrics, 4e):
230 Chapter 9

Time Series: U.S. GDP Growth

ll

i
... 1
'i
e

c.
c 0
l!J
.,;
::i
·1

1 s 'l rn 11 n 25 29 35 37 41 45 49 53 57 51 65 59 73 77 s1 as ll9 93 97

'19850:3-to 2()0'} Q3

9.1.2 An Example: Okun's Law

Consider the following finite distributed lag model:

(9.1)

where DU is the change in the U.S. unemployment rate and G is the percentage change in Gross
Domestic Product (GDP) from quarter 2 , 1985 to quarter 3, 2009; t = 1, .. , T where T = 98.

In cells D4:G5 and H3:J4 of your okun data worksheet enter the following labels and formulas.

D E F G H I J
3 g gt-1 gt-2
4 g gt-1 g t-2 g t-3
=
A4 =
A3 =
A2
5 =
A5 =
A4 =
A3 =
A2

Copy the content of cells D5:G5 to cells D6:G99 and that of cells H4:J4 to cells H5:J99. Here is
how your table should look (only the first five values are shown below):

D I E I F I G I H I I I J
3 g gH gt-2
.-
4 g g,_, g,_� g,_, 1.4 2 1.4
5 1.5 1.4 2 1.4 1.5 1.4 2
·o 0.9 1.5 1.4 2 o:� 1.5 1.4
.-
7 1.5 0.9 1.5 1.4 1.5 0.9 1.5
.s 1.2 1.5 0.9 1.5 1.2 1.5 0:9
:g 1.5 1.2 1.5 0.9 1.5 1.2 1.5

In the Regression dialog box, the Input Y Range should be C4:C99, and the Input X Range
should be D4:G99. Check the box next to Labels. Select New Worksheet Ply and name it
Okun's Law Lag Model q=3. Finally select OK.
Regressions with Time Series Data: Stationary Variables 231

lnput
Input I'Range:
Cancel
Ihputl(_ Ramge:.

� Labels D con6tantisiero
.
D Con�ence Le�el� EJ %

OlI!put options
0 QutputRanQe: I �I
@New Worksheet E'.ly:: j 1 Law Lag Model q�3I

The result is (see also Table 9.2 p. 346 in Principles ofEconometrics, 4e, for Lag Length q = 3):

A I B I c I D I . E I F I G I H I I

1 SU MMARV
-
OUTPUT
l
2
l RegreS$ion Statistir;s
-
4 Multiple R 0. .&07716384
'
s R
-
Square 0.155240575 7
� Adjustecl R Square 0.636957124

7 Standard Error 0.174329325


-·-
.,
g Observations: <is
'
9
-- ·.
10 A N OVA
'
11 df SS MS F ' Significance F
_12 Regression 4 5.133677887 1. 2&341'9472- 42.23064622 6.76928E-W I
'
13 Residual 90 2.735164218 0.030390714
'
14 o al
T t 94 7.868842105·

1s I
16 I Coefficien t:s Standard Error t Stat P-value Lower.95% Upper.95% Lower.:J5.a% Upper35.0%,
17 lnter<tept
--
0.58-0974603 O.Q.53889266 10.780.8%58' 5.9l581E-18 Q.473914173 (),688035034 0.473914173 0.688035034

�g -0.1-02052639 0:-0930B144 -5.120369474 2.38823E-O& -:0.267539()1 -0.136466267 -0.26763901 -0. 1 3646626 7

19 gt-1 -0.164535169 0.-03581752 -4.5'937064 l.41082E-05 --0.235692922 --0.093377416 -0.235692922 -0.093377416


20 gH
-
-0.071555993 0.035304286 -2.02.68.35881 0.045638315 - 0 . 14169411 7 -0.001417869 -().141594117 -0.0-01417859

21 gl-3 0.003303021 0.-036260343 0.00100184 0.927622053 --0;058734477 0.07534052 -0.068734477 0.()7534052.

Go back to your okun data worksheet. In the Regression dialog box, the Input Y Range should
be C3:C99, and the Input X Range should be H3:J99. Check the boxes next to Labels. Select
New Worksheet Ply and name it Okun's Law Lag Model q = 2. Finally select OK.

Input
Input I Range.: p:$C$'J9 �
Input � Raflge: I $1'1$3:.$J:S99

!::!elp
0 labels. D Cons'taot i:s £era
D Con�dence Level� EJ·%
Outputoption5·
0 Qutput Range:
@ New Worksheet.�ly.: j; Law Lag Model q�� I

The result is (see also Table 9.2 p. 346 in Principles ofEconometrics, 4e, for Lag Length q = 2):
232 Chapter 9

A I B I .c I D E I F I G H I I
1 .SUMMARY OUTPUT
-
2

3 Regression Stat1stics I
4 Multiple R 0..80866'9257 I
5 �S q u are 0.653945967
.fi �dj u sted R Square 0. 542551596

-7 Standard Error o: 17:25'199 381


.& Observ.ations '96

10 AN OVA j
11 ] df -SS MS F Sjgnificance F
12 R�gre·ssio·n
- 3 5.179252055 1.726417352 57.95147879 3.954.28E-21
13 Residual 92 2.740747944 0.02979-0739

14 Tot.al '95 7.9'2


15

161 Caef1ir::ien ts Standard Error t Stat . P-vcilue Lower95% Upper95% Lower 95.. 0% Upper95. .0%
17 Intercept 0.583556Il2
. -
0.047211917 12.36035632 2..<J455E-21 0.48ffZ89173 0.677323052 0.489789'173 0.677323052

_!!!_ g -0,202021645 0.002383181 - 5 23 84743 73


. 1.3 3092E-08 -{).26'6�37437 -:0.13.7705 &54 -0.266337437 -0.137705854
19 gt-1 -0.16532588. 0.033535844 - . 9 4 <J29'70&45
'.3-5311)3f-05
- -0.231933946
- -0.0.98719.815 -0.231�Jo:33945 -0.098719'815

� gt-2 -0.0700134&5 0.03309997 -2..115.212952 CHJ37114442 -{).135752�1 -0.00427409>· -0.135752881 -0.00427409

9.2 SERIAL CORRELATION

9.2.1 Serial Correlation in Output Growth

9.2.la Scatter Diagram for G1 and G1_1

Go back to your okun data worksheet.

In cells K2:L3 enter the following labels and formulas.

K L
2 t-1

3 = A3 = A2

Copy the content of cells K3:L3 to cells K4:L99. Here is how your table should look (only the
first five values are shown below):
'I K I L ,I
I
2 g gt-1
-
3 2
-
4
-
1.4 1.:1
5 1.5, 1.4
-
6
- 0.9 1.5
7 1.5 0.9

Select K2:L99. Select the Insert tab located next to the Home tab. In the Charts group of
commands select Scatter, and Scatter with only Markers.
Regressions with Time Series Data: Stationary Variables 233

Sc.attel'

C0Lum111 Line- Pie lilar Area

After editing, the result is (see also Figure 9.5 p. 348 in Principles ofEconometrics, 4e):

Scatter Di agramfor gt and gt-i I


3

2.5
.
!I ll! ..
. . �
2
... � ....
.
*•!II II !II
I
•� i1 it!la � ii

1-5
.--� .. ....
• a
.. .
.. .. � ......
• !I• !It

1 • *
. .
. .
.... • •lo
tii D·.5
lo
0

..()< _ 5

-1

-1.5

-2

-2 -1 0 1 2 3

9.2.lb Correlogramfor G

Let rk be the correlation between Gt and Gt-k; in other words, it is the correlation between
growth rates that are k periods apart. The null and alternative hypotheses for a test of
autocorrelation are: H0: Pk = 0 and H1: Pk * 0. When H0 is true, the product of the square root of
the sample size and the estimated correlation rk has an approximate standard normal distribution:

(9.2)

Consequently, rk is significantly different from zero at a 5% significance level if ..ffrk � 1.96 or


..ffrk :::; -1.96, . Alternatively, we can say that rk if significantly different from zero at a 5%
significance level if rk > 1.96/VT or rk < -1.96/VT. By drawing the values :::; +1.96/VT as
bounds on a graph that illustrates the magnitude of each of the rk, we can see at a glance which
correlations are significant.

In cells Ml:N13 and Ol:P2 of your okun data worksheet enter the following column labels and
formulas.
234 Chapter 9

M N 0 p
1 lag rk LB UB
2 1 =CORREL(A3:A99,A2:A98) =-P2 =l.96/SQRT(COUNT($A$2:$A$99))
3 2 =CORREL(A4:A99,A2:A97)
4 3 =CORREL(A5:A99,A2:A96)
5 4 =CORREL(A6:A99,A2:A95)
6 5 =CORREL(A7 :A99,A2:A94)
7 6 =CORREL(A8:A99,A2:A93)
8 7 =CORREL(A9:A99,A2:A92)
9 8 =CORREL(AlO:A99,A2:A91)
10 9 =CORREL(Al1:A99,A2:A90)
11 10 =CORREL(Al2:A99,A2:A89)
12 11 =CORREL(Al3:A99,A2:A88)
13 12 =CORREL(Al4:A99,A2:A87)

Copy the content of cells 02:P2 to cells 03:P13. Here is how your table should look (see also
reported correlations up to four lags on p. 349 in Principles ofEconometrics, 4e):

,,, M I N I 0 I p I
I I
1 lag fk LB UB
-

2 1 0.495758 -0.19799 0.1979'91,


,_
3 z 0.425994 - 0 . 19799 0.197991
I-
4 l 0.115331 -0.1979'9' 0.1'.979'91
1-
5. 4! 0.2530691 -0.1979'9 0.1'9791
9

Note that your Excel results differ slightly from the one reported in Principles of Econometrics,
4e. By using the Excel function CORREL we constrained ourselves to computing the
autocorrelations by using (T - k) observations in the numerator and the denominator of the
correlation coefficient-an alternative, mentioned on p. 349 of your textbook, that leads to larger
estimates in finite samples and is given by

:LI=k+1 (gt - §)(gt-k - §) (9.3)


:LI=k+1 Cgt - 9)2
Select cells 01:013. Go to the Insert tab, select Line in the Charts group of commands, and
Line again in the Line options list.

2·[) Un:e

Pag11: Layout

You should get the following chart:


Regressions with Time Series Data: Stationary Variables 235

LB

1 2 3 4 5 6 7 8 !l 10 :l!1 12

-o.os -+----

--0.Jl -+------

-LB

-0_]5 ;-----

--0.2 +-------�

-0.2.5 ...________________
__ _

We would like to add to it, our upper bound values, and our correlation coefficient values. Right
click anywhere in your chart area and choose Select Data on the list of options that pops up. In
the Select Data Source dialog box, select Add. In the Edit Series dialog box, the Series name is
the one found in cell Pl and the Series X values are from P2:P13. Select OK.

. -----
Sele-c:;t Data Source

Cu! . ---------- - ·-

Edi t Series

f.ll�te Series l'.!.ame:

Res et to M'�tclll Style �I ' o lw 111 d a ta' ! $P$ 1


=_ _ _ __ _ _ _ ____ ,[i} = \

fO·r:lt... Legend Entries �eries) Serfes �alues;


l'-'-- o kuri d a ta.- 1 $? $2 : � 13 _...[00
_J =c


__

Cbange T�p·e...
=-_ _ _ _ ' _ _ _· _ _
Chart

j� -r+I
J;JL§
S.�lect D-ata,.. L6 OK
-� � ...

The Select Data Source dialog box reappears. Select Add again. Type Correlation for the
Series name. In the Edit Series dialog box, the Series X values are from N2:N13. Select OK.
The Select Data Source dialog box reappears again. Select OK one more time.

. ------
s.e!ect Data s.ource

, - ---- ---- - �

Ed it St>ries

R Series �ame:
Correlailor.i [ffi =

Series yalues�
J ='okun data'!$N$2:$N$13I ti] =

Uti
oKr::;J '--�·
o
� K=1k:;J [ cane.el

The result is:


236 Chapter 9

- LB

-UB

-Corne latiom

On your chart, select the Correlation series, right-click and select Change Series Chart Type in
the menu of options. Select Clustered Column in the Column group of chart type. Select OK.

Change Chart Type

I
Delete
Templates
lj Re·; et to. M�tch 'Styfe·

Change Se1ies Chart Tl(pe .. � l1bii Column

biflJ S�lect Dat� ...


� Line
I

Here is how your chart should look like:

-rn

- us

Select the horizontal axis in your chart, right-click and select the Format Axis in the menu of
options. In the Axis Options panel of the Format Axis dialog box, change the Axis labels
location to Low. Select Close.
Regressions with Time Series Data: Stationary Variables 237

------- --- �
'
.

, Format Axis

( �;; OpboosJ Axis Options


Number Interval Q.etween tick marks: l�o ___ 1
Fill ln'terllalbetween labels:
@ A!!tDmatic
'��---- •
Line Color
0 :2!Jecifyintervalunit
Line Style
D r v rs
!;.ategolies ln e e e order
Shadow
Label !i,istance from ·axis: ,_,10-0--- i
3'.0 Format /\)(is Type:
Alignment
® AutomaticaUy select based on data
0 Iextaxil:
0 '.Datea�s

Major tick mar.k type: I "I


Outside

Ad[fl.ti:!ajo• Gridllnes
Minor tick mark type: I "I
None
A111;1 MLn or G�udlines
�xisJabels:
�- Close ti]
I� f<>rma1 Axis ...
[1;

After your add axis titles and delete the legend, your chart should look similar to Figure 9.6 on p.
350 of Principles ofEconometrics, 4e:

0.6
OS
0.4
0.3
:c
'O
0.2
...
"'
'ii 0.1
...
.... 0
Q
u
-0.1

-0.2

-0.3

-0.4

1. z 3 4 .5 fi 7 8 9 10 1.1 1.2

Lag

9.2.2 Serially Correlated Errors

9.2.2a Australian Economic Time Series

Open the Excel file phillips_aus. Excel opens the data set in Sheet 1 of a new Excel file. Since
we would like to save all our work from Chapter 9 in one file, create a new worksheet in your
POE Chapter 9 Excel file, rename it phillips_aus data, and in it, copy the data set you j ust
opened.

!Insert Workshtet (Shlft."-lli) �

Below we plot the time series of some important economic variables for the Australian economy
as in Figure 9.7 on p. 352 of Principles of Econometrics, 4e.
238 Chapter 9

Select the Insert tab located next to the Home tab. Select Al:A92. In the Charts group of
commands select Line, and Line again.

2-D L!lne

bJ � �

After editing, the result is (see also Figure 9.7(a) p. 352 in Principles ofEconometrics, 4e):

Tim.e Series for A1.1s:traliarn Pri·ae lnfl.ation

25

2
1111

! 15
c
,IQ :ll

.!.5 0'5

(l

-()>5

-:ll

1 5 9 1317212529333741454953576165697377818589

19:8 JQ2 to 2009·0'3

In cells C2:C3, enter the following labels and formulas.

c
2 du
3 =B3-B2

Copy the content of cell C3 to cells C4:C92. Here is how your table should look (only the first
five values are shown below):
c

2 du

3 -0.1
4 -0.2

-0.1
-0.4

O•

To plot the time series for the quarterly change in the Australian unemployment rate select cells
C2:C92. After editing, the result is (see also Figure 9.7(b) p. 352 in Principles of Econometrics,
4e):
Regressions with Time Series Data: Stationary Variables 239

Time Series: ll.S. GOP Growth

1 5 9 1.3 17 21 2.5 29 33 37 4145 49 .53 57 51 65 59 73 7'7 81 8.5 8'9 '93 97

19115'Q3 t·o 2009 Q3

9.2.2b A Phillips Curve

Consider the following Phillips curve model:

(9.4)

where DU is the change in the Australia unemployment rate and INFL is the inflation rate from
quarter 2, 1987 to quarter 3, 2009; t = 1, . . , T where T = 90 observations.

In the Regression dialog box, the Input Y Range should be A2:A92, and the Input X Range
should be C2:C92. Check the boxes next to Label and Residuals. Select New Worksheet Ply
and name it Phillips Curve Model Finally select OK.

Input
I ·$A �2:·$A:$92
OK�
l11put)'. Range:
� Cancel
I
I11put�Range:
I $C$2::$C:$92 [�]
tielp
� �abels D Constant is �ero
D Confidence Level: �%
Output options

0 Qu:tput R�i;ie: �1
0 New W'?fkSheet �ly.: I Phinips Curve Model
I
0 New W.orkbook
Residuals
l!'.l B.eSiduals. D Resi�ual fll ots.

The result is (see also p. 352 in Principles ofEconometrics, 4e):


240 Chapter 9

A I !l c I D I E F I H I
-
1 SUMMARYOUTPlJlT
....__ - -

2
3 Regression Statistic;<;
....:!._ Mllltiple R 0.23822·694·
� RSqu<ire 0.055752075
__§__ Adju>ted R Square 0.046033348
7 Standard Error 0.621988587
8 OIJ.s.ervations 90

t
-

9
10 ANOllA. I
nl elf SS MS .F Significcmce F
_g_ Rt:�ression 1• 2.0481345334 2..048346334 5.294555.&65 0.023753914
13 Residual 88. 34.04454256 0.186869802
14 Total 89 36.0"9<2888&9 I

Coefficfe<nts· Standard Error t Stat P-vafue lowe/95% Upper 95";f Lower 95.0"'"
A Upper 9S.0%
t?J!_ntercep·t 0.777521'157 0.065824943 11.81347414 7,53029-E-20 0.6458'{}8019 0.90.8434495 0.545808019 0.908434495
18ldlu -0.52i8o3&47 0:229404373 -2. 301014095 0.023153�14 -0.'983757818 -0.071969&77 -0. 983757818 -o.bn9.59g77
19
2.0

-
2.1
22 RESIDUAL OUTPUT
+
23
24 'observation Predicted 1 Residuals
25 1 0.830407'642 0.66�592358_
-f- '--
-
26 2: o.ss3194Q27 o:si:&&05973
2J 3. !).li?.30407'642 0.96�592358·'

9.2.2c Correlogram for Residuals

In cells E22:F34 and G22:H23 of your Phillips Curve Model worksheet enter the following
column labels and formulas.

E F G H
22 laJ;? rk LB UB
23 1 =CORREL(C26:Cl14,C25:Cl13) =-P2 =1.96/SQRT($B$8)
24 2 =CORREL(C27:Cl14,C25:Cl12)
25 3 =CORREL(C28:Cl14,C25:Cl1l)
26 4 =CORREL(C29:Cl14,C25:Cl10)
27 5 =CORREL(C30:Cl14,C25:Cl09)
28 6 =CORREL(C3l:Cl14,C25:C108)
29 7 =CORREL(C32:Cl14,C25:Cl07)
30 8 =CORREL(C33:Cl14,C25:C106)
31 9 =CORREL(C34:Cl14,C25:Cl05)
32 10 =CORREL(C35:Cl14,C25:C104)
33 11 =CORREL(C36:Cl14,C25:C103)
34 12 =CORREL(C37:Cl14,C25:Cl02)

Copy the content of cells G23:H23 to cells G24:H34. Here is how your table should look (see
also reported correlations up to five lags on p. 353 in Principles ofEconometrics, 4e):
Regressions with Time Series Data: Stationary Variables 241

·� E I F I G I H
22 la.g r1c LB UR
23 1 0.552:909832 -0. 20 (j(j{)214 0.206()02:14
-

24 2 0.464008771 -0.226321306 0.22·6321306

"25 3 0.44840179 -0.226321306 0.22·6321306

-
26 4 0.44694�916 -0.226321306 U.22•6321305

E... 5 0.3667341"Ci8
·------ --
-0.225321306
- -- -------
0.226321305
----- -- ·-

Again, note that your Excel results differ slightly from the one reported in Principles of
Econometrics, 4e (see Section 9.2.lb for more details on that).

Proceed as in Section 9.2.lb to get the following correlogram for residuals (see also Figure 9.8 p.
353 in Principles ofEconometrics, 4e):

0.6

0.5

0'.4

..
Oc3
·"
0.2:
.i
I: 0 .. 1
a 0

-().1
-0.Z.

-tu

1 2 3 4. 5 5 7 B 9 10 11 12

9.3 LAGRANGE MULTIPLIER TESTS FOR SERIALLY CORRELATED


ERRORS

9.3.1 t-Test Version

Reconsider the Phillips curve model (9.4), restated below in a general form:

(9.5)

Assume the error et follows a first-order autoregressive AR(l) model:

(9.6)

Substituting (9.6) into (9.5) yields:

Yt = flt + f32xt + pet-1 + Vt (9.7)

One way to test the null hypothesis H0: p = 0 is to use at- or F-test to test the significance of the
coefficient of et-1 in (9.8):
242 Chapter 9

(9.8)

where et_1 's are the lagged least squares residuals from the Phillips curve model (9.4).

The estimation of (9.8) requires a value for e0. Two commons way for overcoming the
unavailability of e0 are (i) to delete the first observation and hence use a total of 89 observations,
and (ii) set e0 = 0 and use all 90 observations.

Below, we walk you through the t-test for (ii) only.

In cells D2:D4 of your phillips_aus data worksheet, enter the following column label, value and
formula.
D
1 et-1
2 0
3 ='Phillips Curve Model'!C25

Copy the content of cell D4 to cells D5:D92. Here is how your table should look (only the first
five values are shown below):
D

2 e1
•. 1

3 0
4 0.815805
>--
5 0.95959'2
>--
,5 0.811233

7 0.922.379

In the Regression dialog box, the Input Y Range should be A2:A92, and the Input X Range
should be C2:D92. Check the box next to Labels. Select New Worksheet Ply and name it t-test
Version of LM Test. Finally select OK.

Regress.i lln t7)�


Input
I OK�
Il'lput°)'.Range: 1�$2:$Mm �
I .si:: 5.2: �};5.'n I Cancel I
lflllClt.�Rilllge:
�l
!:!elp
�labels D c�stont is ;i;ero
0 Confidence Level: �%
Ou�u t OP·OO� S·

0 Qllllput R.arn�e: �1
@New Worksheet �ly: I Version of LM Test
I

The result of the t-test is highlighted below (see also p. 354 in Principles ofEconometrics, 4e):

A I B
I c I D I E I F I G I H I I

16 Cor=fficirm ts Standard Error t Stat P-valur= Loower95% Upper95% Lower95.0% Vpper9S:.0%


17 lnter<:ept 0.775458153 0.055128&03 l'L06629771 3.98583E-24 o.·665833591 (L&85032616 0J}65&83691 0.&8:s0326i6
-

-
18 du - 0 57!1358279
. , 0.1'93570737' -3.5()780(!34- 0.0007-17437 -1.064299813 --0.2944167ZS· -:1.064.2998::13 -Q.29·4415725
19 et-1 0.55&7-83928 0.030096701 6.202046518- t:82193E-08 0.3797069&3 (l.737860872. 0.379 7069&3 0.73.7&60872.
Regressions with Time Series Data: Stationary Variables 243

9.3.2 T x R2 Version

The T x R2 version of the Lagrange multiplier test is the one we worked with in Section 8.2.2a.

Consider the following auxiliary regression:

(9.9)

where et 's are the least squares residuals and et-1 's the lagged least squares residuals from
model the Phillips curve model (9.4).

Once again, the estimation of (9.9) requires a value for e0. Two commons way for overcoming
the unavailability of e0 are (iii) to delete the first observation and hence use a total of 89
observations, and (iv) set e0 = 0 and use all 90 observations.

Below, we walk you alternative (iv) only.

Go back to your phillips aus data worksheet. From there go to the Regression dialog box. The
_

Input Y Range should be C24:C114 from the Phillips Curve Model worksheet. The Input X
Range should be C2:D92 from the phi/lips aus data worksheet. Check the box next to Labels.
Select New Worksheet Ply and name it Auxiliary Regression. Finally select OK.

-
R�res.sion ---- - -- -- l1J['g]
Input
OK_w
InpJt l( R1ilnge: I c 524: SC$114 [�l Cancel
I
!np.Jt�Range:
1��2::l'i):?92 �
��abels. D Constant is f.ero
!jelp
I
D Coo!jdence Le�el: @=] %
OU1pat cp1ions
Q Qutput R;ar:ige: 1·1-E'!'""""'"�" 11 · �1
0 New Wbrksheet �ly1 j Auxiliary Regression!

The results we are going to use for the Lagrange multiplier test are highlighted below:

j A I B I c I D I E I F
1 Sl.llMMARY OUTPUT
2
3 Reg!.ession Statistks
TIMuWpleRR Square
Adj l.l st e d R.Square
0.5-5369814
0.306581•63
0.29-064097&
Stan d.ard Error 0.520908923
Obse rvations '90

-fo-1.A.NOVA
Il l df SS MS F Significance F
�Reg< e<•Pn '2_ UJ.43743135 5.218715674 19.23269-051
--
1. 2.1143 E-071
Res i dual ,g7 23:0011u21 0.271346106.
liota l 89 34.04454256
244 Chapter 9

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.

q I' Lllgrange Multiple


i r Tescl/�[

In it copy the Lagrange multiplier test template you created in Chapter 8.

The degrees of freedom for the Lagrange multiplier test are equal to the number of hypotheses
being tested or number of parameters in the null hypothesis. When used to test for
heteroskedasticity based on the variance function (Section 8.2.2a), the degrees of freedom also
corresponded to the number of parameters in the auxiliary regression minus one. This is not the
case anymore. We make the appropriate modifications to our template to reflect that.

Additionally, we replace the following reference: [POE Chapter 8.xlsx]Variance Function by


Auxiliary Regression.

A B c
1 Data Input N= ='Auxiliary Regression'!B8
1
2 R = ='Auxiliary Regression'!B5
3 a.=
4 m=
5
6 Computed Values x2-cntlca
. . 1 va1ue= =CHIINV(C3,C4)
8 La2ran2e Multiplier Test x2= =Cl *C2
9 Conclusion= =IF(C8>=C6,"Reject Ho","Do Not
Reject Ho")
10 p-value= =CHIDIST(C8,C4)
11 Conclusion= =IF(C10<=C3,"Reject Ho","Do
Not Reject Ho")

At a = 0.05, and with m 1, the result of the test 1s (see also p. 355 in Principles of
Econometrics, 4e):
A B
1 Dara Input N= '9-0
2 H2= 0.306582
3 1a= 0.05
4 m= 1
5
6 Computed Values :x2-criticaJ value-= 3.84145'9

7
8 l- = 27.59235
9 Lagrange Mul1ipHerTes1 Goncllilsion = Reject Ho
10 p--v:alue = 1.S.IE-10'7

11 Gonc.lusion = Reject Ho
Regressions with Time Series Data: Stationary Variables 245

9.4 ESTIMATION WITH SERIALLY CORRELATED ERRORS

9.4.1 Generalized Least Squares Estimation of an AR(1) Error Model

9.4.la The Prais-Winsten Estimator

Reconsider the Phillips curve model (9.4) where the error et is assumed to follow a first-order
autoregressive AR(l) model (9.6). The following Phillips curve model (9.10) has an error term v,
that is homoskedastic and uncorrelated over time (see Appendix 9C pp. 397-399 in Principles of
Econometrics, 4e for more details ):

(9.10)

where y; =Yt - fiYt-l' x;1 1 - fi, x;2


= = Xt - fixt-l for t = 2, 3, ..., T. Note, for t = 1:
Yl = .J1 - fi2y1, x;1 .J1 - fi2 and x;2
= = .J1 - fi2x1. Finally, fi is the least squares estimate
of (9.11):
(9.11)

where et 's are the least squares residuals from the Phillips curve model (9.4).

The process of first estimating model (9.4), second using the least squares residuals et 's from
(9.4) to estimate model (9.11), and third using the least squares estimate fi of (9.11) to transform
the dependent and independent variables and estimate model (9.10), is similar to what we have
done in Section 8.5.

Note that both models (9.10) and (9.11) do not have an intercept.

Getting an Estimate p of the Autocorrelation Coefficient

We already went through the first step of the process and estimated model (9.4) in Section
9.2.2.b. The data we need to estimate (9.11) are in our Phillips Curve Model worksheet. From
there, go to the Regression dialog box. The Input Y Range should be C26:C114, and the Input
X Range should be D4:D92 from the phillips_aus data worksheet. Uncheck the box next to
Labels. Check the box next to Constant is Zero. Select New Worksheet Ply and name it AR(l)
Error Model. Finally select OK.

Note that we are losing one observation as there is not - 1 0 residual value corresponding to
the firstt = 1 residual value.
246 Chapter 9

"'

Regres.s.io n
l1.J �
Input
Input 1 Range:1 I :$C �zo :$C$114 �
Ol('t;J
cancel J
lrlput6_Range; I '!$]�4�$0$92 �
Di,.abels � Constant is.:f_ero
!::!elp J
D Coofliknce Level: �%
Output options
0 Qutput.Range: I· ...
. �,. . J·�I
@Ne'lll Worksheett!:ly: I AA(t) Error Model I
The result is:

A B I c D I E I F
I
G I H I I
1 SUMMARY OUTPUT
,__ I
2 r
3 Regresslo·n St:atistics
I
: .
4 Multiple R 0.55292.2734

j
5 R square n.:3osn:355,

6 Adjusted R square 0.29435'9'914
I

j
7 5.ti'lnqard
-
Error _g.51�3.71611 I
f-- 1
8 Observations 89
:
� j
9
I
10 AN OVA
iii I df SS MS .F Sign ific an,c;e .F
12 Regression 1 10.27114506 10.27114606 38.75066.255 L5466E-Q8 I
,__


13 Resi·d,ual 8& 23.32504257 0.255057302 I j
14 Total 89 33.59618861 I ..I
15

171
16 Co.efficients Standard Error tStat P...vafue lower95% Upper95% l.ow.er 95. 0% Up•per 95, 0% ,
lnfe[cept o, 4lN/A .t!N/A tlN/A ttN/A ltN/A flN/A UN/A
18 X Variable 1 0.549B&l589 0.08&'!3434:9 5. 2 2500 '!016 1 L59'542E--OS: 0.374335537 0. 725427542 0.3743•35637 o. 72542 7542

Getting the GLS Estimates (or b1 and b2

Go back to your phillips_aus data worksheet. In cells Fl:Gl, and Il:K3, enter the following
column labels and formulas.

F G
p-hat = ='AR(l) Error Model'!B18

I J K
2 y
* X1* X2*
3 =SQRT(l-G1/\2)*A3 =SQRT(l-G1/\2) =SQRT(l-G1/\2)*C3
4 =A4-$G$1*A3 =1-$G$1 =C4-$G$1*C3

Copy the content of cells 14:K4 to cells 14:K92. Here is how your table should look (only the
first five values are shown below):
Regressions with Time Series Data: Stationary Variables 247

F G I H I I I J I K I
1
- p>-hat= 0_549882

2 y" lll1"" Xl*"


- -
3 1-2528639.3 0_8.35243 -0_083 52 426
-
4
-
0_87517762 0-450118 -Cl.14501184
5
-
(1_8652013 'O_50118 0_009'97631 s
6 0_8f021314 .-0-450118 -034501184
-
7 0.71()21314 0.450118 0.219952636

In the Regression dialog box, the Input Y Range should be I2:I92, and the Input X Range
should be J2:K92. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Prais-Winsten estimates. Finally select OK.

------- -- -
Regression
��
Input
.InputY Range•

Inpu'tK Range:·
I $1s2-:·$I$9 2_
I $J:$2:�$92
�]

� I

t!elp
El �aoels El Ccinsrant is !;_em
D Co11:fjdenc11: Le11E:I: EJ %
Ou'lput Qptions

0 QLJ1put Range� I 't-:e.- �1


® New Wmksllleet [fr: I!linsten estimates ! I
The result is:

A I � I c I D I E I F I G I H I I
2
3 Regression St<11tisfics
4 Multiple R 0.5:).5057.872-
+-
5 RSqUJare 0.378295186
5 AdjUJsted R Square 0.359&57734-_
7 Standard Error 0.51578159&
8 Obse-rvati ans 9{)•
9
1-0 A NOVA
uj elf ss MS f Si!Jaiflcanoe F
12 Regression 2 14. 30030569 7.150153344 26.773250&8 8.o8792E-10
13 Resi·dlual 88 23.50157241 0.207063323
1,4 Total 9{) 37.801R791
15
16 Coefficients Standard Error tStat P-vafue Lower:95% Upper95% lower�5:0% Upper 95.D%
17 ln1;ercept - (), UN/A #N/A - tlN/A #N/A #N/A #N/A #N/A
>---
18 xl"' 0. 7&:5&377Hi 0.119563257 6. 5 7 25-68664 3.'.H&31E-09 0.548230&72 1.02344455· '0.548230S72 1.02344456
--
19 x.2"' -Q.59'942()8615 0.242-8047.5-6 -2.:800514353 0.00498436'9 - l.1819502B& -0.2.15903-445 -1.18195028.& -0.2:11,)90]445

Note that these results do not match those in equation (9.45) in Principles of Econometrics, 4e, p.
362. What we have described is a simple two-step estimation process sometimes called the Prais­
Winsten estimator. However, there are advantages to "iterating" this procedure, as we describe in
the following section.
248 Chapter 9

9.4.lb The Cochrane-Orcutt Estimator

Reconsider the GLS model (9.10). To estimate it, another option is to not include a
transformation for the first observation as we did for the Prais-Winsten estimator, and proceed
with the estimation on the basis of T - 1 observations only. We then repeat the GLS estimation
process outlined in Section 9.4.1a until the least squares estimate b1 and b2 from model (9.10) do
not change in value. This iterative procedure is known as the Cochrane-Orcutt estimator. Below
we walk you through the first two iterations of this process.

Note that the omission of the first observation is not, in general, a good strategy. It simplifies the
calculations, however, so we will use this trick. You might want to test your understanding by
extending the iterative process we describe below to include the first observation.

First Iteration

Go back to your phillips_aus data worksheet. In cells M2:03 enter the following column labels
and formulas.
M M 0
2 Cochrane-Orcutt estimator
3 y* X1* X2*
4 =
I4 =
J4 =
K4

Copy the content of cells M4:04 to cells M5:092. Here is how your table should look (only the
first five values are shown below):

>--
I M I N I 0
2 Cochrane-Orcutt estimator
..___
3 Y" Xi* Li:�
4 o_s 75177616 0-45011 g - 5
0 - 1 4 012
5 0.86520129'8
-
0.450118 .O.OO!i97{1
- ···
6 0.810213139 0.450118 -0.345012
t--
7
1--
0_ 710213139
··- -
0_450118
-
0-219953
8 0_ 9{\5201298 0-450118 -0.6

In the Regression dialog box, the Input Y Range should be M3:M92, and the Input X Range
should be N3:092. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Cochrane-Orcutt estimates. Finally select OK.

, R�;r- e- ss-i o-n - -


-- -----
. (Zji'8J
11'1put

Input ! Range: $1"1$3:$M$9;2 �


lnpufX Range: f !lNf3:��2 �
t\Elp
0b.abels � Corist3nt 1s �ere
D Conjjden(e Level: �%
Output options

0 Qutput Rang�: I t-{e; VE£?.•r• 11' IJ �l


(!) New Worksheet !'Jy: I 1ne-Orrutt esbmatesl I
The result is:
Regressions with Time Series Data: Stationary Variables 249

�MolUple R
A I B I c I I) I E I F Ei I H I I
1 SUMMARY OUTPUT 1 I

I
I
-

2
3 Regn>ssion Stat.istics
0.599724987
11 square 0.359ff70061
ACljus.ted RSquare 0. 34081'.5693 I
Standard_ Error. 0.5164-04118
Obse·rvati ans 39 I
j

-ioiANOllA. I I

ul df SS MS F Sig nificwice .F
1
1Ui·egressi o·n 2 13 .031641.5-5 6.515820774 24.43372U 3.95754E-09

11e�idUiiiil 37 23.20056952 0.266673213


1
14 jTota l 39 36.232211(17 I I
l5

161 Coeffi'cients Standurd Error t stat P-value lower95% UppeT95% lower.95.0% Upper !!5. 0%
#NjA #N/A #NjA 14N/A ttNjA tJN/A .J!N/A

�'"'"""•'
0

18 xi• 0.751!J.840IJ.9 0.1217:26.s 33 6.252409164 l.45899E-W 0.51!1139429 1.003-02Jl749 Q.51913'9429 1.003028749


!9 112' -Q.691678-IJ.28, 0.242736'933 -2.849499748 0.005465372 -1.174144756 --0.20'92.12901 -1.174144756 -0.209212'.l-Ol

Second Iteration

Go back to your phillips_aus data worksheet. In cells Ql:R2, and T3:U4, enter the following
column labels and formulas.

Q R T u
1 b1 =
='Cochrane-Orcutt estimates'!B 18
2 b2 =
='Cochrane-Orcutt estimates'!B 19
3 e-hatt e-hatt-1
4 =A4-$R$1-$R$2*C4 =A3-$R$1-$R$2*C3

Copy the content of cells T4:U4 to cells T5:U92. Here is how your table should look (only the
first five values are shown below):

Q R s li u

1 b1 = 0.76101!4

z b<� = -0. 69167'9

3 e-ha� e-hat._1
4 0_80058015 0.669748{}3
5 0_96974803 ,0.8�D?.�D.li5
5 I)_ 76224438 .0.%'.9748-03

7 0.9389Il:591 0.76224438
& 0.72390861 : 0. 9.3 8915'91.

In the Regression dialog box, the Input Y Range should be T3:T92, and the Input X Range
should be U3:U92. Check the boxes next to Labels and Constant is Zero. Select Output Range
and specify it to be cell Al in your AR(l) Model worksheet: you can place your cursor in the
Output Range window and move it to that cell to do that, or type 'AR(l) Model'!Al in the
Output Range window. Finally, select OK.
250 Chapter 9

Input
Inpt<t 'f_ R"nge:
I $TS:3:$T$92 � c ..ncel
Input ! 1Range:
I $!.l$S:$J$92 (i]
t:!elp
�kabels [ti consta!'lt•is.�ere
D Con.5dence Lev el:: �%
Output options
E) Qutput Range.: r M ode'f ! $:11:$1) 00

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.

. -�-------�

Micro soft Office Excel

? Regression - Output range will o'l/erwrite e)(istfng, data. Press OK to Clverwrit:e darn in range

The result is:

A I B I c I D I E I F I G I H I
1 SUMMARY OUTPUT - - -
�· •A• ,�

2
3 Regression Statistics
4 Multiple R 0.5 50169009
5 R Squ a re 0.313789408
,_
6 A.dju>ted R Square 0.302425772
7 Standard Error 0.513441029

8 Ob.s.ervations 89
9
10 AN OVA
ll df SS MS F Significance F
12 Regression 1 10.60827273 10. li08Zn73 40.24051548 9. 74557E-09
13 Residual 88 23.19870879 0.253621691

14 Total 89 33.80598151 I
is I
Coejficien ts

15 I st.anaa,.d Ermr t Stat P-valu.e lower.95"A Upper.95% lower9s.o% Upp.er.95.0%
1 Intercept 0 #N/A tlN/A #N/A f4N/A ijN/A #N/A #N/A
18 e-tiatt-1 0.557261:979 0.087847144 6.343541241 9.42502E-Q9 0.3S2684i45 0.7ll&39714 0.382684245 0.731839714

Go back to your phillips_aus data worksheet. Notice that in cell Gl your p-hat value has been
updated, and so have your transformed dependent and independent variables in columns M-0.

In the Regression dialog box, the Input Y Range should be M3:M92, and the Input X Range
should be N3:092. Check the boxes next to Labels and Constant is Zero. Select Output Range
and specify it to be cell Al in your Cochrane-Orcutt estimates worksheet: you can place your
cursor in the Output Range window and move it to that cell to do that, or type 'Cochrane­
Orcutt estimates'!Al in the Output Range window. Finally, select OK.
Regressions with Time Series Data: Stationary Variables 251

- - ----- ----- ----

� Regression 12J(g]
Input

Input l Range: 1$M$3:�$92. � Cancel


In,put � Range: $1'1$:3:•$0$92 �
t!eJp
0!.abels 0 Constamt is z_ero
D con�dence Level: EJ%
Output c.ipnons

® Qutput Range.: Iumates'!$A$ll �

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range. The result is:

-
A I B c I I) I E I F I G I H I I I
1 SUMMARY OUTPUT I
- t
2.
I
3 Regression SttJtJstics
I
4 Multiple R 0.594570371
\
5 II Square 0.353513926
- j
6 Adjusted R Squar.e
-
0.334588798
\
7 Sta rndard Error 0.51638.3048
+
8 obs2rvatloa� 89j
j
9
� t
10 AN OVA I I I
J
11 I (ff SS MS F SJgn:ffirnnce f
+
6.34279335 2,3.78683"218 ,5 .991n4E�09

� ""'"
2 12.6&55867
'
l
Residual 87 23.19667537 0.25
. 5651452
t
Tot�I "' "' 89 35.&8425307
15
16 Coeffidents Standard Error tStat" P-volue l.ow.e-r95% l/pper95% l.()Wet95.Q% Upper95.0%

__!!_ lnte-r.Cept 0 tlN/A #.N/A i!N/A 'llN/A ftN/A t!N/A llN/A


18 xP
-
0. 750875315: 0.113745111 5.148580627 2.3{)397E-08 0.51491&525 L006&34105 0.514915525 Ll()06834105
19 x2� -!0.69434112 0.242333141 -2.868157253 0.005331068 -L1771370B -0.211485205 -1.177197033 -0.2;1148.5206

We have run two additional iterations. The third and fourth iterations give identical estimates at
the fourth decimal place level of precision. We can thus say that after a total of four iterations, we
obtain the following stable estimates (see also p. 362 in Principles of Econometrics, 4e):

INF= 0.7609 - 0.6944DU + et


(9. 1 2 )
(se) (0.1238) (0.2429)

et = 0.557et-l +Vt (9. 1 3)


(se) (0.088)

The table below reports the Cochrane-Orcutt estimates obtained for each iteration:

Iteration 1 2 3 4
fj = 0.5499 0.5573 0.5574 0.5574
b1 = +0.76108 +0.76087 +0.76087 +0.76087
b? = -0.69168 -0.69434 -0.69439 -0.69439
252 Chapter 9

9.4.2 Autoregressive Distributed Lag (ARDL) Model


We consider the following autoregressive distributed lag ARDL(l,1) and ARDL (1,0) models:

( 9 . 1 4)

(9 . 1 5 )

In cells W3:Y4 and Z2:AA3 of your phillips_aus data worksheet enter the following labels and
formulas.
w x y z AA
2 inf1-1 dut
3 inf1-1 dut dut-1 =A2 =C3
4 =A 3 =C4 =C3

Copy the content of cells W4:Y4 to cells W5:Y92 and the content of cells Z3:AA3 to cells
Z4:AA92. Here is how your table should look (only the first five values are shown below):

w x y z AA
2 in1f1-.1 du1

3 in�t-1- du1 du�-'1 2 - Q 1


.

..
4 1.5 -0.21 -0.l 1.5 -{). 2.
"'
5 1.7' -0.1 -0.2 1.7 -0.1
,,,.
6 1.& -0.4 -0.1 1.8 -0.4
. ,,.
7 1.&· 0 -0.4 1.8 '°
,,.
& 1.7 -0.6 0 1.7 -1()1.5

In the Regression dialog box, the Input Y Range should be A3:A92, and the Input X Range
should be W3:Y92. Check the box next to Labels. Select New Worksheet Ply and name it
ARDL(l,1) Phillips Curve Model. Finally select OK .

. � - ------ _:___ -

: Regression rrJ �
Jnput
OKl4J
Input '!.Range: �$3:$A$92 �
I $W$3:$Y$92
Cancel I
-Input ;:;_Range:
��
t:j_elp
�Labels D constant. is ?'._ero
D Con[idence Le\iel: �-010
Output options
0 .Q_utput Range! 1rrel
0 Nev-i Worksheet Ely: j iiilips C u rv e Model! I

The result is (see also p. 364 and p. 365 in Principles ofEconometrics, 4e):
Regressions with Time Series Data: Stationary Variables 253

A I B I [ I () E F I G H I I
1 SUM MARY OUTPUT
2 � I
Reg1-.!ssion Statistits
3
4 Multiple R Cl.590704343
i t-

5 R Square 0.348931.621 I

tI
6 A'djusted R Square Ol.325952737
i-- -
7 Standcird Eho� 0.522078251
I---
8 Obs;ervmtions 89 I

I
f
9
10 AN OVA I �
11 df SS MS F Slgnifiwnce F
u Re:gresslo_n 3 12.41663373 4.13&87791 15 .18488
- 113 5.370171E-08
--
13 Re·sidual
-- .85 23.15808537 0.27255571
14 Total 88 :i5:.5847191 I
15 j
16 Coefficients Standard Er.-ro.r t Sto't P-value Lower95% Uppei95% lower35.0% UppeI95.0%
17 lriter.cept 0.333•53253 0.089902785 3.711 03515.32 0.000367565 0.15488171 0.512383349 0.15488171 0.5 1238334"3

18 inft,1 0.559·267573
- 0.09-079 624� G.rn1s-s9153 2.3386E-08
- ().378740314 Ci. 739'.794832
-- - i0.378740314 Q.7397948-32
-
19 dut -0. 63Bl85 2-25 0.24'98-7037 -2.7541M9'89 0.00719'5323 -1.184994454 -0. 191375997 -1.184994454 -0.1913759>97
i--
20 dtlt-l [J.319852527 0.25·750'41
{ 11 1.242515119 0.217463514 -0.192034325 0.8319:19579 --0.192034325 0.831913957.9
-

Go back to your phillips aus data worksheet. _

In the Regression dialog box, the Input Y Range should be A3:A92, and the Input X Range
should be Z2:AA92. Check the box next to Labels. Select New Worksheet Ply and name it
ARDL(l,O) Phillips Curve Model Finally select OK.

! Regression
' �����-·
����--

L1JIBJ
1np&t
OK[$j
lnput y_ Range : I$A$2:$A$92 li:l
I $Z$2:$AA$92
Cancel
l
lnput-;ilJ_ange:
1�1
�!,:abels D Constant is �ero
tfelp
l
D Con[idence Level: EJ %
Output options

0 Qutput Range: �1
0 Ne\l'I Worksheet �ly: I ARDL(l,O) Phillips Curv1 I

The result is (see also p. 364 and p. 365 in Principles ofEconometrics, 4e):
254 Chapter 9

A I B I G D E I F I G I H I I
I

r
.1 SUMMARY dUTPUT I
2 -
3 i?egr2ssion 5tDtistics
� Molti pl • " 0.5&8552275

R Square 0.34639378'1
A.dju!rted R Square ().33136835
Standard Error 0.52.072602.6
Ohservratiom 90
-

-7o-1ANOVA
nj
II f Sigl'lificonce F
rlf SS MS F

12: R-eg.n:_es sio n 2 12.50'235224 6: 2- '5117612 23.{)5383428' 9'.2523 BE-09


13 R·esidual .&7 23.590.53555 0.2 711555'94

14 Total s.9 36.09'2.888f>�,


15

1sl Coefficients S!c;r1cford frrot tStat P-110/ue lower9S",.6 Upper95% Lci-wer9S.0% Upp·er 95.-0%
��I ntercept ().35479511 0.0870023cB' 4.(150064624 0.()0011085 0.18{)675:191 0.528914.lll 0.180675991 0.5:28914229
l& i n ft- 1 0.528247247 (>.0850 75625' 6.2<()91490 71 l.765BE-Qll 0.35'9150231 0.697344263 0.359150231 0.6'97344263
15 d ut -0.490864743 O.lcl214913a, -
2. 554002.891 0.012370485 - 0 . 8'72781353 - 0.10894753 3 -0.872 781953 -0.1'()8947533

9.5 FORECASTING

9.5.1 Using an Autoregressive Model

Consider the following autoregressive model of order p = 2:

(9.16)

where G is the percentage change in Gross Domestic Product (GDP) from quarter 2, 1985 to
quarter 3, 2009; t = 1, .. , T where T = 96.

Go back to your okun data worksheet.

In the Regression dialog box, the Input Y Range should be H3:H99, and the Input X Range
should be I4:J99. Check the box next to Labels. Select New Worksheet Ply and name it AR(2)
Model. Finally select OK.

: R�gre�---- - - t1J�
Input
ii OK�
rnp1Jt ]'._Range: !$Hp$H$99 �]
[$!$3 :$1$99 I Cancel l
Input �Range:
[�
�!,:abels D Constant is f'._ero I !::!elp
0 C on[idence level : �%
01Jt put options

Q"gu\ put Range: �1


@ Ne"N Worksheet E:ly: [ AR(Z) Model I

The result is (see also p. 371 in Principles ofEconometrics, 4e):


Regressions with Time Series Data: Stationary Variables 255

A I B I c I D I E I f I G H l
1 Sl.UMMARY OUTPUT -- -
-

_1__ I /?egres:Sfoii Statistics

4
--
Multiple R 0.5 39149511

-
5 RSqusre 0.2'905821:95
6 Adjusted R Square 0.275428049
-
;
7 Standar-d Error 0.5526875.12
8 Observations. '9
6

9
-

10 ANOVA I

-11 I df SS MS F Significance F

-
12 Regres�ion 2 11. 6417.9165 5.8208%823 19.05594-643 1. 15904E-07

13 Re�rdual
-
93 28. 40810419 0.30· 5453486
14 Total '95 40.0i\98<:/5.83
-

15

15 Coefficiwits: Sto:ndord Error ,t :Stat P-value Le>wer9:S% Upp.er95% LDWE'r95.0% Upper95.0%


2_7�lnfercept 0.465726171 0.1432575&8 3.250970363 0. 0016()2369 o.rn124s.om 0. 7502()7342 0.181245001 G. 7502.07342'

18 i gt- 1 0.377001484' 0.10002()955 3.769225()()] 0.000287355 0.17:83795'95 0.575523273 0.17837%95 (}.575fi.23273

19l gt-2 0.246239399 0.102863812 2.3"!3722589 0.0186&6-064 OAJ419&233 0.450516467 0.0419623 3 (}.45(}516467

Once estimated, the AR(2) model in (9.16) can be used to forecast GDP growth into the future.

Let Gr be the last sample observation; the forecast for GDP growth 1 quarter into the future
(Gr+1), 2 quarters into the future (Gr+z), and 3 quarters into the future (Gr+3), are given by (for
more details see pp. 372-373 in Principles ofEconometrics, 4e):

(9. 17)

(9.18)

(9.19)

The estimates of standard error of forecast error for GDP growth 1 quarter into the future (o\), 2
quarters into the future (82), and 3 quarters into the future (83), are given by (for more details see
p. 374 in Principles ofEconometrics, 4e):

(9. 2 0)

(9. 21)

(9. 22)

where 8v is the estimated standard error of the regression model (9.16).

Finally, the lower limit (LL) and upper limit (UL) of a forecast interval for Gr+ j are given by:

(9. 23)

(9. 24)
256 Chapter 9

where tc is the 100(1- a/2)th percentile from the t-distribution with T - K degrees of freedom,
and K is the number ofparameters in the autoregressive model (9.16).

Below we create a forecast interval template, similar to the prediction interval template we
created in Section 4.1.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Forecast Interval.

I
.

!Jnsert Worksheet (Shi-fhF11l �


q r Eorecastlnterval ,·
,

Create the following template to construct forecast intervals. In the bordering columns and rows
you will fmd the numbers of the equations and the formatting options used, if any, in the
template.

A B c
1 Data Input Sample Size T= ='AR(2) Model'!B8
2 Confidence percentage

Level= 0 decimal place

3 K= ='AR(2) Model'!Bl2+1
4 cr-hatv= ='AR(2) Model'!B7
5 8-hat= ='AR(2) Model'!B17
6 81-hat= ='AR(2) Model'!Bl8
7 Srhat= ='AR(2) Model'!Bl9
8 YT-1= ='okun data'!H98
9 Yr= ='okun data'!H99
10
11 Computed a= =l-C2
Values
12 dfor m= =Cl-C3
13 tc= =TINV(Cl1,Cl2)
14
15 Forecast
16 y-hatr+i = =C5+C6*C9+C7*C8 (9.17)
17 y-hatr+2= =C5+C6*Cl6+C7*C9 (9.18)
18 y-hatr+3 = =C5+C6*Cl7+C7*Cl6 (9.19)

D E F
14 Forecast Interval
15 a-hati Lower Limit Upper Limit
16 (9.20) =C4 =Cl6-C13*Dl6 =Cl6+C13*Dl6
17 (9.21) =C4*SQRT(l+C6/\2) =Cl7-C13*Dl7 =Cl7+C13*Dl7
18 (9.22) =C4*SQRT((C6/\2+C7)A2+C6/\2+1) =Cl8-C13*Dl8 =Cl8+C13*Dl8
(9.23) (9.24)

Here are the results you should get (see also Table 9.7 p. 374 in Principles ofEconometrics, 4e):
Regressions with Time Series Data: Stationary Variables 257

B c D E f
1 Data Input Sample Size T = '96

2 Confidence Level= 95%

3 K= 3

4 o--h;n,. = 0.55168751
5 Ii-hat= 0.46572617
6 8'1-hat = 0.37700148

7 l>__rhat= 0.2-162394!
8 }'T-1= -'L

9
t 10_8
YT=
10
p,aied Villues 0..-05
11 C'ol!l:l
12 dform00 93
13 ti: = 1.9'8 5301 77
14 Po.recast Inten'al

15 Forecast u-ita� Lower Limit upper Limit

16 �'-natT�i= 0.7U0794& 0.552687512 -rum4S36 1..815607318

17
y-batT�:i = 0_9a 343 +72 0_500659S4!1 - Ot . 2 l 94 '9863 7 2:106.3'680716

18 y-hatT-3 = 0.99445191 0.6'28452165 -0.2535299'12 2._42.433722

9.5.2 Using an Exponential Smoothing Model

Go back to your okun data worksheet.

In cells Rl: US, enter the following labels and formulas.

R s T u
1 T= =COUNT(A2:A99)
2 T/2= =Sl/2 ghat a =0.38 ghat a =0.8
3 ghat1= =AVERAGE(A2:A50) =S4*A2+(1-S4)*S3 =S5*A2+(1-S5)*S3
4 a= 0.8 =$S$4*A3+(1-$S$4)*T3 =$S$5*A3+(1-$S$5)*U3
5 a= 0.38

Copy the content of cells T4:U4 to cells T5:U99. Here is how your table should look (only the
first five values are shown below):

R I 5 I T I u I
1 T='98

2 T/2=·49 ghat u=e.3S ghata.='0.8
� �

3 ghat1·= I 1.<138 775 1.4077551 1.42404082



4 (le= 0.8 1.881.55102. 1.64290531

5 Cl·'" -0.3& 1.4963102 1.55'060129

5 1.499126204 1.53B728
1 l.<019185241 1.29145114

Select the Insert tab located next to the Home tab. Select T2:T99. In the Charts group of
commands select Line, and Line again.
258 Chapter 9

After adding the US GDP series (actual, cells A3:A99) and editing, the result is (see also Figure
9.12(a) p. 377 in Principles ofEconometrics, 4e):

Ex:ponenil:ial Smoothed Forecasts with a= 0.38

0
----· gh at a=0.38

-1
-- g-adu"I

1 5 9 13 1711 25 29 SS 37 41 45 49 53 57 61 65 5·9 73 77 8185 89 93 97

To plot the change in the US GDP series select cells A3:A99. After editing, the result is (see also
Figure 9.12(b) p. 377 in Principles ofEconometrics, 4e):

Exponenitial Smoothed Forecasts with a= 0.8

----- g h at ct=\J.lt

--g-actual
- 1

1 5 g 13 i7 2125 2•9 3'3 37 4) 45 49 53,57 6165 6973 77 8185 89 9397

9.6 MULTIPLIER ANALYSIS

We consider the following ARDL(l,1) model to describe Okun's law:

(9.25)
Regressions with Time Series Data: Stationary Variables 259

In cells W3:Y4 of your kun data worksheet enter the following labels and formulas.

=A3

Copy the content of cells W4:Y4 to cells W5:Y99. Here is how your table should look (only the
first five values are shown below):

wl x I y

3 d'lft-1 g1 gt-1.
4 -0.1 1.4 2
I-
5 -0.2. 1.5 1.4
-
fj ( ) 0.91 1.5
7 0.2. 1.5 0.9
f--
g -0. 2 1.2 1.5

In the Regression dialog box, the Input Y Range should be C3:C99, and the Input X Range
should be W3:Y99. Check the box next to Labels. Select New Worksheet Ply and name it
ARDL(l,1) Okun's Law Model. Finally select OK.

lnput

. Input 'i. Range: I $C$3:$C$99


OKW
C!lncel ]
�nput �Range: I $W$3::$Y$99
tfelp
� !oabels D Constant is: :i'._ero

D Con[idence Level: EJ %
Output,o ptior.s

0 Qutput Range: �1
@New Worksheet Ejy: I , 1) Oh.m's La1o11 Medell I

The result is (see also p. 381 is Principles ofEconometrics, 4e):


260 Chapter 9

A B I c I 0 E I F G I H I

1 .SUMMARY OUTPUT
f--
I
2

3 Regression Statistir;s
1
4 Multiple� 0. £33126945 .
Square 0.69410()506
]
� R
'
5 Adju.sted R Square 0.54
8 125523
4
J_ Standard Error 0.162.277406
1
B O bservati ans 96
1

"i�ANOVA I Significance F
I
'
u I df .SS MS F
'

t
12 Regression 3 5 .4'3 7276.00 8 1.8:32425336 69.58412576. 1.40!186E-2J
f-- -- i
13' Residual 92 2.4:2272399 2 0.026333956
14 Total 95: 7.92 I 1
15

wI Cc>efficient� , Stan clard E.rror t Stat P-VQ{Ue lower95% Upper9.5% £ower95.0% Upp1n95.D%

2?.._l_n tercept o. 378U1()424 0.057839-8• 5.53547254.'i 3.47005E-0'1· 0.2.6313.5591 0.49288..5256 0;.2.63135591 0.4928852551

� dut-1 0.35011576 0.084.572.96. 4.13980731.6 7.59455E·05 0.182146564 (].518°(}84955 0.182146564 ()i.51.8084955 :


19
- gt
0 . 184084.286 0.0.3069842'9. -5.9965376:ll 3.9124E·08 -0.2.45054014 -a:123114557 -0.245054014 0 , 12 3•1145 5 7
I
· ·

20 ,gt·l -o'.-099.155204 0.03682442.S -2.692647515 0.008423035 -0 .17229 1 691 5 -a.02601.s114 -ci.17229•1695 -0.025018714-

Estimates from (9.25) can be used to compute estimates of the impact multiplier and the delay
multipliers for the first 7 quarters (see pp. 66-72 in Principles of Econometrics, 4e for more
details):
(9.26)

(9.27)

(9.28)

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Multipliers.
lb .� I
I J[nsei:tWarksh�Et (Shitt+FllJ)
q Multipliers,:: f

Create the following table to compute the lag weights. In the last column you will find the
numbers of the equations used in the table.

A B c
1 Data Input 80-hat = ='ARDL(l,1) Okun's Law Model'!B19
2 81-hat = ='ARDL(l,1) Okun's Law Model'!B20
3 81-hat = ='ARDL(l,1) Okun's Law Model'!B18
4
5 Computed Values .i Bi-hat
6 0 =Cl (9.26)
7 1 =C2+C6*C3 (9.27)
8 2 =C7*$C$3 (9.28)

Copy the content of cell CS to cells C9:C13.

The result is (see also p. 381 in Principles ofEconometrics, 4e):


Regressions with Time Series Data: Stationary Variables 261

l A I B I c

1 Data lnpl!lt ·OtJo-bat = - 0 _ 1 8 40 8'1.2 86


- -+-

2 .81-hat= -0 . 0�9 1 5 52{14


-

3 B1-haf = 0.35011.576

4
-

C o m pa1e d Values j B111at


5
E.i 0 -0.184!084:...86
-
7 1 -0 _ Hii36G6Gl 4
-

8 2--0.057281044
-

9 3 -0.020054996
- -

10
-
4 -0.00702157

-
11 5 -0 . 0024 5 8 3 62

12 6 -O _ Q0086G7 1 l
- -
13 1 -0 _ 000 3' 01 349

Select cells B5:Cl3. Go to the Insert tab to the left of your tab list. Select the Scatter button in
the Charts group of commands, and select Scatter with Straight Lines on the menu of Scatter
chart type.
scatter


l:_
� [8]
• I'.,_
l"-S
Clnarts •

r,,
8
I ,&lll Chad 1yp�1 ...

After editing, the result is (see also Figure 9.13 p. 381 in Principles ofEconometrics, 4e):

Delay Multipliers Okun's Law ARDL(l�l) Model

0 1 2 3 4 5 7 8
CHAPTER 10

Random Regressors and Moment­


Based Estimation

CHAPTER OUTLINE
10.1 OLS Estimation of a Wage Equation 10.2.2 With a Surplus Instrument
10.2 Instrumental Variables Estimation of the Wage 10.2.2a First Stage Equation for EDUC
Equation 10.2.2b Stage 2 Least Squares Estimates
10.2.1 With a Single Instrument 10.3 Specification Tests for the Wage Equation
10.2.1a First Stage Equation for EDUC 10.3.1 The Hausman Test
10.2.1b Stage 2 Least Squares Estimates 10.3.2 Testing Surplus Moment Conditions

10.1 OLS ESTIMATION OF A WAGE EQUATION

Consider the following wage equation:

ln(WAGE) = /31 + f32EDUC + f33EXPER + f34EXPER2 + e (10.1)

where WAGE is hourly wage, EDUC is years of education and EXPER is years of work
experience.

Open the Excel file mroz. Save your file as POE Chapter 10 Excel file. Rename sheet 1 mroz
data.

In your mroz data worksheet enter the following labels and formulas.

AA AB AC AD
1 ln(wa2e) educ exper exper2
2 =ln(M2) =L2 =Y2 =AC2/\2

The data set includes information on working and non-working women. lfp is a dummy variable
which identifies labor force participation: it is set to 1 if a woman is in the labor force and 0

262
Random Regressors and Moment-Based Estimation 263

otherwise (you can find this variable in column G). We will use data on working women only -
which span from row 2 to row 429 only.

Copy the content of cell AA2:AD2 to cells AD3:AD429. Here is how your table should look
(only the first five values are shown below):

AA I AB I AC I AD '
_1_ ln(w.age) educ expeir exper2
2 1.210154 12 14 1%
-
3 0.328512 12 5 25
-
4 1.514138 12 15 225
5 0.09'2123
- 12 6 36
-
-
6 1.5242n 14 7 49

In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be AB1:AD429. Check the box next to Labels. Select New Worksheet Ply and
name it OLS Wage Equation. Finally select OK.

------- ----- � �-·--

1 Regre ssion ��
Input

Inpu t Y, R.;,nge: $AA$! :$AA$<129 00


Input� R�nge: $AB$1:$AD$429 �
tlelp
�\oabels D CmstanE is ?_ero
D Conlidence Level: E] o;.
Output options

Q Qutptit Range:
0 New Worksheet. E'ly: loLS Wage Equation! I

The regression analysis results are (see also p. 407 in Principles ofEconometrics, 4e):

A I B I c I D I E F I G H I I
1 SUMMARY OUTPUT --
- �
-I
2
3 fiegressfon Statistic:;
L
4 Multiple R 0.3%005.544
5 R SqUiire 0.15f>&203'H
6 Adjusted R Square 0.15(}854497
7 Standa_rd Err.or 0.665420217

8 Observations 428

10 AllJOVA

11 rif SS MS F Significn.nce-F
12 !1egress· ion 3 35.02229647 11.674{)9'8.&2. 26.2861534 1. 3.0'177E-15

B ilesidual
�-
14 Total + 424
427
188.305144
223.3274405
0.4441159-06;

15
16 Cor>jfic:ients Standard Error tStat P-value Lower.95% UppE'r95% lowerc95.0% Upper95.0%
17 Intercept -0.52_2,()4()559 0.198632066 -2..623178668 i'.l.00889594 - Q . 91245670<7 -0.131614411 -'0.912466 707 -
0 . 131614411
educ 1.93993E-13 Cll.079 f>B".3 fiS, 0.135295598 .0.0796836S 0.135L'l5598
>---
18 O.Hl7489539 0.014146478. 7.598332005

� ex per 0.04156'551 0.013175198 3.15491HiC105• 0.0017i9848 0.0156596'7() 0:()'57463345 '0.015669676 0.[16'746334.5

20 exper2- -0.00P&11U3 0.00039,3242 -2.062833676 0.03973684 ·0!.00158414 -3.8241541E-05 -'0.00158414 -3.824P4E-05


264 Chapter 10

10.2 INSTRUMENTAL VARIABLES ESTIMATION BASED ON THE 2SLS


ESTIMATION PROCEDURE: ONE INSTRUMENT

The instrumental variables estimators, derived using the method of moments, are also called two­
stage least squares (2SLS) estimators, because they can be obtained using two least squares
regressions.

10.2.1 With a Single Instrument

10.2.la First Stage Equation for EDUC

In the case of a multiple linear regression model with one instrument, the first stage equation has
the random regressor or endogenous variable as the dependent variable, and the instrumental
variable plus all the exogenous variables as the explanatory variables.

Let MOTHEREDUC be our instrumental variable; the first stage equation for EDUC is:

EDUC = /31 + {32EXPER + {33EXPER2 + {34MOTHEREDUC + v (10.2)

Go back to your mroz data worksheet and enter the following label and formula.

AE
1 mothereduc
2 =U2

Copy the content of cell AE2 to cells AE3:AE429. Here is how your table should look (only the
first five values are shown below) :
AE

1 mothereduc
2 1.2
3 7
4 1.2
5 7
6 12

In the Regression dialog box, the Input Y Range should be AB1:AB429, and the Input X
Range should be AC1:AE429. Check the boxes next to Labels and Residuals. Select New
Worksheet Ply and name it 1st Stage Eq. for EDUC 1 IV. Finally select OK.
Random Regressors and Moment-Based Estimation 265

-es-s-io-n----------�
,' Re-gr ··t1J�
Input

Input Y. Range: 1 $AB$t:$AB$4Z9 m �


Input �Range: I $AC$! :$AE$429 [jj] �
�!:j_elp
0babels D Comt�nt is ;1:ero
D Con[!dence Level: �%
Output options

0 Qutpu"t R<mge:
@New Worksheet l'_ly: j ige Eq. For EDUC 1 lll I
0 New '.t[orl<book
Residuals
0 Re.sidual> D Residual Plots

The result is ( see also p. 415 in Principles ofEconometrics, 4e):

A B c D I E I F I u H I I
1 SUMMARY OtJTPUT
2

3 Regre!fSTon Statistics
4 Multiple R o().3907681}37
.,___
5 R Square 0.15269411

c6 Adjusted R5quare -0.146699()21

7 stand::ird Error 2.11109 %13


f-
g Observations 428

r
:9

u ANOVA

11 I df ss MS F Significance F
12. Regr·e�sion I 3 340.537.8336 113.5126112 25.46-986611 3.61726E-15

13 Residual 424 18851.65·8428 4.456741576

14 Total 427 2.2.3().19626'2

is I
�151 Coefficients Stan rtanf Error tStat P-valu£ Lower95% Upper 95% Lo.wer.95.0% Upper.95.0%
17 lnter·cept 9.77510269 0.42388.8615 2:3.06054547 7.5742:"3E-77 S..941917936 10,6082:8739 8.94191791!6 10.60828739

18. expe-r ().0488615 0.0416fi9;1.6 L172603007 0.241613422 -0.03 3042541 -0.130765541 ·'D.033().<!'1541 0.130765541

19< �xper2 --0. 001281065 0:0012449C>fi -1.029045855 0.304044787' -0.00372802 --


0.00116589 -O.OG372&02 0.00116589

2() mothleredu.c I 0.267690&09 0'.()3112"97917 .


8 . 599182717 L56823E-16 0.206502871 0.328878747 i0.2065'02871 0.32&878747
21
.,___
�22.
23.
i---- r
24 RESIDUAL OUTPUT
25
- --
r-
r �
26 Observatio,n P1edicfed educ Residuals

t---
27 1 13.42036467
-
-l.4203t;;t665

-
·--

t---
28:

291
2'
3'
lf.86121-92
· 3

13.4320752'8
o.1387.son

-1.43207528.1 I

10.2.lb Stage 2 Least Squares Estimates

In stage 2, we obtain the predicted values of our endogenous variable, in this case EDUC, from
the estimated first stage equation (10.2) and insert them in the original linear regression model
(10.1) to replace the EDUC values. Then, we estimate the resulting equation (10.3) by least
squares:
(10.3)

where EOOC are the predicted years of education from the estimated first stage equation (10.2).
266 Chapter 10

Go back to your mroz data worksheet and enter the following labels and formulas.

AG AH AI
2
1 educ-hat exper exper
2 ='1st Stage Eq. for EDUC 1 IV'!B27 =AC2 =AH2/\2

Copy the content of cells AG2:AI2 to cells AG3:AI429. Here is how your table should look
(only the first five values are shown below):

AG I AH I Al
,
1 edu.c·hat exper expe1r·
-
2: U.42035 14 �fj
--:--
3 11.86122 5 25
-
4 13.43208 15 225

5 11.89'59'9 6 35
-
5 13.26565 7 49

In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be AG1:AI429. Check the box next to Labels. Uncheck the box next to Residuals.
Select New Worksheet Ply and name it Stage 2 LS Estimates 1 IV. Finally select OK.

Input

I $AA$ l : $AA$429 1�1


Input '1_ Range;
OK�
Input �Range; $AG$ l : $AI$429
Cancel
l

ttelp
0!..abels D Constant is 'lero ]
D Cori[,idence Level: �%
·Output options

0 Qutput R.;mge: �I
0 New Worlo5heet EJy: j 2 LS Estimates l IVI I

The result is (see also p. 415 in Principles ofEconometrics, 4e):


Random Regressors and Moment-Based Estimation 267

A
I B
I c D I E I I'
I 'G
I H
I I
I

t t
1 SUMMA.RY OUTPUT
- -
� t- t-
2
-

I
3 Regression Statistics
4 Mu�tiple R 0.213515'(169
-
5 R �guare
- 0.04.5588(.\85
15
-
A�justedR Square 0.038.835775
t
7 Sti'tndar.d Ernror 0.709015788
-
8 Ob >eFV a tio ns 428

;�ANOVA I
11 I df SS MS F Significance F

t
12 Regression 3 10.1812.0431 3.3-937347691 · 6.750 958574 0.00018608.S.
19 Resid1;1al 424 2.13.1462351 -0.502703387
-
14 Total I 427 223.3274405 I
15

16 Coe/fiGlerits Standrrrcl Error t Stflt P-va/u,e Lawer<J5% Upver-95% lower95,0% Upper95.. D%


17 Intercept 0.19&186077 0.493342657 0.431720942 O.fi880'H815 -0.77151572& 1.1578&7883', -0.7715.1572& 1..1078&7883

18 educ-hat 0.049'262951 0.039056201 L2.61334918 l(Jl.207M1716 -0 02750492 0.12603{)829' -0.0275'0'492$' 0.126030829


-
. 3

19 exp er 0.044855B49 0.0141!)4402. 3.lt\6801504 '0.'001652599' 0.01701455· ()i,Q7269'703S. O.Cl170i465 0.072��7()3.g,


20 exper.2 -0.000!922076 0.000423969 �2.174867599 .Qi.030:'!.'92009 -0.00175541'91 -8. 873 3 7E-OS 0.00175541'91 -
8 873371E 05
.
-

Note that while using this two-stage least squares approach yields proper instrumental variables
estimates, the accompanying standard errors are not correct (see also p. 412 in Principles of
Econometrics, 4e).

The correct standard error of the instrumental variable estimator of Pk is estimated using equation
(10.4) below:

(10.4)

where (10.5)

and (10.6)

In equation (10.6), fj1, fj2, /j3 and /j4 are the least squares estimates from equation (10.3).

Go back to your mroz data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any.

AK AL
1 wage equation IV estimates using 2 SLS, 1
instrument
2 fJi-hat1v = lh-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B 1 7
3 lh-hat1v = lh-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B 18
4 fJ3-hat1v = fJ3-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B 19
5 (l4-hat1v = (14-hatstaee 2 LS = ='Stage 2 LS Estimates 1 IV'!B20
268 Chapter 10

AN
1 e-hat1v2
2 =(AA2-$AL$2-$AL$3*AB2-$AL$4*AC2-$AL$5*AD2Y'2 (10.6/

Copy the content of cell AN2 to cells AN3:AN429.

AK AL
7 N= ='Stage 2 LS Estimates 1 IV'!B8
8 K= ='Stage 2 LS Estimates 1 IV'!B12+1
9 O'-hatsta2e 2 LS= ='Stage 2 LS Estimates 1 IV'!B7
10 se(Pi-hat)stm 2 LS = ='Stage 2 LS Estimates 1 IV'!Cl 7
11 Se(f}2-hat)sta2e 2 LS = ='Stage 2 LS Estimates 1 IV'!C18
12 se(f}3-hat)stm 2 LS = ='Stage 2 LS Estimates 1 IV'!C19
13 Se(f}4-hat)sta"e 2 LS = ='Stage 2 LS Estimates 1 IV'!C20
14 a-hat1v = =SQRT(SUM(AN2:AN429)/(AL7-AL8)) (10.5)
15 se(P1-hat)1v = =(AL14/AL9)*AL10 (10.4)
16 se(fJi-hat)1v = =(AL14/AL9)*AL11 (10.4)
17 se(f:h-hat)1v = =(AL14/AL9)*AL12 (10.4)
18 se(fJ4-hat)1v = =(AL14/AL9)*AL13 (10.4)

The result is (see also standard errors estimates on p. 415 in Principles ofEconometrics, 4e):

-
AK I AL I AM I AN

1 wage equation IV estimates using 2SLS,1 i_nstrument .e-hat1/


2 j31-hat1V= j31 t-hatstage 2 LS= 0.1'98186()8 ()_·000699
-
3 132-hatlV: 137,-hat!il:age 2 LS"' 0.04915295 0.43Sl19

4 �-hatlV= 111:-hatstage 2 LS= o. 0;4!\&55 85 0:{167302


i-
5 j34-hat1V ��-hatstage 2 LS=
= -0.00092208 ()1_&70785-
·-

£> ()_L35H26
:�
7 N= 428 ()'_1()84702
8 K=4 0.565 ·39
:�
9' 11-bat;i.�t.l LS= 0.709(11579 ()'_{;88703

IO
- se(IJrEl.aO,,.go l LS =·Cl.49334166_ 0336899

11 se(p�-ll:at),til�•H.S. 0_0390562
= ()_048497

12 seQirlilal},..g•l Ls.= t)_()t41644 ()<_(J216i1

13
,_
se-(jJ4-bilf);t.igelLS.= 0_00042397 (1'.1_13528

14
,_
cr-bat1v =.0.6 7.�603 55 ()_()0306&_
-

5 se(P1-bat)TI.= 0-472�7�23 ()_174914


:- - -

16 s e(�2-.tia t)I'lr =·0.03743603 0.141"433


,_

7 se(ll'J;-hat)IV = l}_{IB57682 l}.594541
,_
IS se(P1rhat)1v= 0.(}00406-3S 0.{)22506

10.2.2 With a Surplus Instrument

10.2.2a First Stage Equation for EDUC

In the case of the multiple linear regression model with two instruments, the reduced form
equation has the random regressor or endogenous variable as the dependent variable, and the two
Random Regressors and Moment-Based Estimation 269

instrumental variables plus all the exogenous variables as explanatory variables. In Section 10.2.1
we used "mother's education" as an instrument; let us add "father's education" as an additional
instrumental variable. The first stage equation for EDUC is:

EDUC= /31 + f32EXPER + f33EXPER2 + f34MOTHEREDUC + f35FATHEREDUC + v (10.7)

Go back to your mroz data worksheet and enter the following label and formula.

AF
1 fathereduc
2 =V2

Copy the content of cell AF2 to cells AF3:AF429. Here is how your table should look (only the
first five values are shown below):
II ••
AF I

1 fath.ereduc
2 7
,_
3 7

4 7
,_

5 7
,_
6 14

In the Regression dialog box, the Input Y Range should be AB1:AB429, and the Input X
Range should be AC1:AF429. Check the boxes next to Labels and Residuals. Select New
Worksheet Ply and name it 1st Stage Eq. for EDUC 2 IV. Finally select OK.

Input :f RangB: I $AB$1 :$AB$429 �


Cancel
Input � Rang8: !$AC$1 :$AF$429 [m:;}
t!Blp
0kabels D Constant is �ero
D ConEidence Level: �%
Output options
0 Qutput Range) �l
e NBW WorkshBBt e_ly: j 1ge Eq' for EDUC 2
I
1vl
0 New �orlobook
Residuals
� &e:;iduals D ResiQ_ual Plots

The result is (see also Table 10.1p.416 in Principles ofEconometrics, 4e):


270 Chapter 10

A B I c I !l E F I G I H I I
1 SUMMARY OUTPUT
2
3 Regression Statistics
4 Multiple R 0.4598591354
f--
5, R Square 0.2:11470525
5 Adjus.ted R Square 0.204014():83
'!
f-
Standard Enror 2.038957457
g, observation� I 4.2&

+a-IA NOVA
11 I df SS MS "' Sig·nificance F
12 Regres.s.io,n 4 471.62-09982 117.9052:4'%
- 28.35041288
- 5.87297E-2l
-
13 Res.I dual 423 1-758,575253 4.1573&833
-
14 Total 427 :2230.1962·62. r
1s i
1.6 Coeffi'ci:entS Standard Error tst,at P--vo/!Je lower95% Upper95% lower95,0% Vpp�i:9SO%
17 lrn�er.cep.t 9.10264011 I0.4'26551357 21.339579.27 4.09847E-69 8.264195239 9.941083981 8..2641%239 9.9410&3981
lil
- exper 0.0452.254.23 I0.040250712. 1.123593117 0. 2'6i82i938 -0.0338'90891 0.124341737 - 0 033 890-891
. 0.124341737
19 12xper2. -0.001009091 0.0012-03.345 - 0 8 385 71743 0.402183285
. -0.003 3 74371 0.001356189 -0. 003 374 371 0.000355189
20 mothereduc 0.157597033
- o.mss,94116 4390509>167 1.429ME-05 O.G.87043 9'93 0.228150073 0. -
087()4399'3 0.21815007.3
21 'fathered11c I 0.18954841, 0.033756467 5.61517327fi 3.5615.1 E-08 0.123197107 0.255899714 0.123197107 G.255899
, 714

22
1-
23
1--
24
-
25 RES.IDUAl OUTPUT
f--
26

'271 observation Predir:teri !.'di.it: Re.siduafs


28 1 12.. 756D1J47' -0. 75601 711 73
i-,.-
29 2 11.73355 8'(}5· 0:2564419471
f---
30 3 12..77197925 -0. 7719 792,5�1 I

10.2.2b Stage 2 Least Squares Estimates

We obtain the predicted values EDUC from the estimated first stage equation (10.7) and insert
them in the original multiple linear regression model (10.1) to replace the EDUC values. Then,
we estimate the resulting equation (10.8) by least squares.

(10.8)

Go back to your mroz data worksheet and enter the following labels and formulas.

AP AQ AR
2
1 educ-hat exper exper
2 ='1st Stage Eq. for EDUC 2 IV'!B28 =AH2 =AQ2/\2

Copy the content of cell AP2:AR2 to cells AP3:AR429. Here is how your table should look
(only the first five values are shown below):
Random Regressors and Moment-Based Estimation 271

n
AP I AQ I AR I
1 educ-hat exper exper2
I-
2 12_ 7560!75 14 196

3 1L7335 "31 .5 r

4 12Ji'l 9 793 15 215


-

5 11-7676835 6 36
,_
5 13: 9 146148 7 49

In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be APl :AR429. Check the box next to Labels. Uncheck the box next to
Residuals. Select New Worksheet Ply and name it Stage 2 LS Estimates 2 IV. Finally select
OK.

! Reg ressi1rn
· --------� --

(1J l8J
Input
Input "f:. Range:
Inp ut b: Range:
1$AA$1:$AA$429 rni]
$AP$ i: $.AR$429 �
� I

t:!elp
�!,_abels D Constant is E_ero
D Conf_ide.nce Level: 6=]�10
Output options
0 Q.utput Range: l�l
@New Work$heet �y: I z LS Estimates 2 IV
I

The result is (see also p. 416 in Principles ofEconometrics, 4e):

A I B I c I D I E F G I J-:! I I
1 SUMMARY OUTPUT
f---
2
L
31 Re:i
r reMioa Sratisfi�
Mul!ipJe R 0.2.2:3:1202?4
� +
5 R Square 0 _049'782634

I Adfust�d � Square o_ ol '.J.()59390


7 Standard Error Q 7·0 7456266
8 }obose:rvabons
_

42S

-1�IArliovA
-H I df SS MS F Sig_nificanoe F
J! Regression 3 11_ 11182834 3]05942779' 7 -4045£4396 Hi 1.541 E-05
lJ' Besidual
f---
424 212.20961 21 .0.500494368·
14 Total 427 223.3:274405
15'
f--- -
11& I Coefficients Standard Error t Stat P-walue foweI 95% UeE.eI 95% lower!l5_0% Ue£!.er 95.0%
JI�Intercept 0•.048100303 0.41975647.5 0.114590972 0.906823579 -0.7769&2371 o.8731629176· -o_n6962371 0 . 87316297ip
1 B_J educ -h at OcO 61396628 0_0329623 56: 1J�626286J-8. Qi.0632059 -0.003 393342 0.126186598 -0.003393342 0_ 126186598
� exper
_

exper2
0·_044170394
-()_00()89897
0_01408437 3_ 136128625 0_001ll3:1158
4 118 -2_ 1344Q775-S: O_OJJJ-82:382
0_0002
0.0164-86515
-O_OIH72683
0_071854274
-7_11Q91E-05
0_01648!)515,
-Q_001725B3
0.0'71854274
-7_ 11091E-D5

While using the two-stage least squares approach yields proper instrumental variables estimates,
the accompanying standard errors are not correct. The correct standard error of the instrumental
variable estimator of {Jk is estimated using equations (10.4)-(10.6) restated next:

(10.4)
272 Chapter 10

where (10.5)

(10.6)

In equation (10.6), /Ji, /12, /13 and /14 are the least squares estimates from equation (10.8).

Go back to your mroz data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any. This is identical to what you
have done in Section 10.2.lb except for the fact that you now retrieve the information needed
from your Stage 2 LS Estimates 2 IV worksheet instead of your Stage 2 LS Estimates 1 IV
worksheet.

AT AU
1 wage equation IV estimates using 2 SLS, 2
instruments
2 IJi-hat1v = IJi-hatstaPe 2 LS= ='Stage 2 LS Estimates 2IV'!B17
3 fh-hat1v = fh-hatsta11.e 2 LS= ='Stage 2 LS Estimates 2 IV'!B18
4 lh-hat1v = lh-hatsta11.e 2 LS= ='Stage 2 LS Estimates 2IV'!B19
5 fJ4-hat1v = fJ4-hatstal!e 2 LS= ='Stage 2 LS Estimates 2IV'!B20

AW
2
1 e-hat1v
2 =(AA2-$AU$2-$AU$3*AB2-$AU$4* AC2-$AU$5* AD2)1'2 (10.6/

Copy the content of cell AW2 to cells AW3 :AW429.

AT AU
7 N= ='Stage 2 LS Estimates 2IV'!B8
8 K= ='Stage 2 LS Estimates 2IV'!B12+1
9 O'-hatstaPe 2 LS= ='Stage 2 LS Estimates 2IV'!B7
10 se(l}i-bat)sta!!e 2 LS= ='Stage 2 LS Estimates 2IV'!Cl7
11 se(82-hat)sta11.e 2 LS= ='Stage 2 LS Estimates 2IV'!C18
12 se(lh-hat)sta11.e 2 LS= ='Stage 2 LS Estimates 2IV'!Cl9
13 Se{B4-bat)staPe 2 LS= ='Stage 2 LS Estimates 2IV'!C20
14 a-hat1v = =SQRT(SUM(AW2:AW429)/(AU7-AU8)) (10.5)
15 se<8i-hat)1v = =(AU14/AU9)*AU10 (10.4)
16 se(lh-hat)1v = =(AUI 4/AU9)*AUl1 (10.4)
17 se(fJ3-hat)1v = =(AUl4/AU9)*AU12 (10.4)
18 se(IJ4-hat)1v = =(AUl4/AU9)*AU13 (10.4)

The result is (see also p. 416 in Principles ofEconometrics, 4e):


Random Regressors and Moment-Based Estimation 273

AT AU AV AW

1 wage equation IV estimates. using Z SLS, 1 instrument eo-l!mtr/


2 �1�haUV-= Prhats.tage 2 L S = O_.Q481003 ()_0(}0285

3 13-2-hatlV-= Prhatstage 2 LS= 0_06139'663 0-428665

4 f!.3-hatrV = P,yhatstage 2 LS= 0. 0441 t 039 0.0 2356

5 !}4-hatrV = llo1:"hats.tage 2 LS= -O.CIOIJ89897 {U56i35S

5 0-123535
7 N =<L8 0.085834
8 K=4 0-507%1

9 a:-haisra�t 1L'S= 0_7{}745627 -0Ji889'8

10 : s = 0.41975647
se(llr'l'lat)J:bl$�1i (l328Hl7

11 se(IJrhat)sta:go 1 LS= 0_103296236 -0_0:52398

12 s e(li:i.-l'lat}ru,!• 1 LS = 0.01 0843 0.0245 8

13 S e(�.i,-Jlat)m,ve 1LS = 0_[}0042118 {)_]286{)9


-�

1.4 a-bat1v = 0_'6747117 o.002sg1

15 se(Jl1: hatJ:rv� 0.4003_808 0. 1 6 701 8

Hi s.e(f..'rhat)I\· = 0_0314367 0_16655

17 se(JJ;i-Eiat)n, = 0.013432· 8 0.56_ _ 8

18 se(�r b: a(hv = 0_00040169 0_019'9ffl

10.3 SPECIFICATION TESTS FOR THE WAGE EQUATION

10.3.1 The Hausman Test

Let us revisit the first stage equation (10.7) from Section 10.2.2a:

EDUC= {31 + f32EXPER + f33EXPER2 + f34MOTHEREDUC + {35FATHEREDUC + v (10.7)

We obtain the residuals v from the estimated reduced form equation (10.7) and insert v in the
original wage equation (10.1) as an additional explanatory variable. We estimate the resulting
equation (10.9) by least squares:

ln(WAGE) = /31 + f32EDUC + f33EXPER + f34EXPER2 + 8v + e (10.9)

Go back to your mroz data worksheet and enter the following labels and formulas.

AY A'Z BA BB
1 educ exper exper2 v-hat
2 =AB2 =AC2 =AZ2A2 ='1st Stage Eq. for EDUC 2 IV'!C28

Copy the content of cell AY2:BB2 to cells AY3:BB429. Here is how your table should look
(only the first five values are shown below):
274 Chapter 10

AV I AZ BA BB
1 educ e"-11·er exper2 v-bat
2 12 14 196--0.756017
I
3 5 25 0.26644

4 12 1 5' 225;--0_771979

5 12 6 36 0.231317

6 14 7 49 0_[)85385

In the Regression dialog box, the Input Y Range should be AA1:AA429, and the Input X
Range should be AY1:BB429. Check the box next to Labels. Select New Worksheet Ply and
name it Hausman Test for Wage Equation. Finally select OK.

• -- ����� ----�
----===-- �--

R!!g r ession 121�


Jnput
Input Y. Range: I $AA$! :$AA$429 [rti]
Input � Ra�ge: I $AV$! $88$429 i1J
t!elp
0 �abel;· D Con;tant i5 Zero
D Conlidence-Level: EJ'% -
Output op.tions
0 Qutput Range: �1
@New.Worksheet !'.Jy: J t for Wage Equation! I

The result is (see also Table 10.2 on p. 422 of Principles ofEconometrics, 4e):
A. I B I c I D I E. I F I G I H I I
16 Coefficients Standard Error t Sfa! P-valu� Lowe195% Ue_e_er95-% Lower95.Q% Uppe195 0%
17 Intercept 0.048100}��! 0.39457.5257 0.1219()4001 0.9030'32937 -0.7274 T2Q56 OJl236 72661 -0. 727472'056 Q;_ 8:2'36 72.661
1s educ 0.051396fr28 0.030984942 1_9§14�8ss<1 o.0481 s234.5 0.0·00493' 0.12230-0256 0.000493· o t22Joo2ss
_

0.0441 ! 0394 0.013239447 3.336271785 o_ooon4014. 0.01814?091 lt0.7019'3692 0�0701936:92


Ji expel.
20 exper2 -0_()0089897 OJJ o Q�,95 913, -2�270&Z255 0.023&71908 -0.001&77172 -0. 000120767
0_(}18147097
-0.001677172 -O.ll0012CJ7·G7
-21 vchat 0_058166&12 0_034807276 1_67110499� 0_095440554 -OJJ102:50148 0.126583313 -(}, 010250148 0·.1265833-l3

We have outlined the p-value of the t-test of interest to us in the above table. The coefficient of
the reduced form residuals is significant at the 10% level of significance using a two-tail test.
While this is not strong evidence of the endogeneity of education, it is sufficient cause for
concern to consider using instrumental variables estimation.

10.3.2 Testing Surplus Moment Conditions

For the wage equation (10.1), restated below, if we use MOTHEREDUC and FATHEREDUC as
instruments there is one surplus moment condition.

ln(WAGE) = {11 + {12EDUC + f13EXPER + f14EXPER2 + e (10.1)

We obtain the residuals e1v from the JV estimates for equation (10.1), as we did in Section 10.2.2:

(10.6)
Random Regressors and Moment-Based Estimation 275

We then regress the residuals eIV on all available exogenous and instrumental variables. In other
words, we estimate the following equation:

(10.10)

Finally we use the R2 from the estimated equation (10.10) and run a Lagrange multiplier test.

Go back to your mroz data worksheet and enter the following label and formula.

BD
1 e-hat1v
2 =AA2-$AU$2 -$AU$3 *AY2-$AU$4 *AZ2-$AU$5 *BA2

Copy the content of cell BD2 to cells BD3:BD429. Here is how your table should look (only the
first five values are shown below):
BD

1 e-b.af1v
2: -0. 016894

3 -0_ 6 5 4726

4 026899

5 -0.915390

In the Regression dialog box, the Input Y Range should be BD1:BD429, and the Input X
Range should be AC1:AF429. Check the box next to Labels. Select New Worksheet ply and
name it IV Residuals Regression. Finally select OK.

------ ----

' Regression [1] �


Input
OKL\W
lnput 'i.. Range: $BD$1 :$80$429 �
Input� Range: I $AC$1 :$AF$429 �
Cancel
I
t:felp
� labels 0 tonstant is f'._ero
D Ccn[idence Level: EJ%
Output cptipns

0 Q.utput Range: �1
.0 Ne'l-I Work.sheet e_ly: I·iduals Regression I

The result is:


276 Chapter 10

A I B I c [} I I: F I
I
G
I H
I I
I
1 SlJMMARY OUTPlJT

3 I Reg1ession Sta:tistics r
iM"IHpl• RR'Square
A dj uste d ll Square
0.02,9721113

0. 00(f8:83 34'5
·0.()1)8564567
:Standa rd l:rror 0. 67.5 2_1{)346
Obs.ervati ons' 428

-ioiANOV'A
n J rJf SS MS F Slgnijfcan:ce F

12
-
Reg'iessi•on 4 0.1106()3174 10.042525794 0. 0934%273 0.9S44'95193
-
13 Residual 423 192.849�U8 '0.455909011
J.4 To1al 427 1.93.!Y20014'.9 �
15 J
�J Cor?fffcients Stando rd Errnr t Stat P-value Lower95% Upper 9.5% [OW.er!l.5.0% Upper 95.0%
17
-
lnter.cepi 0. 010'9 64!064 0.141257100 Q.0775177!!5 0.9-38168795 -0.2'66589202 Qo.28851733 -0. 266689202 0.28251733
;
18 e.xper ·1.83:148E·05 O.Or13.3 29147 -•0.0013755:19 0.998°903127 -0.026217945 0. 026181276 -0.{)26217945 0.()2.fi!.81276
I
19 expN2 7.34B'JE-07 0.<>003 9'84"91 -0 .001842295 0. 9'9.853-0931 -O.GG07S2536 1).000784004 -
O . M07825 3 6 O.GG0784004

-
20 mothereduc
--
-0. 0056ll'6533 0.011&&6447 -
0 . 55580 � 878 0.57.8638784 -0.029970389 1).016757323 -0.!029 970 3 89 0.01675 7323'

21 1fatheredU1c 0.0()5782258 0.()1117855:& Q.517263334 0.605242715 -0.()1519018 Cl.027754595 -O.Q1619018 0.02 77Ji4696;

The results we are going to use for the Lagrange multiplier test are highlighted in the above
summary output.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.

. U laQranoe Multiolier Testl/t;J\


\ lnrntWorkshe-et [Shift-FU]�

In it copy the Lagrange multiplier test template you created in Chapter 9.

We replace the following reference: [POE Chapter 9.xlsx]Auxiliary Regression by IV


Residuals Regression.

A B c
1 Data Input N= ='IV Residuals Regression'!B8
2
2 R = ='IV Residuals Regression' !B5
3 a=
4 m=
5
2
6 Computed Values x -critical value= =CHIINV(C3,C4)
8 Lagrange Multiplier i= =Cl *C2
Test
9 Conclusion= =IF(C8>=C6,"Reject Ho","Do Not
Reject Ho")
10 p-value= =CHIDIST(C8,C4)
11 Conclusion= =IF(C10<=C3,"Reject Ho","Do Not
Reject Ho")
Random Regressors and Moment-Based Estimation 277

At a = 0.05, and with 1, the result of the test, found in the Lagrange Multiplier Test
m =

worksheet is (see also pp. 422-423 in Principles of Econometrics, 4e):

D.ata Input
IR2
=

0.000883
lit= 0.0·5

Comput•ed Values value= 3.841459

Multip1lier Test x2 = 0.378071


Conclusion = Do Not Ho
10 p-value =

Goncluslon =Do Not Reject Ho

Note that the difference between the x2-statistic value reported above and the one reported on p.
422 of Principles ofEconometrics, 4e is due to rounding number differences.

-
A B I G I D
1 N= 428
2
1-

Ii
4 m= 1
'5
1--,-

1-
6 ('-oitical --·

7
1-
B Lag,rang;e
�eject
1-
9
1-

l
0.5386_3"7
1-
,1
CHAPTER 11

Simultaneous Equations Models

CHAPTER OUTLINE
11.1 Supply and Demand Model for Truffles 11.2 Supply and Demand Model for the Fulton Fish
11.1.1 The Reduced Form Equations Market
11.1.1 a Reduced Form Equation for Q 11.2.1 The Reduced Form Equations
11.1.1b Reduced Form Equation for P 11.2.1 a Reduced Form Equation for lnQ
11.1.2 The Structural Equations or Stage 2 11.2.1b Reduced Form Equation for lnP
Least Squares Estimates 11.2.2 The Structural Equations or Stage 2
11.1.2a 2SLS Estimates for Truffle Least Squares Estimates
Demand 11.2.2a 2SLS Estimates for Fulton Fish
11.1.2b 2SLS Estimates for Truffle Supply Demand

In this chapter, we estimate simultaneous equation models where there are two or more dependent
variables that need to be estimated jointly. Ordinary least squares estimation is not possible when
we are dealing with more than one equation. For example, to explain both price and quantity of a
good, we need both supply and demand equations which work together to determine price and
quantity jointly.

11.1 SUPPLY AND DEMAND MODEL FOR TRUFFLES

Consider the following supply and demand model for truffles:

(11.1)

(11.2)

where Q is the quantity of truffles traded in a particular French market-place, indexed by i and
measured in ounces. P is the market price of truffles and PS is the market price of a substitute for
real truffles, both are measured in $ per ounce. DI is per capita monthly disposable income of
local residents, measured in $1,000, and PF is the hourly rental rate($) for a truffle-finding pig.

278
Simultaneous Equations Models 279

11.1.1 The Reduced Form Equations

Consider the following reduced form equations for the supply and demand model for truffles:

(11.3)

(11.4)

Open the Excel file truffles. Save your file as POE Chapter 11. Rename sheet 1 truffles data.

11.1.la Reduced Form Equation/or Q

We first estimate the reduced form equation for Q (equation (11.3)).

In the Regression dialog box, the Input Y Range should be Bl:B31, and the Input X Range
should be Cl:E31. Check the box next to Labels. Select New Worksheet Ply and name it
Truffles Reduced Form Eq. for Q. Finally select OK.

: Regres-sion ��
.Input
lnput l R. ange :

lhput � R.ange: I
SB.Sl:S8p1

S.CSl:SE531


� I

t:Jelp
0�abefo 0 Con� tant is ;?;ero
0 Confidern:e Level; E=:J%
Output options:

O Q.utputRang e : �I
@New Worhheet E'ly: I [ed Forrn Eq, for QI I

The result is (see also Table 11.2a p. 456 in Principles ofEconometrics, 4e):

A I B I c I D I E I F G H I I l
1
--r
SUMMARY OUTPUT
� l
3 Regress1w Sfotrslios.
-
4 Multiple R 0 8350$646°2
5_ R Square 0.69'7386101
__£___ A.dju�led R Square 0 662469112
7 Standard Error 2 630084498
T Obse�iations 30

�ANOVA
....

I
11 I df SS MS F StnnificanGe F
-
12 Regressio11 3 ,j,30 3826319 143.4603?73 �7268759 63317- 1E-0 7

t
-
__1l. Residual 2.6 186.7.541753 7.18285291 G
14 Total 29 611 .1 3 6a or r I
16 I
16 I Coefficfe11ts SfMda-rd Error I Stat P-•talue lower95% Uooer95% Lower 95.0% Upoer 95.0%
17 lnter;cept 7 .89�100328 3.24342132:5 2 4 34188944 0 [122099332
_ 1.273152378 14..56204828 1.228152378 14.56204828
18 fJS 0 66540201·1 0.14;2.537596- 4 .605114.937 9.6316GE:o5 C}534·11J9 0.949392232 0 _ 3 6341179 ll .949392:232
19 di 2_ fo7156o7s 0. 70047J729 3.[)93843417 0.004580302 0.1273 11721 iGii7ooo43S 0 72731'1721 3.607000435
Toi�f -0.50 698.239:2 0..12.1261645 -4.1soa96.549 o . odo�291ia r -
C 7562'392 7
L -0�25 ni5!514 .:-a.1�r623921 -tl 257725514
280 Chapter 11

11.1.lb Reduced Form Equation for P

Next, we estimate the reduced form equation for P (equation (11.4)).

Go back to your truffles data worksheet.

In the Regression dialog box, the Input Y Range should be Al:A31, and the Input X Range
should be Cl:E31. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it Truffles Reduced Form Eq. for P. Finally select OK.

,.. - - .
: Regre5sion [1'J LR]
1ilput
lnputr Range: I SASl:SAS.3'1 � Ca171(el
Jnput);;Range: SC Sl: 'SE-SJ-i �
!jelp
�labels D Constant Is !('.ero
D Con�den-ce level: � e'. . ,

Output option:s
0 QutputRang�: �1
@New Ww.sheet�y; Iced Form Eq. for Pj
I
0 Ne•11 \l'Lorkbook
Reslduafs
� &esidual�. D Resfgpal Plotli

The result is (see also Table 1 l .2b p. 456 in Principles ofEconometrics, 4e):

A I B I c I D I
E F I G I H I I
,__L SUMMARY"()ijfpLiT I I
2

3 I ,C?eg_ress1on SlatislJCs
,__±___ f\llu_ltiple R 0_9427'00058'
s R Square
'6 i'.-;d)ust_ed R Square
0.88Bfi83399
o _ sisa3·9 1 15
1
T Standa rtl Error 659.74855"16
8- Observations 30
I
1o-1N�OVA. �- ---+ �
11 I I rJf SS MS F S{I£.nificarrce F -
1:2' Regression 3 9034 775536 3011.591845 6�.18934538 1-596.7'1 E-12

f
,4
T3
f--
Residual 26 ·1131_(;9.7193 43 52681513 '

14 Total 29 1 0 1 GG. 4?273 I


15
1&1 Coeffie-1e1i'l-s StaMard E1ror t Stat P-va!1.1e !.OW'ei' 95% Upf!_et 95% Lo111er 95. 0% Up11_et 95. 0%
7 .984235283 -4_072076912 -48.92425066 -l6 .. 1005S9G5 -48.9?4250�6 -lG .. 1 0058965
JL /11tercep1 -32:51242016 0_0(1038.7308
J.!L p_s -- 1. 708147148 0.3501380625 4- 368171757 4 75902E-05 0 9SG901iG1 2-429.392:594 0.. 986-901 T01 2-429392.594
dl 7.602492026 1 72433'5664 ,(4(Jg9397j r, 0-.000159932 4. 058 0 69342 1114591471 4,053069342 1114691471'
1J1.
w i353905&95 o.29ssrifi23g 4.5"356026&5 0 !l 0 0{1452 3 ri 740317338 1 967494052 D.740J17.338 1 967494052
I Pf

�-
21
2i�
� RESIDUAL OUTPUT
� -- - ----1 ---+
2S
2(} Observaf<ati Predicted(! Rec;1d1J!Efis
27
-- , 31 8 3 114 0532 -2 19040631!) �
t
28 2. 40_46577233 -0 235772328
29 -3 38.50107611 -3.791077113 � I
Simultaneous Equations Models 281

11.1.2 The Structural Equations or Stage 2 Least Squares Estimates


We obtain the predicted values Pi from the estimated reduced form equation (11.4) and insert
them in the structural demand and supply equations (11.1) and (11.2) to replace the Pi values.
Then, we estimate the resulting equations (11.5) and (11.6) by least squares:

Demand: (11.5)

Supply: (11.6 )

11.1.2a 2SLS Estimates for Truffle Demand

Go back to your truffles data worksheet and enter the following labels and formulas.

G H I
1 p-hat ps di
2 ='Truffles Reduced Form Eq. for P'!B27 =C2 =D2

Copy the content of cells G2:I2 to cells G3:I31. Here is how your table should look (only the
first five values are shown below):

G I H I I
_j_ p-hat p:s di
2
-
31.83041 1�L97 2.. 103

-
3 4.0_46517 18.04 2'.043
4_ 36.5'01 OB 22.36 1.87
5 39 03302 20.87 1}525
-

6 40-44901 19.79 2'..709

In the Regression dialog box, the Input Y Range should be Bl:B31, and the Input X Range
should be Gl:I31. Check the box next to Labels. Uncheck the box next to Residuals. Select New
Worksheet Ply and name it Stage 2 LS Demand for Truffles. Finally select OK.

Regression
[1]�
lnput
Input'.!'. Range: ls:i51:sss:n &i
Input!Range: I sGs1: s.153 i �
tieIp
06.abels D Cmsfimtl'!. :1:_ero
D Coo:!jdence LeveJ-: EJ � •.

Output options

0 Qutput Range: �1
G New Worksl:Jeet f:ly: l.cmand for Truffles I

The result is (see also Table l l .3a on p. 456 in Principles ofEconometrics, 4e):
282 Chapter 11

=
A I B I c I D I E F I G I H I I
1
f--
SUMMARY OUTPUT I

2
31 Reoress1or; Sfo'li sires

� Pl/lultifJle R 0 _635096462
.0 R SqL1are _b.69BW>101
r--
6 Adjusted R Squ a re
-
_o 56?45911-2 I
7 's!ai1dard Error 2'.G80084498·
T Observ.ations 30
9
1f AN OVA
t-
11 di SS MS f S•g_mficance F

g Regrnssism 3, 430.3826319 14'.L4608?73 19.97268759 -6_33171E-07


-- - -
13 Residual 26· 186.T5 4 1 758 7 182852916
14

fo!al 291 617 136'8077 I
15
16 Coefficrenis Sia1;dard E>Tof' I Slal P-va/11,e Lc.>1·1er95% Ua,;er95% lo.:·er 95. 0% Ur>.c.>rtr 95. 0% .

J1. h1t�rcept -4,279473279 3.013833T48 · 1 41994338 0.167504529 ·10.4744972·1 1 915550652 ·10.47449721 1-915550552'
18 p-ha1
,____
_
-0.374459'152 o.OB9SG43i1 -4.186s96549 o.ooo29'1is1 -o sss561-isg -0�·190]67055 -0_5.585f.fr2'59 -0_ i 9(ffs70r,5.
19 ps 1 296033361 0 1930_944.29 6 11914817 4_02 07E.-07
_
-0 899122:081 1 69294464 0. 899122031 1.69294464.
Ta di 5 . 0 1 397887 1 1 . 2:41414409 4 038924337 0.000422352 2 462215032 7 56574271 2.4 62215032 7 .56574271 .

Note that while using this two-stage least squares approach yields proper variables estimates, the
accompanying standard errors are not correct. The correct standard error of the variable estimator
of ak is estimated using equations (10.4) and (10.5) restated below:

(10.4)

where (10.5)

and (11.7)

where al, a2 'a3 and a4 are the least squares estimates from equation (11.5).

Go back to your truffies data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any.

K L
1 Demand for Truffles, structural equation
or IV estimates usin2 2 SLS
2 ai-bat1v = a1-hatsta2e 2 LS= ='Stage 2 LS Demand for Truffles!B17
3 az-hat1v = az-hatsta2e 2 LS
= ='Stage 2 LS Demand for Truffles'!B18
4 a3-hat1v = a3-batsta2e 2 LS= ='Stage 2 LS Demand for Truffles'!B19
5 a4-hat1v = a4-hatsta2e 2 LS= ='Stage 2 LS Demand for Truffles'!B20

N
2
1 e-hat1v
2 =(B2-$L$2-$L$3 *A2-$L$4 *C2-$L$5 *D2)"2

Copy the content of cell N2 to cells N3:N31.


Simultaneous Equations Models 283

K L
7 N= ='Stage 2 LS Demand for Truffles'!B8
8 K= ='Stage 2 LS Demand for Truffles'!B12+1
9 O'-hatsta!!e2LS= ='Stage 2 LS Demand for Truffles'!B7
10 Se(a1-hat)sta1?e2LS = ='Stage 2 LS Demand for Truffles'!Cl 7
11 Se(Ui-hat)sta!!e2LS = ='Stage 2 LS Demand for Truffles'!C18
12 Se(U3-hat)sta!!e2LS = ='Stage 2 LS Demand for Truffles'!C19
13 Se(U4-hat)sta!!e2LS = ='Stage 2 LS Demand for Truffles'!C20
14 a-hat1v = =SQRT(SUM(N2:N31)l(L7-L8)) (10.5)
15 se(arha01v = =(L14/L9)*L10 (10.4)
16 se(a2-hat)1v = =(L14/L9)*Ll 1 (10.4)
17 se(arhat)1v = =(L14/L9)*L12 (10.4)
18 se(a4-hat)1v = =(L14/L9)*L13 (10.4)

The result is (see also standard errors estimates m Table 11.3a on p. 456 of Principles of
Econometrics, 4e):

I K I L I M I N
1
-
Demand for Truffles. slruc.lural equation or IV estimates using 2 SLS e-hat,..,?
2
-
hat1v = 1:11-hats·ta.ge 2 L�
a. 1 - =
-4 ,2 7947 1 340364
3 C1<-hat1v = Clz-hats1age 1 LS= -0_37'446 1_537591
4 a:3-hat1v = Cl:;i-hat,1,.g• ns =· 1 296033 2_ 156431
-
5 C14-h at1v = C14-hat.tage ·z LS.= 5.013979 4.967461
6 57_50169·
7 N-'-
= 30 65.88'19
8 K= 4 23_92123
9
-
u-hats1ag;e 2 u = 2 G30084 12.05736
10 se(ct1-hat}s1.a,ge 1 LS= 3 01:3834 17-75874
11 se(•cr;i--hath,1,g• l LS=· 0.089564 7_303551
12
-
se(u:i.-halhtag• i LS.=· 0.193094 30.53484
13 se�u..i-hat).i.,9�as =· 1.241414 5.9·8552
14
-
·<Y-hat111 = 4.92.996 6·.781699
15

se(a.1 -hatJiv = 5 543885 0.6165'1&
1(} se·(a.2-hat!tv = 0 164752 3. 568511
17 se{C13-hat)tv = 0_355193 13 7012
18- se(a..i-hat)11r =· :2 283556 32.17105

11.1.2b 2SLS Estimates/or Truffle Supply

In your truffles data worksheet, enter the following labels and formulas.

1
2

Copy the content of cells P2:Q2 to cells P3:Q31. Here is how your table should look (only the
first five values are shown below):
284 Chapter 11

p Q
p-hat yf
2_ 31 83041 10.52
1 40.46677 l9_i57
4 38!i0108 13_74
5 39.03302 17.95
6 4()_44901 H7 1

In the Regression dialog box, the Input Y Range should be Bl:B31, and the Input X Range
should be Pl:Q31. Check the box next to Labels. Select New Worksheet Ply and name it Stage
2 LS Supply for Truffles. Finally select OK.

I Regression tI} l8J


[hput

Input'!'_Rar.ge:

!liput X Range:
I!'llsi, !!ll 5.3j
ISPs1:�sJ·1
[�]

� I

tieJP
�h.al:lels. D constanti� �ero
D Go fiden(f Level: @=!%
output options
0 Qutput Range: �1
0 Mew Wor!<sheet�ly: 12 LS Supply for Truffles I

The result is (see also Table 11.3b on p. 457 in Principles ofEconometrics, 4e):

�SUMMAA$-oufPUT I B

f
I c I D I E I F I G I H I I

31 Re_qreo:sion Slaliotios
,_!L Mu_ltiple· R 0. -.0320884-9"1
� . .
__§_ R Square _q _69_ 23] g57_
,__§__ Adfuste-0 R Square 0.66358394-3-
' Stan-dard Error 2-!iS.1687234
1- -
8 0 bserYati o ns 3 0·
2-
10 AN OVA
11 cff SS MS Sio11ific8'1iCe f
r:
I

J_,?_ Regression 2 427 2"877876 213.643893(} 30 38406587 1 22564E-Oi'


13 Residual 27 189_8490201 7 031445189
14 Tot al -29 617 1368077
I
16 I
18 Coefficients Sli311dacd E,rroJ· I Stat P-1<b/ue Lo w er 9ei% Upper9511i LDIVN95_0% !Jpper 95_ (J% I
�w lnte�cept 20_0328027-9 9 2§Q04c285 7
2.·165698376 7.35757E-10 15.58915682' 24.4_?64-4B75 15.58915582 24 476-44875
18 p-hat o _ 3379815s-1r----o _ o44 12361 7-6598 7 9835 io7433E�o8 o:-2474..f7-384 0.4-26�15723 024744)'384 o .42ss1� n:ll
19 I Pf -1 000909364 0 14612742:9 -G 3495 G 52-73 2 .33394-E-07 -1-30073 8 079 --0-701080649 -1.3ao 1.3so79 -0 701 OS-0649

Again, note that while using this two-stage least squares approach yields proper variables
estimates, the accompanying standard errors are not correct. The correct standard error of the
variable estimator of {Jk is estimated using equations (10.4) and (10.5) restated below:

(10.4)

where (10.5)
Simultaneous Equations Models 285

(11.8)

where p1, Pz and p3 are the least squares estimates from equation (11.6).

Go back to your truffles data worksheet and enter the following labels and formulas. In the last
column, you will find the numbers of the equations used, if any.

s T
1 Supply for Truffles, structural equation
or IV estimates usin 2 SLS
2 1-hatsta e 2 LS= ='Sta for Truffles!Bl 7
3 z-hatsta e 2 LS= ='Sta for Truffles'!B18
4 3-hatsta e 2 LS= ='Sta for Truffles'!B19

v
2
1 e-hat1v
2 =(B2-$T$2-$T$3*A2-$T$4*E2)"'2

Copy the content of cell V2 to cells V3: V31.

s T
6 N= for Truffles'!B8
7 K=
8 O'-hatsta e 2 LS=
9 1-hat)sta e 2 LS= for Truffles'!Cl 7
10 2-hat)sta e 2 LS= for Truffles'!C18
11 se 3-hat sta e 2 LS=
12 O"-bat1v =
13 se(P1-hat)1v =
14 se( z-hat)1v =
15 se rhat)1v =

The result is (see also standard errors estimates in Table 11.3b on p. 457 of Principles of
Econometrics, 4e):

• s I T I u I v I
1 SuJJpJyforTrufles, structural equation or IV estimates using 2 SLS e·-hativ2
I-
2 11,-h.ativ 1'1-hat,to�e i LS= 20_0328 0_ 13:6153
I-
=
-

3 fu-h·ativ =
1!2-h at,tage z LSi = 0.33798Z 0_813448
4 113-h ativ = �-hat�taye z LS =
-1.0009
· 1 2_554734
-

5 1-12'5603'
6 N= 30 3-234284,
L K= 3' 2.921237
8 U-hat.tage ?_LS= 2.1>51687 0_6G68!t8.
9 se(ll,-hat�<toge i u = 2.1656·96 0.1>1213

� se(ll2-hat>.1age? LS= 0.0'44124 0.14584

I-
11 :se(l\3-haf�,.tage z LS = 0.146127 0_416504
12 U•-h.af1"= 1.497585 0_06.28!19
...__,_

I-
13 se,(frhatl1y = 1.22_3115, -
0.507986
14
-
se(IJ>?-hat�LV = 0.02492 0_65358'1
15 se�IJ.J;-hat>1v = 0.0'82528 4_57183.5;
286 Chapter 11

11.2 SUPPLY AND DEMAND MODEL FOR THE FULTON FISH MARKET

Consider the following supply and demand model for the Fulton fish market:

a1 + a2ln(PRICEt) + a3MONt
Demand: ln(QUANt) = (11.9)
+a4TUEt + a5 WEDt + a6THUt + ef

Supply: ln(QUANt) = (11.10)

where QUAN is the quantity of fish sold, in pounds, and PRICE is the average daily price per
pound. The subscript "t" is used to index daily observations collected over the period December
2, 1991 to May 8, 1992. MON, TUE, WED, THU are dummy variables for the days of the week;
they capture the day-to-day shifts in demand. STORMY is a dummy variable indicating stormy
weather during the previous 3 days; this variable is important in the supply equation because
stormy weather makes fishing more difficult, reducing the supply of fish brought to market.

11.2.1 The Reduced Form Equations

Consider the following reduced form equations for the supply and demand model for the Fulton
fish market:
rr11 + rr21MONt + rr31TUEt + rr41 WEDt
ln(QUANt) = (11.11)
+ n51THUt + n61STORMYt + Vti

(11.12)

Open the Excel file fultonfish. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 11 in one file, create a new worksheet in your POE
Chapter 11 Excel file, rename it fultonfish data, and in it, copy the data set you just opened.

fultonftsh Jl.ata ,,.


l!miert Workshel.'t (Stlift•li'lll I

11.2.la Reduced Form Equation for lnQ

We first estimate the reduced form equation for ln(QUAN) (equation (11.11)).

In the Regression dialog box, the Input Y Range should be Dl:D112, and the Input X Range
should be El:I112. Check the box next to Labels. Select New Worksheet Ply and name it Fish
Reduced Form Eq. for lnQ. Finally select OK.
Simultaneous Equations Models 287

_-- - .
Regres.sio111 11Jl:EJ
Input
InputfR<.111ge:

Jnput � Rarige:
1.;D l:SDSl 12
I SES.1:51&112
�]
[�
� I

!::!_elp
�Labels D Constant ls f_ern
D Conjjdence Level: EJ �/[!
Output options

0 Quti::n.Jt R;:ing�:. �1
@Nei\' Worksheet eJy: I1 Form Eq, for lnQ
I

The result is (see also Table l l .4a p. 459 in Principles ofEconometrics, 4e):

A El I c I D I E I F I G I H I I
., SUMMARY OUTPUT I
,_

2
J Regmssron Slatislics
4 l\llultipJe R 0.43974D44�
f--
_§__ R ?'quare Q 19�:).7165.9
.l f+:cijus.te.d R Sql1are 0 1549{)0786
7 Standard Error 0.681789555
>--- 0
s Observations 111

�1ANOVA
11 df SS MS F Sfg_nificance F

J_?_ Regre;ssion 5 1'1 700632.33' 2340126456 5 034294777 0.000356107


13
14 Residu·al 105 48.80788468 0.464836997
Total 110 60.50651701 I
15
1---�,
161 Coelfrc1errfs Standard Eaor i Sia/ P·1<alue Lower·'.95% Ue,oer95% Loi;·er 95_ 0% UB_'f!_er 95. 0%
17' lnte.rc.�pt 8.810062929 0.147024396 59.;9224 5519 5.06767E-B3 8.518540714 9 . 10 1 588143 a .s_rn_�4.CJ714 9.101585143
>--- - - - - ·�

18 rnbn 0 101005158 oiuG5o2'523 OA8!lt23"11· 0.62'5774713 -a 3os"'-'!s'rns o_s 1·a4Gt5.o3 -0.308451188 0.51046151)3
r---- - -· - --···

19 tue -0 48470.6164 o.2o 114ss£a' -2 4 0 97246 8 4 0 017704 4 0 6,


- . -o ss3541253 -0.085871075 -0.883541253 .:a_oa587107s
To
r----
wed .1)_5531.139� 0 . .20580G-O 11 ·2.6Sl54 988 8 0_,_0 0 8371 018 .().9511892'13 .o_ 145038632 -0.96118921� -0_ 145038632
21 thw 0.053593174 0.201048171. 0 . 267 065417 0 789942623 .() 344 949389 0-4523357�7 -0.344949389 0_4s233..? !F
22 stornw -0.387772227 0.143730552 -2.697910927 o .o·o 81:m13 -0.672.7'63353 -0. 102781101 -0.672763353 -0.1027131101

11.2.lb Reduced Form Equation/or lnP

Next, we estimate the reduced form equation for ln(PRICE) (equation (11.12)).

Go back to your fultonfish data worksheet.

In the Regression dialog box, the Input Y Range should be Bl:B112, and the Input X Range
should be El:I112. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it Fish Reduced Form Eq. for lnP. Finally select OK.
288 Chapter 11

'
Regression ��
lnput

I11ptit '!_ Range:

Input :-;_Range:
I S1:51lS112.
sE

I g:<;1:SIS112

[�I
� I

!]_elp
01,,ab'efs D con�tan!is �era
D Con5� Rce Le11el: 8=] 01. .

Ouiputoptim�
0 Qu:!putRange: rill
®New Workl.ihe�t E'Jv: J 1 Form Eq. for lnPI I
0 New '!f'l_orJcboak
Re;iduals
�Residuals D Re&ictual Plots

The result is (see also Table 11.4b p. 459 in Principles ofEconometrics, 4e):

B I c D I E F I .G I H I I

3 I Re<rl'essw1 Stal'iSl1cs
r±- Multiple R
I
0. 422952732
el R Square 0_-17��8:9065 j
� Adjusted R Squan� 0.139786544
7 Standard Error 0.3:54235114
S Oh'SeP>alions 111 1
i
9
lo ANOVA I
-� 1 �������� -d·t_ ����s_s����M_S����F�� -s�
1 1�
Re!Jrassio11
- .

Residual

·-1ufi
5
105
� ce
_ c_a_ � F-
?-8_70_:17�668'
13 1755642,
0_574095934- -1.575.10696_3,
0.125482516
O.OUO!l155B!l
j
j
Total 110 tfi_(}46l4387 ,
15I
161 Goeffic;ients Sta.1dard Error t Stat P-value lower95% Upoe.r95% lo1�er'95. 0% Uaoor95.0%
JI l11terc_eP-t_ 0 27170:5457
- . 1)_07638$9 74 -3. 556$6747 0.0Q0:5€4G33. -0.4.231701317 -0. 12024023 7 -0.4£3170677 0 1:2024 023 7 -
.

1B mon -0 1 �29:2248 0 107291'.8,24 -1_052479825 0.29499553.2 -0.325662341 ti 099&17381 -o 32 51>62341 0.099B1i3B1


0.694570871 -0.24337014- -0 166072(181 -0_24337074 o_ 16sa12os 1
--

19 illle -0 04114933 0 10450BGa5 -D.3 9374-0767'


20 1;-erl -0.01 il324968 0 1!0592994 -0_ 110536129 0.912155647' -0. 22384 7279 ff200191343 -0.223.847279 0.200't97343
A U1u U. U4 tl04353Z IJ 1U-143!lLJEi U-4 f�;>ti/!13J U.oJ3tiilJ l i� -0.1:0i'4 (31;'!1 U.;l3t:illi!UJ_J -U 1'.lL4f3Wl U.Z��(�f8jj
..

22 stomw 0_34640:55!!4 0 0746776-01 4.63136111183 1.015l7E-05 (j)_ 1983337 0.4941477468 0. 1953337 0 4'34477468
23
24- j
25
j
TG RESIDUAL OUlfPUT
2f
23 I Observation Prerffclf!d !price Resirflillls
1 -9.,'°3�222353 -0.39226.Q.647'
2 0 . 03355i07!J8 -0.033-55()798
3 -02835'3'()425 0.355 85 t425 T

Next, we use the results of the estimated reduced form equation for ln(PRICE) to test the
significance of the daily dummy variables.

Equation (11.12), restated below, is our unrestricted model (for a review of the F-test, see Section
6.1):
rr12 + rr22M0Nt + rr32TUEt + rr42WEDt
ln(PRICEt) = (11.12)
+ rr52THUt + rr62STORMYt + Vtz

Our restricted model is equation (11.13) below:

(11.13)
Simultaneous Equations Models 289

Go back to your fultonfish data worksheet. In the Regression dialog box, the Input Y Range
should be Bl:B112, and the Input X Range should be 11:1112. Check the box next to Labels.
Uncheck the box next to Residuals. Select New Worksheet Ply and name it Restricted Model.
Finally select OK.

:' R_e_g r_es-s.1-


· o_ n ______________ (1)®

Input
fnpt..rt 't_ Range: I sss1: sss112 [iJ
Input )\'. Range: I 5!Sl: 51511.1 (�]
tielp
� �<ibels 0 constant is �er-0
D Confidence Leve:r: EJ o;..
Output op trans
0 QutPl.Jt Range: �1
0 New Work:;heet �ly: I Restricted Model I
The result is:

'IARYA_OUTPUT I I I I I I I I
B c D E F
-HSUMr..
j -- ____ __ ---
G H
L

JI ,�f,[J_tess1cn Sl8lisl:Jcs
0)99416%9
Square
�""'""""
R
Square
Q_ 15953;1915:
fl.djusted R 0 1518.23217'
--nst.a_ndard Error_ o .. 351748447
$ Obse!'latians 111

9�
1 o�A�JOVA

�8egmssi on
I

11 1 df SS MS .,c
Srfl_nif1c1mce F
20.6899446 1 4()774E-05
Residual
2.5"599[14.152 2 . 5599-04.152
13 0.1237269>7
Total
109 l3.48G23J71
14 110 1 li.04514387 I

1"51
16"1 Uef!.er 95'%
�lritercept
Coefiicie.nts Standard Error I Stal ,0-value Lower95'% Lower95.0% Uoper950% .
-0. 2,903 3336 7 0.039574 792 ·7.335J20702 4_13476E·'11 ·0.358769316 -0.2118�7418 ·0.368769316 -0211697418
.
1ll stormy 0.:335262367 o.o 7.'.l7o 63s 41i4aii2oon 1.40774E-05 0 18917870.2 0,481346032 !l.1891 787 02 OAB13460.l2,

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it F-test.

J
I Insert Workil'leet [Shift-Fl!)�

In it copy the F-test template you created in Chapter 6.

Replace the following reference: [POE Chapter 6.xlsx]Unrestricted Model by Fish Reduced
Form Eq. for lnP. Also delete references to POE Chapter 6 attached to the Restricted Model
references to obtain the following modified template:

A B c
1 Data Input J=
2 N= ='Fish Reduced Form Eq. for lnP'!B8
3 K= ='Fish Reduced Form Eq. for lnP'!B 12+ 1
4 SSEu= ='Fish Reduced Form Eq. for lnP'!Cl3
5 SSER = ='Restricted Model'!C 13
290 Chapter 11

A B c
6 a=
7
8 Computed Values m1= =Cl
9 mz= =C2-C3
10 Fe= =FINV(C6,C8,C9)
11
12 F-test F-statistic= =((C5-C4)/C8)1(C4/C9)
13 Conclusion = =IF(Cl2>=C10,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reiect Ho","Do Not Reiect Ho")

With 4 restrictions, at a 0.05, the results of the F-test are (see also p. 460 of Principles of
Econometrics, 4e):
I a
-
A
Computed Values
I B
rn- =
I c
4
I D

A 8 c
1 Data Input J = 4
9
-

10
m;=
Fe=
105
2-45821
I
I
2 I� 111
11
=

3 K= 6 -

12 F-test F-statistic =· 0.518763 I


4 SSEu = 13 17566 u .Co11clt1sion =Do r�ot_Re1ect Ho I
5 SSE,,= 13_ 4 3624 14 f)-Value =
0.65011 I
1f
G a= 0.05 Co111::k1sion = Do r�ot Reject Ho
l

The joint F-test of significance of the daily dummy variables hasp-value of 0.65 so that we
cannot reject the null hypothesis that all these coefficients are zero.

This mean that, in this case, the supply equation is not identified in practice, and we will not
report estimates for it in the next section (for more details, see Section 11.7.2 pp. 458-460 in
Principles ofEconometrics, 4e).

11.2.2 The Structural Equations or Stage 2 Least Squares Estimates

We obtain the predicted values ln(PRftEt) from the estimated reduced form equation (11.12)
and insert them in the structural demand equation (11.9) to replace the ln(PRICEt) values.

Then, we estimate the resulting equation (11.14) by least squares:

Demand:
a1 + a2ln(PRftEt) + a3MONt (11.14)
+a4TUEt + a5WEDt + a6THUt + ef

1 J.2.2a 2SLS Estimates for Fulton Fish Demand

Go back to your fultonfish data worksheet and enter the following labels and formulas.

Q R s T u
1 lnp-hat mon tue wed tho
2 ='Fish Reduced Form Eq. for lnP'!B29 =E2 =F2 =G2 =H2
Simultaneous Equations Models 291

Copy the content of cells Q2:U2 to cells Q3:U112. Here is how your table should look (only the
first five values are shown below):

Cl I R s I T I u
1 lnp-ha1 mora tue wed ahu
2 -0.03822
>----
1 0 0 0
3 ()_033551 0 1 0 0
f--
4 -o.2e3s:i 0 0 1 0
5�ll.124346- 0 0 0 1'
6 0.0747 0 0 ci' 0

In the Regression dialog box, the Input Y Range should be Dl:D112, and the Input X Range
should be Ql:Ul12. Check the box next to Labels. Select New Worksheet Ply and name it
Stage 2 LS Demand for Fish. Finally select OK.

-
��ession ------- � [gj
Input
InpcJt y_ R.onge; sDc&i; SDsi 12- �

I SQSl: '1JS112 [�1


!:[el"
[.!] ba'bels D Com; rant;,. i1_ero

DconBdenceLevel: �%
Ou:\pu:t opbons
O-Q.utµut R�nge: �1
G) New Worksheet 8_fy: I 2 LS Demand for Fish I

The result is (see also Table 11.5 on p. 460 in Principles ofEconometrics, 4e):

A I B I G I D I E I F I G I H I I I
_J._ suwi1ARY OUTPUT
2:
3 Reqress;on Stalrslics
-4:_ Mllltiple
R 0 439740445
5 R._Square
�- 0.19337 1 65-9
& Adjusted R Square 0 154960786
t
-

..L Stan·dmd Error


- .... ·- --
0.681739555

8 Ob-servations 111
9
10 ANOVA
11 df SS: MS F S,'911iffcaace ,''
_g_ Re-gression 5 1 1.70063233 2.340126466' 5.0Jfl2:94!77 0_900 356.107 I

Jl Residual 105 46.60766466 oi64°6JG9�71


14 Total 11 0 60.50851701
15
1Ei CoefHcJeafs Sfanda-rri Erro� l S/al P·�a�ue Lower9S% Vooer9o% Lrn�·er 95. 0% U,o.oer 95. 0%
1i' lnterc,ept 8.50 5911279 0 1 6084 6_266 5 2 86224_25,7 1 66961E-77 8 1 86962654 B 624839703 6 166982854 a 824839703-
J_§__ lnJ)-liat -i 1 1 94 1 679 0 4 1 49 1 9848 ·2 5,97910927 0.0081315 1 3, ·1 942i26178 ·
0 -296707403 ·1 942126176 ·0.296707403
man .0:02.5402162 0.207897fa -0. 1221 fo211 0 902965096 ·0.4376'23755 o .38oa1943 -o.4fo32J755 0:38681943
* tue .Q 530769415 0201339958 2 6361 85 1 87 - _ 0 00%55?31 .fl.929 98934 7 -0.131549482 -0.929989347 -0.131549482'
21 we-d -0�5£GJ5099 0-20594?,fa7 -2 rs-rni44429 -
0. �0701
: 8196 .(J.974696-888' .0_ 1580050 9 2 .Q_ 974695 388 -0.158005092
0.589889505
--

22 thu 0.109257351 0.20210127'61 0.54055641 ·0_291462135 0-509996835 0 29 1462 1 35 0.509996835-


·
_

While using this two-stage least squares approach yields proper variables estimates, the
accompanying standard errors are not correct. The correct standard error of the variable estimator
of ak is estimated using equations (10.4) and (10.5) restated below:

(10.4)
292 Chapter 11

where (10.5)

ln(QUANt) - a1 - a2ln(PRICEt) - a3MONt


and (11.15)
- a4TUEt - a5WEDt - a6THUt

where a1, lrz 'lr3, lr4, as and a6 are the least squares estimates from equation (11.14).

Go back to your fultonfish data worksheet and enter the following labels and formulas. In the
last column, you will find the numbers of the equations used, if any.

w x
1 Demand for Fish, structural equation or
IV estimates using 2 SLS
2 arhat1v = arhatsta11:e2LS = ='Stage 2 LS Demand for Fish'!Bl7
3 Uz-hat1v = Uz-hatsta11:e2LS = ='Stage 2 LS Demand for Fish'!B18
4 U3-hat1v = U3-hatsta11:e2LS = ='Stage 2 LS Demand for Fish'!B19
5 U4-hat1v = U4-hatsta!!:e2LS = ='Stage 2 LS Demand for Fish'!B20
6 U5-haf1v = U5-hafsta11:e2LS = ='Stage 2 LS Demand for Fish'!B21
7 a6-hat1v = a6-hatsta11:e2LS = ='Stage 2 LS Demand for Fish'!B22

z
2
1 e-hat1v
2 =(D2-$X$2-$X$3 *B2-$X$4*E2-$X$5*F2-$X$6*G2-$X$7*H2Y'2

Copy the content of cell Z2 to cells Z3:Z112.

w x
9 N= ='Stage 2 LS Demand for Fish'!B8
10 K= ='Stage 2 LS Demand for Fish'!B12+1
11 O'-hatstaee2LS= ='Stage 2 LS Demand for Fish'!B7
12 se(arhaOsta11:e2 LS = ='Stage 2 LS Demand for Fish'!Cl 7
13 se(az-hat)sta11:e2LS = ='Stage 2 LS Demand for Fish'!C18
14 se(a3-haOsta11:e2LS = ='Stage 2 LS Demand for Fish'!C19
15 se(U4-haOsta11:e2LS = ='Stage 2 LS Demand for Fish'!C20
16 se(as-hat)staee2LS = ='Stage 2 LS Demand for Fish'!C21
17 Se(a6-hat)staee2LS = ='Stage 2 LS Demand for Fish'!C22
18 O"-hat1v = =SQRT(SUM(Z2:Zl12)/(X9-X10)) (10.5)
19 se(«1-hat)1v = =(X18/Xll)*X12 (10.4)
20 se(a2-hat)1v = =(XI 8/Xll)*X13 (10.4)
21 se(a3-hat)1v = =(X18/Xll)*X14 (10.4)
22 se(a4-hat)1v = =(X18/Xll)*Xl5 (10.4)
23 se(a5-hat)1v = =(X18/Xll)*X16 (10.4)
24 se(a6-hat)1v = =(X18/Xl1)*Xl7 (10.4)
Simultaneous Equations Models 293

The result is (see also standard errors estimates m Table 11.5 on p. 460 of Principles of
Econometrics, 4e):

"
Demand for Fish, structural nr IV estimates using 2 SLS e-hat1v 2
a1-h atl'.r = a1-h g... Ls = B. 5059'1 '1
1Zll-h'itt1;;r = l!l!2-h ats.ta-9"' .� L5 = -1_1 i9'42 0.071866
4 a3-hatLv =
1 LS = -0_0254
a.t-hatLv = !1.4 �h ats.tai;r.e :n s = �0.53(}77'
a5-hat1,v = as-hat,.t>g@:z u = -0.5'6635· 01.006722
a.G-h = �·-h ats.ta-f):e LS = 0.10>9267 0..347519
G.665611
N= 0.766534
K= ·0.192356
11 2 L_S = 0.10285
sef�-J-.at)_s.tag.e 2 LS = 0.11J.0846 n_oo4593
13 SE! la:i:-hat),.� 2 LS = Qt.334898
- -

14 se(a:i;-hat},.t..� :z LS = 0.2078:97 0.115987


I
w
Se ',! L 5
I =
x
G.20134 I y I z
(}_502584
1 eq_, uationi se(�-hat)s.ta-g:e :z LS =
,_ 0.205H42 G.186604
2' at.,t.. :z" 0.20·2101
0 .001004
0.328638

2.L-? =

3
1B a- h a t1v =
0. 70434:2: 0.00283
OJ-hats.t.ag:e 0241662
,_ se{a:1-hat}1v = 0.166167 2.762907
.s. 0. 1013S6
20
,_ = 0.428645. -- {L323631
-

-
& s-e{11.3 -hat}iv 0.214774 1.000532
=

7 a''tJ. \I
-h at.
?
I iv =
0.2:0& 0.305589
:8·
,_ se{us-hattiv = 0.212755. 0-.002:431
,91 111
Soe{«i;-h atl IV = CL208787 (}_522244
iO _,___
6i
O·-hats.ta-g:e G.681 7 9
12
0!.41492
,_ -·

I-
15'
-
ft!14-hal)s.i:ag;e
16
-
17 _:;e(":_:;-hat}s.t;ao_,g;e
,_

,_
19
,_
se{a2-hatt1v
-
21
,_
22 se{ll.4
-
23
1�
24
-
CHAPTER 12

Nonstationary Time-Series Data


and Cointegration

CHAPTER OUTLINE
12.1 Stationary and Nonstationary Variables 12.2 Spurious Regressions
12.1.1 US Economic Time Series 12.3 Unit Root Tests for Stationarity
12.1.2 Simulated Data 12.4 Cointegration

12.1 STATIONARY AND NONSTATIONARYVARIABLES

12.1.1 US Economic Time Series

Open the Excel file usa. Save your file as POE Chapter 12. Rename sheet 1 usa data.

Below we plot the time series of some important economic variables for the US economy as in
Figure 12.1 on p. 476 of Principles ofEconometrics, 4e.

In cells E2:H3, enter the following labels and formulas.

E F G H
2 �gdp �inf �F �B
3 =A3-A2 =B3-B2 =C3-C2 D3-D2

Copy the content of cells E3:H3 to cells E4:H105. Here is how your table should look (only the
first five values are shown below):

294
Nonstationary Time-Series Data and Cointegration 295

E I F I G I H
2 6gdp 1.inf JiF .AB
--

3 98.9· 0_5 6 0Jl7 1.45


,r:: 59_7· 0.8 0_83 0
--
5 53 0.58 -2.12 -1 54.

!i 83.2' :1- -
0 79. -0.42
7 58.5 -1.27 -0_56 -0,92

Select the Insert tab located next to the Home tab. Select Al:A105. In the Charts group of
commands select Line, and Line again.

ll-D line

:S-catt<r 1Mr
� Cham·
Iii

After editing, the result is (see also Figure 12.l(a) p. 476 in Principles ofEconometrics, 4e):

US GDP 1984ql-2009q4

To plot the change in the US GDP series select cells E2:E105. After editing, the result is (see also
Figure 12.l(b) p. 476 in Principles ofEconometrics, 4e):

.t\US GDP 1984q1-2009q4

You can proceed similarly to replicate any of the other plots from Figure 12.1 p. 476 of
Principles ofEconometrics, 4e.
296 Chapter 12

12.1.2 Simulated Data

Consider the following autoregressive model of order 1 (AR(l) model):

Yt = PYt-1 + Vt (12.1)

where p = 0.7 and Vt are independent N(O, 1) random errors.

Below, we generate our Vt and Yt values, similarly to the way we generated random samples in
Sections 6.6.2, 3 .1.4 and 2.4.4.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.

r-5:1'<:' i1 I simulated d;;ita[/t:I A41


I [Insert w� rksheet JShifM'llJ�I I

In cells Al:E3 of your simulated data worksheet, enter the following labels, values and
formulas. In the last column, you will find the numbers of the equations used, if any.

A B c D E
1 Yo= 0 Vt Yt
2 p= 0.7 =B2*Bl+D2 (12.1)
3 =$B$2*E2+D3 (12.1)

In column D, we generate a sample of 500 random numbers from a normal distribution with
mean 0 and standard deviation 1.

Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.

!flal ysis. Tools


OK.
F-TestTwo-Sample for Variances
Fourier Anal�ois Can�el
Histogram
Movi�vera.!)L_
'!jelp
1nnotmA¥·@ifl.1.a
Riink and percentile
Regres.siGn
S<impling
t-Test: PaireQ Two.Sample: fur Means
t-Test: Two-Sample Assuming Equal l/aria'1Ce!l ,.,..,
Nonstationary Time-Series Data and Cointegration 297

A Random Number Generation dialog box pops up. We need to generate one set of random
numbers for our random errors, so we specify 1 in the Number of Variables window. We would
like to generate 500 random numbers, so we specify 500 in the Number of Random Numbers
window. We select Normal in the Distribution window; the selected Parameters should be
Mean equal to 0, and Standard deviation equal to 1. Select Output Range and specify it to be
D2:D501. Finally, select OK.

· Rimd.om !>lumber Gener111tio111 �rg)


Number of �arlables:
LI� - �I �
Number of R.o-noom Num!;i_e rs : �Js_
oo ___ �I �
Qjstribution: �jN·o_rm_a_I ____
_
v�I [ tle!p

Paramerers.

M�an=

�t;ndard deo:ialj0r1 = �

B.andom .Seo;:d;,

output opbnns
@ Qutput R.C!n ge·:
0 New Workshe&E_ly:
0 Ne�1' �crkbcok

After you copy the content of cell E3 to cells E4:E501, here is how your table should look (only
the first five values are shown below):

A I B c I D I E I
=
1 �u 0 lft Yt
2 e:.= 0.7 -0.86857 -0.86857
3 -0.70454 -'1.31.254
4

I
-
-0 . 34 472 -1.2635
5 0.9"14442 0.029994
6 0.171311 0.192307

Note: you will obtain a different random sample than the one we obtained, so your Vt and Yt
values should be slightly different than the ones reported above.

Select the Insert tab located next to the Home tab. Select El:E501. In the Charts group of
commands select Line, and Line again.

.H>Line


Cillum11 Lil!IE Pfe Bar Arca S cattcr Other
Charts·
Homt . � �
Ii

After editing, the result is (see also Figure 12.2(a) p. 479 in Principles ofEconometrics, 4e):
298 Chapter 12

4-

··2

·4

-5

1 51 101 151 20·1 251 301 351 4D1 4.51

Again note that since you obtain a different random sample than ours, your plot will be slightly
different than the one shown above. For the same reason, our plot and yours are also slightly
different than Figure 12.2(a) on p. 479 of Principles ofEconometrics, 4e.

Algebraically, it can be shown that, for the AR(l) model (12.1), the mean, vanance and
covariance of the time series Yt are:
(12.2)

(12.3)

(12.4)

Let s = 3. For our specific example, where p = 0.7 and


Vt are independent N(O, 1) random
errors, the mean, variance and covariance for the time series Yt should then be:

(12.5)

var(yt) = 12 /(1 - 0.72) = 1.96 (12.6)

(12.7)

The AR(l) model in (12.1) is a classic example of a stationary process with a zero mean. AR(l)
models fluctuating around a nonzero mean and AR(l) models fluctuating around a linear trend
are extensions to (12.1).

The special case where p = 1 in equation (12.1) leads to a random walk model. Extensions of the
random walk model are random walk with drift and random walk with a deterministic trend. In
contrast to AR(l) models, random walk models display properties of nonstationarity.

Examples of all those models are illustrated in Figures 12.2(b)-(f) on p. 479 of Principles of
Econometrics, 4e. You too can consider all those additional models by proceeding as we did
above.
Nonstationary Time-Series Data and Cointegration 299

12.2 SPURIOUS REGRESSIONS

Two independent random walks series, rw1 and rw2, were generated similarly to the way we
generated our AR(l) time series in Section 12.1.2. The data set is named spurious.

Open the Excel file spurious. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 12 in one file, create a new worksheet in your POE
Chapter 12 Excel file, rename it spurious data, and in it, copy the data set you just opened.

l
I � ['{
I 1nmt Warksheel! (S1'1Jft�F11),.

Select the Insert tab located next to the Home tab. Select Al:B701. In the Charts group of
commands select Line, and Line again.

Scatt•r Othe·r

Ctlart§ �

Ch.arts

After editing, the result is (see also Figure 12.3(a) p. 483 in Principles ofEconometrics, 4e):

Time Series
10 �������

60

50

1 UJ1 201 301 5()1 £01

Select the Insert tab located next to the Home tab. Select Al:B701. This time, in the Charts
group of commands select Scatter, and Scatter with only Markers.

S\'.atle'I

Collilmrn Li111.e Pie l!l•r Other


'K c�arts •

Char;
t ·"'G r;.

After editing (refer back to Section 2.1 if needed), the result is (see also Figure 12.3(b) p. 483 in
Principles ofEconometrics, 4e):
300 Chapter 12

....
f
Scatter Plot •

70

n� �,_� -
60 -

. ., .

so
q
. :-. ....

"'•.:;If:lj;r �,..,.
40
� II
..
i
..
30 1:
� ;i,. -:.�
20
!l
:HJ - do�
. · ':'-
(}
-1-(] 0 10 20 30 40 50

l
n.vz

- .... -

These time series were generated independently and, in truth, have no relation to one another, yet
when we plot them, as we have done above, we see a positive relationship between them.

Next, we estimate a simple regression of series one (rw1 ) on series two (rw2 ) :

(12.8)

In the Regression dialog box, the Input Y Range should be Al:A701, and the Input X Range
should be Bl:B701. Check the box next to Labels. Select New Worksheet Ply and name it
Spurious Regression. Finally select OK.

r� ---===--- - ��
- � .

I Regression l7J l:g)


Inpu±
Input! Range: I $A$1:$A$701 mt
Input;. Range:
I $8$1:.$8$701 l�l
� �abels D Constant is fero
D Con!jdence level: � 6/o
output options

0 Qutput Range�
® New. WorkSheet..eJy� I Spurious .Regressim1I I

The result is (see also p. 482 in Principles ofEconometrics, 4e):


Nonstationary Time-Series Data and Cointegration 301

I

A B I c I 01 I E I F I G I H I I I
1 SUMMARY OUTPUT
I
I

f
;
T
_3 I R<fl'rxr.ession Statistics
1
_4_ Multiple R 0_83960906; - ;
_5_ R Square 0.704943374'
;
___!__ Adjusted R S.qoare 0.704520657
1 Standard Error 8 _ 5572 &7989
8 Obsel\lations 700

)_1
10 ANOVA
}!I
1
I

!
df SS MS F Sig_nific:.1mce F
122n6-556'8 1
� RBgrnssicm 1 12'2116.5568 11Sli7 .&4-7606 3-5686E-18' 1
13 RE!sidual 698· 51112.33113 73.226:83543
;
14 Total 699 173228.887�
15 I '

16 I Goefficien Is Sfilndard E!Tor t Stal P-value lower95% Upper 95% l·ower 95.0% UeE_er9i5.0%
17 lnt_e-rcept 17.81804111 0. .£:204176(}3 2'8.716.65471 2.4G0'3E-1;m 16.5998149-8. 19.03626723 16.59981498 19_03626723
1ll rw2 0.84l04116 '0_0;20619645 40_83&84128 :r.5.fiB!iE-187 o. B 01 551::w 1 o_sai52512 -(i_8o155i2o1 o.8s2�2512

This result suggests that the simple linear regression model fits the data well (R2 = 0.70), and the
estimated slope is highly significant (tiny p-value). These results are, however, completely
meaningless, or spurious. The apparent significance of the relationship is false.

12.3 UNIT ROOT TESTS FOR STATIONARITY

The Federal Funds rate (Ft) and the 3-year Bond rate (Bt) series exhibit wandering behavior, so
we suspect that they may be nonstationary variables. In addition, the series fluctuate around a
nonzero mean, so the appropriate Dickey-Fuller test equation is the one that includes a constant
term. Finally, following the procedures described in Sections 9.3 and 9.4 of Principles of
Econometrics, 4e, we find that the inclusion of one lagged difference term is sufficient to
eliminate autocorrelation in the residuals in both cases. The extended test equations are thus as
follows:

(12.9)

(12.10)

The null and alternative hypotheses of the unit root test for stationarity are:

where the null hypothesis is that the series is nonstationary.

Go back to your usa data worksheet. In cells J3:P4 enter the following labels and formulas.
302 Chapter 12

J K L M N 0 p
3 LiFt Ft-1 LiFt-1 LiBt Bt-1 LiBt-1
4 =C4-C3 =C3 =C3-C2 =D4-D3 =D3 =D3-D2

Copy the content of cells J4:L4 to cells JS:LlOS, and the content of cells N4:P4 to cells
NS:NlOS. Here is how your table should look (only the first five values are shown below):

j I K I L I M I N I b I p
3
-
ilFt F1-1 .:l.Fr.1 �s. Bt-1 .!l.Bt.1
4 0.83 10.55 0.87 0 12.64 1.45
-
5_ -2.12 11.39 0.83 -1.54 12.64 0
G -0.79 9.27 �2.1.2 -0.42 11.1 -1.54
-
_]_ -0 56. ' 8.4S -0.79 -o.Si 10.68 -
0 42: .

8 -0.02 7.92 -0.56 -0.47 9.76 -0.92

In the Regression dialog box, the Input Y Range should be J3:J105, and the Input X Range
should be K3:L105. Check the box next to Labels. Select New Worksheet Ply and name it
Dickey-Fuller Test for F. Finally select OK.

---
�" rn�
lnDUt
Input 'f. Ran�e: $J$3:$J$l05 LiitJ
[ Cancel I
i npu t� [!.ange: I $K$3: $L$105
!:!elp
0 Labe!s D Constant is i:_ero
0 Con[idence Level: �%
Output op lions
0 Q_utput Ran_ge: �I
0 New Worksheet !:'_ly; I Dickey-Fuller Test for F I

The result of estimated equation (12.9) is (see also p. 487 in Principles ofEconometrics, 4e):

A I B c I D I E I F G I H I I
-'--'-'- ·
1 SUMMARY OUTPUT
2 �
3 Regression St-atisucs
4 Multiple R 0.582724854
T R S·guare 0. JJ9!i68256

,_!_ Adjusted R Square Q.3262252 I


7 Standard Error 0.445349931
a Observations 102· I

�f, Ar-.JOVA
1
1f df SS MS F S(qriificanc& F
12 Regression 2' 10.0.957157,'t 5.047857897 25.45091022i 1.20638E-09
13 �esidual 99 19.6353195 ()_ 198336?61
'14 Total 101 29.73103529
15


1s
lntoer"ept
ft-1
Coefficients- Standard Error
0. 172522121
- -0. 04462129·
f Stal
0"1002333()1 1 721205623
0.017614175 -2' 50481925
P-valcre
0.086337136
0.013883951
lo�1er 95%
c0.0W362489
-o .079 9 Ga47B
UllJ!.er95%
0.3714 Q 6B1
Lower95.0%
·0.0.26362489
-0.0092741 -0.07996847'8 -0.009274102
11,oe_er 95. (}%
0.371406731

Aft-1 0.561058175 0.08091!2746 G.928119927 4.36()2E-10 0 .4 () 0 3 70842 0. 7.21745501l 0.4 0037034 2 0.721745508

The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure­
for more details on that refer to Section 12.3 of Principles of Econometrics, 4e) is -2.505, and
the 5% critical value for tau, Tc, is -2.86 (value found in Table 12.2 on p. 486 of Principles of
Nonstationary Time-Series Data and Cointegration 303

Econometrics, 4e). In this case since -2.505 > - 2 86, we do not reject the null hypothesis that
.

the series is nonstationary. In other words, there is insufficient evidence to suggest that Ft is
stationary.

Go back to your usa data worksheet.

In the Regression dialog box, the Input Y Range should be N3:N105, and the Input X Range
should be 03:P105. Check the box next to Labels. Select New Worksheet Ply and name it
Dickey-Fuller Test for B. Finally select OK.

Input
Input'[_ Ra�ge: I $N$3 :$N$105 @§1 �
[ Cancel . ]
Input o:; R�nge: I $0$3: $P$105 1�1
tj_elp

D Con[ide�ce Level: EJ %
Output options
0 Qutput Rarige: I 'u1cklly-Full.,r fo5t l�j
©New Worksl"teet f:'_ly: I Dickey-Fuller Test for B I

The result of estimated equation (12.10) is (see also p. 487 in Principles ofEconometrics, 4e):

A I B c I D I E I F G I H I I
1 SUMMARY OLJTPUT

2
_] Re-a1es:;;ion Slaus!K:;
4
--
MultipJe R °-3794 02286
2- R Square 0.143946095
6 Adjuste-0 R. Square
-"-
0,126.652077
7 Standard EFmr 0 .502618036
8 Observations· 102
-9
10 N�OVA
11 df SS MS F Significarrce F
-:12 Regr.essi on 2 4 .205427073 2'.102713537 8.3234614&6 0. 0004558 32

13 Residual
. 99 25_0098641 0_25�62489
14 Total 101 29.21529118 I
15
16 Coefficirmls Standard EITDr t Stat P-val11e Low.ef '95% Uf!E.r:!r95% LDl\'ef 9:5_0% I Uf!.f!.81' 95.D%
-
17 lntr.rcept 0_23687297 0.129173fl89 1 83376408'1 0_069693252 -0.019434455 0-4�H180396 -0.01943446 0-4lB180396
18 Bt-1 - 0 . 056241 169 0.0:10803115 -2'. 702847917 €L003091462 -0. 097523982 -0.01495.336 -fl.0975!11898 -0.0149533-57
0_290307786 O.OB960G852 3.239794507 0.001629198 0Ai3a·1on1,6
-·-.

19 �Bt-1 Q_112508357 0.468107216 o_112508357

The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-2.703, and the 5% critical value for tau, Tc, is -2.86 (value found in Table 12.2 on p. 486 of
Principles of Econometrics, 4e). In this case again, since -2.703 > -2.86, we do not reject the
null hypothesis that the series is nonstationary. In other words, there is insufficient evidence to
suggest that Bt is stationary.

Since we found insufficient evidence to suggest that Ft and Bt are stationary, we are interested to
determine whether these series can be made stationary by taking their first difference. If we can
establish that this is the case, then these series would be integrated of order 1, or I(l). In general,
the order of integration of a series is the minimum number of times it must be differenced to
make it stationary.
304 Chapter 12

Because the series /),_Ft and /),_Gt appear to fluctuate around zero, to test the first difference of the
Federal Funds rate (!),_Ft= Ft - Ft_1) and the first difference of the Bond rate (!),_Bt Bt - Bt_1) =

for stationarity, we use the following test equations:

(12.11)

(12.12)

Go back to your usa data worksheet. In cells M3:M4 and Q3:Q4 enter the following labels and
formulas.
M Q
3 d(�F)t d(�B)t
4 =G4-G3 =H4-H3

Copy the content of cell M4 to cells M5:M105, and the content of cell Q4 to cells Q5:Q105.
Here is how your table should look (only the first five values are shown below):

M Q
3 !i.l�F}t -
a ll.IJ.Bl't
4 -0. 04 4 -1.45
.5 2. 9 5
- 5 -1.54
s; 1 . 3 3. . 6 -
1.12
7 0.23 --
7 -0.5
8 D.54: 8 0.45

In the Regression dialog box, the Input Y Range should be M3:M105, and the Input X Range
should be L3:L105. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Dickey-Fuller Test for changeF. Finally select OK.

' . -

I Reg re ssion IT:J r:8.J


lnP,ut
Input 'i R.:inge; l$M$3:$M$105 �J �I
[ Cancel ]
Input 4.. R.:inge; 1$[$3:$L$105 (00
0b.abels 0 Constant is ;r;ero [ t:!elp ]
D Con[idence Level: EJ %

Output options

O Qutput Range: l'Did e\1 Fulloer fr.st l�j


@NewWork5heetf.ly: I Test for changeFI

The result of estimated equation (12.11) is (see also p. 488 in Principles ofEconometrics, 4e):
Nonstationary Time-Series Data and Cointegration 305

A I 8 I a I D I E I F G H
lll-
·1-
f---
us -�-
., M-A-RY OUTPUT'
2
on_ _l _t
s i_
�3-��� -� -�-s- a·_ - -s��
i s�
R a S -
4 Multiple R 0-47�20942

�� RSquare (')22%41658
6 Adju�led R Square 0.219740678
7 Standard Error 0..457�0180.5
T ObseNai.ions i02
9
WANOVA
11 I _dt SS MS F Siari'if'icm1ce F
12 Regression , 5.304559392 6.3045593,92 JQ_·HJ781804 3.09269E-07
JI Resi�ua.1- ·m 21.14934061 0:209399412
14 Total 102. 27-45391
15
16 CoeffiGienlS' Standa.rd Error t· Sfal P-value 1.. OL'l'er 95% Upper 95% Lower .95_ 0% Upper 95_ 0%
17 I nterc e pt__ 0 #NIA #N/A #NIA #NIA #NIA #NIA #NfA
18 ;l,,ft-1 -0.4469-86047 0.0814·61861 - 5 4810 59 1 43
_ 3.0409E-07 -0_&085S446 -
0 28 5 387 6 33
. -0Ji085844.& -0.2-S.5387633

The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-
5 4 8 7 , and the 5% critical value for
. tau, re, is -1.94 (value found in Table 12.2 on p. 486 of
Principles of Econometrics, 4e). In this case since -5.487 < -1.94, we do reject the null
hypothesis that the series tlFt is nonstationary and accept the alternative that it is stationary.

Go back to your usa data worksheet.

In the Regression dialog box, the Input Y Range should be Q3:Q105, and the Input X Range
should be P3:P105. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Dickey-Fuller Test for changeB. Finally select OK.

. �������-- --

Regression r1] �
Input
Input-):'. Range: j$Q$3:$Q$105 �
OK tJ
Cancel ]
Input:� Range: l$P$3:$P$105 �
.tjelp
�s_abe;ls �Const.ant is. �ero
D Con[idence Level: �%
Output option �

0 Qutput Range: lmh"y-Fuller T t �I


@Nel"! Worksheet i:')y: I Te>t For change8 I

The result of estimated equation (12.12) is (see also p. 488 in Principles of Econometrics, 4e):
306 Chapter 12

I
J_, A
___ __ B I c I D I E I F G I H I
�SUMMARY OUTPUT

JI Re_g_re.ss10FJ Sta6sf1cs
_4_1Mul!iple R 0.60629.3811
5 R Square_ 0.367.592185
t-g- Adjusted R Square 0.35-7691195
,_!_ Standard Error 0-522402752
8 Obsewation s 102
:

io-1ANOVA :
rif SS MS F Sigpificanc� F
]�e'' "" '"" 1 16.0214318:� 16.02143188 56J07071_3J 1.20231 E-11
t3 Residual 101 27.56336812 0-272904635
f4 Tota l 102 43.5848
15
16 Coefficienbs Slamfard Error I Sfaf P-v'a/.ue Lower95% Ue_e_er95% L.ower 95. 0% Upped!5. 0'3'o .
17 lnte.rc.ept_ 0 #N/A #NIA #1'-l/A #N/A #N/A #NIA #NIA
,_
t8 ll.Bt-1 .D_7 0179559 0.091593563 -
7 6620 53 99 3
. 1 14557E-11 -0. 883492774 -0.520098406. -0. 883492774 -0:.520098405

The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-7.662, and the 5% critical value for tau, Tc, is -1.94 (value found in Table 12.2 on p. 486 of
Principles of Econometrics, 4e). In this case since -7.662 < -1.94, we do reject the null
hypothesis that the series fiBt is nonstationary and accept the alternative that it is stationary.

These results imply that while the Federal Funds rate (Ft) and the Bond rate (Bt) are
nonstationary, their first differences, fiFt and fiBt , are stationary. We say that the series Ft and
Bt are integrated of order 1, I(l).

12.4 COINTEGRATION

As a general rule, nonstationary time-series variables should not be used in regression models, to
avoid the problem of spurious regression. However, there is an exception to this rule. If Yt and Xt
are nonstationary I(l) and their difference, or any linear combination of them, such as et = Yt -
{31 - {32xt is a stationary I(O) process, then Yt and Xt are said to be cointegrated. In other words,
in this case, there is a fundamental relationship between these two variables, and an estimated
regression between them is valid and not spurious.

We have already established in Section 12.3 that Ft and Bt are nonstationary. Now, we would like
to test whether these series are cointegrated. The test for cointegration is a test of the stationarity
of the residuals et = Bt - b1 - b2Ft, where b1 and b2 are the least squares estimates of the
regression of Bt on Ft.

Below we first estimate the regression of Bt on Ft:

(12.13)

Go back to your usa data worksheet.

In the Regression dialog box, the Input Y Range should be Dl:DlOS, and the Input X Range
should be Cl:ClOS. Check the boxes next to Labels and Residuals; uncheck the box next to
Constant is Zero. Select New Worksheet Ply and name it Regression of Bon F. Finally select
OK.
Nonstationary Time-Series Data and Cointegration 307

. - -

Regression CTJ [g]


Jnput
Input :r Ranqe: $D$1 :$D$105

Input� R<inqe: i $C$1 :$C$105 [�J


�babels D Cqns tant is ;;,era
l
ti_e p

D Con[jdence Level: �%
Output options

0 Qutput Range: I 'Dii:ke1-F1Jllcr Te;I �1


©New Worksheet E'.ly: I Reqression of Bon Fl I
0 New Y!'_orlcbook
ResJdu<ils
�BMiduals D Resld_ual Plot;

The result of estimated equation (12.13) is (see also p. 489 in Principles ofEconometrics, 4e):

1- e-
A I B _L_ C I D I E I F I G H I
j_
S UMMARY OUTPUT
2 � ��
3 -r- Re-a-�-ss -i o-n·-S-la-Hs-it-cs
-� -

4 Multii:ile R 0.945824�9'2
,IR Square 0·.8945$4726
_§_ Adju�!ed R S_quare 0.8·91551243
7 Stan,d ard E·rror o.810·1t301&s
'8 Observations 104

-fo-1ANOVA
11 I rif SS MS F Significance F
12 Re11r.essio11 1 568-1739601 568_17.39601 1155. 6017148 t22562E -51
' 13 Resiaua/ 102 55_95'197449 0.556391907 --- -
1
1'4 Total · 103 635:1259346
15 ;--��������������������������������������
-11
�6 J��������
C=
o e�
m
1�c�W!=1t�s.-=-S!=
an
� d=a ro
"'-"'E=1ro=r��tS=ta=t���P
_"V
=a�
lu� e ��L=
o�
, � ie�r �
9 5�%�
· �U
=p=
p=
��9�5 %
�o �L=aw
= e�r �
9 5=
.0�
%�U
=g�
o
o�r=
95�
.0�%
o�11
_1Jjlntercept 1 13982Q659 Q_ 174083328 G.547609544 2.3992'6E-09 0.7945362:13 1.48512310,5 0.794S36.213 1.485123105
1Bl F 0 . 9 1 44 113�7 0.031.080112 29.42111002 122562E-51 0,852'764144 0.976058651 0.852764144 0.9'76068651
19
20

I
I�
22 R_ESJDUAL OUTPUT
23
24 I Observ.alion Predicted B Residuals
1 10.01Hl47&1 1 . 18952390'1
2 10 796G140 1 1 .843985985
3 11.554!:,?547 4 .0115024526

The test for stationarity of the residuals is based on the test equation (12.14) which follows. This
is the augmented Dickey-Fuller version of the test equation (12.7) found on p. 489 of Principles
ofEconometrics, 4e. It includes one lagged term Llet-l to correct for autocorrelation.

(12.14)

Go back to your usa data worksheet.

In cells Sl:V4 enter the following labels and formulas.


308 Chapter 12

s T u v
1 e-hatt
2 ='Regression ofB on F'!C25
3 ='Regression ofB on F'!C26 �e-hatt e-hatt-1 �e-hatt-1
4 ='Regression ofB on F'!C27 =S4-S3 =S3 =S3-S2

Copy the content of cells S4:V4 to cells SS:VlOS. Here is how your table should look (only the
first five values are shown below):

s T u v
1 e-hat1
2 •1.189524
3 1.843986 c.:e�hat1.. e-hati.1 ll.·e-ha!i.1
4 '1.0135025 -0.75896 1.84398& 0.654462
5 1.483577 0.398552 1.08.5025 -0.758·96.
6 1.785962 0.. 302385 1.4!l35n
7 1.378032 -0.40793 U85S62: 0.30,2385'
8 0.9'2632 -0.45171 1.378032: -0.40791

In the Regression dialog box, the Input Y Range should be T3:D105, and the Input X Range
should be U3:V105. Check the boxes next to Labels and Constant is Zero; uncheck the box next
to Residuals. Select New Worksheet Ply and name it Cointegration Test. Finally select OK.

- - - ---

i Reg,ressio n
rul8J
Jnput

1nput J'.. Range: Li4J


[ Cancel J
Input 0$. Range: $U$3 :$\1$ l 05
t!elp
0 �abels 0 Constant i;; ;f'._ero
D CeonEiden�e Level: �%
Output options
O·Qutpu,t Range: J 'Re91 eiss1c.n oF B ... '�J
@Neiy Worksheet!:ly: j·cointegration Test

The result of estimated equation (12.14) is (see also p. 489 in Principles ofEconometrics, 4e):

I
\SUMMARY
1
A
OUTPUT
B I c I
i
D I E I F G I H I I

T'
4
R�aression Statistics
Multiple R 0.410996241
I
� R Square 0.16891791

-�- Adjusted R Square 0.150607089


T Standard Error 0.4 f7281263 t
T Obsel'\/at ions 102

1�ANOVA
11 l df SS MS f Signffi'r;ance f
..:1L _Regr.ession 2 3.539'073191 1.7695 36595 10.15252859 9 .67&5·6E-05
13 Residual 100 17.41236524 0.174123652 -
w2 20�95143843.
-··

14 Total
'
1!.i I
is I CoeffiGien/s Slanda�d fuar I Slat P-va/u.e LoVl'er95% Uef!.er95% Lov;eI 95. 0% Veger95Q%
_Jl_ lnle!:E�pt 0 #NII'\_ #NIA
-t·
#NIA #NIA #N/A #N/A #N/A
_1A_ e-hatt-1 -0.224509324 0.053.503858 -4 .1 % 133318 5.88749E-05 -UJ0659451 ·0.118359196 -U30659451 -.0.116359196'
19 .::ie-hatt-1 0 .254044805 0.09}700632 2.711236981 O.OOTB9·HJB3 4
0.0&8'1 5425 0.43:9944165 0.06.8145426 o.43.9944185.
Nonstationary Time-Series Data and Cointegration 309

The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-4.196, and the 5% critical value for tau, Tc, is -3.37 (value found in Table 12.4 on p. 489 of
Principles of Econometrics, 4e). In this case since -4.196 < -3.37, we do reject the null
hypothesis that the residuals are nonstationary and accept the alternative that they are stationary.
This implies that the Bond rate and the Federal Funds rate are cointegrated. In other words, the
regression relationship between them, estimated above, is valid.
CHAPTER 13

Vector Error Correction and


Vector Autoregressive Models

CHAPTER OUTLINE
13.1 Estimating a VEG Model 13.2.2 The VAR Model
13.1.1 Test for Cointegration 13.3 Impulse Responses Functions
13.1.2 The VEG Model 13.3.1 The Univariate Case
13.2 Estimating a VAR Model 13.3.2 The Bivariate Case
13.2.1 Test for Cointegration

13.1 ESTIMATING A VEC MODEL

Open the Excel file gdp. Save your file as POE Chapter 13. Rename sheet 1 gdp data.

Insert a new column to the left of the column labeled usa. In your new cells Al:AS, enter the
following label and values.
A
1 q*-year
2 ql-1970
3 q2-1970
4 q3-1970
5 q4-1970

Select cells A2:A5, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell A125.

310
Vector Error Correction and Vector Autoregressive Models 311

I 1
A
o'-vear
2 q1-lS70 �

3 .q2-1970
4 q3- 197 0
5 o4-19·70 +
i; I

Excel recognizes the series and automatically completes it for you. Here is how your table should
look (only the last five values are shown below):

A
121 q4-1999
122. ql-2000
123 q2-iooo
114i qJ-2000
125- q4-2000

Next, we plot the time series of the quarterly real GDP of Australia and the United States for the
sample period 1970 to 2000.

Select the Insert tab located next to the Home tab. Select Al:C125. In the Charts group of
commands select Line, and Line again.

Column Bar Ar�a S[offer Cth.r



Charts•

After editing, the result is (see also Figure 13.1 p. 502 in Principles ofEconometrics, 4e):

Real Gross Domestic Product (GDP)

30
Q 111 0 L<°I 0 111
"- ... 00 00 CT\ ""
CT\ en en 01 GI GI

::rr
.,.; ..... ..... .... .....

..'.i � ,.; ,,-'., ,.;


rr rr rr a- rr

....

It appears from the figure above that both series are nonstationary and possibly cointegrated.
Formal unit root test of the series have confirmed that they are indeed nonstationary.
312 Chapter 13

13.1.1 Test for Cointegration

We proceed as in Section 12.4 to check for cointegration.

We first estimate the regression of Australia's GDP (A) on the United States' GDP (U)-the
intercept term is omitted because it has no economic meaning:

(13.1)

In the Regression dialog box, the Input Y Range should be Cl:Cl25, and the Input X Range
should be Bl:B125. Check the boxes next to Labels and Constant is Zero, and Residuals.
Select New Worksheet Ply and name it Regression of ausGDP on usaGDP. Finally select OK.

' Regression L1.J�


rnpu
OK
Inplrt I Range: SCSl:SC:Sl25
Cancel
!nput�Range;; S8-!il:!;BS.125
t!Elp
0 [.abels 0 Con,;tant is i'.ero
D Confidence Level: � %

Oujput opbcns
0 QutputRange: �1
@Ne,,� Worksheet E'.I)': J' ausGDP on usaGDP I
0 New �orl<book
Residuals
�l��-�id_��-i�j' D Resic:J.ual Plots

The result of the estimated equation (13.1) is (see also p. 502 in Principles ofEconometrics, 4e):

A B c I D I E I F I G I H I
1 SUMMARY OUTPUT
j
T ------
3 Rec:ression Slatrstfcs
!__Multiple R 0.999626204
5 R Square 0.9!19652439' ___....__
.._ ------1-----;�--___,!--� 1
'& Adjusted -·a s91s2235s

R SquilrB
J Standard Error 1 21937:t742 I
j
6 Obser1ation.s 124
..
""
9 i1 °
MOVA
1
11 df SS MS F Si1J'1"1itic�r.·ce F
,�I Re,gression 1 526014.2115 5.26014.2115 3537716996 4.4122E-213 1
I
13 Re5idual 123 1B2.8855951i� 1 486B747G1
j
1l4 Tota l 124 526197 09?1
15
16� ! ------ -
C-�--
e��-
w-rit_
s
_
S_-
t� ;_ ro_E-
d a_ -
1r
w --t - at-- -P
St- - 1e --
l u-
-·-- L o-�-
�-
r 9_
5_
%
_
_ o�-
U__ r-9�-
%- --
, . -L o w-
� e r_5_-
9_ -
0%� -u
- o__ 9-
er_o. -�
0%o
r
g p
_1_7 [ Intercept 0 :IN/A 1Ni'A #NIA #r11/A ;'IN/A :'FNJA �N/A
1alusa 0.985349542 0_001G56642 594.Ti371045 U477E-214 0.9B20703°2 0.988628764 0.98207032.. 0 �886287G4
1��
20
21
4

-f2 RESI Q UAL O _hJTPUT


J
23 I i
24 I ObseNalion Predicted .ws ,q;;s1duals
1 1
37.73997333 0-495526:&7' j
2 I
37 8112141 0.9438858.9£1
j
3 31l. 14652559 0_624()734 05
Vector Error Correction and Vector Autoregressive Models 313

Select the Insert tab located next to the Home tab. Select cells C25:C148 from your Regression
of ausGDP on usaGDP worksheet. In the Charts group of commands select Line, and Line
agam.

2-() Line

After editing, the result is (see also Figure 13.2 p. 502 in Principles ofEconometrics, 4e):

Residuals derived from


the cointegrating relationsh.ip1

l 9 17 25 33 41 49 57 65 73 81. 89 97 105 113 121

To compute the first order autocorrelation use the CORREL function as you have done in Chapter
9. In cells E24:E25 of your Regression of ausGDP on usaGDP worksheet, enter the following
label and formula.
E
24 ri
25 =CORREL(C26:C148,C25:C147)

The result is:

r1
0.871647553

Again, note that your Excel results differ slightly from the one reported m Principles of
Econometrics, 4e (see Section 9.2.lb for more details on that).

The test for stationarity of the residuals is based on the Dickey-Fuller test equation (12.7) found
on p. 489 of Principles ofEconometrics, 4e. It is restated below:

(13.2)

Go back to your gdp data worksheet.


314 Chapter 13

In cells El:G3 enter the following labels and formulas.

E F G
1 e-hatt
2 ='Regression of ausGDP on usaGDP'!C25 �e-hatt e-hatt-1
3 ='Regression of ausGDP on usaGDP'!C26 =E3-E2 =E2

Copy the content of cells E3:G3 to cells E4:G125. Here is how your table should look (only the
first five values are shown below):
E F G
e-hat1
2 0.49552'7 J'le.hat1 ·e.hat1.1
3 0.943886 0.448359 0.495527
OJ),24073 -0.31981 0..9438.861
0.532725 0 624CJ13
-0.37954 1.156798
- 0 1 4 0 74
. 0 i17i63

In the Regression dialog box, the Input Y Range should be F2:Fl25, and the Input X Range
should be G2:G125. Check the boxes next to Labels and Constant is Zero; uncheck the box
next to Residuals. Select New Worksheet Ply and name it Cointegration Test for GDPs.
Finally select OK.
� -
'
Regressio n 12]�
Input

I
OK
Input!'. Range: 5F52:5FS125
� Cancel
Input l!'. Range: J sc;s2: sx:;n2.s

!ielp
0�abels 0 Constant 1s :i;ero
0 Con5dence level: �%
Output o_pban.s

0 Qulput R.ange:: �1
© Nl'=W \l\lorksh.eet ['.:ly: J �ration Te�t for GDPs
I

The result of estimated equation (13.2) is (see also p. 502 in Principles ofEconometrics, 4e):

A I B I c I D I E I F I G I H I I I
SUMMARY OUTPUT

I
,...L
2
3 Reoress;o1r Slalisfrc;s

,_4_ Multipl�R 0.2�3071206


5 R Square· 0.064045035

,_§_ Adju,sted �Squaw 0.055848314


,_l_ St a:ndard Error o .5984 �961 7
8 Obse1Vatioris 123
.. t
3
'10 Al�OVA
11 elf SS MS c
I S�r;ific;ance
F
0.004£764-07
,J2:. Regressic11 1 2.99032'3011 2.990323011 8 34-815202,
13 Resrdllal 122 4J 70 05,1853 0_35820.1792
'14 Total 123 4G6909416
15
16 I Cceffic1e11fs Staf'lcla-rd E'ror I Stal P-va)ue Lower 90'% UoMr 90% lowe-r9o.Q% Uooe/'95.0%
lli4 lnt er.c e_rit 0 ll:N/A #N/A #WA i:'WA #N!A ll:NlA #NIA
e.·hatt·i -0 127931)484 0 .044279146 -2.8893i 6878 !) 004.570296 -0.215591475 -
O Ct4 0 2 8 1 4 9 3 -0.215591475 -0� 04 028149�1
.
Vector Error Correction and Vector Autoregressive Models 315

The I-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-2.889, and the 5% critical value for tau, Tc, is -2.76 (value found in Table 12.4 on p. 489 of
Principles of Econometrics, 4e). In this case since -2.889 < 2.76 , we do reject the null
-

hypothesis that the residuals are nonstationary and accept the alternative that they are stationary.
This implies that Australia's GDP and the United States' GDP are cointegrated. In other words,
the regression relationship between them, estimated above, is valid.

According to the estimated equation (13.1), if the United States' GDP increases by one unit, the
GDP of Australia would increase by 0.985 of a unit. But the Australian economy may not
respond fully by this amount within the quarter. To ascertain how much it will respond within a
quarter, we estimate the vector error correction model.

13.1.2 The VEC Model

The vector error correction model (VEC model) for Australia's GDP (At) and the United States'
GDP (Ut) is as follows:
(13.3)

(13.4)

where et-i are the lagged residuals from estimated equation (13.1).

Go back to your gdp data worksheet.

In cells I2:J3 enter the following labels and formulas.

I J
2 �usa �aus
3 =B3-B2 =C3-C2

Copy the content of cells I3:J3 to cells I4:J125. Here is how your table should look (only the
first five values are shown below):

I I J I
2 l'l.US11 ti.a us
-
3 0_0723 0_5196
,_
4 0.340297 0.015499
5
...---
-0_4146 0 124199
6 1-062401 0_667301
7 0.222099 0.078103
'-

In the Regression dialog box, the Input Y Range should be J2:J125, and the Input X Range
should be G2:G125. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VEC Model Eq. for ausGDP. Finally select OK.
316 Chapter 13

Input
Input '(Range: I s.m:sJs12s t�l
19$¥:SG5m �
t!elp
0labiols, D C onstant iS G_e:ro
r;::;;--1 "
D Conjjdence Levef: � fol�
Outplltoptions
0 QuipL1i Rarige: �1
@NewWorkBl1eet.f'.ly: I ode! Eq, kw ausGDP I

The result of estimated equation (13.3) is (see also p. 503 in Principles ofEconometrics, 4e):

A I B I c I D I E I F G H I I I
�[ s:DJ�1MAP.Y OUTPUT !
2 l· ;
_l__I Reg_ressron SIEr11stics
_L Multiple R O_ JB55�233
_6_ R Square O _ Q 34�,3�4 .
6 A.djusted R SqLiare 0.026453511 -·

T Standard Error
-
0:640876564 -
8 OIJ.servations 123
,.. .

io-1ANOVA
11 df SS MS ,� Sfonificance ,c ·-
'
12 Re:gressiun 1 1 7722392'18 1 772239218 4. 3 1502.2'34 91 0_03989'2949
13 Residual 121 49.69/7716.5 D.410°i'25386
'
14 Tots I 122 51 47006037 I
1.5
ii I
1 Coeffic1er;ts :Sta11d;;rd Erro1 f Stat P-value lo11w95% u;g_per9o% Lower95 0% Uoe_er95 0%
*1�1erc. ept OA91705B74 0_05-790946-9 8.490940752 6 12439E-14
0 039 892-�·:,fa
0. 3 7705 880 7 0_606352942 0.377058807 0_606352942
-0.. 19277:2639 -0.00463275fl
·hatt·1 -0. 098 702 599 0.04 75-1574�1 -2° 01t21f31a1 -0.192772639 -0 Otl46�27S�

Go back to your gdp data worksheet.

In the Regression dialog box, the Input Y Range should be I2:1125, and the Input X Range
should be G2:G125. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VEC Model Eq. for usaGDP. Finally select OK.

,---���---
-- ---- ---
� -_,----
- -

Regression [1]�
Input
1nput '!.. Rar.ge: Js.1s2.J$i:i;l2.5 �
'Input 1 Range: I %S2:'SG5125 (00]
t:J.elp
�Labels D Constant is t_ero
D Confidence LewEil: � O J"
L::..:...__J "

Output option�
0 Qutput Range:: �1
0 Me•N Worksheet �ly:, I odel E q . fqr uMGDP I

The result of estimated equation (13.4) is (see also p. 503 in Principles ofEconometrics, 4e):
Vector Error Correction and Vector Autoregressive Models 317

A I B I c I D I E I F I G I H I
_L SUMMARY OUTPUT '
2
l
J Rewession Slalislrcs
...
4 Multiple R 0.07161£1325
-- ...
5 R Square 0.005129323 '
-if Adjusted R Square -o ooio92i44
7 Sta 1 d a rd Error
1 0 5165 68014
g Obser1atio11s 123,
'
-
9
'
IO AN OVA I
11 I cff SS MS F Si.Qflfficarrce F
'
12: Regression 1 O 1GG469c'.32I 0 1664693.21 o._?.?3�da5B 0 431165838
13 Rasidual 121
.
'32 2879 ,(41 0 2!56342513
'

14 Total I 122 32 45441 342


-
15
16 CoefficiE!n�s S timclard E1ror t Stal P-valuE! .l Cll'>'!lll 95'%Uoper95% l.Qi�'ef 95. 0% JJ.oce/95.0%
JL Int e rc_�pt, 0 . .509834284 IJ.04-0G7GS27 10_.92"371il51 9 507G7E�20 o.417475195 0.60229·3373 0.41 T4 7§195 0.60.2'293373 !
1a e-hatt-1 0.030250241 0�03 8299159 0 189·840,8 53 0.4J 1165638 -0.1Jt45.57 304 S 0.106073527 0 045573046 0.1060135271
-
.

13.2 ESTIMATING A VAR MODEL

Open the Excel file /red. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 13 in one file, create a new worksheet in your POE
Chapter 13 Excel file, rename it fred data, and in it, copy the data set you just opened.

c=:) II tred data L

Insert a new column to the left of the column labeled A. In your new cells Al :AS, enter the
following label and values.
A
1 q*-year
2 ql-1960
3 q2-1960
4 q3-1960
5 q4-1960

Select cells A2:A5, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell A201.

I
A
1 q*-yeai:-
2 ql-1%0
3 q2-1960
4 q3-1960
5 lq4-1960
-f-
-" I

Excel recognizes the series and automatically completes it for you. Here is how your table should
look (only the last five values are shown below):
318 Chapter 13

A
197 q4-l(HJ8
r-
198 q!-20(19
I--
199
,__
q2-20(19
200 q3-20CJ9

201 q4-20Cl9

Next, we plot the time series of the quarterly log of Real Personal Disposable Income (denoted as
Y or ly in your Excel file) and log of Real Personal Consumption Expenditure (denoted as C or le
in your Excel file) for the US economy over the period 1960: 1 to 2009: 4.

Select the Insert tab located next to the Home tab. Select Al:C201. In the Charts group of
commands select Line, and Line again.

.2-D LiITT;,

After editing, the result is (see also Figure 13.3 p. 504 in Principles ofEconometrics, 4e):

Real Persona1I Disposable Income and


Real Persona1l Consumption Expenditure (in log)

7_5

7.2
"' "' 0 "' 0 If"\ 0 "' 0 U1
"' "' ID ((] "'
- "'
lD •JJ m D D
"' Ci "" "' "" "' "" 0 0
,..; rl rl ,,..; rl rl ,..; r;' '";' N
� ..-'< ...; ..... ...; ,; ..... rl rl rl
cr cr cr er er rr er tT rr 0-

·�·

It appears from the figure above that both series are nonstationary.

13.2.1 Test for Cointegration

We proceed as in Section 12.4 to check for cointegration.

We first estimate the regression of the log of Real Personal Consumption Expenditure ( C) on the
log of Real Personal Disposable Income (Y) for the US economy:

(13.5)
Vector Error Correction and Vector Autoregressive Models 319

In the Regression dialog box, the Input Y Range should be Bl:B201, and the Input X Range
should be Cl:C201. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it Regression of C on Y. Finally select OK.

"' � ----
-=- �-

Regressio n ll]�
Input
Input l: Range: I $B$1:$B$201 l_______Ql<_ti]
[ Cancel J
Input c; Range: 1$C$1 :$C$201 [�l
tielp
l!'.J Labels D Constant is ;::era
Oi:on[idencelevel: �%
output opt! ans

o·Q.utput Range:

@ New Worksheet eJy: /Regression of Con V I


0 New l,liorkbook
Residuals
i!'.IB.esiduals D Resigual Plots

The result of the estimated equation (13.5) is (see also p. 503 in Principles of Econometrics, 4e):

A I B I c I D E I F I G I H I 1

1 SUMMARY 0 TP T I
2 I
J Regre�sio11 Statisti� -

4 Multiple R ().99-�198794 I
-

.5 R Square {).99'8398229' I
-
1
6 A�us.ted R Square ().998390139' I

_l_ �andard Error ().U19680429


I
8 Observations 200
'9
-

10 A NOVA I
11 df SS MS f Significonc:e F I
-
12

u
Regre�sion
Residual
1

198.
47.8010&067

0.0766&9215
47.80108067

0.000387'11'9
123415.1893 l.02-061E-2 78
-
l
-
I
Total 47.87776'988;
14 199>
I
15

16 coefjrcients Standard Error t:Stot P-volue Lower95% Upper95% tower95:0% Upper95.0%


17 Intercept --0.404162766 0.02SO•S�41 -16.1320464 6.24675E-3S - 0.45356852.7 -0.354757006 -0.4535&8527 -0.35475·7M6

18 ly 1.03 5.287621 0.002946'977 351.'.1�98()4 l.0206E-278 1.()29476131 1.041099


· 11 1.029476131 1.04109911
19
--

-
20 I
21
-

ll RESIDUAL OUTPUT
_,_
23 I
24 Observation Predicted .le Resrduols· I
-
25 1. 7�441661973 0.03735.$027' 1
26 2 7.447258738. 0.044331262
--

27 3 7.4411153227 0.039468773.
J

The test for stationarity of the residuals is based on the Dickey-Fuller test equation (12.7) found
on p. 489 of Principles of Econometrics, 4e. It is restated below (and includes the extra term
Llet-1):
(13.6)
320 Chapter 13

Go back to your fred data worksheet.

In cells El:H4 enter the following labels and formulas.

E F G H
1 e-hatt
2 ='Regression of C on Y' ! C25 �e-hatt
3 ='Regression of C on Y' ! C26 =E3-E2 e-hatt-1 Ae-hatt-1
4 =E3 =F3

Copy the content of cells E3:F3 to cells E4:F201 and cells G4:H4 to cells G5:H201. Here is
how your table should look (only the first five values are shown below):

/" E I F I G I 11

1 e-hat1
-

2 0.037355 1le-ba_tt
-

3 0.044!331 (1_006976 e-hafi_� ae-:ha.t1_E


-

4 0.039469 �0_004862 :o_m43Jl 0_Q06'.1Ho


-

5 IJ . 041443 {1_(}01975 1 0_0394!69 -0.004!862


-

6 0.031783 --0_00%61 {)_(}41#3 0_(}01'9'75


-

7 0.03W7 -0_0(1011 3 OJBl 783 -0.009661


B 0.021871 --0_009i98 {)_1}3107 - 0 000713
.

In the Regression dialog box, the Input Y Range should be F3:F181, and the Input X Range
should be G3:H201. Check the boxes next to Labels and Constant is Zero; uncheck the box
next to Residuals. Select New Worksheet Ply and name it Cointegration Test for C and Y.
Finally select OK.
. Regre-s-
' s i-
o n _________
-

--- - ��
Input
Input Y. Range:

lnput·lj_ Range:
I $F$3:$F$201
I $G$3:$H$201

[�]
� I

.t!elp
0!..abels �Constant is �ero

D Con[idence Level: �%
, 0-utput options

0 Ql..!tput Range: I �1
Test for c and
®New Worksheet Ely: I 1 YI I

The result of estimated equation (13.6) is (see also p. 503 in Principles ofEconometrics, 4e):
Vector Error Correction and Vector Autoregressive Models 321

A I B I c I D I E I F I G I H I I

sv'MMARY iJ VTPUT I
� I
t � r r + j
3 R'egn!'ssion S,tatistics
I
4 MultipleR 0.388.877197
t- i
5 R Square 0.151225474
r-- j
6 AdjtJsted R Square 0.141792951
7 Stanc!a rd Error 0.008:18920.8
r--
& Observations 198
I

j
�MAN OVA I I
t-11 I df SS MS F Significance F
12 Regre>'sio·n 2 0.00,2341922 0.00117()961 17.46058111 1.05-786E-07

� Residual l!>fi 0.013144373 6.70631E--05


14 To-tal 19B 0. 01548.6295

15 I
.16 J Coefficients Strmdrircl Error t Strit P-valve Cower9S% Upper95.% Low,er 95.0% Up-per95.0%
17 Intercept I() ttN/A tlN/A li!N/A �/A flN/A li!N/A t!N/A
18 e- h att-1 - 0.087647619 0. 0-305()8415 -'2.8.7289975 0.00/151539'5 -0.147814521 0. ()27480117 0 . 147814521 -0.027480717
-
- -

19 &e-natt-1 -0.299406355 0.06716.0619 -4.45805428 l.39045E-05 -G.431856576 -0:166956134 --0.431856575 -0.166956B4

The t-statistic (also referred to as r-statistic in the context of a Dickey-Fuller testing procedure) is
-2.873, and the 5% critical value for tau, re, is -3.37 (value found in Table 12.4 on p. 489 of
Principles of Econometrics, 4e). In this case since -2.873 > -3.37, it indicates that the errors
are nonstationary and hence the relationship between C (i.e., ln(RPCE)) and Y (i.e., ln(RPDI)) is
spurious. That is, we have no cointegration. Thus we do not apply a VEC model to examine the
dynamic relationship between the log of Real Personal Disposable Income Y and the log of Real
Personal Consumption Expenditure C. Instead we estimate a VAR model for the set of I(O)
variables {�Yt, �Ctl.

13.2.2 The VAR Model

The vector autoregressive model (VAR model) for the log of US Real Personal Disposable
Income (Yt) and the log of US Real Personal Consumption Expenditure (Ct) is as follows. For
illustrative purposes, the order of the lag in this example has been restricted to 1.

(13.7)

(13.8)

Go back to your fred data worksheet.

In cells 13:L4 enter the following labels and formulas.

I J K L
3 .l'.\Ct L\Yt .l'.\C1-1 L\Yt-1
4 =B4-B3 =C4-C3 =B3-B2 =C3-C2

Copy the content of cells 14:L4 to cells IS:L201. Here is how your table should look (only the
first five values are shown below):
322 Chapter 13

I K L
___!__ liC1 �C,_1 1'.Yi1.1
.6.Y!
-
4 -O_OCH96S 0_000864
.
·�
O_OL573 0_0054fr6
s I
J.OOU43 -'il00061 -0_0039'68 0_000864

-
6 -OJJD-028 0.009'061 0.001343 -0.00061

7 0.01477 OJ}.14955 -0.{!0028 0.0090-01


-

8 C'-004839 0_013559' -0_01477 O.Ol4955

In the Regression dialog box, the Input Y Range should be J3:J201, and the Input X Range
should be K3:L201. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VAR Model Eq. for Y. Finally select OK.

----- ----- -

I Regression �tBJ
lnput
Input y_ Range: $J$3: $J$20l

Input ;>;_Range: $K$3: $L $20 1

D Constant i> <::ero


[ !;!elp
B labels
D Cor.Eiden�e Level: 195 1�10,
Output option5

0 Qutput Range:
©New Worksheet !:,ly: I VAR Model Eq, for V
I

The result of estimated equation (13.7) is (see also p. 504 in Principles ofEconometrics, 4e):

I
�M�UTPUT I
I A B I c D I E I F I G H I I I
t
9 I Regression Stotistics
4 Mllltipfo R \>.334387691
-5 11 Square {l.111315128
-

6 �djusted RSquare 0.1'()270554

-
7 Standard Eri'or Q.008Sfil528'.
8 Observati ans 198
9
-

10 A NOVA t I
Iii I rlf 55 MS F Slgnificonrf!f

-�I"'"'"" I
2 '().001799428 0.000899714 12.27444346 9.52969E-06
--

B Res.idual 195 ().014293454 7. 32998E-05

14 To-tl
i l 197 -0.010092881 I
15
-151 Coefficients Standard Error t Sta< P-vol!u!· Lower95% Uppee-95% l.ower-95.0% Upfi!er95.0%
17 Intercept -0. 00 6'0'35673 -0.IJ(}M&fi078 -6..12-1:1-03131 4. 98959E-09 0.004091927 0.0079 8142 0.0040919>2'7 0.0079&142'
18 .1Ct-1 o.:475427604 0.097320409 4.884&77702 2.15226E-06
-
0.283480071 0.567375137 0.283400071 0.667375137
19 t.Yt-1 -Q.2171G7947 �.075172994 -.2..888900094 0.00'4303069 -0.365424427 -0.068911466 -0.3654.24427' - 0.068911455

Go back to your fred data worksheet.

In the Regression dialog box, the Input Y Range should be I3:I201, and the Input X Range
should be K3:L201. Check the box next to Labels; uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it VAR Model Eq. for C. Finally select OK.
Vector Error Correction and Vector Autoregressive Models 323

-
- rlJ�
-

· � r�-----
Input
Input 't Range: [�] Lil';]
I Cancel J
Input� Range: $K$3:$L$201

�Labels �
0 Constant is �ero
tielp

D Conlidence: leve:I: O/Q


Oulput options

0 Qu_tput Range: I �1
® lllew Worksheet E'ly: [ vAR Model Eq. for cJ I

The result of estimated equation (13.8) is (see also p. 504 in Principles ofEconometrics, 4e):

A I B I c D I E f I G I H I I

�SUMMA RV OUTPUT I

3 I 11eg ression StC/tistics


4 Multi-pie R. 0.34711241:!

5 R S,quare 0.12.0487027
-

-
6' Adjusted RSq1.iare 0.111466381
7 Standard Error 0.00:6575419
-

& Observations 198


9
-

10 AN OVA t I
11 I SS MS F Significance F
df
_g Regression 2 IJ.0011549-93 0.0005774'97 "l.3.3$6S{f713 3. 66117E-C>6
13 Residual 195 Qi.008431046 4.32351E--05
-

14 ,Total 197 0.00958.5039 I


15)
15 Coefficients Sta ruJonJ Enor t Stat P-value lOW,fH95% Upper95% lower95.0% Uppe1"95. .0%
17 Intercept 0.005.277615 0.0007573.27 6.96874U06 4;80599E-11 0.003784012 0.0057712.1.8 O.fr03784012 0.00577121.8
-

18 liC:t-1 0.2.15606801 0.07474856 2.88442.7484 0 .• 0043&2.208 0.0681873.9'2 0.36302.5211 0.0681S7392 l).36302.&211


19 liYt-1 0.149379832 0.05773431 2.58H!>67t03 0.01039-8562 0.03551599'4 0.263243"67 0.035515994 0,263243·67
-

13.3 IMPULSE RESPONSES FUNCTIONS

13.3.1 The Univariate Case

Consider the following autoregressive model of order 1 (AR(l) model):

Yt = PYt-1 + Vt (13.9)

where p = 0.9 , Yo = 0, v1 = 1 and Vt = 0 fort > 1. Note: because Yo = 0 and v1 = 1, y1 = 1;


because Vt = 0 fort > 1, Yt = PYt-l fort > 1.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.

sinrnlated data Jil II


I
324 Chapter 13

In cells Al:D3 enter the following labels, values and formula.

A B c D
1 p= 0.9 Yt
2 1
3 =$B$1 *D2

Copy the content of cells D3 to cells D4:D31. Here is how your table should look (only the first
five values are shown below):
A B c D
p= 0.9 'it

J 0_9
4 O_B 1
5 0.729
5 0_6561

Select the Insert tab located next to the Home tab. Select Dl:D31. In the Charts group of
commands select Line, and Line again.

.2:·D Line

Bar

Oiarh
k.:a S°[atter Otl')-:r
Chart§•
r;; I� � l;2
I
After editing, the result is (see also Figure 13.4 p. 505 in Principles ofEconometrics, 4e):

l1m,1pulse responses foil!' AR{t) model

0 .8.
..

0'.6

DA

. .0.2

!Q
1 5 9 17 25 29
Vector Error Correction and Vector Autoregressive Models 325

13.3.2 The Bivariate Case

Consider the following bivariate VAR system of stationary variables:

(13.10)

Xt = 820 + 021Yt-1 + 022Xt-1 + vf (13.11)

where the errors vi and vf are independent of each other (contemporaneously uncorrelated);
vY-N( 0, a; ) and vx-N(O, a;).

In this case, there are two possible shocks to the system-one toy and the other to x. Thus we are
interested in four impulse responses functions-the effect of a shock to y on the time-paths of y
and x, and the effect of a shock to x on the time-paths of y and x.

First, let us consider what happens when there is a one standard deviation shock to y, so that
vi = O"y and vi = 0 for t> 1; assume vf = 0 for allt.

We further assume the following numerical values: Yo = x0 = 0, O"y = 1, 810 = 820 = 0,

811 = 0.7 and 812 = 0.2, 821 = 0.3 and 822 = 0.6.

Note: this implies y1 = 1 and x1 = 0. Fort> 1, Yt and Xt are given by equations (13.12) and
(13.13):
(13.12 )

Xt = 820 + 821Yt-1 + 822Xt-1 (13.13)

In cells Fl:G6 enter the following labels and values.

F G
1 010 = 0
2 011 = 0.7
3 012 = 0.2
4 020 = 0
5 021 = 0.3
6 022 = 0.6

In cells Il:K3 enter the following labels, values and formulas. In the last row, you will find the
numbers of the equations used, if any.

I J K
1 Shock toy: Yt Xt
2 1 0
3 =$G$1 +$G$2*J2+$G$3 *K2 =$G$4+$G$5*J2+$G$6*K2
(13.12) (13.13)
326 Chapter 13

Copy the content of cells J3:K3 to cells J4:K31. Here is how your table should look (only the
first five values are shown below):

F G H J
010 = 0 Shock !Cl�� lit
611 = 0.7
012 = 0.2 0.7
Oio �· 0 0.55
021 = 0.3 0.4�3 0.399
Ou= 0.6 0 4.039 0.3783

Select the Insert tab located next to the Home tab. Select Jl:J31. In the Charts group of
commands select Line, and Line again.

__,51
® __ QH'.l_ _m e_ � I nsert�
After editing, the result is (see also Figure 13.5 p. 507 in Principles ofEconometrics, 4e). We also
show the response of x toy, plotted by selecting cells Kl:K31.

Response of v to y Response of>< to y


1-

0 . .8

0.5

04

i:U:

'()

I
r 5 9 13 17 21 25 29 1 5 9 1! 17 21 25 29

�------ ,�

Note that the figures above looks slightly different from the ones found in Principles of
Econometrics, 4e. The difference is explained by the fact that in the above figures we did not plot
the Yo and x0 values, but started instead with y1 and x1.

Next, let us consider what happens when there is a one standard deviation shock to x, so that
v f = ax and v f = 0 for t > 1; assume v [ = 0 for all t.

We further assume the following numerical value: 2. Note: this implies y1 0 and x1 2.
For t > 1, Yt and Xt are given by equations
ax = = =

(13.12) and (13.13) restated below:

(13.12)

Xt = 820 + 821Yt-1 + OzzXt-1 (13.13)


Vector Error Correction and Vector Autoregressive Models 327

In cells M1:03 enter the following labels, values and formulas. In the last row, you will find the
numbers of the equations used, if any.

M
1 Shock to x: Xt
0 2
3 =$0$1 +$0$2 *N2+$0$3 *02 =$0$4+$0$5*N2+$0$6*02
N 0
Yt
Copy the content of cells N3:03 to cells N4:031. Here is how your table should look (only the
2
first five values are shown below):

(13.12) (13.13)
Shoc:k lo x:

0.52 0.84
0 532
1.11 I�
0.5044 I 005556 I
1 'Yr x,
-

2 0 2:
,____ ..,

We plot the response of y to x by selecting


,_
3 cells Nl:N31,
04 and
12 the response of x to x by selecting
4
cells 01:031. After editing, the result is (see also Figure 13.5 p. 507 in
,_ Principles of
5 -
OGG
Econometrics, 4e): ,____
6

Response of y to x Response of x to x

0.6

1.4

0.5 2

04 1,6

0.3 1.2

D.2 0.8

IH 0.4

{) 0
1 5 9· n 17 25 2'9 :L 5 1,..., i7 21 25 29
CHAPTER 14

Time-Varying Volatility and


ARCH Models

CHAPTER OUTLINE
14.1 Time-Varying Volatility 14.2.1 b Lagrange Multiplier Test
14.1.1 Returns Data 14.2.2 Forecasting Volatility
14.1.2 Simulated Data 14.3 Extensions
14.2 Testing and Forecasting 14.3.1 The GARCH Model
14.2.1 Testing for ARCH Effects 14.3.2 The T-GARCH Model
14.2.1a Time Series and Histogram 14.3.3 The GARCH-ln-Mean Model

14.1 TIME-VARYING VOLATILITY

14.1.1 Returns Data

Open the Excel file returns. Save your file as POE Chapter 14. Rename sheet 1 returns data.

Insert a new column to the left of the column labeled nasdaq. In your new cells Al:Al3, enter
the following label and values.

A A
1 m*-year
2 ml-1988 8 m7-1988
3 m2-1988 9 m8-1988
4 m3-1988 10 m9-1988
5 m4-1988 11 ml0-1988
6 m5-1988 12 ml1-1988
7 m6-1988 13 m12-1988

328
Time-Varying Volatility and ARCH Models 329

Select cells A2:A13, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below, left-click, hold it and drag it down to cell A272.

A
1 I m•-11·ear
2 m1-1988
3 m2-1988
4 1113-1.988
5 m'l-1988
G mS-1988
7 m6-1988
-
B rnl-1'988
-
9 rn8-19·88
1 ti m9-1988
11 11110�1 sss
12 m 11-i 988

I 13 m 1 2 -193
-'-'I
�.

Excel recognizes the series and automatically completes it for you. Here is how your table should
look (only the last five values are shown below):

A
258 m3-2'010
r----
259
--
m4-2-010
270 mS-2-010
-
271 m6-W10
---
272 mJ-2-010
1.-� -- -----

Next, we plot the time series of the monthly returns to the United States Nasdaq stock price index
(NASDAQ).

Select the Insert tab located next to the Home tab. Select Al:B272. In the Charts group of
commands select Line, and Line again.

Scatte.r Other
Charts-
Charts

After editing, the result is (see also Figure 14. l(a) p. 520 in Principles ofEconometrics, 4e):

United States; Nasdaq


3U

20

10

-10

-2tl

-30
330 Chapter 14

The values of this series change rapidly from period to period in an apparently unpredictable
manner; we say the series is volatile. Furthermore, there are periods when large changes are
followed by further large changes and periods when small changes are followed by further small
changes. In this case, the series is said to display time-varying volatility as well as "clustering" of
changes.

Next, we plot a histogram of the returns.

We proceed as we have done before in Section 4.6.1. First, we create a BIN column. In cell Gl,
type BIN. The bin values will determine the range of values for each column of the histogram.
The bin values have to be given in ascending order. Starting with the lowest bin value, a value
will be counted in a particular bin if it is equal to or less than the bin value.

Note that econometric packages such as SAS or Stata automate the choice of the number and
width of bins. Thus the figures they produce might differ slightly from ours.

Fill in the bin values as shown below. Note that all you need to do is enter the first two values:
- 30 and - 27.5, select cells G2:G3, move your cursor to the lower right comer of your selection
until it turns into a skinny cross as shown below, left-click, hold it and drag it down to cell G26:
Excel recognizes the series and automatically completes it for you.

1
G
BIN
2200
23

24
:U..5
25

2 i---:3ol 25 27.5

3 .�.f. �. 30+

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data analysis.

Anal}'m

The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.

,.� - -

D ata An11 l ysis �L8J
Analysjs Tools
Co�ariaince �
Descrii!Jtive Stalistics
Exponential Smoothing �-
F-TestJwoc'Silrnple for Variances

=I
F.iurier Analysis tlelp·
I 1sto�ram
Moving Average
Ranciorn Number Generafon
R<irnk and Percentile
Regresflion

An Histogram dialog box pops up. For the Input Range, specify B2:B272; for the Bin Range,
specify G2:G26. The Input Range indicates the data set Excel will look at to determine how
Time-Varying Volatility and ARCH Models 331

many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it US Nasdaq Histogram; check the box next to Chart Output. Finally, select OK.

r -- - ---- -- ·---

1 Histogram I}]�

; �
Input
Input Range:. 1$_
'-"B$,__2--'-
:$B__,$_27_2_--'- _..,
J;l,in·Range: $G$2:$G$26 �J �
tielp
D Labels.
Output options
0 Qutput Range: I ri:'il
@New W�rksheet E'.!y: I us Na·sdaQ 11 istogr am I
0 New :tLorkbook
D P2reto (sorted histogram).
D curnul,ative Percentage
�t�F�-r.f§�tP.�t:'

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. Select Close.

.-��rmat Data Series ff]L8]


�elrie I Serles Options I Series. Options
Re.et fo MJJ.tch Style· Fill Series Qverlap

Chang;e Series Chart


· li�pe . . . Border Color .Separated .-------0- overlapped

� 5.i;.tect Data ... Border Styles


EJ
3 o Botat1 n. Shadow
Gap�dih
Add Data tatrels .3-0 Format

_ I; Tien.dli r: ...
NciGap

� .E ormol: f!)a:ta ? eni.• s ...

Go to the Border Color tab and select Solid line, choose a different Color if you would like.
Select Close.
... � - - - �
format Datil Series �['g]
Series Oplions Border Color
Fill 0 t:!o'nne.
� Border Color j @ �otd
t line
Q §radient line
Border Styles
0 A!!tomatic
Shadow

3-0
i:;'.,olor: �
_b;J
Format
Iransp Color Ji--- :I Close

Finally, delete the Legend, and increase the size of the Chart area (see Section 2.3.4 for more
details on that). After editing, the result is (see Figure 14.2(a) p. 521 in Principles of
Econometrics, 4e):
332 Chapter 14

United States: N!asdaq

50

40
I>
li!
� 30
I:!"
:!!
...
20

10

- 30 -25 -20 -15 -10 -5 () 5 10 15 20 25 30

We would like to draw a normal distribution on top of this histogram so we can better assess
whether or not the returns display normal properties.

Go back to your returns data worksheet. In cells Il:J4, enter the following labels and formulas.

I J
1 Nasdaq
2 sample mean = =A VERAGE(B2:B272)
3 sample variance = =V AR(B2:B272)
4 standard deviation = =SQRT(J3)

In cells Ll:M3, enter the following labels, values and formula.

L M
1 Mid-point NormalNasdaq
2 -31.25 =NORMDIST(L2,$J$2,$J$4, FALSE)
3 -28.75

In column L, we specify the mid-point or mid-value of the bins or class intervals we used to
construct the US Nasdaq histogram. In column M, we compute the normal distribution values
corresponding to those mid-point values, where the normal distribution is specified to have a
mean and variance corresponding to the sample mean and variance of the monthly returns of US
Nasdaq.

Select cells L2:L3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell L26: Excel recognizes
the series and automatically completes it for you.
L I

""[
22
L 1 23 21.25
1 Mid-point I 24 2:3.75

-31.�I
2 25 25.25
:3 ·· 28.75+ 26 28.7\
r ' :1
Time-Varying Volatility and ARCH Models 333

Copy cell M2 to cells M3:M26. Here is how your table should look (only the first five values are
shown below):
-
I I J I K L I M
1 nasdilq Mid-point NormalNi!sd.aq
1--
2 sample me.an= 0.70S548 -31.25 9>.52135E-07
,_
3 sample variance= 46.35319 -28.75 5.MllE-05
�,_
4 standard deviation= 6.808318 -26.25 2-.30812E-05
5 -23.75 9'.2349E-05

l
,_
5 . .
-21.25 i{) .0003 2'."88 0

Go back to your US Nasdaq Histogram worksheet. Select the histogram, right-click and choose
Select Data on the list of options that pops up. In the Select Data Source dialog box, select Add.
In the Edit Series dialog box, specify the Series values to be M2:M26 from the returns data
worksheet. Finally select OK. The Select Data Source dialog box reappears again. Select OK
one more time.
e--- .

Select Da.ta Source

Chart 4ata range: [ii

R
Qelele

� Rmt ta M.e.tc� SVJI<


Legend Entries (§:eri$') Series values:
Change
· Series Chart Type ..•
l='retu�ns data'1$M$2: $M$26I
I� S�lect Data... .[0 I �6dd �I L'i�
Frtquerin-
3.c. Ro:itat1�n,,

The series you just added is barely visible, at the bottom of your plot area. Select it, right-click
and select Change Series Chart Type. In the Change Chart Type dialog box, select Line. In
the list of Line charts, select Line again. Finally select OK.

. -----

(h.a11ge Chart Type

I ,Qei-.te
Templates
�I Reset to M_g_tcb '.ityl.:.

I� Chang.e Serie5 Cha rtTwe . . , t:s'l

Your series is now is little bit more visible, still at the bottom of your plot area. Select it again,
right-click and select Format Data Series this time. In the Series Options, select Secondary
Axis.
���-

Format Data Series

sene� Op ti\;ln!>
Serie-s Optio ns
Marker Options Plot SE:rfes On
Add Data LoQels·
.dd Tre·ndline.,,
Marker FJll 0 e.rim.ary Axl!>

Lme Color
1 '·� forma� Gata s.:�i;::1 ...
����-����r.:t .��-i�.1
In the Line Style options, select Smoothed Line. Finally, select Close.
334 Chapter 14

r - -

I Fonn<tt D<Bta Series

Mark.er Optims

� SOlOOthed llt1e

Select the right-vertical axis, right click and select Format Axis. Select the Axis Options tab,
specify Fixed Minimum at 0.0 and Fixed Maximum at 0.09. Finally select Close.

Format Axis

I Add Major Gri<llines I A*- Option� I Axis Options


l.Ptdd �·�nor Grldlines Number Minim m:
O e,uro 0 Eixed
I� .Eorrnat A;x i. s ...
.bl: I Fill Maxi�ium: O .0.1,!lD 0 F[xe�

The result is (see also Figure 14.2(a) p. 521 in Principles ofEconometrics, 4e):

1- :f.------'-'-"--'' "'

United States: Nlasda.q


60 �-------� 0.09

0.08
50
0.07

:>
40 0.-06
u
"
0.05
�.,. 30
0.04

L&. 20 0.03

0.02
10
O.Ql

0 0

� � � � � � Q � � � + � �

Note that there are more observations around the mean and in the tails. Distributions with these
properties-more peaked around the mean and relatively fat tails-are said to be leptokurtic.

You can proceed similarly to plot the time series and histograms of the monthly returns to the
Australian All Ordinaries stock price index (ALLODS), the Japanese Nikkei stock price index
(NIKKEi), and the United Kingdom FTSE stock price index (FTSE ). They are shown on Figures
14.l(b)-(d) and 14.2(b)-(d) pp. 520-521 of Principles
ofEconometrics, 4e.

14.1.2 Simulated Data

Consider the following ARCH(l) model:


Yt = f3o +et (14.la)

(14.lb)
Time-Varying Volatility and ARCH Models 335

(14.lc)

where {30 = 0, a0 = 1 and a1 = 0. Note: these values imply ht = 1, which means that
var(et llt_1) is constant and not time varying.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.

J simuJated data1 J1J 4 J


I lnsert Worksheet !5h[ft+F'l!1] I I

In cells Al:E3 enter the following labels, values and formula.

A B c D E
1 Po= 0 et Yt
2 ao= 1 =$B$1+D2
3 a1= 0

In column D, we generate a sample of 200 random numbers from a normal distribution with
mean 0 and standard deviation 1.

Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis. The Data Analysis dialog box pops
up. In it, select Random Number Generation (you might need to use the scroll up and down bar
to the right of the Analysis Tools window to find it), then select OK.

;· Da
-ta.
- An _11_lys _i_s -----------� (IJ[8]
analy�is Tools
Descriptive Statistics
Exporn:nb.ol Smoothing
F-Test Two-Sample for ·� ar i<lnces
Fouri er Analysis
Histogram
Mo11i1�
· vera�
IMijft@ijij;fi§§.t�tl·®·
Ra k and Percffl!lle
Reoressi on
II Formul1M [}atatf Review sa�pling ,.,,_; 1
Jl.nal}i�ls

A Random Number Generation dialog box pops up. We need to generate one set of random
numbers for our random errors, so we specify 1 in the Number of Variables window. We would
like to generate 200 random numbers, so we specify 200 in the Number of Random Numbers
window. We select Normal in the Distribution window; the selected Parameters should be
Mean equal to 0, and Standard deviation equal to 1. Select Output Range and specify it to be
D2:D201. Finally, we select OK.
336 Chapter 14

r -- ------ .

Random Humber Generation


mgi
Number of _ariables:
I.___
L _ ___,I �
Number of Random NumQ_ers: �!20_0 ___ �1 �
Qil;tnbu.tion: �IN_or_m a_l _____ ""�I [ [iep
Peiramet<:r:s

Mgan=

�tamlard deviation = �

&an dom Seed:

Ou cput op!ions

0 Qufput R,:mge: I stisc:SDs2D1 �J

After you copy the content of cell E2 to cells E3:E201, here is how your table should look (only
the first five values are shown below):

"I A I B I c I D I E I
1 Ille= 0 9t y, I
2 ct11 = 1 , 1.903�3 'U 96313
3 ll'1 =
0 , 021504 1-0.21.504
-

-
4 0_625423 0_13.25423
5 -0.0904.3 -0_09043
Ei -0.132132 -0.132.62

Note: you will obtain a different random sample than the ones we obtained, so your et and Yt
values should be slightly different than the ones reported above.

Select the Insert tab located next to the Home tab. Select El:E201. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.
S:Ca.tter

E�
l� �
After editing, the result is (see also Figure 14.3(a) p. 522 in Principles ofEconometrics, 4e):
Time-Varying Volatility and ARCH Models 337

Simullated Data: ht= 1

-4

0 25 50 75 100 .125 150 175 20{)

Next, we standardize the simulated data we just generated. That is for each observation we
subtract the sample mean and divide by the sample standard deviation.

In cells Gl:J3, enter the following labels and formulas.

G H I J
1 Yt standardized Yt
2 sample mean = =AVERAGE(E2:E201) =(E2-$H$2)/$H$4
3 sample variance = =VAR(E2:E201)
4 standard deviation = =SQRT(H3)

After you copy the content of cell J2 to cells J3:J201, here is how your table should look (only
the first five values are shown below):

G I H I I I J
1 Yt_ s.tandardirnd Y:rc
1-
2 .sample me.an= 0_005703 1-309547 877
- -


3 sample variance = 0.826602 i.117.275611
4
- sample .standmd deviation= 0.909177 0.581628202
5 -0.105734753
,_
6 I -0_ 152135987

Remember that your numbers our going to be different than ours since you are working with a
different random sample.

To plot the histogram of the standardized Yt, we proceed as we have done in Section 14.1.1.

In cells Ll:L3, enter the following label, value and formula.

L
1 BIN
2 -4
3 =L2+1/3
338 Chapter 14

Copy the content of cell L3 to cells L4:L26. Here is how your table should look (only the first
five values are shown below):
L
Blf�
2 -4
3 -3.56667
4. - 3 33333
.

5· -3
6 -2.5661)7

In the Histogram dialog box, the Input Range should be J2:J201, and the Bin Range should be
L2:L26. Check the New Worksheet Ply option and name it Simulated Data Histogram; check
the box next to Chart Output. Finally, select OK.

Input
Input Range�

�n Ra ge: I 5ls2:.:L-S.2.6 [�1


Dlabels

output op!iorts

0 QutputRan.ge: �1
@ Neilll \�for�eet ely: I red Oat.:; Histogr<iml I
0 Ne•N '.tl_or�book
D P�reto {sorte d histogram}
D Cumulative Percentage
� �hart Output

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. In the Border Color tab, select Solid line, and
change the Color to black. Finally select Close.

Format D.ata Se ries

Qe-lt·t
l e
� Series Optic:ms l Se·ries Options i Format Data S.elies
Re1et to M;:i_tchi Style
Fill Series Qverlap
Ch a.ng,e Set['es Cha rt T�p e,,, Series Optims Border Color
Border Color Separated
Si:;lect Data ...
.----

Iili fill O t:!_o JJne


Bordo:r St0o:s
3-0 Botat1on Border Ll;Jlor 0 ;!did line
S�adow 0 �adie:nt line
Ad1d Data1 t.;�el> Gap_:l!.idth
AJ,Ltomatic
--

3--0 form:;t 0
A.did T!en.dl1ne ... Ma Gap Shadow

.Eorm�.at Data �e ries .. ,
I t- 3-0 Format
�olm:

After editing, the result is (see Figure 14.4(a) p. 522 in Principles ofEconometrics, 4e):
Time-Varying Volatility and ARCH Models 339

Simulated Data Histogram

Note: we obtain a different histogram than the one illustrated in Figure 14.4(a) since ours is based
on a different random sample. Yours will be different than ours and that of the textbook for the
same reason.

We would like to draw a normal distribution on top of this histogram so we can better assess
whether or not the ARCH(l) model with constant variance display normal properties.

Go back to your simulated data worksheet. In cells Nl:03, enter the following labels and
formulas.
N 0
1 standardized Yt
2 sample mean = =AVERAGE(J2:J201)
3 sample variance = =VAR(J2:J201)
4 standard deviation = =SQRT(03)

You should find that the sample mean of the standardized Yt is 0 and the variance is 1.

In cells Ql:R3, enter the following labels and formulas.

Q R
1 Mid-point StandardN ormal
2 =L2-0.5*(L3-L2) =NORMDIST(Q2,0,l, FALSE)
3 =(L2+L3)/2

In column L, we specify the mid-point or mid-value of the bins or class intervals we used to
construct the simulated data histogram. In column M, we compute the normal distribution values
corresponding to those mid-point values, where the normal distribution is specified to have a
mean 0 and variance 1.

Copy the content of cell Q3 to cells Q4:Q26, and copy the content of cell R2 to cells R3:R26.

Here is how your table should look (only the first five values are shown below):
340 Chapter 14

_ +-- ---�N----�J �� �O��-l�-p��._


l �__ 0- 1
· �+-- � -R� �
1 standardized y1 Mid-point Standard !fo rm al
2
-
sample mean= -2 77556E-18. -4.166&6 6.776.3E-05
3 sample variance = 1 -3.1333333 0_00025707
-4 sample standard deviation = 1 .. 3_5 0.000872G83
5 �3.166667 0.0026509I5
s ·2.833333 0.0072061 I

Go back to your Simulated Data Histogram worksheet. Select the histogram, right-click and
choose Select Data on the list of options that pops up. In the Select Data Source dialog box,
select Add. In the Edit Series dialog box, specify the Series values to be R2:R26 from the
simulated data worksheet. Finally select OK. The Select Data Source dialog box reappears
again. Select OK one more time.

-------�---�- -

: 5.elect Data Source


Chart �ata range: [j ,..-_ ·--

' Edit Series.

Serres name:
Q<:lete

t;} Reset to M�tch Styl.e �-----�[iJ S!:.

Legeni:l Entries �erles} Series :<!_alue�:


Cl1mqe ii:ri�s-ChartT!lpE .. ,

I� ��lo-ct Data... �- 1 ?::J �dd�I rq�,


Frequency
3-C B,o at1on ..

The series you just added is barely visible, at the bottom of your plot area. Select it, right-click
and select Change Series Chart Type. In the Change Chart Type dialog box, select Line. In
the list of Line charts, select Line again. Finally select OK.

. -

Chillnge Ch<irtType

QefrlE
Templates Urne
'

r::-� I
Column
� Re ;et to M!tch Styl"

II t.t'I
Change S1.:rie! Ghart Typ-e... OK
dd
t'J

Your series is now a little bit more visible, still at the bottom of your plot area. Select it again,
right-click and select Format Data Series this time. In the Series Options, select Secondary
Axis.
,. - -

I Format D.ata Series

I Ser1e.s Op.cons.
Series Optio ls
Marker Options Plot Series On
Add D<ita la.Q.el;.

Add Tre ndl 1ne... M<irk.er FUI 0 !'_rimary Axi�

�, f.orniat Data Se-


riE-�-
...- L111e Color
���ci;�·��-;i�-�i5J
In the Line Style options, select Smoothed Line. Finally, select Close.
Time-Varying Volatility and ARCH Models 341

I
r

Fonnat Data Series.

Series Op tlon;

'Marke.r Option<

'Marker Fill

.l..Jne·color
Tt smooth"'d

111
(:

- -�I �.11
llni:: Style Clos�
-
- ·

Select the right-vertical axis, right click and select Format Axis. Select the Axis Options tab,
specify Fixed Minimum at 0.0 and Fixed Maximum at 0.85. Finally select Close.

�������
r

Format.Axis

Clp1ion•
Add Major Gridllnes
"-xlS
Axis Options
Add f\li!!or Grrdline< Mumber Minimum: 0 �uto @ !:_b:ed

11, fprmat Ax i s-. ..
-� Fill Maximum: Q A!,J.to @ F\xed
� 11 Close �

The result is (see also Figure 14.4(a) p. 522 in Principles ofEconometrics, 4e):

Sim ufiated Data Histogram

ll.l:I

0.7

IHi

0.5

0.4

0.3

()2

0.1

J
The bottom panels in Figures 14.3 and 14.4 of Principles of Econometrics, 4e (p. 522) illustrates
the case of a time-varying variance. It is would be much more complicated to generate such a
series in Excel; we will not investigate this problem at this point.

14.2 TESTING AND FORECASTING

14.2.1 Testing for ARCH Effects

Open the Excel file byd. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 14 in one file, create a new worksheet in your POE
Chapter 14 Excel file, rename it byd data, and in it, copy the data set you just opened.
342 Chapter 14

f byd data 4"9§]


lt;i I'� II
I I Insi.rtWark!h.,.t (Shifhfll) II Q
14.2.la Times Series and Histogram

Select the Insert tab located next to the Home tab. Select Al:A501. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.

5!att;;r Other
_1', Cllart5�
Charu !-{ r.

After editing, the result is (see also top panel of Figure 14.5 p. 524 in Principles ofEconometrics,
4e):

BYD Lightin,g

0 50 10{) 150 200 250 300 :.so 400 450 500

To plot the histogram of returns for BYD Lighting, we first create our BIN column.

In cells Dl:D3, enter the following label and values.

D
1 BIN
2 -8
3 -7.5

Select cells D2:D3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell D34: Excel recognizes
the series and automatically completes it for you.
I D

1
c--2�
I
Q
BIN
I
I
}� c-� -
32
33
7
7. 5
2-L...l.L5
A I 0
34
�·
B
T..
Time-Varying Volatility and ARCH Models 343

In the Histogram dialog box pops up, the Input Range should be A2:A501, and the Bin Range
should be D2:D34. Check the New Worksheet Ply option and name it BYD Lighting
Histogram; check the box next to Chart Output. Finally, select OK.

.
Histogram L7Jtg]
Input
'Input Range: I sAs2: �Asso 1 [BJ
�lf1Rango<: ED52:5DS34 �
!:ielp

Output option&
Q Qutput Range: f'iil
8 Ni::'ii Wor-ksheel e_ly: I BYD. LJgh ting His;tograml I
0 New \"l_orlcPook
D P,;_re:to {Mrt:ed hfstogram)
D Cu!:!l_u1.3tive Percentage
� �;J1art,QU/put

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. In the Border Color tab, select Solid line, and
change the Color to black. Finally select Close.

��-
���������

Forma.t Dal:d Series

Qelde
I :Serles Opbons J Series Optio: s
Fomu1t D<ll:a Series
� Rt"sett to M'l!.tch Style
Series· Q.verf<!p
flll
Seri es Options
Chang• s_�ries Chilrt 1iline ... Separated Bor• der Colm
Border Color �

Fill
� S!lect Gata... 0 hlo fine
Border 5 tvles
:!-D B.otiit 1Q n Bord�r Color ® 2olidline
Shadow 0 2rad1entline
Add l:J<ata Lall.el'� Gap \IY.idth Borafr Stvle�
Q A;!_tomabc


3-0 Format
Add T!endline ... No Gap Shadow
& fm m at · Data $erie<... � 3-D'fomiat
Qllor:

After editing, the result is (see lower panel of Figure 14.5 p. 524 in Principles of Econometrics,
4e):

BYD Lig.hting Histog1ram

·8 -6 -4 -2 0 2 4 6 8
344 Chapter 14

14.2.lb Lagrange Multiplier Test

We first estimate the following mean equation:

(14.2)

where rt is the monthly return on shares of BYD, the hypothetical company BrightenYourDay
Lighting.

Note that equation (14.2) is equivalent to equation (14.3):

(14.3)

where Xt = 1 for all t's.

We estimate equation (14.3) instead of (14.2). First, we create our explanatory variable x. In cells
Bl:B2 of your byd data worksheet, enter the following label and value.



Copy the content of cell B2 to cells B3:B501. Here is how your table should look (only the first
five values are shown below):
B
1 x

2
3
4
5
6

In the Regression dialog box, the Input Y Range should be Al:ASOl, and the Input X Range
should be Bl:BSOl. Check the boxes next to Labels and Constant is Zero and Residuals. Select
New Worksheet Ply and name it Mean Equation for BYD. Finally select OK.

- --==-==-- ---- � --- --- ;:; � -

Regres�on l1JLEJ
rnput

Input '!.Range: I 5As1:·SA�501 t�l


J sBst:.sB·S.Sol [jij
t!e/p
l!'.l Labels � CGnst<,-nt is. lero
D Confidence Leve'!: �%
Output options

0 Qutput Range: �I
® �Jew Worksheet �ly: J Mean Equation for ll D J
0 �Jew �orkbook
Residuill�

�liieii�-g�i�] D Resi!:!o.JalPlois

The result is:


Time-Varying Volatility and ARCH Models 345

A I B' I c I D I E F I 13 H I I
SU�IMARY OUTPUT
J_
2 � t
3 Rearesmoft' Statl&ltcs
4 M_ultrple R 0Ji7'33 826!52 '·
>--' -
5 R Sql1are 0.453444196
� Adjust_ed �_Squar.e .0-451440186
7 Standar·d Error 1 1 35024 !524,
,___
8 01Jsarv::1tions !500

eJ_
10 ANOVA
1i I df SS MS •"' Sif}_mfrcance ,i::
1
J:?_ Regression 581-3592124 581 3592124 413.990030;i 1.97762E-67

J1. Residm1l 499 700 7372778 1 404283122


14 Total 500 1262.09649
1.51
1fi Coefficients S/ancictrd Error I Slat P�val1Je Lovrer 95% U,oe_e195% Lower 95.0% Up[!_er95.0%
lJI lnterc_e_!lt 0 #NfA #N/A #NIA #NtA #MIA #MIA #N/A
18 x '1-0182 9422! 0 052995908 20.34674496 1 iil3308E-67 0.974-171607 1 1824'16633 0_974171607 1 182416833
19
r---

elJl
d1.. -
22 RESIDUAL OUTPUT
23
f-- ·---

24 I
Observa tion Predicted r Re101dua/s
25 - 1 -
1 07��9422. -1.07629422
26., 2 -1.30548722
i
,...__

I-
1.07829422
-
27 j 1.07829422. 0.27254878

A Lagrange multiplier (LM) test (used previously in Sections 9.3.2 and 8.2.2a) is used to test for
the presence of ARCH effects.

To test for first-order ARCH, we first consider the following auxiliary regression:

(14.4)

where e[ are the squared residuals and e[_1 are the lagged squared residuals from model (14.2)
or (14.3).

The null and alternative hypotheses for a test of the presence of ARCH effects based on the
auxiliary regression (14.4) are: H0: y1 = 0 and H1: y1 * 0.

When H0 is true, there are no ARCH effects, and the sample size (T - q), where q is the order of
the lag, multiplied by the R2 goodness-of-fit statistic from (14.4) has a chi-square distribution
with m S-1 degrees of freedom, where Sis the number of parameters in (14.4)-note that:
q.
=

m = S-1 =

X2 =
(T - q) Rz-XCm=S-l=q)
X (14.5)

In cells D25:E26 of your Mean Equation for BYD worksheet, enter the following labels and
formulas:
D E
2
25 Residuals/ Residualst-1
26 =C26"2 =C25"2

Copy the content of cell D26:E26 to cells D27:E524. Here is how your table should look (only
the first five values are shown below):
346 Chapter 14

D I E
25 -
Residuals11 Residu111s1.1 ;:
26 1 704296882 1 162713425
T7 0.0742$2$37 uo429riss2
Ts 0.000743368 0.074282837
1-025754158 0.000743368
.JJ_
3[! o .3634821 i6 {ri2s7s<11·59

In the Regression dialog box, the Input Y Range should be D25:D524, and the Input X Range
should be E25:E524. Check the box next to Labels. Uncheck the boxes next to Constant is Zero
and Residuals. Select New Worksheet Ply and name it Auxiliary Regression. Finally select
OK.

Input

lAput Y. Range: I s0-�25:5Ds524 [�l


]nput 1 Range: I SE:S25: S!:S524 [�]
jjelp
�b_abels D Canst3At is fern
D Con�dence Level: � %.
OU1put op�ons
0 QutplJt-Range:
@New l�iorksheetBJv: I Au)!Jl1ary Regression! I

The result is (see also p. 523 in Principles ofEconometrics, 4e):

�3--- I B I c I D I E I F I G I H I I I
_1_ SUMMARY OUTPUT I
2
I
3 RecrresS1011 Stat�sfrc.s
4 Multiple R 0 352.942118
-g- R Square a t2455ans
I
.!... Adju,sl.ef! R Square 0 122EHlfi70T I
_L Standard Error 2.45001797'1
a Observati.ons 4.9gr
9
w N-JOVA
11 df SS
4.24.501tll9 4:!4501319
MS F S(cwrt1cance f
1 I


f2 Re9re.ssi�11 1 70. 71979859 4-3871E-15
I
13 Residual. 497 2983,286254 6.002588067
I
--

14 Tot�I 4!t8 3401.issail3


--t
I
1.6
16 Goefficier.fa Sla.ndMd Error I Stat P-•;a/ue Loiver 95% Uooer95% Lower 95.0% U:Ooa- 95. 0%
i7 lnforc�pl 0 908251831 0 124401233 7 301067 3 05 1 1407.2E-12· O.G61344G.97 1 152678976 0.6'6,3 844697 11526�8�76
1i Residual st-12: -4 387fE-1G 0.27 05S1ll75 O.Li355GHJ251 0 2:70.SBiB75 a 4355G102s
-

0.353()7145 0.04 Hl8413 8 409-5 013448

The results we are going to use for the Lagrange multiplier test are highlighted in the above table.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.

Lagrange Multiplier rest! ..-·tin


I liisert Worksheet tSlilitt.i-fllJ � 11

In it copy the Lagrange multiplier test template you created in Chapter 8.


Time-Varying Volatility and ARCH Models 347

Replace the following reference: [POE Chapter 8.xlsx]Variance Function by Auxiliary


Regression.

A B c
1 Data Input N= ='Auxiliary Regression'!B8
2 S= ='Auxiliary Regression'!B12+1
2
3 R = ='Auxiliary Regression'!B5
4 a=
5
6 Computed Values m= =C2-1
7 i-critical =CHIINV(C4,C6)
value=
8
9 Lagrange i= =Cl*C3
Multiplier Test
10 Conclusion= =IF(C9>=C7,"Reiect Ho'',"Do Not Reiect Ho")
11 p-value= =CHIDIST(C9,C6)
12 Conclusion= =IF(Cl1<=C4,"Reiect Ho'',"Do Not Reject Ho")

At a 0.05, the result of the test is (see also p. 524 in Principles ofEconometrics, 4e):

A I E3 I G
__L Data Input N= 499
2 S= 2

-
]. R2 = 0_ 124568
-
4 "= 0.05
2-
G· CompUled. Values Ill= 1
2
-

7 X -critical V81Ll8 = 3.841459


a
-

.9 La9ra119e MuUiplierlest i= G2.1595


10 ConclL1sion = RejHct Ho
p-value =
---

11 3 17E-15
Concluslon Reject Ho
-

12 =

The value of the Lagrange multiplier statistic reported in Principles of Econometrics, 4e, p. 524,
is LM = (T -
l)R2 = 499 x 0.124 = 61.876. Our calculation is slightly different because more
decimal places are used for R2.

14.2.2 Forecasting Volatility

Equation (14.6) shows the results from estimating an ARCH(l) model applied to the monthly
returns from buying shares in the company BrightenYourDayLighting. These results are obtained
using econometrics software such as EViews, Stata or GRETL-computer manuals for those can
be found at http://www.principlesofeconometrics.com/. The estimated mean of the series is
described in (14.6a) while the estimated variance is given in (14.6b):

ft = /Jo = 1.063 (14.6a)

ht = ao + a1 if-1 = 0.642 + 0.569ef-1 (14.6b)


348 Chapter 14

We can use the estimated model to forecast next period's return rt+l and the conditional volatility
ht+l · For our case study of investing in BrightenYourDayLighting, the forecast return and
volatility are:
Tt+1 = Po =
1.063 (14.7a)

� ( )2
- Po
A

ht+1 = a0 + a1 rt =
o.642 + o.569(rt - 1.063) 2 (14.7b)

Go back to your byd data worksheet.

In cells Fl:I4 enter the following labels, values and formula. In the last column, you will find the
numbers of the equations used, if any.

F G H I
1 ARCH(l)
2 Po-hat= 1.063 ht+i-hat
3 «o-hat= 0.642 =$G$3+$G$4 * ( (A2-$G$2)A2) (14.7b)
4 «1-hat= 0.569

Copy the content of cell 13 to cells 14:1501. Here is how your table should look (only the first five
values are shown below):
1 F I G I H I I
1 ARCHpj
-
2 S-0-ha,t = 1.063 h1+d1a.t
1�
3 aq-ha,t = 0.G42 1.284952
f---
4 Q1.-hat ·= 0.569 1.589156
5 0.689144

I
6 0.64303"1
7 120811)

Select the Insert tab located next to the Home tab. Select 12:1501. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.

After editing, the result is (see also Figure 14.6 p. 525 in Principles ofEconometrics, 4e):
Time-Varying Volatility and ARCH Models 349

BYD Lightirng
24 �-------�

16 -

h1-hat 1

14.3 EXTENSIONS

14.3.1 The GARCH Model

The GARCH model, or generalized ARCH model, allows capturing long lagged effects with few
parameters. The general GARCH(p,q) model hasp lagged h terms andq lagged e2 terms. The
conditional variance function of a GARCH1
( ,1) model is given by :

(148
. )
where a1 + /31 < 1.

The returns to shares in our BrightenY ourDayLighting example have been reestimated under the
new GARCH(l,1) model:
ft= 1.049 (14.9a)

- -
ht= 0.401 + 0.492e�zt-l + 0.238ht-l (149
. b)

We use the estimated GARCH(l,1) model to forecast next period's return Tt+l and the
conditional volatility ht+l:
Tt+l= /30 = 1.049 (14.lOa)

In cells Kl:NS enter the following labels, values and formulas. In the last column, you will find
the numbers of the equations used, if any. Note that for our first ht+l value, there is no ht value
available, hence the shortened version of equation (14.1Ob) in cell N3.
350 Chapter 14

K L M N
1 GARCH(l)
2 Po-hat= 1.049 ht+i-hat
3 o-hat= 0.401 =$L$3+$L$4*((A2-$L$2)1'2) (14.JOb)
4 a1-hat= 0.492 =$L$3+$L$4*((A3-$L$2)"2)+$L$5*N3 (14.JOb)
5 lh-hat= 0.238

Copy the content of cell N4 to cells NS:NSOl. Here is how your table should look (only the first
five values are shown below):
r.=======::::======
::;::: :::::;:==:;:::==
:: ::::::rn
-
K L M L N I
_J_ GARCH(1,1)
2 �-hat• 1 049 ht+-1-hal
-
3 5-hat = 0,401 0_942397
� -·-

4 ot1-hat � 0.492 1.426595


5 fl:i-hat � 0.238 CL785355
6 CL589483
7 I 1 0 171 97

Select the Insert tab located next to the Home tab. Select N2:N501. In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.

After editing, the result is (see also Figure 14.7b p. 527 in Principles ofEconometrics, 4e):
.... ..

GARCH(l,1)
10

15

ll

h,·h<>t
8_

,-o
J"�M-u.Jll.JJJ.w,
0 1'()0 20Q JOO
iw
40-0
J.l.
500

: - .... L-

14.3.2 The T-GARCH Model

In the T-GARCH version of the model, the specification of the conditional variance is:

(14.11)
Time-Varying Volatility and ARCH Models 351

The returns to shares in our BrightenYourDayLighting example have been re-estimated with a
T-GARCH(l,1) specification:
ft= 0.994 (14.12a)

� �

ht= 0.356 + 0.263etz-i + 0.492dt_1etz-i + 0.287ht-i


� �
(14.12b)

We use the estimated T-GARCH(l,1) model to forecast the conditional volatility.

In cells Pl:U6 enter the following labels, values and formulas. In the last column, you will find
the numbers of the equations used, if any. Note that for our first ht value, there is no ht-i value
available, hence the shortened version of equation (14.12b) in cell U3.

p Q R s
1 T-GARCH(l,1) et
2 fl0-hat = 0.994 =A2-$Q$2
3 o-hat= 0.356
4 ai-hat= 0.263
5 y-hat = 0.492
6 fl1-hat = 0.287

T u
1 dt
" "" " ht-hat
2 =IF(S2<0, 1 , 0 )
3 =$Q$3+$Q$4*(S2/\2)+$Q$5*T2*(S2/\2) (14.12b)
4 =$Q$3+$Q$4*(S3/\2)+$Q$5*T3*(S3/\2)+$Q$6*U3 (14.12b)

Copy the content of cells S2:T2 to cells S3:T501 and copy the content of cell U4 to cells
U5:U501. Here is how your table should look (only the first five values are shown below):

p I Q R I s T I u
1 T-GARCH(1 1')1 e1 d!
2 !\.·hat= 0_994 -0994 1 h1-hai
-

3 ll·nat = 0_355 -1 22119 1 1 101967


4 a.L·nat = 0.263 0.356843 0 1-7982[15
-

5 v·hat = 0.492 0.11155� 0 0.905575



6 P.t·hat = 0 287 -0_9285 1 0.6191 3
1 0.687189 0 1.134599
= - -

Select the Insert Home tab. Select U2:U501.


tab located next to the In the Charts group of
commands select Scatter, and Scatter with Smooth Lines.

�1 0
s�rttor
1"

. OlM �r
Charts­
lli
352 Chapter 14

After editing, the result is (see also Figure 14.7d p. 527 in Principles ofEconometrics, 4e):

l· -GARCH{l,1)
14

12

,.&JJ�Jllii J
.s
h,-hilt
6

{)

0 100 20!J 300 4DO 500

14.3.3 The GARCH-ln-Mean Model


The aim of a GARCH-in-mean model is to use risk to explain returns. The equations of a
T-GARCH-in-mean model are shown below:

Yt =
/30 + 8ht +et (14.13a)

etllt_1-N(O, ht) (14.13b)

(14.13c)

The returns to shares in our BrightenYourDayLighting example have been reestimated as a


T-GARCH-in-mean model. The results are:

ft = 0.818 + 0.196ht (14.14a)

� �

ht = 0.370 + 0.295et2-l + 0.321dt-let2-l + 0.278ht-l


-- --
(14.14b)

We use the estimated GARCH-in-mean model to forecast conditional return and volatility.

In cells Wl:AC7 enter the following labels, values and formulas. In the last row, you will find
the numbers of the equations used, if any. Note that for our first ht value, there is no ht-l value
available, hence the shortened version of equation (14.14b) in cell AC3.
Time-Varying Volatility and ARCH Models 353

AA
1 GARCH-in-mean et dt
2 IJo-hat= 0.818 =A2-$X$2 =IF(Z2<0,"1","0")
3 w 0= 0.196
x y =A3-$X$2-z$X$3*AC3
4 <>-hat= 0.370
5 ai-hat= 0.295
6 y-hat= 0.321
7 P1-hat= 0.278

AB AC
1
2 E(rt) ht-hat
3 =$X$2+$X$3*AC3 =$X$4+$X$5*(Z2/\2)+$X$6*AA2*(Z2/\2)
4 =$X$4+$X$5*(Z3/\2)+$X$6*AA3*(Z3/\2)+$X$7*AC3
(14.14a) (14.14b)

Copy the content of cell Z3 to cells Z4:Z501, copy the content of cell AA2 to cells AA3:AA501,
copy the content of cell AB3 to cells AB4:AB501, and copy the content of cell AC4 to cells
AC5:AC501. Here is how your table should look (only the first five values are shown below):

GARCH-in-mean
1\,-hat = 0_8113' -0.818' 1 E{rt) h1-hat
·8;= 0 .196 � -1.1985 1 0.9 1307 0.782181
5-ha� =· 0.37 0244278 0 1.106565 1.47227j
-
w
·O'�·hat =
I x I
0295:
y I z
Ii '1313.68
I AA I AB I AC .1
1 ·e1 dt
v-hat = 0.32'1 -0.86944 1 0.934939
2
-
P?-hat = 0278. 0.666892 1.014297
-
J -

4
-

5 0 0 . 97419 1 0.7 9589� ;


Insert tab
Select the 6
located next to the Home tab. Select AB2:AB501.0.596628
--'-'--
_
In the; Charts group of --

!
-

commands select Scatter,


7 and Scatter with Smooth Lines. 0 1.00. 1513.

Column Line Pie Bar Area .Scatter other


Charts·
Chart� �

---
��

After editing, the result is (see also Figure 14.7e p. 527 in Principles ofEconometrics, 4e):
1\;

354 Chapter 14

G.ARCH-in-Mean: Ef r11:J

IE� r,).

" _::l)
-
4
3.5
5 1:QD 1:.:0 O<O 25'() 3 D 350 400 45'J 500
3
2.5
l
1.5
l

Select cells AC2:AC501 and


0.5
plot a Scatter with Smooth Lines again.
9
0
After editing, the result is (see also Figure 14.7f p. 527
t
in Principles ofEconometrics, 4e):

.. .... ..

GARCH-h:i-mean: ht
2()

hr.

16
1
50 150 25 ] 450

12
t

��vLlMu.�J��
:1
8

0 I I ' ' I I I

.o lO::J 200 300 35.(J 400 SOD

: - - - - - �
CHAPTER 15

Panel Data Models

Chapter Outline
15.1 Pooled Least Squares Estimates of Wage 15.3.1 Testing for Random Effects
Equation 15.3.2 Random Effects Estimation of the Wage
15.2The Fixed Effects Model Equation
15.2.1 Estimates of Wage Equation for Small N 15.4Sets of Regression Equations
15.2.1 a The Least Squares Dummy 15.4.1 Estimation: Equal Coefficients, Equal
Variable Estimator for Small N Error Variances
15.2.1 b The Fixed Effects Estimator: 15.4.2 Estimation: Different Coefficients, Equal
Estimates of Wage Equation for Error Variances
N = 10 15.4.3 Estimation: Different Coefficients,
15.2.2 Fixed Effects Estimates of Wage Different Error Variances
Equation from Complete Panel 15.4.4 Seemingly Unrelated Regressions:
15.3 The Random Effects Model Testing for Contemporaneous Correlation

15.1 POOLED LEAST SQUARES ESTIMATES OF WAGE EQUATION

Open the Excel file nls_panel. Save your file as POE Chapter 15. Rename sheet 1 nls panel
data. The nls panel data contains information on a sample of N = 716 women who were
interviewed over T = 5 years: 1982, 1983, 1985, 1987 and 1988.

We consider the following wage equation:

f31 + f3 EDUCi + {33EXPERit + {34EXPERft + {35TENUREit


ln(WAGE) it 2 � (15.1)
{36TENUREit + {37BLACKi + {38SOUTHit
=

+ + {39UNIONit + eit

where EDUC EXPER measures total labor force experience, and its
measures years of education.
square is measured by EXPER2. TENURE measures tenure in current job, and its square is
measured by TENURE2. BLACK, SOUTH and UNION are indicator variables.

In cells Tl:AB2, enter the following labels and formulas.

355
356 Chapter 15

T u v w x y z AA AB
1 lnwa2e educ exper exper2 tenure tenure2 black south Union
2 =C2 =F2 =02 =P2 =Q2 =R2 =M2 =L2 =N2

Copy the content of cells T2:AB2 to cells T3:AB3581. Here is how your table should look (only
the first five values are shown below):

I T I u I v I w I x I y I z I AA I AB
1 lnwage 1 educ exper � ex.per� tenure tenure2 black south union
'2
--
1.80828.9 12 7.666657 58.77717 7.65'6·6'61 58.Tl777 1 0 1
3 1.863417 1-S_ 8.5S.J333 73,67361 S..58333.3
H67JG1 1 0 1
'4 1.789%7 12 10.1790 i03.622] 1-833333 3.361111 ·1

it
0
,_L 1-84�53 12 12:.17949 1 48.339·9 3.75 14.0625 0 1
G 1.856449 12 13_62'1791 185 . .55.33 5.25 Z.7.1i625 B 1

In the Regression dialog box, the Input Y Range should be Tl:T3581, and the Input X Range
should be Ul:AB3581. Check the box next to Labels and Residuals. Select New Worksheet Ply
and name it Pooled LS Wage Equation. Finally select OK.

r - ----- - • -· �

: Regr ession f'?lrg)


Input
Input ]'.Range: I $T$1 :$T$3581 @!ij
Input 6. Rarlge: I $U$1 :$AB$3581 �
tie\p
� !.obels D Constant is ;;::ero
D Corl[jdence level: �%
Output options

0 Q.Litput R.:inge: I 'R.estrldoo Model'" l�J


@New Wort;r.eet EJy: I L5 Wage Equ.:itionl [
0 New Y[orkbook

[t] B.esiduals D Resigual Plots

The result is (see also Table 15.2 p. 543 in Principles ofEconometrics, 4e):
Panel Data Models 357

A I D E f G
J]suMMARY PUTPUT
2 �������

3 Regression Slatis/ics
___!_,Multif!leR 0_57050_B5_2
+
5 R Square (1_325585903
:::C M.il!sted R Square 0�3?40750�
_I_iSt_aITTdar�_Error 0.�81�7492
iJ OW.errot,ion'S :J!iQ()

1il 'ANOVA
11 df SS MS F S1gnmcance F
_E_ Re�re-ssiori 8 2:51. 5350441, 31.44188051 215.49'5803 1.1658E-29ll
�Res. idcual 3571 521.02611311 0 .14590483 9'
14 TolaH 3579 772.5612252
15
1� -����������������������������
Coofffcren·ts StaJtdard Error f Stat P-va/11e lowet 95% �
�PP�e_r 95_ _%

_Lo _�_ °'
1r_9_�-
" -�
� pp
U� r 9_5_ _0%
·_e_ __
17 lmerceP't 0.47&500026 a.ose1ssas 8.48 094854 3.06:UE-t7 0.366499268 0.58670()784 ()_35649:9268 0.586700764
�educ r 0_071448792 O•_O 026893-�2 26.56689212 4_5664-E-142 OJJSG 175893 0_076721691 0.05617'5893
� - .
0_076721691
- '

__i:w exper 0.055685059 a.oosso116 6 .46962067 6 1.1161l6E-10 0.038ll09G16 0.072560501 0_038809616 0•.012560501
JQ.i_e xper2 -0_0,(}11_47538 ·o_OOOJ61287 -3.176250626 0_0015D4632 -0.001855887 - 0 0004 3 91 8 8 �G.. 0()':1 8�887
. -0_00tl4391�8
�tenyrri'l 0.00069531[ O.QQ6318981
_

0.01496QQ11 11.004407276 3.394389488 0.023601(142 0.0(163118981 0'.0236011)42


22' :tenuie2 -0_000486042 0.00025710<4L -1.886045622
. 0.059369878 --0.000991304 1-92203E-05 -0. 0•00991304 1. 9220JE--05
23'blaick,
. . � . - -

-tL 11 ()71386·7 ()i_0157151l95 -7 .426485425 1_3874E-13 -0. 147526899 --0.085'300835 -0.'147526399 -O.OB5goo835
24 i soutn -0_ 106<002565 0 _014200329 . 7 .464533577' 1.04458E-13 -0.133845-J 15 -O.G781£M16 -0.133845115 -0_078160016
25-, unncm IL 132·243201 (k014961M8 8 83883633°J 1-48971E-18 0_ 102909047 0_1161577355 ()_ 1()290'9047 0.151577155
26
27
2'8
2:9. RESIDlJAlOUTPUT
3-0 ------

1 __ O_ b_s_fJtv<
. 3_ _a_fi_
o n
__ P
_redic_
_ t e_riJn w�
_ _ a9�
e _ _
Res_ _ _ a_ts
i riu _

Ej
332 1.79'5108'91 CL01311l0()9
2 1-83.5533305 Oi.0:27863695

34 3 1.823243235 -
0· 033876235
_

15.2 THE FIXED EFFECTS MODEL

15.2.1 Estimates of Wage Equation for Small N

15.2.la The Least Squares Dummy Variable Estimator for Small N

Again, we consider a wage equation model, but this time we are working with N = 10 women,
over the same period of T = 5 years. Furthermore, we assume that all behavioral differences
between individuals and over time are captured by the intercept. Assuming equal variances of the
error terms across individuals, this model can follow the dummy variable format of (15. 2):

ln(WAGE)it = P1,1D1i + P1,2D2i + + P1,10D10i + P2 EXPERit


· · ·

+ {J3EXPERft + fJ4TENUREit + {J5TENUREi� (15.2)


+ {J6UN/0Nit + eit

where Dki• k = 1, ...,10 are 10 dummy variables defined as: Dki =


{� i =k
otherwise·

In cells AD1:AR2, enter the following labels and formulas:


358 Chapter 15

AD AE AF AG AH
1 dl d2 d3 d4 d5
2 =IF(A2=1,1,0) =IF(A2=2,1,0) =IF(A2=3,1,0) =IF(A2=4,1,0) =IF(A2=5,1,0)

AI AJ AK AL AM
1 d6 d7 d8 d9 dlO
2 =IF(A2=6,1,0) =IF(A2=7, 1,0) =IF(A2=8,1,0) =IF(A2=9,1,0) =IF(A2=10,1,0)

AN AO AP AQ AR
1 exper exper2 tenure tenure2 union
2 =02 =P2 =Q2 =R2 =N2

Below cells ADl:AMl, we assign values to the dummy variables. In cell AD2, Excel is
instructed to look at the value contained in cell A2. If this value is equal to 1, i.e. if we are
looking at information regarding individual 1, then the dummy variable dl is assigned the value
1, and 0 otherwise. In cell AE2, Excel is again instructed to look at the value contained in cell
A2. This time, if this value is equal to 2, i.e. if we are looking at information regarding individual
2, then the dummy variable d2 is assigned the value 1, and 0 otherwise. l's and O's are assigned
similarly to dummy variables d3-dl0 (for more details on the IF function, see Section 3.1.4e).

Copy the content of cells AD2:AR2 to cells AD3:AR51. Here is how your table should look
(only the first five values are shown below):

,_
AD � AF I AG I AH Al ,tU AK AL AM AN AO� AO l AR
I dl dol d:J, d4 d'.i id5 dT 116 d!! d10- eKper exper2 _ tenure renurei Uf'lllO·ll
-2- I c 0 c � I() 0 0 0 � 7-666667 fJ.B. nm 7-66&667 sa.1m1· 1
] 1 u c 0 m l{l c 0 Q )
( !Ui83:l33 7167361 8 Eill:HB 13 67361 1
4 I ll 0 0 � () Q 0 0 � 10. 1794�· 103.622 1.83'3''.r.33 3.361111 I
� 1· I) 0 0 m 0 0 0 0 0 12.1784'l> 148.3399 3.75 14-.06-15 1
6 -
1 Ii 0 D 0 0 c D 0 ii 13_62179• 185_5533 5.25 'J.7%25 1

In the Regression dialog box, the Input Y Range should be Cl:CSl, and the Input X Range
should be AD1:AR51. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it LS Dummy Variable Wage Equation. Finally select OK.

------- --..··--

Regression f1Jrg)
lnp•Jt·
OK[0.J
Input 1 Range: $C$1 :$C$51 �
Input::\. Range: I $AD$1 :$AR$51 [�]
Cancel
)
[�]�abels � Coristant is·f_ero
ljelp
I
D ConEidence Level: �%
Output option>
0 Q.utput Range:. �1
@ Ne'f!i Worksheet �ly: I riable Wage Equation! I

The result is (see also Table 15.3 p. 545 in Principles ofEconometrics, 4e):
Panel Data Models 359

A B c D E F G H
.SUflo1MARY OUTPLJT
1
L_
___ ____.!
Regressio,, Sl.atlstics
Multiple R 0.994G17953
·
.5 R.Square 0.989264872
6 R Squai:,e
AdJu�,ted o:§ssJ9i:fa3
7 iStandard Error 0.2'7.605J302
6 ;·OJ}servatrQflS 50
'9
1o 'ANOVA
11 df Sq MS F SCgnific.anoe ,c

l� Re�re::;sion 15 .24.5.78722:7 15.3858'1 5 13 215.021633.5 3.64051E-29-


13 Residual 35 2.66718990:3 0,07;6205426
14 JTotal 50 243.45441 6.9
15 I:���������������������������������������
l6_1 Coeffl,;lents ·Slanda�d Erro.r i Slai P-v�fore ld�·e.r95% Upper95% Lrn11er 95. 0% Upper95.0%
17 lnte-rcept � #NIA #N/A #NIA #N/A #NJA #NIA #NIA
1�<d 1 0.151905392 1.09�745764 0-138505565 0 . 89(}6347'19 - 2 . 0 746 0 6865 2-3 784 17'649 -2.0746068()5 2.378417649
1;) ·.c12 0. 1 86894325 1.071484875 0-17442'.5 5 37 0.862536102 -1-988335601 2.362124252 -1. 9383356Cl 1 2.3�2124.252·
26-i�3 -0·. 0 6304245 t J 50'9170·7
"
-0.046 6664 1 0.9630443&5 -2. 6 0554 93.BS 2.6.794G49Sl -2.805549888 2.679464987
21Jd4 0.185625866 1 ..3434983 6 8 0.138166052 0.89090105 �2.54162080S 2:�13072537 -2:.541820806 2.91'3072537
22 b d5 O.�l'3898El547[ 1.09nB0248 0.8 5 535 0·193 0.3981749'9·7 -1.28%25824 3.1675989 18 -1.28%25824 3 . 1 67 598918
23"1.ci6 0.794484529 1.111771459 0.714611373 0.479'583711 -1 . 4625 3 1 5 0 9• · 3.0515005·67 -1.462:531509 3.05150056:7
24 'd7 0.5'81198·127 1.235 914274 0.470258124 .0.641087'87 -1 92784 0623 3.090.238078 -1 92J840623 .;l 0902:38078
25 iid8 0.5.37924881 1.097'498134 - 0.49.013 74 0.62709,52.2'.6 -1.6901-14757 2'..76596453 -1.690114767 2.75596453
2·6 d9 o� 4 1 a334011 1 . oa4.04aas1 0.38569-9592 0. 701906832 -1.782402'1 54 2.619070176 -1.782402154 2. 61907017'6·
� d10 O.G1455 7 865 1.090:176696 0.56372-3.172 0.576·536971 -1..5986'18475 2.627734205 -1.5980.18475 2.a2n:M205.
2 8 i expe: r o_;?37�9�541 o:rnnsG613 1.2.67590'7'19 0.2'13 312844 -0.143167646 0.6.19164727 -0.14316764·6 . '
0.6'19164727
- .

2'.91 �xp�r2 -0.008188168 0.0079 04 819 -1.03534 50 53 0.3073735&6 -


0 . 02423 5 804 0.00 7 859463 -0.02423 5304 0.0078 59;1.68,
'
30 tenwm -0.01235005 0.03414332:5 -0.3617119 98 a.1rn141905 -0 .'DB1664GB4 0.056964583 -0.0816646 84 0.056:964583
31 tenure2 0.0022%•15 0.002688457 0.8:54077425 0.398870183 -
0 . 0 0 3 1 6 1 707 0.007754003
- ·· - .. -
-0.00316170:7 0.007754001J
32 .union 0.113543476 0.15086:2342 o.75 2G2'.T1 8 s 0.45670644 -0. 192724374 0.419811327 -0. t92T24374 0-4198·113'.il'l

We test the following hypothesis using an F-test:

(15.3)
H1: the /31,i are not all equal

Our unrestricted model is equation (15.2). In the restricted model, equation (15.4) below, all the
intercept parameters are equal.

ln(WAGE)it = /31 + /32EXPERit + /33EXPERft + /34TENUREit


(15.4)
+ {35TENURE� + {36UN/0Nit + eit

Go back to your nls panel data worksheet.

In the Regression dialog box, the Input Y Range should be Cl:C51, and the Input X Range
should be AN1:AR51. Check the box next to Labels. Uncheck the box next to Constant is Zero.
Select New Worksheet Ply and name it Restricted Model. Finally select OK.
360 Chapter 15

. ����������������� - --

! Regressio n �rg}
Input
1$C$1:$C$51
OKtsJ
Input 1 Range:
[�]
I $AN$ :$AR$5
Cance·1
l
Input·;>; Range: l l

!;_!elp
�Labels D Constant is ;::.ero
.
D Con [idence Level: �%
Output options

Q.Qutput Range: �1
© Nel/>J Worksh_eet !!II': I Restr ic ted Model I

The result of this pooled regression model 1s (see also Table 15.4 p. 546 in Principles of
Econometrics, 4e):

-
A B I c I D I E F I G I H I I
-}�SUMMARY OUTPUT j_
I

f-
3 I Regression Slalfst.i<:s

,_4_ Mulfir,leR 0 . 4585742·6 7


·I
.5 R Square o.21 a29035s i
6
r---
A'\lj usted R Square 0.120550626
I
7 Standard Error 0353632649
a 0 bs·ervati ems 50
I

�ANOVA,
f-1. 1 df SS MS F SignniGam:;e F
i

i
t2 Regressior1 5 1.46524182 0.29304fr364 2:34333615 0.056S9i'119
13 Residual 44 5.502466225 (}_ 1 25056051
i
11 Total 49 sAGnoso45 I
15
16 Coefficierrls Sf:at'ldard Error i Stal P-va/ue Lower 95% Ueeer95% tower 95.0% Upper95'0%
f-
JL ln�ercept 0.6208522·56 1.01720872 O.G·
· 1034B942 0.544770491 -1.429197176
- - 2.6709017·09
. -1.4-29197176.: 2.67090•1709
J._8_ �per 0.1947492.26 Q.173043985 '1 .12543'1931 0<..26650765
.
-0_ 1539·98005 0.543496456 -0 153993005 0.543496455
0 00_70i'•i 1_2 -0.'.687&822�� 0 �.0 1 9 1 21 698
-

19 !'X:fier2 -_ 0 . 004864747 ()_<¢9!52�22& -0.019121698 0.00939,2205 - O_OQ93922Q5


2Q tenure -o OCJ136GG43 !l.03 7500654
- -
0.0364431!! 0.971093824
- - -0: 01421oss9 a 076944245 -0.0742J0959 O.O>i'G944245
21 l0enurce2
1� -
-O.D008692.fi9 0.00,24
3 30:84 -0,170993529 0.712422671 -o_ooss91<1<1s· o_rio38s2901� -0.00559144-{) !LO OJ 8519(fl
22 unlorr -0.017541686 0.10·24353-83 -0.1712463$ 0.864lJ 1517� -0.223986&32 0.18890•3259 -0 223 986'632 o.188903-259

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it F-test.
ltiJ:{
I I l�srect Wor�heet (Sh·in�rnH
I ,--\
L--../
f Hestl/ti ,.1]
_

In it copy the F-test template you created in Chapter 6.

Replace the references to [POE Chapter 6.xlsx]Unrestricted Model by LS Dummy Variable


Wage Equation, and delete all other references to [POE Chapter 6.xlsx]. Also, in cell B2,
change the notation for the total number of observations from N to NT, which corresponds to the
number of cross-sectional units (N) times the number of time periods (T) of the panel data.
Finally, since our LS Dummy Variable Wage Equation model does not have an intercept, the
total number of parameters K in the model corresponds to the regression degrees of freedom in
the summary output: delete the +1 in cell C3 of your template.
Panel Data Models 361

A B c
1 Data Input J=
2 NT = ='LS Dummy Variable Wage Equation'!B8
3 K= ='LS Dummy Variable Wage Equation'!B12
4 SSEu= ='LS Dummy Variable Wage Equatioin'!C13
5 SSER = ='Restricted Model'!C13
6 a=
8 Computed Values m1= =Cl
9 m2= =C2-C3
10 Fe= =FINV(C6,C8,C9)
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")

With 9 joint null hypotheses, at a - 0.05, the results of the F-test are (see also p. 546 of
Principles ofEconometrics, 4e):
A B c
·-
B Computed Values m -
9
A B I c
9 m ;:
35

I
1 Data Input J= 9
--
10 Fe= 2.160829
_?_ N= 50
11
3 -K= 15
- -
12 F-test F-stati:stic =_, 4.133967
4 SSEu.= 2_66
· 719
- n Conclusion= Reject Ho
5 S.S.ER = 5_502466 14 e_-11aiue = Q_Off1084
-

6 a= 0_05 15 Conclusion= Reject Ho

15.2.lh The Fixed Effects Estimator: Estimates of Wage Equation for N = 10

We consider the following wage equation:

(15.5)

where variables are in deviation from the mean form.

Go back to your nls panel data worksheet where we will first transform our data in deviation
from the mean form.

In cells ATl :AY2, enter the following labels and formulas.

AT AU AV AW AX AY
1 lnwa2e exper exper2 tenure tenure2 union
2 =C2 =02 =P2 =Q2 =R2 =N2

Copy the content of cells AT2:AY2 to cells AT3:AY51.

The result is:


362 Chapter 15

AT I AU I AV AW I AX I AY
1 lnwag17 e x p el exper2 tenure tenure'2' union

--
2 1.80B2:89 7.G66:!fo7 58.n"in ;' 1.666667 58.inn l
-3 1.863417 8.583333
"
73.67361 8.583333
"
7"3.67361 1
-4. 1.78936_7 10.17 949 10:3.fi,22 1.833333 3.35111,1 1

--
5 1.64>£}53' 12:17949 14'8.3399,. .3.75 14.0&25 1"
6i 1.8.56449 '13.62179 195,_5533 r 525 2'1.56,25 1

In cells BA1:BA6 and BBl:BFl, enter the following labels and formulas.

BA
1 lnwa2ebar
2 =AVERAGE(AT2:AT6)
3 =AVERAGE(AT2:AT6)
4 =AVERAGE(AT2:AT6)
5 =AVERAGE(AT2:AT6)
6 =AVERAGE(AT2:AT6)

BB BC BD
experbar exper2bar tenurebar

BE BF
tenure2bar unionbar

Copy the content of cells BA2:BA6 to cells BB2:BF6.

The result is: -


I BA I BB BC BO BE I BF
1 lm�agebar exp_erb_ar ,exp·er2bar ten�rebar tenureZba_� unionbar
2 1.8328104 1 0.446:15 113.99332 5.4166(}66 .35-4874978 1
3 1.8328104 10.44&15 113.99332 5.4166666 35.48749'78 1
4

1.8328104. 10.44&15 -
113.99332 -
5.4156666 3 5 .4874 978 - - 1
5 1.ll 328104 10.44,&15 113.99332 5.4166666 .35.4874978 1

1-6 1.8328104 10.446.15 113.99332 5.4166666 35.48749'78 1

Select cells BA2:BF6, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell BF51.

- - - - -
BA BB BC BO BE BF
1 lnwao:ebar ex1Jerbar .exver2bar tenur·ebar lenure2:b.ar unionbar
2 1.8328104 10.4461!5 113.99332 5:.4166666· 3!5.4874978 1
3 1.8328104 10.446.15 113.99332 5.416666() 35.4874978 1
4 1.8328104 10.44·61
· 5 11.3.99.332 5.4166666. 35 .4874978 1
5 1.8�28104 10 ..44•615 113.,99.332 !j,_416666& 35.4874978 1
10.44'6; 15
6
7 I
1.8328104 113.99332 5.4166666 35.4874978

Here is how your table should look (only the last five values are shown below) :
Panel Data Models 363

BA BB BC BO BE BF
47' 227£607 1'3. Ht10'2 179.6.3768- 2.1333.334 s._6472� 0
48: 227·66(}7 13.19,10•2 179. &37'68' 2. 13:33334 s:.647222 0
49 2..27'66(}7 13.19<102 u·9_,s;37as. 2_ 1333334 8,_64722:2 0
50 2:27'6.607 13.19,10'2 179.'!)3768 2.B33334 8.647222 0
51 2.2.7,6607 13.1SW2 179..,6876& 2.1.333334 K647'222 0
--, •

In cells BH1:BM2, enter the following labels and formula.

BH BI BJ BK BL BM
1 lnwaged experd exper2d tenured tenure2d uniond
2 =AT2-BA2

Copy the content of cell BH2 to cells BI2:BM2 and then copy the content of cells BH2:BM2 to
cells BH3:BM51. Here is how your table should look (only the first five values are shown
below):
I BH I Bl I BJ I BK I BL I BM
1 lnwagecl1 experd exper2di tenured tenure2d union cl
,_
2 -0.02452 -2.77949 - 55 . 2155, 2.25 23.2902'7 O'
3- 0.030607 -1.86282 -40.3197 3.166666 38.18611 0
4 -0.04344 -0.26666 -10.3713 -3.583.33 -32.1264 0
5 0.01372 1.7'33336 34_34,559, -'1.66667 -.21.42'5 °'
6 0.023639 3_175636 71..5599
· 8 -0,_16667 -7.925 0

In the Regression dialog box, the Input Y Range should be BH1:BH51, and the Input X Range
should be BI1:BM51. Check the boxes next to Labels and Constant is Zero. Select New
Worksheet Ply and name it Fixed Effects Wage Equation. Finally select OK.

,..- - -= - -· -=--·

• Reg r essi on r7Jrg]


I nput
Inp1Jt 'i Range: I $BH$1 :$BH$51 � Cancel
Inp1Jt K Range: $BI$i :$BM$51 �
t:!elp
�Labels �Constant is ;:;ero

D Con[idence level: �%
Output options·

0 Qutput R.:ino;ie: �1
® Newworksheet E'JY: I :FFects W.:ige Equation I

The result is (see also Table 15.6 p.549 in Principles ofEconometrics, 4e):
364 Chapter 15

I I I I I
m SUMMARY OUTPUT
A B c I} E F G H I

f
f--
�:i I R�ress1of1 Stalishos
4 Multiple R 0-470143855
5 R Square_ 02210'35244

� Adjusted R Squar� o. 1 295717 1 1


7 St�ndard Error (]243456129
f--
8 ObseNations 50
9
1o ANOVA
11 I df SS MS F Siqn ific.arice F
J_� 'Regrnssion I 5 0. 756328814 0_151365753 2.553796157 0.041012102
1 3 Residual 45 2�66 7 1'89903 0. 059270 887
14 fotal .50 3.4240•1871 Ei
1!i I
16, Coe fficien is S/ar:rcf'arcl Error f Sia! P-valt.te Lower.95% Uooer95% L9vre-r .95. o.g.i; Uooer 95.0%
JI_ Intercept -
0 #NIA �N!� #N/6 #NIA #NIA #NIA
-
IJN/A
-0.095508312 a s115os393 -o 09550 a:l12

f--
-� -

.

1B ex:perd 0.2379-98541 o. Hi558 5759 1.4J7312ns 0.15754!i605 0 571505393


-1 174537889 0.24ii35779f
f--
1 9 ex:per,2,d -fl.008188163 D 006971395
. -0.0222:29278 0Jl058529-43 -0 .Q222?9�78 o·.oo 5 85294 3
t-20 te ured
n -0_01235l'l05 0.030111582 -0.410142854 ·a.5s.364733°i -0.0729.97889 0.048297T88 -0.07W97889 0·.048:297788 I
2T t-enure2d .
0_00229615 0. 0023709% 0.%8432772 0.3311004£99 -0.0024 79281 0.007071581 -0.0024 79281 0.00,7071591
22 unkmd 0_ 11354 3476 o.1 J:ga4 8521 0. 853399012 0.397957987 -0_154429997 - 0:3.s151&9s -0 .15442999 7 0-3-81515951

Note that the least squares residuals from (15.5), SSE= 2.66719, are the same as the least
squares residuals from (15.2). Furthermore, the least squares estimates of the f3k parameters from
(15.5), shown in the above table, are identical to the least squares estimates from the dummy

- -
variable fixed effects model (15.2) shown in Section 15.2. The standard errors of those
coefficients estimates are slightly different though. This is because the estimate of the error
SSE SSE
-
vanance
"2
above uses ae R N
W O G NT-K

-- "2
when 1- t should use ae C RRECT
O NT-N-K
. The ca1cu1at1- 0n•

ofBi R N ignores the loss of N = 10 degrees of freedom from correcting the variables by their
.W O G
sample means. So, if we multiply the standard errors estimates of the coefficients, from the above

table, by the correction factor �. the resulting standard errors will be correct and identical
��
to those obtained in Section 15.2:

isSE) �INT-=-K )
(Be,WRONG ) x
(� ( � �) = �� x

( �

(15.6)

= ��
� =Be C RRECT
. O

With N = 10, T = 5 and K 5, the correction factor for the standard error estimates from the
table above is:

(15.7)

We use (15.7) below.

Select cells D16:D22, right click and select Insert in the menu of options that pops up. In the
Insert window, select Shift cells right and then OK.
Panel Data Models 365

Insert �(8]
rnsErt

@[f;F;i_�-�fli°i�h�:
0 Shi A: cells down
0Entirto[OW
jnsert.. . 0 'Entire 1;olumn

I
Dele:te...

Oear l"i:o_nt,;nts
1=---�0=K-1r:;;J I Can c:el

In your new cells D16:D18, enter the following label and formula:

D
16 Correct SE
17
18 =SQRT(45/35)*C18

Copy the content of cell D18 to cell D19:D22. The result is:

A I B I c I D
16
--
Coefficients Standard Error CfJrrei;t SE
17 Jnbercept 0 #N/A
I
13 experd 0.2379-9-8541 0.165585769 0 18175'66'13 '
0.00(}971395 0:0()79048'19 !
-

j_�exper2d -().068188.1&8 � -- -

20 tenured -0_0-123 500.5 0_030 1 11582


-

21 tMure2'.d 0.002'29615 0.002370996 . �������:���� i


-22 Ul'liQnd 0.1135434761 0:133048521 0.150$.61642]

Note that the subsequent !-statistics, p-values and confidence intervals estimates would need
correction as well.

15.2.2 Fixed Effects Estimates of Wage Equation from Complete Panel

We consider the following wage equation:

z
-- -- --

f32EXPERit + {33EXPERit + {34TENUREit


In(WAGE)it z
-- -- --
(15.8)
=
f3s
+ TENUREit + {35SOUTHit + {37UNIONit + eit

where variables are in deviation from the mean form.

Go back to your nls panel data worksheet where we will transform our data in deviation from the
mean form.

In cells BOl :BU2, enter the following labels and formulas.

BO BP BQ BR BS BT BU
1 lnwa2e exper exper2 tenure tenure2 south union
2 =C2 =02 =P2 =Q2 =R2 =L2 =N2

Copy the content of cells B02:BU2 to cells BU3:BU3581.


366 Chapter 15

The result is:


BO I ElP I BQ I ElR I ElS I ElT I BU
3577 1.&Q.9438 12.4359 154.6516 3.083333 9·.506944 1 0
3578 1.459441 i3_435,9 180.5233 4.083333 15.6.736'.I 1 0
3579 1.427116 15.4359 2'38.26&9 6.083333' 3-7.00695 '!_, 0
'3580
�-
1.49·4'.36·8 17.4359 304.0105 8.166667 ·66. 69445 1 0
3581 1.34-14.2:2 1-8�8589•7 355.6609 9'.583313i 911.84027 1 0

In cells BW1:BW6 and BXl:CCl, enter the following labels and formulas.

BW
1 lnwa2ebar
2 =AVERAGE(B02: B06)
3 =AVERAGE(B02: B06)
4 =AVERAGE( B02: B06)
5 =AVERAGE( B02: B06)
6 =AVERAGE( B02: B06)

BX BY BZ
experbar exper2bar tenurebar

CA CB cc
1 tenure2bar southbar unionbar

Copy the content of cells BW2:BW6 to cells BX2:CC6.

The result is:


BW
� --�- ����-���-��������������� .
BX BY BZ CA CB cc
1 lnwagebar experbar exper2bar tenurebar tenure2bar souihbar unionbar
T u320fo4 10.44615 11fss33z 5.4166666 3s:4e14sT· 6 o - · 1
-
3 1.8328104 .
10.44615 '113.99332' 5.4166666 3.5.48749'78 0 1
4 1.B32:B104 1 0 _ 446 15 1n_9g.332 5.4166666 35.48749'78 a
5 1.832:8104 "10.44615 -113.99:332 5.41666-66 35..48749118 Q
5- 1 _ s 32s 1 04 10.44615 1n_9g.332 5.4166666 35,_497497�· a,

Select cells BW2:CC6, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell CC3581.

B:W BX BY BZ CA CB cc
1 , lnwa11ebar experbar exper2b.ar tenure bar' tenum2bar southbar unio-nbar
2 1.832810'4 10.44515 11'3.9i933'.2 5.416-6666 35.4874978 0 1
3 1.832810-4 10.44·6:15 113.9'9332 5.4166666 35.4874978 0 1
4 1.83.28104 10.441&15 113.9-9332 5.416&666 35.4874H78 0 1
5 1.8326104 10.44615 11�.9·933-2 5.41·66666 3-5.4874978 0 1
6 1.83.26104 10.44-615 113.9·9332 5.4166'66:6_ 35.4674918 0
.., I I I 1+

Here is how your table should look (only the last five values are shown below) :
Panel Data Models 367

BW BX BY Bl CA cs cc
3577 1-4663.57 15J�20.51 246.'62264 6_ 1'999•9 98 44.3444452 1 0
a57B 1-466357 15_52051 246_.62264 6.1999998 44.344445.2 � 0
3579 1.466357 15._52051 246_,62264 6:.t99·99"98 44.34444.52 , 0
3580 1.46u357 15.52051 24Edi2264 H.199-9'99·8 44.3444452 1 0
3581 1_4663.5 7 15Ji2051 246"'62264 6c1999'99'8 44.34.M452 1 0

In cells CE1:CK2, enter the following labels and formula.

CE CF CG CH CI CJ CK
1 Lnwa2ed experd exper2d tenured tenure2d southd uniond
2 =B02-BW2

Copy the content of cell CE2 to cells CF2:CK2 and then copy the content of cells CE2:CK2 to
cells CE3: CK3581. Here is how your table should look (only the first five values are shown
below):
CE I CF CG I CH I Cl I CJ I CK
3577 0_ 14308'1 -3.0841).1 -.91.97
· 1 -3_11667 �34J3375 0 0
"-
3578 -0..00692
t-·-·-
-2. 0.8'46;1 -66.0993 -2.'1166.7 -27.67
° 08 0 0
3 5 791 -0_03924 -0.138461 -8.35573 - 0 1 1 6 6,7
_ • -7.337.5 - 0 - 0
,_
3580 0.028011 1.915385 57_38786: 1.966667· 22.35 0 0
,_
3581 -0.12494 3338457 109.0383 3_383333- 47.49•583 0 D
·- -

In the Regression dialog box, the Input Y Range should be CE1:CE3581, and the Input X
Range should be CF1:CK3581. Check the boxes next to Labels and Constant is Zero. Select
New Worksheet Ply and name it Fixed Effects Wage Equation All. Finally select OK.

. ------ ------ -- -··

Regression rr1] [8]


Input
OK
lnput :!'.Ran ge: $CE$! :$CE$3581

Inpu� 3. Range: I $CF.$! :$CK$J581 [�]


�Labels � E:onstant is 1;.ero [ tie Ip ]
D Con[idence Lel:'el: �%
Output options�

0 Qutput Range: 1�1


0 Nel"I Workshee
. t Ely: I W age Equation All I

The result is (see also Table 15.7 p. 350 in Principles ofEconometrics, 4e):
368 Chapter 15


A B I c I D I E I F I G I H I
_J_ SUMMARY OUTPLJT

-
2
3
-

Regres.sion Statistics
l
....L Multiple R 0. 378119 0 87
5 R Square
-�
0.142974044
_L Adju_sted R Square 0.14149527'2

L Standard Error 0. 1 744 7540§


8 ob"sar./atio�s 3580
.9
10 MOVA
11 I df SS MS F SionifiGance F
12 Re!'.]ression
-
6 1!l.15 040072' 3 . 025066'787 gg.:_37257027 6.026E-116'
J_� Residual 3574 1 as. T9852t2 0.030441668
14 TofaJ 3580 126.948922
15 1
161 Coefficients St� rrriarrJ &ror I Stal P-'Jalue Lower 95% Ue£!.er9o% C.ower95.0% Upper95.0%
17 Intercept 0 #NIA. #NIA #NIA #NfA #1\JfA #NJA #NfA
-
18 exper� 0.041083173 0.005919878 6.939868298 4.64003E-12 0.02947'6495 (J.052689851 0. 0294 7 6495 0 .052 689851
19 exper2d -0 ..000409052 0.000_24442_5 - 1 673 52 5 6 33
. q.094.311373 -o. aoosa a·219 7 0175E-05 -0. 00088827� 7 oi7SE-05
l G-192 i
--

_1Q__ tanure.d 0.013908943 0.002931175 4.745f 2'.1 G4B1 E-06 0.008161999 0.019655887 0.008161999 0 ()'196"5588
l1 tenur11.ld -0.. 0 0089622'.7 0•.0 001 84088 -4_868462'4'.33 1.17J38E-06 -0 001257155 -0.000535298 -0.001257155 -0.000535298
.

2i southd -0.016322397
• 0 032'325859, -0.504933.f'.31 ri _513:5,35935 -0. ()79701378 o.o41dsss84. -o-cmiioms 0 .04 7056584
23 unio.nd 0.053697234 0.01274631 4.997307672 S.092413E-07 0.038706463 0.088588005 CJ.03870£463 O.OB858B006

With N = 716, T = 5 degrees of freedom and K 6, the correction factor for the standard
error estimates from the table above is:

3580 - 6 j3574
(15.9)
3580 - 716 - 6 2858
=

We use (15.9) below.

Select cells D16:D23, right click and select Insert in the menu of options that pops up. In the
Insert window, select Shift cells right and then OK.

,,... -- - - .

Insert L:f] (g]


Insert

®[sh1tf�1�ifohf'
Q Shifil cells Q.o•.m
Q-Entire cow
!n>ert... --� QEntire £Olumn

I �ear �[
Del,,..te ...
OK Cancel
.Contents

In your new cells D16:D23, enter the following label and formula:

D
16 Correct SE
17
18 =SQRT(3 574/2858)*C18

Copy the content of cell Dl8 to cell Dl9:D23. The result is:
Panel Data Models 369

1�
A I B I c I D. I
16
--
Coefficients Standard Error Correct SE
i7 Intercept --
a #N/A
,_
1B experd 0_ 041083173 0_005919878 0 _ 00()620014
�exper2d -0_000409052 0•_000244425 0_000273333
- -
� tenured 0.01390894 3 0.00·2931175 0. 00 32778.41
�tenure2d -0.000895227 0_000184088 0·_000205861 . --�
� .

22 i S O Ulh d -0_016]22397 0_0323:2'5859 0_036·148.99.5


23[uniond 0 _ 06369'7234 0_ 0 1 274631 0_0142538

Note that the subsequent t-statistics, p-values and confidence intervals estimates would need
correction as well.

Next, we test the following hypothesis using an F-test:

Ho: fli ' 1 = fli 2 ' = . . . = /31' N


(15.10)
H1: the {31,i are not all equal
where N =
716.

Our unrestricted model is equation (15.8). In the restricted model, equation (15.11) below, all the
intercept parameters are equal.

/31 + f32EXPERit + f33EXPERft + f34TENUREit


ln(WAGE) it (15.11)
+{35TENURE� + {36SOUTHit + {37UN/0Nit + eit
=

Go back to your nls panel data worksheet.

In the Regression dialog box, the Input Y Range should be B01:B03581, and the Input X
Range should be BP1:BU3581. Check the box next to Labels. Uncheck the box next to
Constant is Zero. Select Output Range and specify it to be Al in your Restricted Model
worksheet. Finally select OK.

lilput
Input1Range: 1$BO$l:$B0$3561 �
Cancel
Input :i R-:in9�: f$BP$1:$BLJ$3581 Ii]
.t:Jelp
� �abels D Constant r:i·;:;:ero
D ConEiclence Le\lel: EJ 0/o

Output options;

0:Q.i.Jtput Range: lj Model'!$A$1 Ii]

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.
370 Chapter 15

, ·�����

/./ik rnsolt Office Excel

? RegreS!lio;n -Dwtput ramge wm overwrite ex\sfing data. Press. OK to .overwrite dara in r<1nge

OK j:;J [ On� ] [ Help ]

The result is:

A B I c I D I E I F I G I H I
SUMMARY OITTPLH
,__1_ I
2
3 Reqressioa Slfltislics
Multiple R 0_408141951
, _i__ j
,_i_ R Square D. 16957985?
§__ Adj'Usted R Square 0.165180322
7 StandaFd Error 0.424504153
'T Obs,ervations. 3580
I
9
'10 AN OVA
-
11 df SS MS F Sionificam;e f
I
12 Regression 6 128:.6931345 21.44885575 11 '.1-0255624 1.614'&E·137
,_ 1
i;!. Resid�al 35}3 643.8680907 0_1802(}3776
14 Total 3579 772.5612252
1.5
16 Coefficie11ts- Sfanddld Etror I Stal P-va/tte Lo�er95% Upper95% Lowedl5.0% U[!_e_er950%
17 ln!er�ept -- 1.2849303 75 0_052094637 24.665%998' 3:5454E-t.24 1-18279'2:166 ua1ow;ss i. 1827-9<2'166 1-3870&8585
Ta
-
exper 6.0783667.JB CL009511456 s-2391.94· 458 2A061�E�16 !L08971 B'3Q9 0_097015,166 IL059718309 OJisiiJ i 5166
19 expe�2 -0Jl02009946 0.000399'301 -5_033658213 5_04964E-07 -0_00279:?,&27 -0.0012.i rnss -0.00279'2:827 -0_001227065
20 ten ure 0_01206214.5 0_0048%715 2_463313468 0_013812755 0_00245'1507 0_0216&2:783 0_0024'& 150 7 0.0216&2:783

JI tenure2 -0_0002433�5 0_00028:&174 -0_85048033 0_395115'08 -0.00000#66 IL00031l695 -0_000004466 0_000317695


-0.195957455 -0 .22443291 -0.167481999 -0.22443291 -0.167481999
,__??__ south 0.014523639 -"13.4923' 1.07 1 . 676 1'6E-4 0
23 uriio.n 0.10977422 0_016358781 6-711646:54 7 2.22914E-11 o_omo6616 0_ 141_841823 0_077706616_ 0.141841823

In your F-test worksheet replace the reference LS Dummy Variable Wage Equation by Fixed
Effects Wage Equation All. Also, the denominator degrees of freedom, m2, in cell C9, needs to
be corrected to account for the loss of N = 716 degrees of freedom from correcting the
variables by their sample means.

A B c
1 Data Input J=
2 NT = ='Fixed Effects Wage Equation All'!B8
3 K= ='Fixed Effects Wage Equation All'!B12
4 SSEu= ='Fixed Effects Wage Equation All'!C13
5 SSER= ='Restricted Model'!C13
6 a=
8 Computed Values m1= =Cl
9 mz= =C2-C3-716
10 Fc = =FINV(C6,C8,C9)
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho'',"Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho'',"Do Not Reject Ho")

With 715 joint null hypotheses, at a - 0.01, the results of the F-test are:
Panel Data Models 371

A B I c
8 Com1mted Values m1= 715
A I B I c
9 m2= 2858
.J_ Data Input J= 715
JC!_ Fe= 1-144628
2 N= 3580
11
..]__ K= 6
12 F-tes.t F-statist.ic = 1�.·65819
4 SSEu= 108.79135 t---
13 Conclus[on = R�ject Ho�.
ti
,____
SSER.= 643.8681 14
,___
p-value = 0
� a= 0.01 15 l Conclusion"' R'E!jE!ct H[J

Thus we reject the null hypothesis of no fixed effect differences between these women; it is
proper to include individual effects in the model.

15.3 THE RANDOM EFFECTS MODEL

15.3.1 Testing for Random Effects

In the random effects model we again assume that all individual differences are captured by the
intercept parameters, but we also recognize that the individuals in our sample were randomly
selected, and thus we treat the individual differences as random rather than fixed, as we did in the
fixed effects dummy variable model.

Below we re-consider the wage equation of Section 15.2.2 and treat individual differences
between the 716 women as random effects:

(15.12)

where iJ1 is a fixed population parameter. vit eit + ui> where ui are random individual
=

differences or random effects. The component ui that is common to all time periods implies that
the errors vit are correlated over time for a given individual, but otherwise uncorrelated. The
correlation is given by:
cov(vit, vis) a �
p = corr(vit, vis) = =
2 2 (15.13)
.Jvar(vit)var(vis) au + ae

We test for the presence of random effects by testing the null hypothesis H0: a� = 0 against the
alternative hypothesis H1: a� > 0. If the null hypothesis is true, i.e. there are no random effects,
then the Lagrange multiplier test statistic (15.14) is distributed as a Xfi) random variable in large
samples:

(15.14)

where ei,t are estimated residuals from model (15.15) below-which (15.12) reduces to when
there is no need for a random effects model:

(15.15)
372 Chapter 15

You will recognize model (15.5) as model (15.1) from Section 15.1.

Go back to your nls panel data worksheet.

Insert four columns to the left of column C labeled lwage.

In your new cells Cl:F2, enter the following labels and formulas:

c D E F
1 e-hatit Ye-hatitovert (Ie-hatitovert)2 e-hat2it
2 ='Pooled LS Wage Equation'!C32 =SUM(C2:C6) =D2t\2 =C2t\2

Copy the content of cell C2 to cells C3:C3581 and the content of cell F2 to cells F3:F3581. Here
is how your table should look (only the first five values are shown below):

c I F

-
1 e-hatit e-harZit
2 O_OB18 0_000174

-
3 0_027884 !l_ooorni
4 -0_033:88 0_001148

-
5 -0.06Q,24. 0_003629

__.£._ -0_ 10381 0_01D7n

Select cells D2:E6, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell E3581.

D E
1 �e-hat;,ov.er, fre--hatrrovert�2
2 -0_ 15686.248 0 _·02460583 8
3
4
5
6
'7 I +

Here is how your table should look (only the last five values are shown below):

D I E ,

35Tl ..:2_1}7038349 4_2s64BnsJ:


3578
3579
3560
3-58:1
'1rnnJ +.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test.

- � I'< 111fl JI
I Insert Worlc:s11 eel [Shift-FU] I
q rlilgrange Multiplier T�/tJf
In it copy the simplified Lagrange multiplier test template you created in Chapter 9.
Panel Data Models 373

In cell B2, replace R2 by T. Delete the content of cells Cl:C2. In cell CS, we enter the formula
for the Lagrange multiplier statistic given by equation (15.14).

A B c
1 Data Input N=
2 T=
3 a=
4 m=
5
2
6 Computed x -critical value = =CHIINV(C3,C4)
Values
2
8 Lagrange x = =SQRT((Cl* C2)/(2*(C2-1)))*
Multiplier ((SUM('nls panel data'!E2:E3581)/
Test SUM('nls panel data'!F2:F3581))-l)
9 Conclusion = =IF(C8>=C6,"Reject Ho","Do Not Reject Ho")
10 p-value = =CHIDIST(C8,C4)
11 Conclusion = =IF(C10<=C3,"Reject Ho","Do Not Reject Ho")

At a = 0.05, with T = 5, N = 716 and m 1, the result of the test is (see also p. 556 in
Principles ofEconometrics, 4e) :

A I B I c
_1__ Data Input N = i'Hi
2
- T.: 5
i_ a= 0.05
4 m= 1
-
_i_
1i Computed Value!> f-critical vaJue = 3.841459
7
-
B L_agrnnge M_ulliplie:r Test x ·-
62.12314

-9 ConclLJsion = Retect Ho
10 p-v.iJue = 3.23E-15
1f "c�n�lus.io� = R(!J��l Ho

15.3.2 Random Effects Estimation of the Wage Equation

Estimation of the random effects model is done via generalized least squares (GLS). As was the
case when we had heteroskedasticity or autocorrelation, we obtain the GLS estimator in the
random effects model by applying least squares to a transformed model. The transformed model
is:

ln(WAGEri,t ii1X{,it + f32EDUCi� + {33EXPERit + {34EXPERit2 + f3sTENUREtt (15.16)


+ {36TENUREi/ + {37BLACKi� + {38SOUTHtt + {39UNIONi� + vit
=

where the transformed variables are: X{' it = 1 - a and Xit = Xit - aXi for all other variables.

The transformation parameter a is defined as:

(15.17)
374 Chapter 15

Least squares is applied to (15.16) with ai and a� replaced by Bi and B� in (15.17).

Below we first get estimates of Bi and B�.

The regression error variance ai comes from the fixed effects wage equation estimated in Section
15.2.2 and re-stated below:

f32EXPERtt + {33EXPER izt + {34TENUREtt


-- -- --

+f3sTENUREizt + {36SOUTHtt + {37UN/0Nit + eit


ln(WAGE)it = -- -- -- (15.18)

where variables are in deviation from the mean form.

The wage equation (15.18) is referred to as the deviation (DV) regression as it uses variables in
deviation from the mean form.

Re-consider the following correction factor, used in Section 15.2.2:

NT-Kvv
(15.19)
NT-N-Kvv

A consistent estimator of ai
is obtained by multiplying the estimate of the error variance from
(15.18) by the correction factor (15.19)-see also Appendix 15B p. 583 in Principles of
Econometrics, 4e):

8,DV,WRONG X
( NT-Kvv
NT-N-Kvv
) =
SSEvv
NT-Kvv x ( NT-Kvv
NT-N-Kvv
)
SSEvv
aeDV,CORRECT

=
NT_ N_ Kvv

where BeDV,WRONG is the estimated standard error of the regression, SSEvv are the least squares
residuals and Kvv is the number of parameters from model (15.18); Kvv = 6, because they are all
Ksl o p es·

716 5 6,
slope parameters. Kvv is also referred as

With , T and Kvv the correction factor is:

3580 - 6 3574
= = =

NT-Kvv
NT-N-Kvv 3580 -716 -6 J2858 = (15.20)

We use (15.20) below.

Go back to your Fixed Effects Wage Equation All worksheet.

In cells C6:C7, enter the following label and formula.


Panel Data Models 375

c
6 Correct SE
7 =SQRT(3574/2858)*B7

The result is (see also p. 557 in Principles of Econometrics, 4e):

-
6
T
c
Correct SE
·"-----"

O. I 9SI 10384

Next, we obtain an estimate of u� by getting the regression error variance of the following model:
- 2
{31 + {32EDUCi + {33EXPERi + {34EXPERi + {35TENUREi
ln(WAGE)i = + {36TENURE'f + {37BLACKi + {38SOUTHi + {39UN/0Ni (15.21)
+ui + ei

where xi are time-averaged observations: xi = Lf=1 xi,t and vi = ui + ei.

Equation (15.21) is referred to as the between estimator (BE) regression as it uses variation
between individuals as a basis for estimating the regression parameters.

The estimate of the error variance from (15.21) is given by:

-+ ei)
var (Ui =
;-u;
O"u + - T
_

-
SSEBE

N-KBE
_ ,_2
- O"v
(15.22)

where BJ is the mean square residual, SSE8E are the least squares residuals and K8E is the
number of parameters from model (15.21); K8E = 9, the intercept and 8 slope parameters.

With estimate (15.22) in hand, we can estimate u� as (see also Appendix 15B p. 583 in Principles
ofEconometrics, 4e):

(J."2
-
u -
a.
2-ui
u +
Bf -
-
SSE8E 1 ( SSEvv )
T-T N-KBE T NT-N- Kslopes
(15.23)

"2 1 "
O"v - O"eDV,CORRECT
T
where T = 5 years.

Go back to your nls panel data worksheet.

In cells CQl :CQ6 and CRl :CXl, enter the following labels and formulas.
376 Chapter 15

CQ
1 Lnwa�ebar
2 = AVERAGE(X2:X6)
3 = AVERAGE(X2:X6)
4 = AVERAGE(X2:X6)
5 = AVERAGE(X2:X6)
6 = AVERAGE(X2:X6)

CR cs CT cu
1 educbar experbar exper2bar tenurebar

CV cw ex CY
1 tenure2bar blackbar southbar unionbar

Copy the content of cells CQ2:CQ6 to cells CR2:CY6.

Here is how your table should look (only the first five values are shown below):

CQ I CR I cs I '
CT I cu I CV I cw ex I CY
1 l
lnwa�e,bar ·educbar exp�rbar exper�bar tenurebar tenur,e2bar blackbar southbar unionbar
2 1-8328104 12 10_44.£i15 ·113.99332 5_41B66.66
-- -- -
35_4874978 1 0 1
-- _,

3 1.8328104 12 10.44615 113.99332 5.4166666 35.4874978 1 0 1


it44615
- - - -·

4 1_83'28104 12 1 113.99332 5.4166666 35.4674976 1 0 1


5 1_8.328104 12
-
\0_44815 113.99332' 5_4 1 6- 6:§. 66 35.4874978 1 0 1
-

� 1.8.3281
• 04 12 10.446•15 11'3.9.9332 5.4166666 35.4674976 1 0. 1

Select cells CQ2:CY6, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell CY3581.

ca CR cs CT cu CV cw ex CY
1 lnwag·eb·ar edu·cbar ,experbar exper2bar t,e·nur·ebar tenure2:bar blackbar southbar unionbar
2' 1.8328104 12 10_44615. 113-991332 5_416666:6: J.5_4874978 1 0 1
3 1. 8 3281 04 12 10.44615, 1 '13.99332 5.41 GG66& 35.4'8.74978 1 0 1
4 1-8328104 12 10_44615 iu.sg.332 s"41Ei6666i 35-4874978 1 0 1
sJ 1.8328'104 '12 10_44615 113_99332 5_4166666:, 35_4874978 1 0 1
(j. 1.8328'1 04 12 1 0 .446 1 5· 1i3.99•332 5.415666&.; 35.48'74978 1 0
-,•I ' ' I �

Here is how your table should look (only the last five values are shown):

I
co CR cs CT cu CV cw ex CY -

3577 1.466357 12 15.52051 246.&2264 6.1999S9B 44.3444452 Q 1 0


357B 1.466357 12' 1 5 _ 5:2 0 5 1 246_132264 6.1999996 44.344445.2' Q 1 0
3579 1.466357 12' 15_52.05·1 246.6,2264 6.19999.98 44.3444452' 0 1 0
356'0 1.46·6357 12 15.52051 246.&2264 6.19999'96 44.3444452 0 1 0
3581 1-466357 12 15_5205' 1 246_&2264 '6.19999·98 44.3444452' 0 1
� .............. :
- - -
°+.

Select cells CQ1:CY3581. Right-click, select Copy.


Panel Data Models 377

� Cu�

� �op.y
£lo.st�
l1

Pii.ste �� e[ial ...

Place your cursor in cell DAl. Right-click, select Paste Special. In the Paste Special dialog box
that pops up, select Values. Finally, select OK.

l�L"'t llf--:.-..
. lf•T-�1 -

� Ou!
Paste
� £opy
Oe.JI
faste 0Eorrnulas
I
,_
l'aste �peci;>I ...
� ©[��-�.��! 11 OK
f;J
Here is how your table should look (only the first five values are shown below):

..
DA OS: DC QD DE DF DG DH DI
1 lnwagetiar ooucbar experhar exper2.har tenurebar tenure2bar hlackbar south bar union bar
-2 1 _83281 12 10.44615 1H9933 5.416667 35.4875 1 (} 1
3 1.83281 12 10.44615 113.9.!HJ 5_416-66"7 35.4875 1 °' 1
4 1 _83261 i2 10.'44615 113.99'33 5.416'661 35.46'75 i 0 1
5 1.83281 12 10.44615 113.99>33 5.41666·7 35.4875 1 0 1
& 1.63281 12 10.44615 1 t3.9�33 5.41666·7 35.4875 1 0 t

Go to the Insert tab, at the upper left comer of your screen. In the Tables group of commands,
select Table. A Create Table dialog box pops up. The data for your table are found in cells
DA1:DI3581. Select My table has headers. Finally, select OK.

Crea1e Table @ rg]


\"l.hE:re i> the·d;ita for your tablet

I !l!mmimmu [�1
� !;iy table has headers

....�..._ o_K _.f;JI


.. I Cancel

Here is how your table should look (only the first five values are shown):

2 1 � "'""""=-- ......,, ....�� -=• -..........


... ....
.., .. ........
..., ..,...
., ..,,.
. � --....-.� ........
. . � _...,....,.___,....__
., ��_,.,,�������1• u
J 1
4 1��=T�.--��ce=•· -=��� ��='='"=-- ="°' '"""=0" -o==�c=...��--c ... o.--��_,,;_.-��t

5 1
6 1

Table Tools and its Design tab show up. In the Tools group of commands of the Design tab,
select Remove Duplicates. This will delete duplicated rows across selected columns. All
columns of your table should be selected. If not, select the Select All button. Finally, select OK.
378 Chapter 15

' ����-
��� - -·

Remove Duplicates 11JIBJ


To-delete duplicate walues,·select one or me>re columns that contain duplicates.

Colurmns

� lnwagebar

0 ei.:perbar

0 ei.:per2bar

� renurebar
� Summartze wrth l'ivotlable � temire2loar
I �Ouplfcates � fl'l s ()Uthbar �I
�� Co1wertto Ra11geo OK Cancel l,
T1Ml5

Excel informs you that 2864 duplicate values were found and removed, and 716 unique values
remain. Those are the 716 time-averaged observations we need to run model (15.21). Select OK.

�- -�-

: Microsoft Office Excel E8J


i 2864.:fll!plicatevalues found and removed; 716unique11·alues:rer:nain.

Here is how your table should look (only the first five values are shown below):

1
2
3
4
:; 1:11 0 1
6 Q Q

Left-click anywhere in your table. Go back to the Tools group of commands, and select Convert
to Range. When asked, confirm that you do want to convert your table back to a normal range by
selecting YES.

Micmso� Office Exc;el l8J


� �umrnariZ.- with Piv11tTabt�

Remove Duplicate·> j Do you v;ant to convert the talile to a m:irmal range-?:


-
t�ifil Conv�rt to Range[i_l

Yes _lij I No
Too/3 -

Here is how your table should look (only the first five values are shown below):

DD
-
DE
exper2bar te11urebar

g 02
0 1
0 CJ
Panel Data Models 379

In the Regression dialog box, the Input Y Range should be DA1:DA717, and the Input X
Range should be DB1:DI717. Check the box next to Labels. Select New Worksheet Ply and
name it Between Wage Equation. Finally select OK .

.
Regression LI]l8}
lnpLit
Input '!'.Range:

Input 1" Range:


I$ DA$1 : $DA$717 [i"j
J$DB$1 :$Dl$717 [!i3J � I

t:!elp
0 Labels EJ
D Con:ltant is.f_ero

0 Con[idence Level: Q1�


<Xltput options

0 Q_utput _Range: 1�l


@ New Wotkshe.et i:_ly: J•n Wage Equation I
The result is:

A I B I c I D I E I F I G I H I I
1 SUMMARY OUTPUT
"'T
3 Reunmsion Stalisf.ir;s
1 Mult. iPo[ eR 0.6045-39637
-

� �.Square 0.3654i68173
I->� A_djust<><!� Square
���
0.35828818
Stand.ard Error 0.340422217
8 Ot>servations 716

i�ANOVA
11 df SS MS F Sif/nifimnce f
�1s Re<gression 8 _4] .1901497_6 5.8987687l 5Cl.90091369 5.4524)lE-65
13 Re-sldual 7fr7 81-93231089 0.115887285
14 Total 71 5 129.1224-606
15
15 Coefffoients Standard Error t Stat P-vafue 1-ower95% Uooer95% l O'Nft.f 95. 0% Uooer95.0%
JI_ lnt�-cei:it 0.4166�8577 0.135761818 -3.069261906- 0.002127968 0.1.50144()05 ll.6$3:23315 Q.150144005 0.68323315
e<Jucbar o_il'fon231f 0 _()05387'371 13i3670i48 i9s55a.E:-:is o:os01g.51s7 o_ o813494-78 o_o6- 0'195157 o_o-01349478
�1 9 e:xF"erbar 0.0661 !!202: 0.023455392 :2.822056097 Cl.0049056.78 0.020141874- ()_112242989 0.020141874 o_112242989
- -

20 e:xper2bar
e-
-0.001606476 o_oo-og.gg.826 - 1-606754494 (j_ -j 08554128 -0.003569!16 0.000356509 -0.00356!!46 0_000356509

,11. terJumbar 0.0�655!!044 0.01:2201636 1)57034_712: 0.175-2032_58 -


-0_00739>7734 0.04051382:2 -0_007397734 0.0'40513822

3?_ terJure2fair -0.000494785 0,0007028�8 -9-704�1177 0.481657249 -0_001874627 0_9oa88505_6 -o_op10_71j_6:?7 0_000885056

33 blackbar �0.1B3709i3B -0.183709738


' 2 _ 5-0Ut'hbar
-0.12155060'3 0_031550•135 -3.839231951 0.000134485 -0.05939146'8 -0.059391468
4 - 0 10531732:5
. 0.029100'454 -3Ji'19095569- D.000<316717 -0_ 1'62450975 -0.048183676 -0.16245()975 - 0 0 4 1! 1 � 6 75
. 6

Ts uriionbar o.1557.3549f 0.03.5460749 4-.391771186- 1.29639E-05 0.08611452:2 0.2253 564 i4. a.oa6114.s22 0_2:25356474

In cells D3:E3, enter the following label and formula.

D E
3 'Fixed Effects Wa

The result is (see also p. 557 in Principles ofEconometrics, 4e):

D I E I
a2 u-h at= n_ 108273673

We now have the ingredients we need for our transformation parameter a and thus for estimating
model (15.16).

Go back to your nls panel data worksheet.


380 Chapter 15

In cells DKl:DLl, enter the following label and formula.

DK DL
a= =1-('Fixed Effects Wage Equation All'!C7/
1 SQRT((5*'Between Wage Equation'!E3)
+('Fixed Effects Wage Equation All'!C7A2)))

The result is (see also p. 557 in Principles ofEconometrics, 4e):

DK DL
a; o,_743683

In cells DN1:DP2 and DQl:DWl, enter the following labels and formulas.

DN DO DP
1 lnwa2e* xl* educ*
2 =X2-$DL$1*CQ2 =l-$DL$1 =Y2-$DL$1*CR2

DQ DR DS DT DU DV DW
1 exper* exper2* tenure* tenure2* black* south* union*

Copy the content of cell DP2 to cells DQ2:DW2, and then copy the content of cells DN2:DW2
to cells DN3:DW3581.

Here is how your table should look (only the first five values are shown below):

:I -� I I
ON I DO DP I OQ I DR I OS I OT I DU I OV ow
1 lnwage• x1• educ• ex.per' exp-er2" tenure• tenure2• black• -South• uni-on•
2 0.445259' 0.266317 3-07580-5 -0_ 10196 -25_99'71 3_1i38'384 32;3?632' -
0_256317 0 0"256317
Q-_500�87 tj,2_5_6317
--

3 0256317 3-075805 0 814706


- -
-
1 1-1013 4.55505 47 2_�21& 02563-17 0
4 0.425337 0-256317 3.075805 -2.410863 13_84712 D.256317
-2_ 19495 -23_0303 0 0_256317
5

0.4835 0.256317 3-075805 4.410863, Kl.56502 -0.21828 --=-12_328� o-256317 0 0.256317
6 0.493419 0.255317 3.075805 s-.8s31 il:f 100�7784 12i111i 1 .171053 0.255317 0 0.256317

In the Regression dialog box, the Input Y Range should be DN1:DN3581, and the Input X
Range should be D01:DW3581. Check the boxes next to Labels and to Constant is Zero.
Select New Worksheet Ply and name it Random Effects Wage Equation. Finally select OK.

Input � Range: I$DN$1 :$DN$3531 [�]


cancel
Input c; Range: l>i:$DW$3561 �
tielp
� Label5 � Con5tant is lero
D Con[idence Level: E:joio
Output o ptloris

0 Q_utput Range: �I
@ lllew Worksheet ['_ly: jcts W'.lQ'-' Equation I

The result is (see also Table 15.9 p. 556 in Principles ofEconometrics, 4e):
Panel Data Models 381

A I B I c I D E F I G I H I I
1 SUMMARY OUTPUT
'2 t
3 Rearession Sfl!tislics
I- MullifJ.!.§. R
4 �
0_9 H1973 ?
_5_ R S!quare 0_868101666
jl_ Adjusted R -Square O. B6l 52614 t

I__]_ Standard Error 01.195501389


8 Observati o.n s 3580
9
'10 AN OVA
l1 df SS MS F Siqnifiecance F
12 Regre1s s ion 9 898.3:2:6·1827 9 9.81404252 2511.431402 Q-,
13 Residual 3571 13&.4 9064 0 9 0.038221966
14 Total 3580 1034.817024 I
15
16 Coefficients Standaro Error I Stal P-�alue Lower:-95% Upper95% Lower95.0% Upper 95.,0%
JI_ Intercept 0 #NIA #NIA #N/A #NIA #N/A #NIA #N/A
1-1,8 x1'
-
0.5339294 0. 0 7988278_7 6.683910526 2··6BBZ8E-11 O.�·fZ3'o�S:9134 0.6-9'054�9' 86.6 0.377308934 0 _690549866
, _11_ educ' 0�973253564 0_00-53-30-756 13)4)684,5? 653i74E-42 CL0,&2:8019•3-3- o)8JT0-51961 _0_062ao19� 3 0_0837051 9 6
20 ex pm• 01.043616994 o_ 00 &3 5-T5 B4 ·a_n4 75: 4E.-12
6.860623854 0.03· -1152:13 3 0.056-081854; 0.031152133 0.056081854
ex?-er2'
11 -0-.000560959 - 0.0002:6-2&08 2 136113712 O.OJ2J3l953
- . -0.00107'5-8;3-5 -4.G0837E-05 · -o .001075835 -4 .60837E-05
22 t·enure�
,__
0·.014154128 0.003-16&561 4.459874232 8.06%2E-06 0'60-7914-5679· 0_02·03szs.T-6 0.007945579 0_020362576
23' t-enure2• -0,_000755342 0_000119'4-72-6 -3.878999975 0.000106777 -0_0 0:1-1}-712& -0.000·3:-,:3-557' -0.001137128 -0.0003735'57
T4
1�
black' -0·.11673658 7 0.0302:�870.5 -3.864335975 a.OOOj 133?1 �
-0.175�64&35 �b.o-sr5o-8-5§s· -.Q.:175964635 -Q _057508539
25 south• -0_081 811 707 0.02241094 -3.650525 396 0.000265453 - 0 . 12'5'7'512:34' -O.O>J7B,T2'1 79 -0.125751234 -'0.037672179
z6f uni ; n• 0".080235324 o:o BZUZO 9 6.072356904 -1.-39249E-09 CL.O 54 32:9,129 0.10&14-1518 ti .054329129 0.106141518

15.4 SETS OF REGRESSION EQUATIONS

Open the Excel file grunfeld2. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 15 in one file, create a new worksheet in your POE
Chapter 15 Excel file, rename it grunfeld2 data, and in it, copy the data set you just opened.

I run,feld 2- data -ti


I i Inserrt Wmk1heet (Shi�-Fli)t

We consider a model for describing gross firm investment for N 2 firms, General Electric
(GE) and Westinghouse (WE), over a period ofT = 20 years.

15.4.1 Estimation: Equal Coefficients, Equal Error Variances


If we assume that these two firms have similar investment behavior over time, their investment
equation can be specified as:

(15.24)

where t = 1, ..., 20; i = GE or WE; and var ( ecE,t) = aJE = a&,E = var (ewE,t)·
I NVit denotes gross firm investment, for the ith firm in the tth period of time. Vit denotes the
stock market value of firm i at the beginning of year t, and is used as a proxy for expected profits.
Kit denotes the actual capital stock of firm i at the beginning of the year t, and is used as a proxy
for permanent desired capital stock.
382 Chapter 15

If, in addition, we assume the errors are uncorrelated, both over time for each firm and between
firms, then equation (15.24) can estimated with the General Electric and Westinghouse data
pooled together using the least squares regression technique, as in Section 15.1 for our wage
equation example.

In cells Gl:Il, enter the following labels.

G H I
inv v k

Copy the content of cells A2:C21 to cells G2:I21, and the content of cells D2:F21 to cells
G22:I41. Here is how your table should look (only the first five values are shown below):

I G I H I I
1 inv IJ k
--
2 33.1 117Q_1b 97_8
3 45 2015_8 104_4
,_
4 -
77_2 2803_3, 118
,_
44_6 2039_7 1 SK2'
I'+ 48.1 2256.2 17:2:.6;

In the Regression dialog box, the Input Y Range should be Gl:G41, and the Input X Range
should be Hl:L41. Check the box next to Labels. Select New Worksheet Ply and name it
Pooled LS Investment Model. Finally select OK.

,.
r Re g;e-ssi-c_n_________ rn �
1nput

I $G$1 :$G$4 l
OK�
Input 't_ Range:

I $H$1: $1$41
Cancel I
lflput � R.ange:

0 b_abels 0 Coristant is f_erc
.t!elp
l
0 Ccntiderice Level: �%
output options

0-QutP.ut Raoge: �1
€) New Worksheet �Jy: I L-5 ln\lestment_Mcdel
I

The result is (see also Table 15.11 p. 564 in Principles ofEconometrics, 4e):
Panel Data Models 383

A I B I c I D E F I G I H I I
SUMMARY OUTPUT l
t
�}j

+

i

3 I Re_cj:ressron S'fii'lislios
-
�· Multi'plEl R T
- '
0_1199873334
i
5 _R Squar� ()_6097720 t?

6 Adjus_ted R Sguare ()_ 799489423

L Stan cl arc! Error 21.1smo8 - - i


8 O'b-ser;ations 40
__..
i
1�01ANOVA I

11 1
i
df SS MS F Sig_nificil'nm F
12 R99N1ssim1 I 2 705()6_221 352-53_ 1105 78J5172745 4_()4.()63E-14

Jl Residual 37 0
1656J. ()28& 44l.64.872&5 i
14 Total T -39 87069.22383 I
1:5
16] Coefficients Stan rfard E.rror I Stat P-value_ lov.'r!'r.95% Upper95% Lower950% Upper 95.0%
lnterc'ept 17.87200128 7 .02:408050_7 2'_.5443°901.54 Q_.Q1525292'4 3.639'862407 32 1041401.5 3:639·862407 .32 10414015
JJ_
1B 'f ()_()15192638 0.0061�623� .2'_4519,13329 o_o 190.50853 0.0Cl2637B68 0_02774 740� 0_0021i Ji a 6 s -ii.027747409
119k () 143579159 0.01860()98& 7-718!Hl0416 3_ 1939'2E-09 0.105889.981 0_1B126833S 0.105889981 ()_ 181268336

15.4.2 Estimation: Different Coefficients, Equal Error Variances

Using the indicator (dummy) variable format from Chapter 7:

where (
var ecE,t ) = (
var ewE,t ) , and Di is a dummy variable equal to 1 for Westinghouse
observations and 0 for General Electric observations.

Equation (15.25) is estimated using the pooled set of General Electric and Westinghouse data.

Go back to your grunfeld2 data worksheet.

Insert a column to the left of the column labeled v and one to the left of the column labeled k.

In your new cells Hl:L2, enter the following labels and formulas for the dummy variables.

H
d

Enter value 0 in cells H2:H21, and value 1 in cells H22:H41. Copy the content of cell J2 to cells
J3:J41, and the content of cell L2 to cells L3:L41. Here is how your table should look (only the
first five values are shown below):

I
H I I J I K I L I
1 d v dlxv k dxk
f--
2
-
D 1170.5 o.. 97-8 0
3 0 2015_8· 0 104_4 0
f--
4 0 2803_3- 0 118 0
5-
-
D 2039.7 0 15&_2 0
6 D 2255.2 0 172.6 0
384 Chapter 15

In the Regression dialog box, the Input Y Range should be Gl:G41, and the Input X Range
should be Hl:L41. Check the box next to Labels. Select New Worksheet Ply and name it
Dummy Variable Model. Finally select OK.

... - -- - - -

! R�ression LZJ L8J


1i1p.lt
Iflputl Rafli;ie�

I:nput�.Rarige:
1�$:1:$G$4l
I �$l:;$LS41
[�

� 1

't:!elp•
�babels 0 Ci;mstan t 1� i:_ero
0 Confidence Level: t:=J %
OIJ tput options
0 Qutput Range: �1
@ New WorkBheet Bly·: I Dummy varfable Mod el
I

The result is (see also Table 15.1 2 p. 565 in Principles ofEconometrics, 4e):

A I B I c I D I E I F I G I H I I
SUMMARY OUTPUT
t
3 I Rearession Stati.st1:Y-s
4 Multif!le.R 0.909857235
0.8278401813
,--.L R Square-
6 .f><djusted R Square (J.802522568
7 St.a.ndard - - -Error 20_99707349--
B Obserwtions.. 40

-?o{,"\NOVA
11 I df SS MS F Sia.rr ifica nce F
12 Regre-ssion 5 TZO 79 .4 02·64 14415.88053 32.69818434' 4.6< 07E-12
--
13 Residual t ]4 14�B9.82123 440.8770951
14 Total 39 87069.22388
15
161 CcieffiGie11 Is Standard Error f Slat P-�alue: Lower : 95% Uooer95% L.ower95.0% UDOeI 95.0%
17 l nterce pt _
9 9563 0 8498
-
_ 2'3.. 626�·b432' -0.421406712' 0 . 6761' 104 5ti -57.9708573� 38.05824039 57 97085739 38.0·5824039
-
_

'1a d 9 44 6�2061 5
_ 2'8.80535028
. 0 .32795715.1 fr.744955154 -49Jl925.94 67_98643523 - -49_092594 67. 9'8643523
T9 v 0.026551169 0-.011722048 Z.2650b4 05 8 !Hl29996268 0-:-002729122 o.o5oi732s1 0.002729122 0.05-0'373257
'20 Q·X)/ 0.02634293 (}_034352'67'& CU66637B�1 0·_4484 70 172 -0'.04347·01 O& 0_()96155966 -0_043470106 O_O 96155966

7[ k
22 d:xk:
0. t5169Jii75
-0.05928736
o.619356449'
0.11694·6429'
1.8-3G"fi65rh6· 4.0157SE-09
-Q1_ 5 0 69616S3 0..6154540<;)4
1u{i:f5Ga:3.9' o.191030911
-0·_2969510·% 0_ 17'8376377 -0-296951096
o.11§s5839 0.19'fojos11
0.178:375377
- - - - -� - - -

15.4.3 Estimation: Different Coefficients, Different Error Variances

If we assume that these two firms have distinct investment behaviors, fixed over time, their
separate regressions can be specified as:

(15.26)

where t = 1, ..., 20; i = GE or WE; and var ( eGE,t ) =a E g * a a,E =var ewE,t ( )·

If, in addition, we assume there is no contemporaneous correlation, then equation (15.26) can be
estimated twice, first with General Electric data, and then with Westinghouse data, using the least
square regression technique. Equation (15.26) equivalently as the set of equations (15.2a) and
(15.2b):
Panel Data Models 385

(15.26a)

INVwE,t = /31,wE + /32,wEVwE,t + /33,wEKwE,t + ewE,t (15.26 b)

Note that the dummy variable format equation (15.25) becomes equations (15.27a) and (15.27b):

INVcE,t = /31,GE + /32,cEVcE,t + /33,GEKGE,t + ecE,t (15.27a)

INVwE,t = (/31,GE + 81) + (/32,GE + 82)VwE,t + (/33,GE + 83)KwE,t + ewE,t (15.27b)

The least squares estimates of f3k,GE from (15.26a) will be equal to the least squares estimates of

/3k,GE from (15.27a), k =


1, 2, 3. The least squares estimates of /3k,WE from (15.26b) will be
equal to the least squares estimates of (/3k,WE + ok) from (15.27b), k =
1, 2, 3. The difference
will be in the standard errors, due to the fact that model (15.26) allows for error variances that
differ for the two firms, while model (15 .25) assumes that the variance of the error term is
constant across firms.

Go back to your grunfeld2 data worksheet.

In the Regression dialog box, the Input Y Range should be Al:A21, and the Input X Range
should be Bl:C21. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it GE Investment Equation. Finally select OK.

- -

� Regression 11J�
Input
Input Y. Range:

Input?;_ Range:
$A$1:�$21

1$1l�i:�l

1�1
� I

t!elp
�i�.b..�.S.i D Constanus :feFO
D Con�dence Level: ECJ %
Output OptiOfl!i
0 Qutput'Range: �I
@ New Worksheet 8Jy� I esbnent Equation I
0 New �orkbook
-Residuals
�Residuals D ResiQual Plots

The result is (see also Table 15.13 p. 566 in Principles ofEconometrics, 4e):
386 Chapter 15

A I B I c I D I E I F I G I H I I
�SUMMARY OITTPUT

3 I Reqression Siatistic:s
Lj Mulliple R 0..!!3 982540:5
�R Square 1().70:530671
6 ,Adjusled R Square 0.670636911
__J�Standan:l Error 27.88212414
8 Observations I 20


10 1ANOVA t
11 I I .c/( SS MS f Sig_nifiarnw f
12 ! Regre.s sion 2 -��6_32.0322 _15816.(1161 20.3435478.3 3.0;Sn9E-05
R!i
1s I
Residual
-Total t 17
19
13:216.58719
44!148.6193;9
m.4463053

1� I CDefticienJs Slandani Etro.r t Stat P-va/ue Lawer9a% Upperil5% l..ower95.0% Upper 95.0%
17 , lntercepl - 9 .'956303498 3137424�3'7 -0.317340144 0.754849862 -76.151118584 5623756885. -76.15018584 56-2375688'5
!Tfiv ge
- � "!

-0. 006290419 0.1)593:927 97 -0. 006290419 0.0593�279'7


�k-�e I
0.·026:551189 0.015566104 �)0510:5521 0. 11>6265091
0.1511693375 o: 02570408"3 5.901547865 1.M206E-05 0.097463001 0.205924746 0•:097463001 0.205924'746
20
71
,___
I
22
T3 RESIDUAL OUTPLJiT
,_.__
24
25 I Qbsewation Predicted inv_gli Residuals
21J. 1 3.5.961()17445 -2.86017645 3
I-
'Zf 2 5 9.4024 2101 -14.4024210:1
�if 3 82.374518lJ7 -!i..-174-521973.

Go back to your grunfeld2 data worksheet.

In the Regression dialog box, the Input Y Range should be Dl:D21, and the Input X Range
should be El:F21. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it WE Investment Equation. Finally select OK.
r------=-==- - --

RegreSSiofl L[J �
Input
Input y Range.:

Iriput l!: Range:


I $D:$1:w 21
I $E $1::$F.$21
[�I
[tii]
� I

� b.abels 0 Coostant is Z,ero t!elp


0 Coojjden�Ei.eve'I: �%
Oulpuf optionfi

Q QutputRange: �1
0 Ne_w Worksheet B!Y� I ve.stme:nt Equation
I
0 New �orkbook-
Residuals
0 !?,esiduals. 0 .ResiQ_ual 'Plot->

The result is (see also Table 15.13 p. 566 in Principles ofEconometrics, 4e):
Panel Data Models 387

8_ I c I D I E I F I G H I
+fsoM"MAk�·aliTP ur
. �
Tl Reg_ression Statistics
4 I Multiple R Q_6621l129

�-IR :s·ql'.la-re 0.7444461 0· 1


_§__ �dj8ste-O_ R S_quare 0-714380936
7 Stand.ard Error 10.21J12317
8 Observations 2:0
9�
10 ANO VA
11 I df SS MS F Sirz.nifican ce F
_1_?_ Regression
-
2 5165.552838 2582.776419 24-.76108513 9.19605E-O&

� Resid ual 17 1773.Z34Q44 104.307685


14 Total 19 6938-786882
15
16 Coefflci!mts StanrJarrf Etror t Stat P-value Lower.95% Upper 95% C.ower95.0% ueeer95_0%
J 7 Jlntercept_ -0.5 0938788,3 6.015289229 -0.0635·52028 0_950068222 -17.4-20·16981 1_6 4_013,9404 -17.42016981 16.401 39404
.
J.!_ v_we. 0.05289412 0.015706502 3.36l657536 0.003654766 0.01975.6297 0.0.8603194:2 0.019756297 0. 086031942
.
19 k we 0.0924-0651 5 0_0!:>6098975 1.64720504-7 0_1 17874246 -0_025951'97& 0.. 210T6!iD06 -0.0:25951976 0:2107€5006:

lQ_
2'1


---

u
n RESIDUAL OUTPUT
-- ·--

2'4

-1§..I Observation Pe
r diclerJ inv we Residuals
25 i 9.786167743 J_14383
· 2257
-w 2 26.857903�3- -Q.95790303
"is- � - 38c.?��2�35 }-61!4_?3�!i - " �

Below we use the Goldfeld-Quandt test to test the null hypothesis H0: a EJ = a a,E.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Goldfeld-Quandt Test.

I Goldfeld-Ouandt fest /'D :.ill


I I Insert Wo rphe-et [Sllitt� Flll I

In it copy the Goldfeld-Quandt test template you created in Chapter 8.

Replace the following reference: [POE Chapter 8.xlsx]Subsample 1 Model by GE Investment


Equation, and replace: [POE Chapter 8.xlsx]Subsample 2 Model by WE Investment
Equation.

A B c
1 Data Input Ni= ='GE Investment Equation'!B8
2 Ki= ='GE Investment Equation'!B12+1
3 MS Residual 1 = ='GE Investment Equation'!D13
4 N2 = ='WE Investment Equation'!B8
5 K2= ='WE Investment Equation'!B12+1
6 MS Residual 2 = ='WE Investment Equation'!D13
7 a=
8
9 Computed mi= =Cl-C2
Values
10 m2= =C4-C5
11 F-statistic = =C3/C6
388 Chapter 15

A B c
12 Goldfeld-
Quandt test
13 Right-tail F=
c =FINV(C7,C9,C10)
14 Conclusion= =IF(Cll>=C13,"Reject Ho","Do Not Reject Ho")
15
16 Two-tail FL=
c =FINV(l-C7/2,C9,C10)
17 Fuc= =FINV(C7/2,C9,C10)
18 Conclusion= =IF(OR(Cl1<=C16,Cll>=Cl7),"Reject Ho",
"Do Not Reject Ho")

At a = 0.05, the result of the Goldfeld-Quandt test ts (see also p. 566 in Principles of
Econometrics, 4e):
A I 'B I c
9 Compute di Values m1= 17
-

-10 m2= 17
A I B I c
- -

11 F-slatistic = 7_45333

-
1 Data Input I N1 =I w T2 Goldfeld-Qu.andt test
-

2 K,= 3 13 Rig.ht-ta.ii Fe= 2.271893


--

J
-
MS. Residual 1 = 777.4463 14
-
Gon:e'l usion = R·eje-ct Ho
4 Ni= 20< 15
-
-

5 :J 15 Two-tail F1.c-= 0.374069


Kz = -

5- MS Residual 2 = 104.3079' 17 Fu-o = 2_5;].33


=·R-eject
-

7 a= 0.05 18 Conclusi-on Ho

15.4.4 Seemingly Unrelated Regressions: Testing for Contemporaneous


Correlation

Again, we consider a model for describing gross firm investment for General Electric (GE) and
Westinghouse (WE), over a period of T = 20 years. These two firms have distinct investment
behaviors that are fixed over time.

This time we assume that the variances of the error terms are different across firms, (15.28), and
the error terms across firms, at the same point in time, are correlated, (15.29):

(15.28)

(15.29)

Correlation like (15.29) is called contemporaneous correlation, and to be accounted for, a dummy
variable model, not separate investment equations, has to be estimated. As we saw in Section
15.4.2, a dummy variable model like (15.25) implies that (
var ecE t
, ) = var ewE t
,
( )· So, the
dummy variable model will have to (1) correct for the heteroskedasticity implied by (15.28) and
(2) account for the contemporaneous correlation between the errors of GE and WE implied by
(15.29). This is what a seemingly unrelated regressions (SUR) model does.
Panel Data Models 389

We would like to test whether or not <rcE,WE = 0 to determine if we need a SUR model. To carry
out such a test we compute the squared correlation:
2
<TcE,WE
rc2E,WE

- 2 2 (15.30)
<TcE<TWE
� �

where agE and a�E are the mean square residuals from the GE and WE investment equations.

The estimated covariance is computed from (see also p. 567 in Principles ofEconometrics, 4e):
20
8cE,WE =
1\ L ecE,tewE,t (15.31)
t=l
Go back to your grunfeld2 data worksheet.

In cells Nl :05, enter the following labels and formulas. In the last column, you will find the
numbers of the equations used, if any.

N 0
1 =(1/17)*SUMPRODUCT('GE Investment Equation'!C26:C45,
O'-batGE,WE ""
'WE Investment Equation'!C26:C45)
(15.31)
2
2 <J -hatGE =
='GE Investment Equation'!D13
2
3 u - hatwE-
- ='WE Investment Equation'!D13
2
4 r GE.WE-
- = (QI A2)/(02*03) (15.30)
5 rGEWE = =SQRT(04)

The result is (see also p. 569 in Principles ofEconometrics, 4e):

N I 0
1 a-h:a�Wi! - 207_5871

2 ai-ha�= 777.4463
,_
3 az·hatWE - 104-3079
r-
<
4 r G..w.> = 0.53139
I-
5, •Gl,W< - 0.728965
-

The correlation rcE WE'


= 0.729 indicates a strong contemporaneous correlation between errors of
the GE and WE investment equations. To check the statistical significance of rJE,WE• we can test
the null hypothesis H0: <TcE,WE = 0. If <TcE,WE = 0, then LM = TrJE,WE is a Lagrange multiplier
test statistic that is distributed as a Xfi) random variable in large samples.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it Lagrange Multiplier Test simple.

lb tr I .---1\
I!Insert Warksheet{Shilt-'-flll·I l____/

In it copy the simplified Lagrange multiplier test template you created in Chapter 9.
390 Chapter 15

In cell Bl, replace N by T. In cell B2, replace R2 by r2 GE,WE· Delete the content of cells Cl:C2.
In cell C2, we get the value of r2 GE,WE from our grunfeld2 data worksheet, as shown in the table
below.

A
1 Data T=
2
2 r ='grunfeld2 data'!04
3 a=
4 m=
5 B c

6 Input Values
Computed
2
. . 1 va1ue= =CHIINV(C3,C4)
x -cntlca
8 Test GE.WE=
= =Cl *C2
Conclusion= =IF(C8>=C6,"Reject Ho","Do Not

10 =
11 Conclusion= =IF(CIO<=C3,"Reject Ho","Do
r.!
La2ran2e Multiplier Not Reject Ho")
9
At a 0.05, with T
= 20 and m Reject
1, the result of the test Ho")also p. 569 in
is (see Principles of
Econometrics, 4e): p-value =CHIDIST(C8,C4)

1 Data Input T=
r2GE.wE 0_�3139
0_05
mr=

Computedl Values i-critic.a:I value = 3.841459


A I B I c

Lagrange Multiplier Test 20


10_,&278
-

2 Conclusion == I R_ejE!ct 1-!o


p-vaJ.ue
·-

.3 a:==
·-

-
4 Conclusion = Reject 1
Ho

.�
,5
·-

_]__
To implement the SUR estimation use one of the econometric software programs listed at
·- s. x2 =
www g.
·-

10 J!. OO�!_l 4
·-

·11

.principlesofeconometrics.com.
CHAPTER 16

Qualitative and Limited Dependent


Variable Models

CHAPTER OUTLINE
16.1 Least Squares Fitted Linear Probability Model 16.2.1 Censored Data
16.2 Limited Dependent Variables 16.2.2 Simulated Data

16.1 LEAST SQUARES FITTED LINEAR PROBABILITY MODEL

Open the Excel file transport. Save your file as POE Chapter 16. Rename sheet 1 transport
data.

We consider a model for explaining individuals' choices between driving (private transportation)
and taking the bus (public transportation) when commuting to work, assuming that these are the
only two alternatives:
Y = /31 + /3zX + e (16.1)

where the dependent variable y is a dummy variable representing an individual's choice:

=
{1 individual drives to work
(16.2)
y 0 individual takes bus to work

and the explanatory variable xis defined as:

x = (commuting time by bus - commuting time by car) (16.3)

If the probability that an individual drives to work is p, then P[y = 1] = p.


It follows that the
probability that a person uses public transportation is P[y O] 1 = = - p. The probability function
for such a binary random variable is:

f(y) = pY(l - p)1-Y, y = 0, 1 (16.4)

391
392 Chapter 16

where p is the probability that y takes the value 1. This discrete random variable has expected
value E[y] = p and variance var(y) = p(l - p).

A priori we expect that as x increases, and commuting time by bus increases relative to
commuting time by car, an individual would be more inclined to drive. That is, we expect a
positive relationship between x and p, the probability that an individual will drive to work:

E(y) = p = /31 + f32x (16.5)

In the Regression dialog box, the Input Y Range should be Dl:D22, and the Input X Range
should be Cl:C22. Check the boxes next to Labels and Residuals. Select New Worksheet Ply
and name it LS Linear Probability Model. Finally select OK.

Input
Input y: Ran,ge:

[�]
0 i,_abels 0 Constant is "?_ero
0 conBdenc:e Level: � %

Ootput options
0 Qulput Range: �1
0 New Wo�ksheet Ely·: j =-ar 'Probability Mod el I
0 New �orkbook
Residue.ls
0 B,esiduals

The result is:

A I B I c I I 'E I F I G H I
1 ,SUMMARY OUTPUT
.

2.
-�

J Reqressi'on Srali:sfiGs
4 MultipJe
R 0.781873104.
-� R S�q uare _o.61nzsss
-s;- AoJu steo- R Square D.5908'69
7' -Standard Error 0.327342874
T D b�ervati o n s . . 21

T
rJF SS MS F Signiticarme F
J.f.. Regression 1 .3 .202181 4 55 3 .202181455
,
29.8840983· 2.8.342E-05
J1 R·es.idual 1. 9 , 2.0359137841 0.107153 357'
14 Total 20« 5.2380 9523°8
15 I
1& I Coefficients Standard Error t Stal P-vaiue Lowef 95% Upoer 95% Cower 95. 0%, UoDer 95. 0%
0.4?47:95Q� 8 Q.0•7144;!41'1 6.78.5151347' U6499E� O � O.J.35249'732 0.634340404 0.33514973:2 0.163434(1404
0.0070309�2· 0 .001266;164 5.4666.35007 .2.S 34 2 E�05 0.0.04 3 390 1 9 0:009722955 o·.oo4:3J:9ofa o.o.0·97229-65

We are particularly interested in the predicted probabilities of automobile transportation being


chosen. As outlined below, by using least squares to estimate model (16.1), we obtain values of p
that are less than 0 or greater than 1-values that do not make sense as probabilities.
Qualitative and Limited Dependent Variable Models 393

j A I B I c
2..?;� RE S IDUAL OUTPUT
23
'---
24 Oose!lf.8tiorr Pf'Scfof.ed auto Resid11a.ls
25 1 0 1 437919 71
_ ' - 0 143791971
_

26 2 0.65&351265 �0.656351265
27 3 1.066951201 -0 0 5696 1 201
_
f---
28 4 0.311832&73' - 0 311 !!32673
_

19 5 0.262:615731 -·0.262·61-57.31
30 '6 1 1:24615311
_ - 0 1241515311
_

31 7 o_a5110s121 0 148B 90279


_
I-
J-2 8 -0.1-31622882 -
Q. 131822882
-
,_
33 9 0 3652& 8209
_ - 0 35526B209
_

34 10 0_ 12269899& - 0 122'59B996
_
f---
35
--
11 -0.152:915867 0.152915657
12 0 945.32'5 023 0 054,574977
.� _ · _

37
---
13 0_ 1 75431434 0_824568?5_6
J_! 14 0.435576126 -0.435578126
39 15 0.84759'42'25 0_ 152405775
f---
40 1'6 0_712'599213 02B740()7B7
f--.
41 17 0.050·2 79789' -0.05027(1769
42 18 0 723848785
_ , 0-276151215
f--'-
43 19 0_68095.9736 ()_3190402-54
44 20 -0.02:776424 0.0277>6424
45 21 0 835 41567
_ & 0 1'64358433
_

The underlying feature that causes this problem is that the linear probability model (16.1)
implicitly assumes that as x increases the probability of driving increases at a constant rate.
However, since 0 :::; p :::; 1, a constant rate of increase is impossible. To overcome this problem a
nonlinear probit or logit model must be used. These estimation options are available m

econometric software packages such as those listed at www.principlesofeconometrics.com.

16.2 LIMITED DEPENDENT VARIABLES

16.2.1 Censored Data

Open the Excel file mroz. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 16 in one file, create a new worksheet in your POE
Chapter 16 Excel file, rename it mroz data, and in it, copy the data set you just opened.

I mro:z darn t;i 11


I lberrt:Warkshetl(Shift�FUJ � I !I

To plot the histogram of the wife's hours of work in 1975 (hours), we proceed as we have done
previously in Chapters 4 and 14.

In cells AAl :AA3, enter the following label and values.

AA
1 BIN
2 0
3 200
394 Chapter 16

Select cells AA2:AA3, move your cursor to the lower right comer of your selection until it turns
into a skinny cross as shown below; left-click, hold it and drag it down to cell AA27: Excel
recognizes the series and automatically completes it for you.

I
AA
23 42M
,r AA I 24 440·0
1 BIN 2:5 4600 -
2 25 4800
i zo�! 21 5000+.
• I 'JO I

In the Histogram dialog box, the Input Range should be H2:H754, and the Bin Range should
be AA2:AA27. Check the New Worksheet Ply option and name it Censored Data Histogram;
check the box next to Chart Output. Finally, select OK.

. - -
1 Histogram
l1JLR!
Il'lput
InputRange� I $H$2:$H�754 [@] OKEJ
16in Range� I $AA�2:¥<A$27 [�I ·Cancel ]
t!elp l
Output optlons
0 QutpUtRange:
0 New Worbheet<P_ly: I ·ed Data Histogram! I
0 New '!'\l_of.kbook
D P.§_reto.(sorted histr.gram)
D C�m.ulative Percentage .
� QiartDutput

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap. In the Border Color tab, select Solid line, and
change the Color to black. Finally select Close.

-------�

Format Data Series

Qelete (
Series Opbo:ns 1 Series Options format IJrata Series
� Re�e-t to Mgtch :Sl)·le·
Ftll Series Q.verl.ap
Chan.!Jle S�ri;es
· Ch.art Type ... Series· Options Bord�r Color
Border Color Separated �
Si;_�e-rt Data .... Fill Q !:J.Eoline

Border S yles
3-D Botatron,, .Border CollDr @ �olid line
Sh.odow 0 �adient line
Ad dl Data La.]l:e·l:s Gap Width -- Border Styles
Format 0 A!,!tomatic
Addi Tren.drhn,e....
3·-0
I NoG-:p __
Shado\111

� format Daras,.,,;., .... 3'.-0'Form.:it

After editing, the result is (see Figure 16.3 p. 614 in Principles ofEconometrics, 4e):
Qualitative and Limited Dependent Variable Models 395

Censored Data Histogram

300

Ei"
"
2QO
Ill
"
...
, ...f

100

0 1000 woo 3000 40\)0 5000

Wife's lfoi.ars of Work.in 197 5

The histogram shows the large fraction of women who did not enter the labor force. This is an
example of censored data, meaning that a substantial fraction of the observations on the
dependent variable take a limit value-which is zero in the case of market hours worked by
married women.

16.2.2 Simulated Data

Consider the following index or latent variable model:

(16.6)

where xi are uniformly distributed over the interval [0,20] and ei are normally distributed with
mean 0 and standard deviation 4.

The observed Yi are defined as: Yi


{0 if Yi :5 0
(16.7)
=
Yt if Yi > 0

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Rename it simulated data.

I I sim ulated data .....'V A


I J ll'l�eEl: Wor�she•ttShift- fll J M I

In cells Al:E2 enter the following labels and formulas.

A B c D E
1 y* x e y E(y*)
2 =- 9+B2+C2 =IF(A2 <=0 , 0 , A2) =- 9+B2
396 Chapter 16

In column B we generate a sample of 200 random values that are uniformly distributed over the
interval [0,20] and in column C we generate a sample of 200 random values from a normal
distribution with mean 0 and standard deviation 4. We proceed as we have done before in
Chapters 14, 12 and 3.

We first use the Random Number Generation dialog box to generate our x values. We need to
generate one set of random numbers for our x values, so we specify 1 in the Number of
Variables window. We would like to generate 200 random numbers, so we specify 200 in the
Number of Random Numbers window. We select Uniform in the Distribution window; the
selected range should be Between 0 and 20. Select Output Range and specify it to be B2:B201.
Finally, we select OK.

!'lumber of�ariables:

NumberofR��dom NUJfll�rs:
�Iro_o
___

Qii:b'ibmfon.: j

u_
ni fo
_r_m _____ v�I I !::!elp
Paramel:Er"

'B.andoiin Seed:

output llptians
0 QutputRange:

Next, we use the Random Number Generation dialog box to generate our e values. We need to
generate one set of random numbers for our e values, so we specify 1 in the Number of
Variables window. We would like to generate 200 random numbers, so we specify 200 in the
Number of Random Numbers window. We select Normal in the Distribution window; the
selected Parameters should be Mean equal to 0, and Standard deviation equal to 4. Select
Output Range and specify it to be C2:C201. Finally, we select OK.

-
-rn
• Ra11dQm Numli>er Genercition �
Number of �ariatiles:
I�==�
i
Number·oflRandom 'Nlum!!.ers: �I20_0 _____,

Qisinbubon: jNormal t[elp

Parameh:rs

M�an=

��ndard devialic>n = �

Random Seed:

Output op tlons
(9) Quiput Range:

After you copy the content of cell A2 to cells A3:A201, and the content of cells D2:E2 to cells
D3:E201, here is how your table should look (only the first five values are shown below):
Qualitative and Limited Dependent Variable Models 397

A E3 c D E
I � y• -
x e y E{Y"I _

2 1-780206 8_4.65224 2_3149'82 1-780206 -0_ 534.7 8


3 11 _g.4g.71 15-713918 5-23:5725'
-
1 1 _94�71 6-7139·81
4 13-09'52 1420'209 7 _89'3104 13_0952 5-202094
5 -7_ 16142 5_.369'427 -3-5:3085 a -3_'53057
6 14_77688 19_95582 3_811065 14-77688 10_96582

Note: you will obtain different random samples than the ones we obtained for x and e, so your
y*, y and E(y*) values should also be different than the ones reported above.

Next, we plot the uncensored sample data and the latent regression function as we have done
before, in Chapter 2 for example. We choose a Scatter with only Markers chart type for the
uncensored sample data series, where the x-axis values are B2:B201 and the y-axis values are
A2:A201. The latent regression function is plotted using a Scatter with Smooth Lines chart
type, where the x-axis values are B2:B201 and the y-axis values are E2:E201.

The result is (see also Figure 16.4 p. 616 in Principles ofEconometrics, 4e):

y* and latent regression function

2CJ

10

-10

..

-20

I() 2 4 6 .8 :HJ 12 :l'.4 16 :ll8 2()

The latent or uncensored data y* are scattered along the latent regression function. If we observed
these data we could estimate the parameters using the least squares principle, by fitting a line
through the center of the data.

However, we do not observe all the latent data. What we can do is estimate the parameters of our
regression, using the least squares principles, by fitting a line through the center of the observed
or censored data-which is what we do next.

In the Regression dialog box, the Input Y Range should be Dl:D201, and the Input X Range
should be Bl:B201. Check the box next to Labels. Uncheck the box next to Residuals. Select
New Worksheet Ply and name it LS Fitted Censored Data Model. Finally select OK.
398 Chapter 16

lnptJt
1nput )'_ 'Raniie: I :$0$1: $1)$201 [� cancel
Input� Range: I :$6$1:$6$201 �
t!elp
�!,_ab!!ls D Cons.tantis l!!rO
D cinDdenae Level; 19: 1%
Oulput options

0 Qutput �ange·: �1
©New \�Jor'kslieet·BJy: j 1sored Data Model I
The result is:

-
A B I c a I E F GI I H I I I
1 SUMMA'RY OUTPUT
--
2
!
j Reqression Statistics
--
4 Multiple,R 0.74233 03'41

2- R Square 0 551054335
.. .
-,.
AdJ�·�te<l
-· ..

6 R $.quare 0.548786�33. l

,_
1 Standard Error 3., 67307796 1
8 Ol>servations 200,

,._.!_ j
10 AN OVA
11 df SS MS F Sfan.fffciil'nce F I
J_2_ Regression 1 2,43.8,071126
: 2438.0'71126 243.03.33267 2.S0475E-36
13 Residual rnr 1986.304 05 7 10_01133s67'
14 Total 199• 4424.375184 j
t5
I
�16 Coofftdents Standim:f Error /Stal P-l'a}ue Lowe195% Upper95% i..oww95.. 0% Uppe;�.0%,
�nterc ept -2.451600913: 0.46:355
· 9917 -5 28�638697 3.2464£iE-07 -3.3()5'7491_49 ·1.537452677 -3.3�5!�91149 -1.537452677
'
lK 0 .5 979245173: 0.03 8354.249 15.58952619 2Jf0475E-.36 0.522289325 0.6'73-55'9821 0.52':2289i325 0.673559821

The estimated regression function in the table above gives different parameters estimates than the
ones reported in equation (16.32a) on p. 616 in Principles ofEconometrics, 4e because it is based
on a different censored sample data. Your estimated regression function will also be different
than ours and that of Principles ofEconometrics, 4e for the same reason.

Go back to your simulated data worksheet.

In cells Fl :F2 enter the following label and formula.

F
1 LS Fitted
2 ='LS Fitted Censored Data Model'!$B$17+
'LS Fitted Censored Data Model'!$B$18*'simulated data'!B2

Copy the content of cell F2 to cells F3:F201.

After you copy the content of cell F2 to cells F3:F201, here is how your table should look (only
the first five values are shown below):
Qualitative and Limited Dependent Variable Models 399

F
1 LS Fittedf
2 2.60S965
3 S.944174
4 G.04018
5 o,_759.911
G 9.4815453

Note: since you are working with different random samples than the ones we are working with,
your LS fitted values should also be different than the ones reported above.

Next, we plot the censored sample data, its least squares fitted regression function, as well as the
latent regression function we plotted earlier. We choose a Scatter with only Markers chart type
for the censored sample data series, where the x-axis values are B2:B201 and the y-axis values
are D2:D201. The regression functions are plotted using a Scatter with Smooth Lines chart type.
For the least squares fitted regression function, based on the censored sample data, the x-axis
values are B2:B201 and the y-axis values are F2:F201. For the latent regression function, as
plotted earlier, the x-axis values are B2:B201 and the y-axis values are E2:E201.

The result is (see also Figure 16.5 p. 617 in Principles ofEconometrics, 4e):

y and latent & LS fitted regressions

- Ely*)
- Fitt e d LS .
..

..
.

-20 �-------�

0 2 4 B 10 12. 14 �6 18 20

Note that the least squares principle fails to estimate /Ji = -9 and {32 = 1 because the observed
data do not fall along the underlying regression function E(y*) = {31 + {32x = -9 + x.

Finally, we can estimate the parameters of our regression, using the least squares principles, by
fitting a line through the center of only the positive sample data-which is what we do next.

In cells H1:12 enter the following labels and formulas.

Copy the content of cells H2:12 to cells H3:1201.


400 Chapter 16

Copy the content of cells Hl:Il to cells Kl:Ll.

Next, select cells H2:I201. Right-click, select Copy. Place your cursor in cell K2. Right-click,
select Paste Special. In the Paste Special dialog box that pops up, select Values. Finally, select
OK.

�I Cut
�= --

.(;;opy
,� o�

I
['if!i ,Ee>ste
0 EormUJlas
P.a,ste �pe�i a,�,, .. 0[��'�'�'�j

Here is how your table should look (only the first five values are shown below):

K L J
1 v x

2 1_18(}205 8_4'65224
3 11-94971
- .... ...
15_713:98
4 13.Q9
. 52 14..20:209'
5 0 5.369421
G 14.776$8 1 '9 . 96582

Your cells K2:L201 should still be selected. If not, select them.

Select the Data tab in the middle of your tab list. In the Sort & Filter group of commands, select
the Sort Largest to Smallest option.

rAi'Z1
.II. I
z+ �
Su rt Filter
� Advanced
Sort&. Filter

Here is how your table should look (only the first five values are shown below):

·JI k L I
1 v )(

2 19.1556;3 18.06635
3 ·1.§.176?5 1 9.72:045
4 15.50975 15.Q�.]69
-
5 15. 0975 17.97295
G 14.77688 19.96582
-

In the Regression dialog box, select only your positive y-values in column K and their
corresponding x-values in column L. Our Input Y Range is Kl:K112, and our Input X Range is
Ll:L112; yours will be different because you have a different sample of data. Check the box
next to Labels. Select New Worksheet Ply and name it LS Fitted Positive Data Model. Finally
select OK.
Qualitative and Limited Dependent Variable Models 401

Input Y:Range: $K 1.: $($112


������

InputX'�;
Regression [2I) rg)
!:ielp
�!,_abels
Input 0 Const:ir:it is �ero
Oconfjdcince Level;

Output options

0 QulputRange:
New Worksheet E'.lv: d Positive D:ata Model
@=] %
The result is:

@ I I
SUMMARY OLJTPUT

Slafistics
Multiple R 0.5741·62303

R Square- A I B 6235
0.32'96_ I c I [JI E I F I G H I I
Adjust,� R Squ-are> I
r-1- - l
2 Standar;:I Error 3.52076612:7
-
3 Obsel'/atiom;
R.eqrassion 111
4
>--
5 /'/NOVA
10
6
11 O .J2J.51246· 3 SS F
7 R;egFe:s sion 664.471854.2 5.3.6:0462167 4.. 45216E-11

13
8 R·esidwal ·10� 1.351.141559• 1239579412


9 fota'l

df f MS
Stal
t. F Sirznif icanoe
Lower95%

Lower95.0%
12 lnt:eT-cept,
17 2 60 07 862311
- . • 1.35100075 664.471.8.542
-1.92
.
: 12·64
. 0.056825344 0 .07685.325 3 -5.278425-714 0·.076853253
,__ ,

o_.659064183 0.090017427' 7.321517716 4.452'HiE-11 0.480652576 0.8314757911 0.480652576 o.837475791


r---
14 110 . 2'01 5.6 1J.414

16 CrJefffcients Standa!tl Error P-Valt.ie Uoi;er95%


Again, the estimated regression function in the table above gives different parameters UoaerQ;5.0%
estimates
-.5 . 278425714
r---
18 xthe ones reported in equation (16.32b) on p. 616 in Principles
than ofEconometrics, 4e because it
is based on a different positive sample data. Your estimated regression function will also be
different than ours and that of Principles ofEconometrics, 4e for the same reason.

Note that once again the least squares principle fails to estimate /31 = -9 and /32 = 1. If the
dependent variable is censored, having a lower limit and/or an upper limit, then the least squares
estimators of the regression parameters are biased and inconsistent. In this case we can apply an
alternative estimation procedure, which is called Tobit in honor of James Tobin, winner of the
1981 Nobel Prize in Economics, who first studied this model. The Tobit estimation procedure is
available in standard econometric software.
APPENDIX A

Review of Math Essentials

CHAPTER OUTLINE
A.1 Mathematical Operations
A.1.1 Exponents
A.1.2 Scientific Notation
A.1.3 Logarithms and the Number e
A.2 Percentages

A.1 MATHEMATICAL OPERATIONS

If you have not done so, read Chapter 1 in this manual. Do it now.

The basic arithmetic operations are described in Section 1.3 .1. Here we explain the use of some
Excel functions that may help in computations. Open Excel and save the workbook as Appendix
A. Rename Sheet 1 as math functions. In cell Al type the label x to name the column. Enter the
values 1,5 , -3, 3 in cells A2:A5.

::a ., - Miumolt ®I

- ,jl Cali�ri • 1u • • • • -: Uro:tral :i;; - lnr


'/:I
IJ.1�,IT

.j
0 ./ i1 � ,,·1 ••>ilc· 1�·'11. •

�rll-4 hid&
i.Jf•"1� 'll:!ll ../. - Oih>< • $.t.a
F<t.I'�

112 v
"'

II c D 6 H

_1_ •
1 1
3 5

i -J
5 JI
6


9
1(1
��,.;� !:I

402
Mathematical Tools 403

Many mathematical functions are built into Excel. These are easy to access with a few clicks.
Suppose we want to find the sum L{=1 xi. sum x. Click in cell
In cell AS type the label BS to
make it the Active cell. Locate the Insert Function icon, to the left of the formula bar.

llgnment

A click opens a dialog box. In the search box you can enter the term you are seeking. Type sum
and click Go. The recommended function is called SUM. The command format is shown below
the function window, and the very important Help on this function link is given in the lower left
comer.

Insert Function · ii'


2earch fur a function :

sum g_o

Select. . a fi.rl.ct10u:

SUMX2PY2
LINEST
SUMIF
SUMIFS
sUMx2fYIY2 Command format _1
SUM_59 ...:J
8lil'(l(m1nberl.numbef.2,. ) •. .
V"'
1-'dldc ;:ill ll:ic ""mboro n;:i r'lJlgo.of colln.

Help m e nu
__,,,,,,_..
O'K Cance.I

Click on OK.

SUM(number 1,number2, ... )


Adds all the numbers in a range of cells.
OK

Help on this function OK Cancel

Several changes occur. First, in the formula bar "=SUM()" appears-the summation command
is awaiting a range of values to add up. In the active cell BS the command is mirrored.
404 Appendix A

JI. CU\
�opy
1·=�1r�1 r Wraj>C Norm ill

I::: - := :� '-I � ""'·· �· c...i .. Good


FC,,,.,.., I �

SUM command line


A F H I K M
1 )(

? Function ArgLJments _

J 5

'1 3

! '"m{@
10
Collapse Dialog Button

11 Arid< All tllo'b .,.,�..,.,,� In � '""'3" M rAll<,

12 Numl'icrl: oombcr 1,r11.1n,b6<'.1, ... .rel ·ID ao nurrooro to cum. Laqtc.:il v.:ik.ioo .:rt! 1Dxt0<1iiqnorccl
B n colic, n:b.iod rftypc!d JG :.ir:qurnontc,

14
15
11')
17

18

Click on A2 and drag the mouse down to AS. As you do so the argument in the SUM function
changes to A2:A5.

A. c D F

Function Arguments I

Number! A2 :As

Number2

8 sum x ji(A2:A5} l
g

Click the OK button in the dialog box. The result is:

I\ c D E F G H
1 l(

l.. 1
3. 5
4 -3:
'\; .'I
6
7
g sum x GlI

9
10
11
� � math fonatioris., Sflcct2 • Shcctl • �
Mathematical Tools 405

If the Function Argument dialog box happens to be in the way of numbers you want to select,
click on the oddly but aptly named collapse dialog button. It will temporarily reduce the dialog
box and allow you to drag it out of the way. After you are done selecting data click the restore
button to return the dialog box to full size.

A more direct approach is to type a formula, beginning with an equal sign in an active cell. To
illustrate compute If=1 xf. In cell A9 enter sum xA2 where the caret is a way to indicate a power.
In cell B9 enter =sum and a drop down list of functions appears.

A B c F H
, •

'
s 5
Type"=" and the fast f.ew
4 3
5 3 l'i'thus of tihe command you
6 s.eek. Exca'l's d ro p ,down list
7 then provides some cll oi ces.
� sumx

'!J �umx112

11:1
11
12.
B
14
15
16
0
I• -� � !J•arh f1111:pt"IW. 5hllefl. • ShlWI •

Select SUMSQ and a definition is provided.

10

11 SUMIF

.SUMIFS
12
SUMPRODUCT
13
SUMSQ Return,;, the s.um of the square!> of the .arguments.,
14
:
.� SUMX2MY2
15
SUMX2PY2
16 o SUMXMY2

Double-click SUMSQ and the function enters B9. Specify the function arguments by filling in
the range A2:A5. Don't forget the closing parenthesis. Then press Enter to obtain the sum of
squared values.
406 Appendix A

A B
fie] =SUMSQ(A2:A5)
l1 ,,.c_
SUM 'X ..,t' 1 x

I A � B l c I D I E 2 1
1 x
I 3 5
1


4 -3
5
-3 5 3
3 6
t
6
7
117
� 8 sum x. 6
8 sum x 6
'--

,_
9 sum x112 I =SUMS.Q(�2:A5) 9 sum x:A2 441 ·-

ID 10

The trick is to know what functions are available. The key tool here is the Help button that will
be found in the upper right comer of the window.

l"'!1 1(

Mkr·o.s"Oft Offi�e Excel Help CF11

ng_M ci ...,r,,...os ci ft
.-J�,,...u:rs i....
..,h e o fflc e. -�
tnS!'rt Delete 1.' ..
Form -..__G-. t ... _
_ __ ____
. ea
Filter
y Select�

Cells EdrtJng

In the resulting Help window you can find resources for functions and many other tasks. If you
do not see what you are seeking, enter a short phrase or keyword into the Search window and
press Enter.

.� Excel Help

r��- �q:�·_!-��-�@i� N.� �w.i�:._....-..���::::


;_�� .. .. =· � Enter term to initiate s earc h
•P Search•
Table of Contents
Whafs new

\I" P<ctivafilng Excel


Whai.'5 nm'V Getting help
"'1/J Customizing

.:§ Ac[essibility :rnstalllng Activating Excel

� Rle conversion and connpatibtllty Customizing Acce'i.Sibllrty


Sal1ing and J>rinting
File conversion and oompatibllity Saving and printing
� Woddmek mana:gement
Workbook rnanagement Worksheet and Excel ta
� Wori':sfieet and Eltcel tabf:e ba<Sics

FormJS Forms Formu� and 11ame bas

Formul;, ond basrcs


Filtering. sortfng, and
name
oo

� Function refe�ence
summanzirig , consolidatlng, and o u l:ll nlng data Validating datcl
Rltenng, oorting, and conditiona lly fl
Mathematical Tools 407

Click on Function reference. There are sections for Math and trigonometry and Statistical
functions.

Function reference

Da:taba5e <ind ls;t management ll}ate ancl time

EJl!llneenng !Financial

logical
L-oolruµ and reference Matti and m;ionomeuy

Stmrtleal Text ;irn! data

Click on Math and trigonometry.

Miilth emd trigonometry


-
1-
0 fl'C'
- . -_����������������������-�
-l-
.�-
, �-
Mm1 m:I U'QOOOmi!1TV ll.JIKIIJl'IS (referErla!)
Click here to list functions
Ai!Sf\Jnam
with -definitions

""1'4it-�����=�=•- Alphabetica I list

.ATAN furn.-tucl

Click on Math and trigonometry functions (reference). Below is a very abbreviated list (copied
directly from the Excel Help) of some useful functions.

Function Description

ABS Returns the absolute value of a number

EXP Returns e raised to the power of a given number

LN Returns the natural logarithm of a number

PI Returns the value of pi

POWER Returns the result of a number raised to a power

ROUND Rounds a number to a specified number of digits

SQRT Returns a positive square root

SUM Adds its arguments


408 Appendix A

SUMSQ Returns the sum of the squares of the arguments

SUMX2MY2 Returns the sum of the difference of squares of corresponding values in two arrays

SUMX2PY2 Returns the sum of the sum of squares of corresponding values in two arrays

SUMXMY2 Returns the sum of squares of differences of corresponding values in two arrays

Click on SUMX2PY2. The resulting help window includes an equation, so that you can quickly
see that the function is designed to compute the sum of squares of two arrays.

Excel H<>me > Fundion refurerm' >Math ard tngonomelry

SUMX2PY2 function

l±lShowAll

Returns the sum of the sum of squares of corresponding vaiues in two arrays. The sum of th.e sum of squ.ares is a

.common term in many statistical calculations_

Syntax

SUMX2PY2(array_,x.array_y)

Array_x rs the first array or range of values_

Array_y is the second array or range of values_

Remarks

• The argLiments shou[d be either numbers or names, arrays, or references that contain numbers_

• If an array or reference argument contains text, lo_gic.al values, or empty cells those values are ignored;
l1owever, cells with the value zero are included_

• If array_x and arr.ay_y have a different number of values, SUMX2PY2 returns the #NJA error value

• The equation for the sum of the sum of squares is:

......
'=====:!• The formula tells all.

A.1.1 Exponents

The notation xn means take x to the nth power (see p. 635 in Principles of Econometrics, 4e).
The function POWER achieves this in Excel. We will use this function to raise each value in the
array x to the power -3. Note that x-3 = 1/x3 as long as xis not zero.

Close the Excel help window. In cell Bl enter xA-3. In B2 enter =POWER(A2,-3) and press
Enter. Select cell B2. Move the cursor to the lower right comer of B2 until it turns into a skinny
cross. Drag the cross down to cell BS and release. Cells B2:B5 contain the calculated values.
Mathematical Tools 409

A B
A B ,.c
1 XJl.-3
I
x
1 x x_A-3

2 11 11 2 1 1

T- 3 5 0.008
3 5
4 -3 0 0 3 7 04

I
-
.

4 -3
5 3 0.037037�
5 3
6 �
L. , +

Instead of using the power function we could have entered =A2"-3 into cell B2 and pressed
Enter, then dragged the formula down to achieve the same result.

A.1.2 Scientific Notation

Very large or very small numbers can be expressed as a number between 1 and 10 times a power
of 10. For example, 0.00000034 is 3.4 x 10-7 = 3.4E - 7. In cell AlO enter small x and in BlO
enter .00000034. Right-click on the cell BlO and select Format Cells from the menu.

,f;;Opjr
A H _E.a.ste

Paste ,S.pecial ...


1
Insert...
3 5 0.008
.Qe·lete ...
4 -3 -0.037037

5 3 0 .03703704 Clear Contents,

fi tr=rltgr

7 :SQ rt
8. sum x 6
�nsert Comment
9 sum xJl.2' 44
,Eofmat Cell>...
10 small x: 0. 00000034
Prck Frntn Drop-down List ..
11
Name a _Bange: ...

_E}yperl fn It..

Select Scientific with 2 Decimal places and then OK, the number is now represented in scientific
notation.
410 Appendix A

A B
Number
l Alignment ] Font
I Border
I Fill
I 1 x .x'"'-3

�at:Egory: 2 1 1
General
Number
� 3 5 0.008
r:.:��07 4 -3 - -0 . 03 7037
Currency
Accounting
Date
Qecimal places: 12 � 5
6
3 D.03703704

Time
Percentage 7
Fraction
8 S.tlm X 6,

Text 1-I' 9 s.um x"'2. 44


Special
Custom
10 small x 3.40E-01 I

11

A.1.3 Logarithms and the Number e

In cell Dl enter the label y and in El enter the label ln(y). D2:D8 enter powers of 10 starting
In
with 1 and ending with 1,000,000. In cell E2 enter the formula =ln(D2) and press Enter. The
function LN is the natural logarithm. All the logarithms in Principles of Econometrics, 4e are
natural logarithms, rather than those to the base 10, or some other base.

A B C D E
1 x X:"-3 y 1_n....
(y._)
...,
-

2 1 1 l 1) =ln(D2) I j
3 5 0>.002 10
-

4 -;;! -U.O::l/Ujf llJU

_?
&
l 3 0.03703704 1000
10000
,____,
r lUUUUU
_!Jsumx 6 1000000
�sumx"2 44
10Ismail x 3.4Ut-O/

Move the cursor to the lower right comer of E2 and drag the formula down to ES.

A B c D E
K"-3 y ln{y)
1 1 1 0
5 0.008 10 2.302585
4 -� -0.037037 100 4.605:17
5 3 0.03703704 1000 6.907755

6 10000 9.21034
7 100000 11-51293
8 sum)! 6 llOOOOOO 13.81551
9 sum xA2 44
HJ sma&I x 3..40E-07

Logarithms are very useful in econometrics. The properties of logarithms are discussed on p. 636
of Principles of Econometrics, 4e. For example z = ln(y0·5) = 0.5 x ln(y). In cell Fl enter the
label z=0.5ln(y). In cell F2 enter the formula =0.5*E2. Copy the formula from cell F2 down to
Mathematical Tools 411

cellsF3:F8. Now that we have z, a variable in logarithmic form. Next, we would like to convert z
back into a non-logarithmic form (this is called taking the antilogarithm). To do that we use the
exponential function. In Gl enter the label exp(z). In G2 enter the formula =EXP(F2), and then
press Enter. Copy that formula down to G3:G8. Now, compare columns D and G. The values in
G2:G8 are the square roots of the values in D2:D8. Of course we could have simply used the
SQRT function to do this calculation, but the point here was to demonstrate operations with
logarithms.

D I E F r
....

01 I E I F t G

_..

ln(y} z=O.Sln(y}1 ln(y) z.=O. Sln(y) exp(z) I "'""":'


y y
1 0 0 1 0 0 1

10 2.302585 1.1512.9 3 10 2.302585 1.151293 3 .. 16·221'8


Ii-
100 4.60517 2.302585 100 4.60517 2.30258,S 10
I�
1000 6.90'7755 3.453-87.8 - 1000 6.907755 3.453878 31.62278 -

10000 9.21034 4.60517 10000 9.2HB4 4.605.17 HJO


Ii
100000 11.51293 5.756463 1ommo 11.51293 5.756463 316.2278

1000000 13.81.551 6.907755 1000000 1.3.81551 6.-907755 lOCO I

To illustrate another point, in cell


11 enter the label x, again. Excel doesn't mind. In cells 12 and
13 enter the values 1 and 2. Highlight these two cells and drag the sequence down to 111. You
will find you have the sequence of numbers 1 through 10. In Jl enter the label y=.03*EXP(x),
then in J2 enter the formula =.03*EXP(l2) then press Enter. Copy the formula from cell J2 to
cells J3:Jll. The values you obtain quickly go from being small to much larger.

H J K I J
l
x x y=.03exp(x.)
- - -

r ·�a� 1 0:081548
-�
-I 21 2. 0.,221672

I
3 0.6D12566·
I 4 1.637'945

I I
-

5 4.452395
"
¥. 6 1 2.1 0286
i� I 7' 32.89899

I I 8

9
89.42874

243.0925

i il�
--F 10 660.794

I I

Highlight 11:Jll (all the cells from these two columns, including labels). Click on the ribbon tab
Insert and then select Scatter charts.
412 Appendix A

Appendix A..xlsx - Microsoft Excel

Page Layuut FOrmufBs Data Review View Diovelop!!r Add-ltis Amibat ij) � X


-

[I] CJipArt � TErl Box � �

I@ shapes •
Header & li'oater �
PivotTable Table Pic:tUJre
r: SmaMrt ..'1 WordA.rt •

Tables 11 lu strati cm> Charts Lml:s Text

From the drop down menu choose the one showing curvy lines. A graph is superimposed showing
the plotted relationship with a title (since you included the header row with text labels in your
cells selection).

Scatter

y=.03exp(x)
800

600

lQ � M 400
- y=.03exp(x)
200

tcl 0
0 5 10 15
i8J 811 Chart Types ...

Select the figure and drag it off to the side. Place the cursor over the column header to select
column J. Select theHome tab. Once there, go to the Cells group of command, select Insert and
then Insert Sheet Columns.

L ..,
I J w,

x y=. 03exp(i<
-

1 0 081548.
..

2. 0.221672.
3 0 . 6 02 5 66· 8� Inisert C�lls
4 1.637'945 J
_J lm:ert :heet £0 s

5 4.45?395
CD [ � ·rm· •
....
EEi
:i ··
..
Ifi1sert Sheet �olumnt&
6 12.102.86
� Insert1 Detete Format
ln�ert Sn eet
7 32.89.899
8 89.42874 celrs

9 243 0925 ..

10 6·60.794

Now column J is empty and column K contains the y values. In the new Jl enter the label ln(y),
in J2 enter the formula=ln(K2), then press Enter. Copy the formula from cell J2 to cells J3:Jll.
Now graph the relationship between x and ln(y). As you can see it is a straight line.
Mathematical Tools 413

I J I K
x ln(y) y=.03exp(x}
1 2 5 0 656 0.081548455
ln(y)
- .

2 -1.50656 0.2216'7'1683
3 -0.50656 0.602566108 10 �----

4 0.493442 1.637944501
5 1.493442 4.452394773
- ln(y)
6 2.493442 12 10i28638
..

7 3.493442 32.898.99475
8 4.493442 89.428.73961
g 5.493442 243.0925178
10 6.493442 660 .7939738
.

For econometric analysis the ability to convert "curved" relationships to straight lines 1s
sometimes very important.

A.2 PERCENTAGES

While we have understood percentages since grade school, let us consider them again. In
particular we should keep the distinction between a percentage change and its decimal form clear.
In the Appendix A workbook label a new worksheet percentages. In Al enter the label y. In
A2:A7 enter values 1.01, 1.05, 1.10, 1.15, 1.20, and 1.25. If they-value changes from y0 to y1
then the percentage change is

Yl - Yo
%Lly = x 100
Yo

For each of the values in A2:A7 compute the percentage change from the value Y = 1. This
o
choice of y0 value implies that the percentage change equation becomes: %Lly = (y1 - 1) x
100.

In Bl enter the label pct chg. In B2 enter the formula =100*(A2-l), then press Enter. Place your
cursor on the lower right comer of B2 to form a skinny cross, then drag it down to B7.

A B
....
A I B
I
....
A I B
pct chg 1 y pct chg 1 y pct chg
i.cn) :::: 10D"'�A2 -} ) 1
- -

2 1.01 11 2 1.01 1
-
+ -

j
3 1.05 3 1.05 3 1.05 5
- ,_

4 1.10 4 1.10 4 1.10 10


5 1.15 5 1.15 5 15
l - -
1 15 .

1
6 1.20 6 1.20 6 1.20 20
-

7 1.25 7 1.25 7 1.25 25

There is a 10% percentage change from Y =


1 to y1 =
1.10. The following value:
o
414 Appendix A

Y1 -Yo = 1.10-1 =
.lO
Yo 1
is the decimal equivalent of the percentage, but the percentage itself is multiplied by 100:

percentage change = lOOxdecimal equivalent.

A convenient mathematical approximation is that when x is "small", then ln(l + x) ,..., x. In Cl


enter the label ln(y). In C2 enter the formula =LN(A2). Copy this formula to C3:C7.

A B c

1 y pct chg ln(y)


2 1.01 1 0.00995

3 1.05 5 D.04879

4 1.1() 10 D.09531

s 1.15 15 0.139762

6 1.20 20 0.182322

7 1.25 25 0.223144

You can see that the approximation works pretty well for the first few cases. In Principles of
Econometrics, 4e, p. 638, it is shown that this trick with logarithms can be used to approximate
percentage changes when the change is small.

%Lly ,..., lOO(ln(y1) - ln(y0))


Use this approximation to compute the percentage changes in column D. In Dl enter the label
approx pct chg. Since y0 1, ln(y0) 0, and the approximate percentage changes equation
= =

reduces to: %Lly ::: (ln(y1)) . In D2 enter the formula =100*(LN(A2)), press Enter. Drag this
formula down to D3:D7.

/I
A I H I c t D l
1 y .Pct chg ln(y) approx pct chg

2
r 1.cnJ l 0.009951 =·10U*(LN(A2))1 J
3 1:05 5 0.04879
,_ - --

4 1.10 10 0.09531
,_

5 1.15 15 0.,1:397'62
,_

6 1.20 20 (Di.182322

7
-
1.25 25 Q.223144

The default in Excel is to report many decimals.


Mathematical Tools 415

A B C D
1 y pct chg ln(y} approx pct chg
2 1.01 1 0.00995 0.995033085
3 1.05 5 0.04879 4.879016417
4 1.10 10 0.09531 9 . .53101798
5 1.15 15 0.139762. 13 .. 97'619424
6 1.201 20 0.182322 18.23215568
7 1.25 25 0.223144 22 .. 31435513

Highlight C2:D7, right-click and select Format Cells. Choose the Number format with 4
Decimal places. Click OK.

Format Cells

11'1\lnw ii Alk}lment J FOITt J llOl'Qer Fill I Pmtacuon I


!;;,alegory;
le
r£a"'f'
General -
•·
0.0100
CLrroncy
l\CCOlllllrq
Datil;
Qbd�lp�oo: 14 :iJ
Tim" ruse ieoo Ssparab:( (,)
PP,frnnt;;igR
Frac.tion �gafu9 null'rtia�!
Sc:fmtift:
Text l2J4.:JZ 10
:Jpecial (17.:l4.-:P10I)
. ?!210)
Cuslnm
lZ34

�nb!r is u:>W r.,. \fl'JtJT "I Llkiplely urhrntbfns, Cll r"rllly. air.:J l\CbW 1� u'lffii Spil(idll.'.t:tl
fi:lrrn;:i� liY fll'"!t;rv v;ikJP..

Compare the results in columns B and D. For the first few values of y the approximation is pretty
good, but when y = 1.10 the approximation error is already Yz%.

J._A I B c D
1 y pct chg ln(y)' approx pct chg
,_

2 1.01 1 0.0100 0.9950


3 1.05 5 0.048-8 4.B790
4 1 . 10 10 0.0953 9.5310
5 1.15 15 0.1398 13.9762
6 1.20 20 0.1823 18.2322
7 1.25 25 0.2231 22.3144
,_ •
n

Use the approximation %.lly = 100(ln(y1) - ln(y0)) only for small changes in y.
APPENDIX B

Review of Probability Concepts

CHAPTER OUTLINE
B.1 Binomial Probabilities B.3 Distributions Related to the Normal
B.1.1 Computing Binomial Probabilities Directly B.3.1 The Chi-Square Distribution
B.1.2 Computing Binomial Probabilities Using B.3.2 The t-Distribution
BINOMDIST B.3.3 The F-Distribution
B.2 The Normal Distributions
B.2.1 The STANDARDIZE Function
B.2.2 The NORMSDIST Function
B.2.3 The NORMSINV Function
B.2.4 The NORMDIST Function
B.2.5 The NORMINV Function
B.2.6 A Template for Normal Distribution
Probability Calculations

Excel has a number of functions for computing probabilities. In this chapter we will show you
how to work with the probability function of a binomial random variable and how to compute
probabilities involving normal random variables.

B.1 BINOMIAL PROBABILITIES

p,
A binomial experiment consists of a fixed number of trials, n. On each independent trial the

X x= 1,
outcome is success or failure, with the probability of success, being the same for each trial. The

X= x
random variable is the number of successes in n trials, so 0, ..., n. For this discrete
random variable, the probability that is given by the probability function:

P(X=x)=f(x)=(x! enn-x! ))p


! x(1-pr-x,x=O,l, ...,n

We can compute these probabilities two ways: the hard way and the easy way.

416
Review of Probability Concepts 417

B.1.1 Computing Binomial Probabilities Directly

Excel has a number of mathematical functions that make computation of formulas


straightforward. Assume there are n = 5 trials, that the probability of success is p = 0.3, and
that we want the probability of x = 3 successes. What we must compute is:

P(X = 3) = f(3) =
( 5! ) 0. 33(1 - 0.3)5-3
3! (5 3)!
_

Open Excel and name the workbook Appendix B. Rename Sheet 1 binomial. Make cell Al
active by "clicking" it.

Eventually you will learn many shortcuts in Excel, but should you forget how to compute some
mathematical or statistical quantity, there is an Insert Function fx button to the right of the cell
reference window.

p.;:ige Layout Forrnwta> D;ata

·- Copy
Paste
Format Pamter
[u I !!

Cl1 b°'ard 5 5

A1

A B c E
1
..
I

Click on the Insert Function button, select Math & Trig in the Or select a category window.
Next, scroll down the list of functions in Select a function window. Select FACT; this function
returns the factorial of a number.

Insert Function
1
S.earch fo� a function;
Type ci brief description of whcit you wcint 1D do cind then click Go

Or select a 6a'tegory; J Math 8 Trig

Se.lect a fundioo:
EVEN
EXP

FACTDOUBLE
FLOOR
GCD
INT
FACT(number)
Relur.ns the factcrial of a·number, equal 1D 1 *2*3·* . '1 Number.

Definition

He lfl on this functicin 01< Cancel


418 Appendix B

Click OK. Enter 5 in the Number window of the Function Arguments dialog box that opens
up. Excel determines that 5! = 120. Click Cancel.

-
-

fLmction Arguments

ti = s

= 120
Returns lhe fac:tnr-ial of a numioor, eClJ.lal to 1 *2*3� ••. * Nt.m1iller.

�He�H111 lhls �ti!ln oi< � .,'-_c_


ar.c I �
_e_

Your cell Al should still be active. Click on Insert function again. This time search for the term
factorial and click Go.

Insert Function I

s_earch fur a function:

lfactori�I Search for tenn 136

Or select a �ategory: (Recommended


:::::::J
-Seler:t a function:
Fi'.l.CT
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

FACT DOUBLE
MULTINOMIAL

FACT(number)
Re1urns_ lhe factorial o.f a number1 equal to 1 *2*3* ... "' N_umber.

Help on this function OK W I Cancel

The funtion FACT should be selected in the list that appears in the Select a function window.
Click OK. The Function Arguments dialog box shown above appears again. Click Cancel.
Alternatively, in cell Al type P[X=3], and in Bl type the following formula:

=(FACT(5)/(FACT(3)*FACT(2)))*(O.JA3)*(0. 7A2)

Your screen should look like the one below:


Review of Probability Concepts 419

; AppendtX 1:!.xlsx - Microsoft l:xcel -�


Paqe layC1ut Formula� DEita Revie�v View Dw�lllPl'r Add-lrls Acrob�t � - � x
-- -
""' �
[::__ _ I -""'=� General
la ;i"' · mt

l'asle
II I
]fA�
Ir =�� -[ $- % 1� .
S�1es
i" lJeToer�
Sort& F1ml&
'.:J Femi
II-' - I r� ='EJ I ... �

\ �- . Q · Filtfr Sele.ct -
Clipboard r,, Font Number Coils Edctii;

FACT .... X d fr = ( FACT{S)/ ( FAOT:W FACT(?)))* (0 . �11::! ) "(0.7":?)


A B C 1U E I F G H
1 P_[X=3] _ _____ __-H�I
[1 -(FACT(5)/('FACT{3)"' FAC'!_'(2 ) )."'(0.3A3)"(0. 7Al)I- _,_

2
3
'I 0
s
l'4 � • �I binnmial '5heet2 Sheet3 ..
Etlil �

Note that we have used parentheses to group operations. Press Enter. The result is 0.1323.

B.1.2 Computing Binomial Probabilities Using BINOMDIST

Make cell B6 active. Click on the Insert Function button. Select Statistical in the Or select a
category window of the Insert Function dialog box. Next, scroll down the list of functions in the
Select a function window, and select BINOMDIST. Select OK.

TnsPrt fi 1nrt1on , ? x

t>YERAGC/I.
/\�G.EIF
AlkH.At;i,Jf-!;
Btl"ADIST
OCTAll'H

r:HrnT:ST
::..!
BiNOM)lST{number_o,trl.als,prObablltv_s,.cun\UtatlY e)
___

Rell.tiis ltJs ndl'lf<t.gl 11arm btJOJl\!al d�lr bull;ln fll"lbobilftY,

Fonn of command is summar�z.ed

The Function Arguments dialog box pops up.

The Excel function BINOMDIST can be used to find either cumulative probability, P(X:::; x) or
the probability function, P(X = x) for a Binomial random variable. Syntax for the function is:

BINOMDIST(number, trials, probability, cumulative)


where:

• number is the number of successes inn trials,


• trials is the number of independent trials (n),
• probability is p, the probability of success on any one trial,
420 Appendix B

• cumulative is a logical value. If set equal to 1 (true), the cumulative probability P(X ::::;
x) is returned; if set to 0 (false), the probability P(X = x) is returned.

Note that Excel defines each argument for which it is prompting you. In the middle portion of the
screen shot shown below, you can find the definition of the Cumulative argument-this is the
argument that is defined because the cursor is in the Cumulative window. Using the values
n = 5, p = .3, x = 3 and setting Cumulative to 0, we obtain the probability 0.1323, as
above.

runction Arguments ·-

ThJClMl)JSr

NIJMbllir_!O
IJ ii = ·::i

Tri*
1� 6 = 5

PrnhRl'lility_s 1.;J s = ["13

r.umulative ILi s = FAJISF

� 0.1323
Re�..m:. tlha irdll'�I term ilhomlal d151rbutb1 prabablltty.
Wmulatlve IS a loqt:al vatie; rtr tro cumulat!Ya d1Str11Ju1Jro lln:!Ul, use
TRVC; far 1he probaQilily mas3 function, userALGE.

FtYm Ji.'l rBSJlt = n.13:"1

Press Cancel. Next, we will set up a "template" that will allow you to compute any binomial
probability with a simple click or two. In A3:A7 enter some labels for the number of successes x
(B3) inn trials (B 4) with probability p on each independent trial (BS). In B6 we will compute the
probability that X = x and in B7 we will compute the probability that X ::::; x.

A s.
1 Pi[X:= 3 ] 0.132'3
2
3 successes x

4 trials n

5 probability p
6 P[X=x]
7 PfX<=xJ
8
9
10

I� � IP- !II binomial /'S'he'et

Enter values x = 3, n = 5, and p = 0.3 in cells B3:B5.

Make cell B6 active again. Access the BINOMDIST function via the Insert Function button as
you have just done above or directly type the function in cell B6. Either way, this time around,
instead of specifying the values of the arguments x, n and p, specify the locations (cell
Review of Probability Concepts 421

references) where Excel can find those values. Repeat the exercise in cell B7, but this time set the
Cumulative to 1.

A IB
1 P:[X=3] =(FACT (5 )/(FACT(3}"'fACT(2 )))*'(O. JA 3}11<(0. 7A 2}
2

l
3 successes x

4 trials n

5 probability p
6 P[X=x] =BINOMDL5T(B3,84,B5,0i)

7 P.[X<=x. I =Bl NOMDIST( B 3; B 4A IB5, 1)


g B NOMDIST(numbE>r__s;, 1rfali:;. p·robability_S, cumulative)
g
10

I� 1 � .-1 binomial� Sneet2 5neet3 II�

In this book we will use "templates" a great deal. These templates are Excel pages with cells
addresses in the formulas so that by changing a numerical value (say in B3 ) we can compute an
alternative probability. It is very instructive to see the formulas to check on exactly the structure
of the commands. Select the Formulas tab on the Excel ribbon, and then go to the Formula
Auditing group of commands.

Appendix B.xlsx - Microsoft Excel -�


Forrpulas Data Review View Developer @.1 - t:!I x

I: Autosum - logical � � ;.8 oetfne Name •

fx
\
.r.� US(' 111 Fo mUl3
rnsert
Fmtdion Finmicial •

Function Lib
� !Date Bl. Time -
� l Name
Manager !!§I create from
Defined Narnes.
,
Selectron
Formula
uditing •
Calculation

Select Show Formulas. You can switch between the numerical values, shown below, and the
formulas shown above.

A B c
1 P[X=3] 0.1323
2
�;= Tr.acE: Preceden!Is Show Formula�
3 successes x 3
c(� Tr-ace Dependents tf, Error C!iieckin J't 8;J Watc� 4 trials. n 5
..?., Remove Arrows y ®. Evalt.1atte Formura Window
5 probability p 0.3
fDrmr.ila Auditing
6 P[X=x] 0.1323

t
7 P[X«=x� 0.96922
a

Next time you need to compute a binomial probability you can call up the function BINOMDIST
or you can open your Appendix B workbook, go to the binomial worksheet and enter values into
422 Appendix B

the template. For example, use the template to compute the probabilities for 5 successes, in 10
trials if the probability is 0.7. Here are the results you should get:

A B c
1 P '[X=3] 0.132.3
2
3 successes. x 5
4 trials n 10
5 probability p (l7

-
6. P,[X=x] 0.102919345.
7 PIX<=x] 0 .. 150268333
g,
9
10

I.. � 1> l>I bi nom:ia l ,. · Sheet2 Sheet3

So that this template can be perfectly general, delete the entries in the first row and, in cell Al,
enter the label Computing binomial probabilities. Save your workbook.

A B c
1 Computing bin o m i a l probabilities
2
3 successes x 5
4 trials n 10
5 probability p {). 7
6 P[X=x] Oi.102.919345
7 P[X<=xJ 0.150268333
.8
9
10

Ii � I> 1>1 bi11on1ial .· ·sheet2 'Sheet3

B.2 THE NORMAL DISTRIBUTIONS

Excel provides several functions related to the Normal and Standard Normal Distributions.

B.2.1 The STANDARDIZE Function

The STANDARDIZE function computes the Z value for given values of X, µ and O'. That is, it
computes:
Review of Probability Concepts 423

X-µ
Z= -­

The format of this function is:


STANDARDIZE(X, µ, cr)

For example forµ= 3 and a= 3, if we wanted to find the Z value corresponding to X= 6, we


would enter =STANDARDIZE(6,3,3) in a cell, and the value computed would be 1.0.

B.2.2 The NORMSDIST Function

The NORMSDIST function computes the area, or cumulative probability, less than a given Z
value. Geometrically, the cumulative probability is the area under the standard normal probability
density function to the left of the given value. In many statistics books the cumulative distribution
function of a standard normal random variable is denoted by the special symbol ct>. Then,

P(Z ::; z) = <t>(z)

Standard normal probabilities are contained in Table 1, Appendix E of Principles of


Econometrics, 4e.

Standard Normal Distribution

Example:
P(Z-S:. 1.73) = 'll(l .73) = .9582

-4 -3 -2 -1 0 1 2 3 4
z

Instead of a table in the book we will use the function in Excel. The format of this function is:

NORMSDIST(Z)

If we wanted to find the area below a Z value of 1.0, we would enter =NORMSDIST(l.O) in a
cell, and the value computed would be. 8413.

B.2.3 The NORMSINV Function

The NORMSINV function computes the Z value, Zc, corresponding to a given cumulative area
under the normal curve. The format of this function is:
424 Appendix B

NORMSINV(prob)

where prob is the area under the standard normal curve less than Zc. That is, prob = P(Z <
zc ) . If we wanted to find the Zc value corresponding to a cumulative area of .10, we would enter
=NORMSINV(.10) in a cell and the value computed would be -1.2815.

B.2.4 The NORMDIST Function

The NORMDIST function computes the area or probability less than a given X value, or the
value of the normal pdf for given values of the distribution mean µ and standard deviation er. The
format of this function is:

NORMDIST(X, µ, cr, CUMULATIVE)

Let X ,..,N (µ, cr2 ) . Then the function NORMDIST will compute:

P(X < x) if CUMULATIVE= 1

f(x) = � exp [- (x-�)2] if CUMULATIVE= 0


2nu 2u

CUMULATIVE is a logical value, which can be replaced by 1. If we wanted to find the area
below an X value of 6, we would enter =NORMDIST(6,3,3,1) in a cell, and the value computed
would be. 8413.

B.2.5 The NORMINV Function

The NORMINV function computes the x value corresponding to a cumulative area under the
normal curve. The format of this function is:

NORMINV(prob, µ, cr)

where prob is the area under the normal curve less than x. That is, prob = P(X < x). To
compute the value of x such that . 10 of the probability is to its left, enter =NORMINV(.10,3,3)
in a cell, yielding --0.844 6.

B.2.6 A Template for Normal Distribution Probability Calculations

Rename your Sheet 2 normal and build a template for normal probabilities by entering the
formulas shown below. The highlighted cells require user input. The formulas in the other cells
do the computations.
Review of Probability Concepts 425

8
1 Normal Probabilities
2
3 mean

4 standard_dev
5
6 Left-tail probability
7 a

8 P(.X-c;=:a l =NORMD'IST{B7,B3,B4,1J

10 R;ight-tail probab il ity

11 b
12 P ( X>=b ) =1-NO R MDI ST ( IB 11,B3,B4,1 }

13
14 Interval pro bability
15 a
16 b
17 P(a.o:;=.X<=b)
18
19 Inverse probability
20 Left-tail probability

21 Critical value, or quantile =NORMINV(B2Q,83,B4l


..
...2�������
2.
1'4 • " ti birmmial nornnal Sheet3 1m

Using X-N(µ = 3, a2 = 9), the above template would produce the following results:

A B
1 Normal Probabilities
2
J mean �
4 standarci_dev 3
5
6 Lreft-tail probability
7 a 5
8 P(X<=aL 0.841344746
9
_1Q_ Right-rail probability -
11 h fi
12 P(X>=b) 0 .15:8655254
13
14 Interval probability
15 a 4
16 b 6
17 P(a<=X<=b) 0.210786086
18
19 lnvene probability
20 L·eft-tail probability 0.95
21. Critical value, or quantile 7.934560881
22

426 Appendix B

Note that the quantile equal to 7.93 gives the top 5% "cut off' value.

The template works equally well for standard normal calculations. For example,

A B
1 f\J!ormal Probabilities

2
3 mean lJ
4 standard dev 1
5
6 Left-tail probability
7 a 2
a P(X<=a) 0. '977 Z4'9868
9
10 Right-ltail probability -

�, h ?
12 P\X>=b) 0.1022750132
13
14 Interval probabJlity
15 a 1.5
16 b 2_5
17 P(a<
1 =X<=b) 0.060597536
181
19 Inverse probalbility
20 Leh-tail probability 0.95
-

21 Critical value, or quantile 1. 644853627


ll
14 � • �I binomial normal

It might be a useful exercise for you to compute these normal probabilities using Table 1 m

Appendix E of Principles ofEconometrics, 4e.

B.3 DISTRIBUTIONS RELATED TO THE NORMAL

The chi-square distribution, the t-distribution and the F-distribution are related to the normal
distribution. For each we will make a few remarks and then provide a template for probability
calculations.

B.3.1 The Chi-Square Distribution

IfZ1 is a standard normal random variable with mean 0 and variance 1, then Zf has a chi-square
distribution with one degree of freedom. If Z1, Z2, ..., Zm are independent N(0,1) random
variables then:
m
v Izr -x(m) =

i=l

This notation means that V has a chi-square distribution with m degrees of freedom. The
th th th
expected value of V is E(V) = m. The variance of V is var(V) = 2m. The 90 , 95 and 99
percentiles, and some others, are given in Table 3, Appendix E of Principles ofEconometrics, 4e.
Review of Probability Concepts 427

The template we will create next will make calculations to answer the following two questions:

1. For any value v > 0 what is the probability that a chi-square random variable will be
greater than v?
2. What is the "critical value" for the percentile p. That is, what is the value c such that
P(V < c) = p.

To answer the first type of question we use the Excel function CHIDIST. The format of the
function is:
CHIDIST(x, df)

Here x is the value of the chi-square variable and df is its degrees of freedom. The CHIDIST
function returns the probability in the right-tail of the distribution, the probability that V> x. To
calculate the probability that V< x, use the function 1-CHIDIST.

To answer the second question we use the function CHIINV. The format of the command is:

CHIINV(probability ,df)

where probability is the right-tail probability and df is the degrees of freedom. To find the 95th
percentile use the function CHIINV(.05,df).

Rename Sheet 3 chi-square and create the following template.

A B -
1 Chi-square probabiliti'es
2
3 value -
4 df
5
6 P[V<=value] =1-CH IDlST(B3}84)

7 P[V)valuej =CH lD�IST(B3,B4)

g Cumulative Percentile
10 Critical value =CH I. I NV(l-89·,84}
1-4 '4 � �I binomial Mrmal chi

The calculations are illustrated for a chi-square distribution with 5 degrees of freedom, for the
value 7.7 below. We find that > 7.7 0.173563, and that the 95th percentile value is
P(Xfs) ) =

11.0705.
428 Appendix B

A B

1 Chi-square probabilities
2

3 value 7.7

4 df 5

5, P[V<=value] 0.825437

7 P[V>value] 0. 173563

,g,

9 Cumulative Percentile D.95

10 Critical value 11.0705

11

B.3.2 The t-Distribution

A t-probability density function is bell-shaped and centered at zero, like the normal distribution.
Its variance depends upon its degrees of freedom parameter m, and is equal to m/(m - 2). We
denote the t-distribution with m degrees of freedom as t(m). As m � oo the t-distribution
converges to the standard normal N(0,1). The function used to compute t-probabilities is TDIST.
The function used to compute critical values is TINV.

The format of the TDIST function is:

TDIST(x, df, tails)

In this function x is the value of the t-random variable and x > 0. The df is the degrees of
freedom parameter, and tails takes the value either 1 or 2.

TDIST(x, df, 1) computes P(tcan > x ) , this is the right-tail probability.

TDIST(x, df, 2) computes P(tcan < -x ) + P(tcan > x , ) this is the two-tail probability.

To compute left-tail probabilities, for x > 0, P(tcaf) < x ) use 1-TDIST(x,df,1). To compute
probabilities for negative x values we use the symmetry of the distribution. For example,

P(tcan < -x ) = P(tcan > x ) and P(tcan > -x ) = P(tcan < x )

Critical values are computed using TINV with format:

TINV(probability, df)

where probability is the two-tail probability. This function computes the value tc such that
P(tcan < -tc ) + P(tcan > tc ) =probability.

Insert a new sheet, name it t-distribution, and create the following template for basic probability
calculations.
Review of Probability Concepts 429

A
1 t-d is t rib uti o 11
2
3 df
4
5 value» O
6 P'[ t<=value· =l-TDIST(BS,83,l}
7 Pit>valuej = TDIST(I BS , 83 , 1 )
8
9 value< 0
10 P [t<=value] = TDIST(-89
. ,83,1.)
11 Pit>valu�) = 1 -TDIS'f( - B9 , B5 , 1 )

12
13 cumulative· percentile
14 c ri t i�al value =TINV(2"'(1-B13},B3}
15
normal c:hi-s LIBre t-distribrution

The calculations are illustrated below for a t-distribution with 5 degrees of freedom, a positive
t = 2.3, a negative t = -1.5 and the 95th percentile.

A B C
1 L-1.Ji:sLr ibu1Liu11
2

-3 df 5
4
5 valu1e>O 2.3
6 P[ t<=value] 0.96511377 �

7 r[_t>vQ!uc] 0.03438623
8
9 valu:e-< 0 ·1.5
10 P[ !<=value] 0.09695184
11 P[t>value) 0.81.5380344
12
13 cumulative percentile 0.95
14 c.ritio:: ail value 2.01:::i04837
15 •
·
� � � lo'I • nt>rmaJ,. chi-squ are J t-distributioni II

B.3.3 The F-Distribution

The F-distribution is used in a variety of hypothesis testing situations. Its shape is controlled by
two degrees of freedom parameters called the numerator degrees of freedom and the
denominator degrees of freedom. Probabilities are computed using the Excel function FDIST.
Critical values are computed using FINV. The formats for these functions are:

FDIST(x, dtl, df2)


430 Appendix B

This function computes the probability that an F-random variable, with numerator degrees of
freedom dfl and denominator degrees of freedom df2, is greater than x, P(F > x).

The FINV function computes the critical value Fe so that P(F > Fe) = a. The format of the
function is:
FINV(probability, dfl, df2)

Here probability is the right tail probability, and dfl and df2 are the numerator and denominator
degrees of freedom.

Insert a new sheet, name it F-distribution, and create the following template to compute
cumulative probabilities, right-tail probabilities and percentile critical values.

A
"'""""'==f-����----1.��-'-�--.I •
1 F-distribution probabilities

2
3 value
4 df_numerator

5 df denominator
6
7 P[F<=·value] =1-FDIST(B31B4,BS}

8 P[F>value] = FO I ST( B 3 , B4 i B S )
9
10 cumulative percentile

f ' NV( 1 - B 1 Q , B 4, BS )
l
11 critii:al value =

normal chi-s uare t-cfist<t"lbutlon F--idistribmion I I

To illustrate, let the numerator degrees of freedom equal 2, the denominator degrees of freedom
equal 10 and the F-random variable value equal 3.2. Finally, let us find the 95th percentile value.

A B c 0 ...
1 F-distribution probabilities

3 value 3.2
-
4 df numerator 2

5 df_denominator 10

7 P [F<;=valuel 0.915709

8 P[F>value) 0.084291

HJ cumulative percentile 0.95

11 critical valt.ie 4.1()2821 ..


normal chi-s uare t-distrbution F-distribution I I
APPENDIX C

Review of Statistical Inference

CHAPTER OUTLINE
C.1 Examining a Sample of Data C.5 Hypothesis Tests About a Population Mean
C.2 Estimating Population Parameters C.5.1 An Example
C.2.1 Creating Random Samples C.5.2 The p-value
C.2.2 Estimating a Population Mean C.5.3 A Template for Hypothesis Tests
C.2.3 Estimating a Population Variance C.6 Other Useful Tests
C.2.4 Standard Error of the Sample Mean C.6.1 Simulating Data
C.3 The Central Limit Theorem C.6.2 Testing a Population Variance
C.4 Interval Estimation C.6.3 Testing Two Population Means
C.4.1 Interval Estimation with a2 Unknown C.6.4 Testing Two Population Variances
C.4.2 Interval Estimation with the Hip Data C.7 Testing Population Normality
C.7.1 A Histogram
C.7.2 The Jarque-Bera Test

C.1 EXAMINING A SAMPLE OF DATA

When faced with a new set of data observations, or a data set, it is wise to "look" at the data
graphically, and to look at its summary statistics. To illustrate open the Excel file hip. You will
find a single list of numbers with the label y in cell Al. Examining the definition file hip.def we
find that the variable y is the hip width of 50 individuals; we also find some basic summary
statistics that we will recompute. Save the workbook as Appendix C. Rename the worksheet ply
hip data.

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

If the Data Analysis tool does not appear on the ribbon, you need to load it first.

431
432 Appendix C

Select the Office Button in the upper left comer of your screen, Excel Options on the bottom of
the Office Button tasks panel, Add-Ins in the Excel Options dialog box, Excel Add-ins in the
Manage window at the bottom of the Excel Options dialog box, and then Go.

. ------

Excel O ptions

Popular

Proofirtg

Advance�

Customize

Excel Optjom �X Ej!;j.t Excel J f Add'·lns Manage'; I Excel Add-l111s


v 11 110 ... [)J

In the Add-Ins dialog box, check the box in front of Analysis ToolPak. Select OK.

Add·Ins. a>tailable;

�D Il
· · mli'll
l · ··ml-n1I� ==1
0 K� I
Analy.sis ToolPilk VBA- IL,f=d

Now Data Analysis should be available on the Analysis group of commands. Select it. From the
dialog box choose Descriptive Statistics and select OK.

Data Analysis
J
8,nalysis Tools

Anova: Two-Factor INithoutReplica.tion


Correlation
covariance
Descr 1 t1ve Statistics
EKp9nentialsrnootl-iirtg
F-Test Two-sample for Variances
Fourier Analysis
Histogram
Moving Average
Randcim Number Generation

In the dialog box that results specify the input range of the data to be Al :A51, indicate that the
data are in Columns and indicate that there is a Label in the First Row. Under Output options
choose New Worksheet Ply and assign the name hip data summary stats. Finally, check the
box next to Summary statistics so that the statistics will actually be transferred to the new
worksheet. Select OK.
Review of Statistical Inference 433

Descriptive Statistics
__ ]�
mput
OK
lnp.ut Range: A1:A51

Groupeq �y: (i' �olumn�
€a'ncel
I
V l,_abels irrFir:stRow
(' Rows ttelp
I

utput options -------�

( QutplJt R.;inge :
,

� New Wor.ksfieet E.ly : jhip data summary .stats

("' New W.or�bo'ok

P'!§��0��1·:�:��i��1��j 4111111���-�!i
C Coofidence Leve/·for Me.;in: 95

rJ K1n L_grgest:
pti smallest:

The summary statistics are pressed into two columns so that the labels are not visible, but it
should be highlighted. Back in the Home tab, go to the Cells group of commands. Select Format
and then AutoFit Column Width.

I I
-
A B c
1 y
2
3 Mean H.1582
4 Standa:rd E. 0.25656
Excel -

5 Medi-an 17.085
6 .M od. e t6.4 Add-Ins Acrobat 1� - o x

7 Standard l. 1.807013 ""Insert� E �

8 Samp1e V� 3.265i2!H
� Delete -
[i1 -
9 Kurtosis -0.61015 Sort & Ftnd &
-

10 Skewness - 0 0 1 42 6
_
I $1l Format -

] I
� T

rrilter - Seled -
11 RangH 16J17 1Gell Size

12 Minimum 1 3 53
_
XC Row He1ght...
1 3· Maximum .20.4
�utor:[t Row H'eLght
14 Sum 857.91 - ·�
n Columll'll Width...
15 Counf 5.0
� I
-

H3 _ Auto Fit Colu mn Width


--

-
·

I� � � �' I hh> data summary stats Qe:fault Width....

This makes the columns fully visible. The values reported and brief explanations are given next.
434 Appendix C

y
Mean 17.15819992 y = "i.yJN
Standard Error 0.255550251 se (y) = 8 /VN
Median 17.0850005 SOth percentile
Mode 16.4 most frequent value
Standard Deviation 1.807013155 8
Sample Variance 3.265296541 (jZ = "i.(yi -y)z j(N- 1)
Kurtosis -0.610148853 measure of peakedness
Skewness -0.014256196 measure of symmetry
Range 6.87 max-min
Minimum 13.53 minimum value
Maximum 20.4 maximum value
Sum 857.909996 "i.yi
Count 50 N

The values of Kurtosis and Skewness are calculated by Excel using slightly different formulas
than used in Principles ofEconometrics, 4e, p. 702. In fact, the statistic reported by Excel is often
called "excess" Kurtosis. It is a measure of Kurtosis minus 3, which is the Kurtosis value for the
normal distribution. The formulas are equivalent in large samples, except for the minus 3. We
will take this opportunity to show the calculations explicitly. The formulas are used in Section
C.7.2 of this manual as part of a test for normality.

Copy the hip data to a new worksheet named statistics calculations. In column G2:Gl8 enter the
labels shown below.

A Ii c u c I- G
y
2 14_96 Mean
-3 17.34 Slam.lcinl Er rn1
4 164 Mflt!iHll
!i 19.33 Mode
6 17_69 Standard Deviation
7 17.5 Sample Variance
8 15_84 Kurtosis
!l 18.6!) Skewness
10 18.53 f{angg
11 18.55 Minimum
12 14.70 Maximum
13 fftl!) Sum
14 18.36 Cuuul
1" 17 !'i9 Sigma tilde
1G 16.64 Mu2
17 20.23 Mu3
18 16.98 Mu4
I� 4 � �I hip dilra s_.mmary stiits , nip d�l:il l statis,_, ______ _____

• In H14 enter the formula =count(A2:A51) to obtain the sample size N = SO.
• In H2 enter the formula =sum(A2:A51)/H14 to obtain the sample mean y = 17.1852.
• In Bl enter the label y-ybar. In B2 enter the formula =A2-$H$2. Copy this formula to
B3:B51.
• In Cl:El enter labels (y-ybar)A2, (y-ybar)A3, (y-ybar)A4. In C2 enter =B2A2, in D2
enter =B2A3, in E2 enter =B2A4. This will create the square, cube and fourth power of
the difference between the value of y and the sample mean y.
Review of Statistical Inference 435

• Highlight C2:E2 and move your cursor to lower right comer of your selection until a
skinny cross is formed. Left-click, hold it, and drag it down to cell E51.

A B
1 y y-ybar
2 M.96 -2.1982
3 H_34 Q_ 1818
4 1r6.4 -
0 7 5 82 ,
.

The first five rows of the result are:

A B c D E
1 y :y:-ybar (Y-:-ybar)A.2 (y-ybar}113 (y-ybar)A4
2 14.96 -2.19$2 4.83.2083 -10.62188 23.34903
3 H.34 0.1818 0.033051 0.000009 0_001092
4 16.4 --0.7582 0.574867 -0.435864 0.330472
5 1'9,· 33 2.1718 4.716n6 10.24376 22.2.4741
6 17-69 0_53-1801 0.-282812 0_15-04 0079983 ...

I� -4 .. �I h data statistics calrnl I 1111 �I

In cells H2:Hl8 complete entering the formulas as shown below.

G H ....

2 Mean =SUM(A2.:A51 }/H14


3 Standard Error =H5/S QRT(H 14}
"'

4 Median
- =MEDIAN(A2:A51)
5 Madie =MODE(A2:A51}
6 Standard Devi ation =SORT(HT)
7 Sampl'.e Variance =SUM(C2:C51)/(H14-1)
8 Kurtosis =H 18/(H 1 S114)
g Skewness =H17/(H15113-)
1D Range =H12-H11
11 Minimum =M:l �SA2:A5: l)
12 M aximu m =MAX (A2; (' 5 1 }
l3 Sum =SUM(A2 A51 �
14 Count
- =COUNT(ALA51)
15 Sjgma tilde ==SQRT(H16)
115 Mu2 ==SUM(C2�\:51)fH14
17 Mu3 =SUM(D2:051)!H14
is Mu4 =SUM E2:E51 ,/H14
I� � � �I h" data statisUcs cakulatio I IUI

The numerical values are shown on the following page. Note that the values match Excel's
descriptive statistics except for the Skewness and Kurtosis, which are computed using the
formulas in the middle of p. 702 of Principles of Econometrics, 4e. The value of excess Kurtosis
=Kurtosis - 3 = - 0 668 4 7 which is close to the value Excel reports for Kurtosis automatically
.

when computing descriptive statistics. In large samples, the calculation using our approach and
that of Excel will converge to the same value.
436 Appendix C

G f:I

2 Me-an 17_ 15819992


3 ....Standard
.
'" -
Enror
"-
0.2SSS!J02!J1

4 Me<lian 17.0850005
5 Mode 16.4
6 S1andard Deviation 1.807013155
7 Sample Variance 3.255296541
S Kurtosis 2.331534288
9 Skewness -
C> . 0 1 332 49
10 R�n(JP. R 87
11 Minimum 13.53
12 Maximum 20."1
13 Sum 857.909996
14 Count '.iO
15 Sigma tilde 1.788851757
1ti Mu2 :>.1Y9Y�Uti1
17 Mu3 -0.0791 3797
1-8 Mu4 23.874771 •

C.2 ESTIMATING POPULATION PARAMETERS

Consider a population of interest. We wish to examine a characteristic which we denote by the


random variable Y. The population parameters E(Y) = µ and var(Y) = a2 provide summary
information about the location of the center of the probability density function of Y and its spread.
To estimate these parameters, assume we have a random sample Y1, Y , , YN. The estimators of
2
• • •

the population mean and variance are the sample mean Y = L Y;./ N and sample variance 82 =

L (ii - Y)2 /(N - 1). These estimators are random variables because their values change from
one sample of values to another. In order to illustrate this we will carry out a "simulation"
experiment, creating data such as that in Table C.2 of Principles of Econometrics, 4e, p. 696. We
will create 10 samples of random data from a normal population with meanµ 17 and variance =

a2 = 6.25.

C.2.1 Creating Random Samples

Label a new worksheet ten samples. In cells Bl:Kl place the labels bl, ..., hlO as sample names
of the random samples we draw next.

Select Data Analysis then Random Number Generation, and OK.

Data Analysis _

f',nalysis Tools

DescriptjVeStatistics
Exponential smaatti ngi
F-Test Two-Sample for Variances
Fourier Analysis

u
[lelp
Histogram
Jml
o
M vi-
n 'lA e !lll
ra e �-
l ---
Review of Statistical Inference 437

Create 10 samples of 40 observations each based on the normal distribution with Mean = 17 and
Standard deviation = 2.5. Recall that the standard deviation is the square root of the variance.
For <J2 = 6.25 this means <I= V6.25 = 2.5. Creating numbers that behave randomly is a
science, and how they are created is beyond the scope of this work. It is quite a fascinating
subject and many web sites provide introductions. See, for example, http://www.random.org/ or
visit Wikipedia http://en.wikipedia.org/wiki/Random_number. We specify a Random Seed =

12345. The actual value of the seed does not matter, but odd numbers with 5 to 7 digits are
frequently chosen. If you do not include a random seed value, Excel will create its own value
based on the time and date. If you do not use a seed, each time you generate a set of random
values you will obtain different values. This is an exciting possible approach, and one that we use
at various points in this manual. However at this point we will use a seed value so that you can
follow our steps and replicate our values. The values will be placed in the cells B2:K41. Select
OK.
-

Random Number Generation _I w�


Number of �ariables: OK [;J
e::cihcel I
Qjstrib1,..1tion: INormcil !::i�lp

arametErs-------

2tandard deviation = j2.s

8_andorn S�edi 112345


Outputoption.s·-------­

(i' Qu:tput Range : ls2:1<41


( New Warksh.eet E.iY:

(" New .Wor-kb.otJk

The 10 columns of numbers we obtain represent values we might have collected from a
population. We have N = 40 observations in each sample. The first few rows should look as
shown below.

A 8 c I D E IF c H J K
h1 h? h:l h4 h:i h16 h7 hR h9 h1(]
2 15_ H3482 17_53583 18.99'207 HU3595 14.691114 18.41487 19.47136 14_5651 20.02987 18.41455
3 16.72703 16.69642 17.78928 18.053.22 15.6:2357 �6.55638 18.10231 15.77295 19.06652 1'1.00899
4 15.77338 15.34563 15 9'641 15.78947 13.49482 �5.83066 17_98444 13.92702 12_63854 19 53194
17.95701 13.20390 10.34170 21.08038 14.2090:) 18.24109 14.91513 17.874-18 10 8G04::i 197-0842
16.74914 16.52 18.19'651 18.921·67 16.2352 20.00413 12.78505 15.15303 13.63347 17.30009 ..

I• • • " 'hrp data stabstJcs caktmtons • ten samples �


438 Appendix C

C.2.2 Estimating a Population Mean

The values in sample hl are values of the random variable Y, namely Yi. y2, ..., y40. Using these
40 values we compute the sample mean y = 'J:.yJ 40. In cell A43 enter the label ybar. In B43 enter
the equation to calculate the sample mean, =average(B2:B41).

40 12-27654 .21.13776
41 Hi83417 17_30435
42
43 ybar I =avera_ g e (!J2 : 641 )
44

statistics calc

Press Enter. The value computed is 16.99915. Move your cursor to the lower right comer of the
cell B43 until it turns into a skinny cross. Drag this horizontally to K43 to copy the formula. The
sample averages of the 10 samples are shown below.

A R c: n F f G H I .I K
38 180!)0[16 19.86662 14.07659 19,10689 2U!l849 14.59571 20.52626 18,43039 16,70643 17.1101
39 13.15166 12.34321 16.19744 20.48739 13J34245 13.71275 15.17232 18.62732 19.08595 15_67293
40 12.27654 2L'13n6 20.2!l453 141.628 17.71345 14.84!l!l7 116.28037 17.15855 19.80795 16.05368
41 16.83417 17.30435 13.11689 18.00358 19.99195 15.45454 13.15164 14 07355 15.58759 15.52478
-12
43 ybar I 10_99915 16.39387 17.CD811 :J 17.14914 17.2u004 17.1576 16.71567 15.93504 10.53494 10.712811
..All
l<I � > >I , hp di!ta • �tllthtb C<1kull!l:ion:io J ten .511mples,

Different samples yield different sample means, and the sample mean is an estimator of the
population parameterµ. As shown in Principles of Econometrics, 4e, p. 697, the sample mean is
an unbiased estimator because E (Y) = µ. This property says that if we take many samples from
this population, the average value of the sample mean will equal the true valueµ. Our illustration
has only constructed 10 samples, which is not enough to qualify for "many samples," but we can
still compute the average of these 10 values of ybar to illustrate this property. In cell M42 put the
label average, and in M43 the formula =average(B43:K43).

41 16..13:3417 1 7:)0435 13.11689 16.06J5(l 1'il.99195 15.45454 13.16164 14.07355 15,5875'9 15. 52478
42
•13 ytrar [ 16:99915 16_39387 17.013115 17.1'19H 17.26001 17'_�576 16 71567 16.93504 16.5349"1 16.71281 ]
44
45

Press Enter. The resulting average is 16.894. This is notµ= 17. Repeating the experiment with
1000 samples, the average of the sample means is 16.9902, which is very close to the true mean
17.

C.2.3 Estimating a Population Variance

The estimator of <J2 is 82 = r.(yi - y)2 /(N - 1). This too is an unbiased estimator. For each of the
10 samples hl-hlO compute the sample variance using the var function. Enter the label sighat/\2
in A44. In B44 enter the formula =var(B2:B41). Press Enter. Copy this formula across to K44.
Review of Statistical Inference 439

In M44 compute the average of the 10 variance estimates by entering the formula
=average(B44:K44). The result is shown below.

A B C __ _!)_ I- F G H_ I_ L .• K .. N
40 12.21654 21.13716 2.Q.294(>3 11.6"26 11./1�45 M.!31997 16.26037 17.1�65� 19Jl0/9� 1\J.l>',;,3G6
41 16.8:3417 17.30435 13.11689 18.•003-GB 19.99195 15.454M 13.16164 H.07355 15.58759 15.52478
42 average
4:l ybar 16.9�915 16.39387 f7.0811!i 17.1�9'M 17.:26004 17 1570 16.71567 16.935().1 iG.53'194 16.71281 16.89-<l
44 sigha1•2 6.457193 5.74902 T.J29992 3.71J165 6.600413 7.tl2167 6.8<10020 :S.797105 9.J119.506 5.7�1701 l 65'15661�
45
..16� • � • llp .i.-4
1 . ''"'"'""' �...1� • '"" "" '"Phis . ;:i]liiiiiii
i iiii
ii iiii
ii iiii
i iiii
ii iiii
ii :iii ll�====:=!i::=====.t::=::::J�Li

The average of the 10 variance estimates is 6.545661. If we repeat this for 1000 samples, the
average value of 82 is 6.252163, which is very close to the true value 6.25.

C.2.4 Standard Error of the Sample Mean

The variance of the sample mean is var(Y) = a2 /N = 6.25/40 = 0.15625. This value
indicates how much the sample mean Y varies from sample to sample. In the worksheet ten
samples, label cell N42 variance. In N43 enter the formula =var(B43:K43). This is the sampling
variation of Y. In the 10 samples hl-hlO the sampling variation is 0.084495. For 1000 samples
we obtain a calculated variance of the sample mean of 0.144631. Sampling variation is harder to
capture than the average value in a simulation experiment. In a larger number of samples the
variance of Y will approach 0.15625.

The value of the variance of Y is usually unknown because a2 is unknown. The estimated
variance is var(Y) = 82 IN. The square root of var(Y) is called the standard error of the
mean or sometimes the standard error of the estimate. It can be referred to as se(Y) = 8/../N.
The standard error of the mean is a very important component of hypothesis tests and confidence
intervals. It is reported automatically when we use Descriptive Statistics in the Data Analysis
tool. Let us add it to our ten samples worksheet. In cell A45 enter the label N, for sample size. In
B45 enter the function =count(B2:B41) and press Enter. This counts the sample size N = 40.
Copy this formula to C45:K45.

In A46 enter the label Std error. In B46 enter =SQRT(B44/B45). Because B44 contains the
estimated variance, the command takes the square root of 82 /N , which is se(Y) = 8/../N. Copy
this equation to C46:K46. The calculated values should look as shown below.

A B c D E F c HI J K l N
4:J ytrar 16.W915 16.. 39J67 17 00115 17 14914 17.26004 17.1576 16.71567 1G-.93:i04 16.53494 16.7'120-1 16.094 0.004495
44 s1ghat�2 ll.451�93 t;.(4!182 f.329992 J.T1J10� 6.836413 /.El2161 S..!l40020 5.79710!> 9.319500 !>.141/();1 6.>-15661
45 N 40 40 40 40 40i 40 40 �O "'10 40
�6 Std er1m 0.401783 0.3791J8 0.428077 0_3()468 0.'1t4922 0.436511 0-413522 0.360694 OA8268B 0_3788'7
47
·11 :1 • •-� tlat-3 . �� �n tP.n-�111p•1e.�. o
• :..:..mi::::====::ii:i===�••
______

C.3 THE CENTRAL LIMIT THEOREM

An amazing result in the theory of statistics is the central limit theorem. It says, if we take N
random variables, Y1, Y2, ..., YN, that are statistically independent and identically distributed (no
matter what that distribution might be), then the sample mean Y will have approximately a
440 Appendix C

normal distribution with mean µ and variance a2 / N. This is what in statistics is called a "large
sample" or "asymptotic" result, which means that for the approximation to hold the sample size N
must be large.

Specifically, the theorem (Principles of Econometrics, 4e, p. 699) 1s stated in terms of the
standardized variable:
Y-µ
ZN = .JN � N(0,1)
a/ N

This standardized variable has an approximate standard normal distribution in large samples. To
illustrate, we use one of the simplest but most useful distribution in statistics: a uniform random
variable in the interval between 0 and 1. Create a new worksheet called CLT. Select Data and
then Data Analysis. In the Data Analysis window choose Random Number Generation. We
will create 1000 variables-these will be our samples. Each sample will consist of N = 10
values. The Distribution is Uniform between 0 and 1, and we use a Random Seed= 12345 so
that you can replicate our results. In the Output Range simply specify cell Al. Select OK.

We will not use column and row labels in this example (too many) so remember that each column
is a sample of 10 observations, and the rows 1: 10 are observation values from a uniform
distribution between 0 and 1.

--
Random Number Generation
l!J�
Number of'{ariables; j1000 l
I OK
I
Number of Random Num!2_ers: J10 Cancel
I
Q.istributJon: ! uniform iJ !jeJp
I
arameters--------..,

B§.tween
lo

11ando m Seed: 112345


-Output options �------�
(9, Q.utput Ran�e :
("" NeW<Wor:k�heet E'.ly:
I New .Workbook

The result is 10 random numbers between 0 and 1 in columns A to ALL. In cell A12 enter the
formula =average(Al:AlO), and press Enter. This will compute the sample mean Y for the
values in Al:AlO that represents the first of 1000 samples. Next, you need to copy and paste this
formula to cells B12: ALL12 to compute the sample means for all the samples. The easiest way
to do this in this case is frrst select A12, select Copy, select B12, press and hold down the SHIFT
key, press the CTRL and END keys, and finally select Paste.
Review of Statistical Inference 441

Remark: When faced with the task of copying formulas across large ranges
Excel' s keyboard shortcuts become very useful. Click on the Help button and
search.

�) Excel Help -�X

·�
keyboa1·d shortcuts
Table of Contents x ExcdHome

IVbat':s

J
new
Searched for: "keyboard shortcuts11
;e-tliing help

hstolling 2 3 4 • Nerto
Result:sl-25oftop100 Paige: [I]
\cti\I att ng Ex.�el
·� Keyboard shortcuts.im tbe 2001 otfioesyst�rn
:uotomizl ng
Tr.illJng
\cce-sSi bi! 1ty
Keyboi:lrd shortcuts for Microsoft Clip Organizer
I@ Use the keytmarcf ta work: w!th Rlbbo1 Help> Working with gr<iphics >Adding pictures, shapes,
t� Ribu�n, Helpi, and other Miernso�
to!' •

Off
1
'r7

1@ Use access keys from Office 2003 ·

_iJ

For selecting large areas the following are very useful.

• CTRL+ARROW KEY moves to the edge of the current data region in a


worksheet.
• SHIFT+ARROW KEY extends the selection of cells by one cell.
• CTRL+SHIFT+ARROW KEY extends the selection of cells to the last
nonblank cell in the same column or row as the active cell, or if the next cell
is blank, extends the selection to the next nonblank cell.

The results for the first few columns are shown below. The values in row 12 are sample means.

A B c 0 E
J 1- F G H ..

-
1 0231452 0.584857 0. 78722 5 0.675.222 0.177862 0. 714.28.6 0.838557 0.165197 0..887.234 0.714255.
.

2 o.:oml:453 0486862 ()779015 0..98526 0 254646 0.873.287 0.963012 0.678854 0.610797 0 972472
3 0.987091 06HH49 0.834315 0988189 02275·15 0,.544816 0..572405. 0 795038 0.980153 {)_058687
--

4 0.3'88195 0.3336.28 0.379833 0.892239 .O,OOml-58


...
0.39-201 0.4:93851 0.583056 0.13477 .0.511338
.. ..
-

5 0 1167516 0.4975.35 0.463942 0.665487 0 348796 0.166906 0.41251!} 0.163213 0.. 186529 0.576·922
6· O.D22309 0.09.89-41 0.87002.2 0 6961 27 0.873684 0.652058 0 047365. 0.179357 .0 80'1599 04163973
..

7 056
1 3555 0..0116 68 0,854671 0.577502 0711722 0,.726646 0.952055
-
0 396527 -
0407422 0.722037 I
...

8 o._6Z2889 0.87:2707 0.5H124


. 0.018281 0.77HJ.35 0 558184 0.94045.8 0.42365.8 0 024934 0 745415

1
- -

9, 0.91879 0.137669 0.561174 0.. 725.486 0402264



0 584918
-
0.085665 0...757408 0.877804 063£769
m 0.34376 0.949858 0 949 44 0325.846 :0_081129 0626576 n.esss12 0.7731.05 01'1276 0788354
11 -

12 Q.34
. 1014 OA-53856 0.71872 .0 670196 .0.409162
.. 0.617.356 Ofi0137 0.4719' 18 0.54716,9 0.595-572 .,,
I� � � �I lli data statistics calculations ten sam les t:LT I� •1

To display the shape of the uniform distribution, first enter the label Bin in cell ALNl. In
ALN2:ALN11 put the values 0.1, 0.2, ..., 1.0. On the Data tab, select Data Analysis,
Histogram, and then OK. Use the 10000 values in Al:ALLlO to construct the histogram.
442 Appendix C

Specify the Bin Range to be ALN2:ALN11 and the Output Range to be ALN15. Finally select
Chart Ouput and then OK.

-.,,
Histogram _I
Input--------�
·�nput Range: IA1:ALL�
!:!,in Range: J!:-LN2 :ALl\l11 C:I�
t!elp
I �al:'.iels

'ulput options-------.
(.° Qt.rtput Range :

I· New Worksheet E'.ly:


('; (:Jew Workbook

r Pg_retd (sorted hisrogram)


n CUfil!Jlcitive Percent(;ige

� :����:t::c?:0:�0�

The distribution shows that the 10000 values are evenly spread over the interval [O, 1], with
about 1000 values in each of the intervals of width 0.1.

Bin Frequency
0.1 983
0.2 1002
0.3 975
0.4 1011
0.5 1030
0.6 1003
0.7 1036
0.8 944
0.9 987
1 1029

The corresponding Histogram figure (after some editing) is:

Histogram
1500
>
u 1000
c
cu 500
::I 0
O"
cu """
... ..-t N M LI) l.D r-.. 00 °' ..-t w
...... • Frequency
LL. ci ci ci ci ci ci ci ci ci 0

Bin
Review of Statistical Inference 443

The uniform random variable U on the interval [a,b] has mean E(U) =(a+ b)/2 and variance
var(U) =(b - a)2 / 12. If U is on the interval [O, 1], it has mean E(U) =0.5 and variance
var(U) = 1/12. The central limit theorem says that the standardized Y variable is
asymptotically standard normal distributed, which in this case is

u 0.5 [J 0.5
� N(0,1)
- -

ZN = =
.J1;12;m .J11c12 x 10)

In cell A13 enter the formula for the standardized variable, =(A12-0.5)/SQRT(l/(12*10)), and
press Enter. Next you need to copy and paste this formula to B13:ALL13. An easy way to do
this in this case is first select A13, select Copy, select B13, press the FS key, use your scroll bar
at the bottom of your Excel window to get to the right of your table of data, select ALL13, and
finally select Paste. The first few values are shown below.

A B c D E F G H J
12 0.34'1014 0 453856 0_71872 tH370rn6 0409162 0_617356 0_60137 0 471978 0.54716 9 05955.72
13 - 1 741 6 1
_ -0 50548 2_395958
.. 1_'8644 -,0:99508 1 285569 L11045'6
.. -0 306-97 0.516715 1 _046'936 •
I� � • •I statistics calculatrans _. I

Now we repeat the steps of constructing a histogram. In A14 put the label Bin. In A15:A30 put
the values -3.5, -3.0, -2.5, ..., 4.0. On the Data tab, select Data Analysis and then Histogram.
Fill in the dialog box as shown below to chart all the standardized values in row 13. Finally select
OK.

--.-
Histogram
Input·-------__,,.

Input Range: IA13:ALL13 �


�in Range: b.15 :A30==-- �
!::!elp
C L,abels

wtput optior;is --------,.

(i'. Qt1tput Rani;ie;


(""., N.ew W1Jrk;:;heet �l:r:
("'" New Workbook

.I P@reto (s,ortep histiDgram)


I E:uto_ulat!vf:) PerrnFJtage
P" Chart Output

The resulting histogram show a bell shaped curve (we have eliminated the gaps) which is the
characteristic of a normally distributed random variable. Note that it is centered at 0 and the range
of values is -3 to 3, which you will see using Table 1 in Appendix E of Principles of
Econometrics, 4e is 0.9974 of the probability from a standard normal distribution.
444 Appendix C

Histogram
250

200
>
u
c 150
cu
:::J
CT
cu
...
100
LI..

•Frequency
so

0
LI'! M LI'! N LI'! rl LI'! 0 LI'! rl LI'! N LI'! M LI'! '<t (])
......
0 0
I I I
M N .....; rl N M 0
I I I I

Bin

What we have shown is that if we take samples of 10 values from a uniform distribution, which is
not bell shaped at all, then the standardized means, or averages, of these samples of 10 values has
a probability distribution that is approximately normal.

C.4 INTERVAL ESTIMATION

If Y N (µ, a2) then Y


� � N (µ, a2 /N). From this we can construct the standardized normal random
variable :
Y -µ
ZN= � N(0,1)
a/-JN

The critical values from the N(0,1) probability distribution such that 2.5% of the probability is in
either tail are -1.96 and 1.96 (see Figure C.4 on p. 704 of Principles of Econometrics, 4e).
Consequently:

P[-1.96 < Z :5 1.96] = 0.95

which implies that:

[-
p y - 1.96
a
<µ :5 y + 1.96
a _ ]= 0.95
-JN -JN

In general, if ct>(zc) = 1 - a/2, then the 100( 1 - a)o/o confidence interval estimator ofµ is:
Review of Statistical Inference 445

It must be emphasized that a 95% interval estimator will contain the true population mean µ in
95% of many repeated samples of size N. To illustrate return to the ten samples worksheet. In
cell A48 put the label LL and in A49 put the label UL. In B48 enter the formula =B43-
1.96*SQRT(6.25/B45) and in B49 enter =B43+ 1.96*SQRT(6.25/B45). These calculations find
the lower and upper bounds of the interval estimate ofµ given that() is known. Copy B48:B49
across to K48:K49.

A a l': I) f. � G H I .J K
13 ywr 10 .9'il�1'1006 10.3�3009:i� 17.0611'1!14:'i ll'.M91'1399 U.2:000l0:13 17.1�7�9609 1G.T1�7155 1tl.'93W3.00 16.53'19J1391 11J.71i600?9
4•1 !.ighat�2 6.457193152 5.7\19819876 7.329�9189 3.7D3185124 11886413044 7.62!6702<1 B.8•10028208 5. 79710il6 9.31950595:2 5.741700823
15 N ·10 �o 40 �o ·1·0 �o -to 40 40 10
413- Sid error 0 .. 401763;!;2 o.::mr:m:i:u 0.420076659 0.3'04r6795S 0, 414922012 (1.436510009 0.413522315 0.1006936 0.4826437941 o.� 7007·00se.
47
46 bl 16.2•2439003 15.6191115 16.30639142 16.37436595 l6.4llS0765 16.:Jl32&1007 15,94001652 16.1602605 15.76011):{)68 15.93005026
49' Ul 17.7'1390009 17.160027� 17,!!S5M740 i7.92J90.201 �IJ.(1�53�455 ll'.9n356tZ 17.49D4,2:$7 11,7097966 17,30009693 17.41l756
6 32
.. . � >I � "'� Cdkulatl0$ j l"'n �11mpl P.<;

Note that the intervals created move around because they are centered at the sample mean Y
which varies from sample to sample. As it happens all 10 of the intervals we have created contain
the true population mean µ = 17. We can ask Excel to tell us this using some logical functions.

In cell ASO enter the label Cover. In A51 enter the label Success. In BSO enter the formula
=AND(B48<=17,17<=B49). Press Enter. This logical function is TRUE if the value of 17 is
between the upper and lower bounds, otherwise it is FALSE. In B51 enter the formula
=IF(BS0,1,0). If the result in BSO is TRUE, then we assign the value 1; and if BSO is false we
assign a value of 0. In this way we can record whether our interval estimate successfully contains
(or covers) the true parameter µ. Copy the formulas from B50:B51 to C50:K51. The result is
shown below.

A l:l c 0 f- I- G H J K
•13 ywr 10.9·99MSOO 10.393�953 17.081149�5 17.1491'1399 17 .2:00036�3 17. l::i7�9509 1G. T1567155 1e.93503BV 16.. 53•193891 10. 712130829
4� sig11itt•2 6-457193�52 5.749819876 7,329991�9 3_7tt::l185124 13_ 00&11 JO.M 7.62167G2<11 6.840028208 5_7971046 9.31950595:2 5.741700823
45' N 40 40 40 40 40 40 40 '10 40 40·
41l ·Sid error 0.401763)2 0,379137�34 0.420076659 0.3'04'67955 0. 414922072 0.436510009 0.413522315 0.3006936 0.482607941 0.370070058
47
4t LL 16.22-439003 tS.619111:5 1S..JOOJ9H:! 16.37438$91) i6A858785 16.M28400'i' 15.94_001652 16-.1002005 15. 760tetl88 iS.93005026
49 Ul 17 773900M 17.1ae.6275"5 17.85.590748 17.9.2}90201 ta_03!139455 17.9l2l561:Z 17,4904'.3257 17 7097960 17.30969()� 1741!7581332
50 cover TRUE muE TRUE TTIUE TRUE muE TRUE TRUE 'ITTUE TRUE
.ti1 s.uccess 1 •

14 4 > >I ..� ..,i;u!.tior.,, l '"'" ..... 1111'1 � I « I

To further illustrate, create in a new worksheet 1000 samples of size 40 from a normal
distribution with mean 17 and standard deviation 2.5.
446 Appendix C

--
Random Number Generation

Number of'i_ar1ables: j1rno I I OK �


Number ofRandEJm Numger-s:
j�
==----
Cancel . I

!;iislributiori:
EJ C:telp

·�tanclatd deyiation �

L1234s
).itput options•------­
C Quip.it Range :

(i' New Worksheet E:ly: {§samples


('" New Workbook

For our 1000 samples we obtained an average value of success of. 961. This means that 96.1%
of the 1000 interval estimates cover the true parameterµ = 17, which is close to the expected
95%. If we had used more than 1000 samples to test this idea we would have gotten a success
rate closer to 95%.

C.4.1 Interval Estimation With u2 Unknown

The interval estimation procedure described above depended upon specific knowledge of the
value of var(Y) = u2• If the variance is not known we substitute the estimated sample variance
82 = L (ti - Y)2 /(N - 1). When we do so, the standardized variable follows the t-distribution
with N - 1 degrees of freedom.

Y-µ Y-µ
t = = --- � t (N-l)
fJ /../N se(Y)

The confidence interval estimator is now:

- a - -
Y ± tc ...[N' or Y ± tcse(Y)

The critical value tc is the 1 - a/2 percentile from the t-distribution with N - 1 degrees of
freedom, or tcN-1)·

Return now to the ten samples worksheet. Put the label tc in cell A53. In cell B53 use the Insert
function key and scroll to the statistical function TINV. This function returns the 1 - a/2
percentile. Given a = 0.05 and degrees of freedom N - 1 = 39, we see that the critical value is
tc = 2.02269. Click OK.
Review of Statistical Inference 447

ffI�J
-

Function Arguments �

l=lnDbability j.os fi· - o.os

Dcg_frecdom IB4,...-.5- """,_ .


- 1----------,ij 3'J

Qcg_li:ccdom i8 apo:1rrivc:-irlt:DgCr ndiG:Jtng Ilic rumbcr of dCg-ccs of1Tcccl6m ;tD crnilr.:icteri;;c.thc


dlStributio ri.

For:rrula result= ·-2.0�690901

He Ip on mis uxnan � � _ _c_


�_e_l �

In AS4 enter the label LL and in ASS enter UL. In BS4 enter the formula =B43-BS3*B46, which
computes Y - tcse(Y). In BSS enter =B43+BS3*B46, which computes Y + tcse(Y). Copy the
content of BS3:BSS to CS3:K5S.

A B C D E F I G H I I J I K
'f--�����������-t-���������������������� ·
43 y bar 16.9,!f915 16.3�387 f7- 08115 17_149·14 17.216.064 17_1575 16,_71567 16_93504 16_53494 15.71281
44 sig�at�.2' 6.457193 5.74982 7-32999:2 3_71318.5 '5.886413 7_62167 6_840(}2:8 5_797105 931.9'505 5-7417 '01
45 N 40 40 40 40 40 40 40 40 401 40
46 Std erro:r 0.40<1l83 0.379138; 0 .42 8077 0.304'6:8 ·0.41-4922 0.436511 0.41352:2 0.3:8069'4 0.4826-lHJ; 0.378.87
47
48 LL 1 6:2243 9 15.61·911 lS.30539' 16.3743'9 16.48588 1·&.38284 15..9·4092 16.1.60:28 15.7&018' 15.913805
49 UL 17_77391 17_116'853 �7_8.5591
- -- -
17_923,9 18 _ 03539 ff 9323 5
- - .. _
17-49043 17_ 7098' 17 -3097 17-48757
-
TRUE TRUE TRUE TRUE TRUE TRUE muE TRUE TRUE TRUE
51 SLJC081SS 1 t 1 1 1 1
52 - . -,- -

53 tc 2.02269'1 2.022691 2.022691 2.02:2$91 2.022691 2.0122691 2.022691 2.02269>1 2.022691 2.0.22'691
54 LL 16.18646 15.62699' 1 G.'21528 15.53W7 16.42138 115.27467 15.87'92:5 16.1550,1 15.55861 15.94·647
55 UL 11 9:1183 17.1,fi07S 17_94702 17_765421 18';099'9 18_04052
.. 17.5521 17_70506 17_51127 17.47'915 5
56 CD"/8f TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
5,7 :SUOG8SS 1 1 1 1 1 1 1 ....

1111 • I

The resulting 10 interval estimates for µ all cover the true parameter value µ = 17, but note that
now the center of the distribution and its width vary from sample to sample. The intervals are also
slightly wider because the t-distribution critical value is larger than 1.96. In a large number of
samples such intervals will cover the true parameter 95% of the time.

C.4.2 Interval Estimation With the Hip Data

Now we create a template for constructing interval estimates for the sample mean. Label a new
worksheet Interval Template. Set up your template as shown below.
448 Appendix C

-
A ' 8 A
-
1 Interval estimation off the population mea.n
2
3 Data Input
4 sample size �LI
5 Confidence level 0.95
6 Esnfmated mean (Y-bar} 1T 15819992

-7 Sta111de rd E rmr-SE(Y-bar) 0 255550251063506


-
-8 "'i
g Comput�d Values
1C:1 r1egrees of frF?P.dom df = 114-1
11 t critical value tc =TINV(1 85,810}
12 hi:!!lf_widlh =B 11"B7 -
13
-
_1L Confid'e11ce lnterva,I
....
15 lower limit - 86-B 12
____, <

10 upper limit - Bff•-B 1 2


,,

The values shown in the shaded area for the mean and standard error, under the section called
Data Input can be copied and pasted from the hip data summary stats worksheet. For example,
highlight the entry for Mean, then press both the Ctrl and C keys (Ctrl+C) to copy the number to
the Windows clipboard. Return to the Interval Template and click on the target cell to make it
active, then select Paste and Paste Values to transfer the numbers (or Ctrl+V).

fo�Le
[cirmulas
Pa�te Y:�I��
Nn !!l,llord�Fll
!ranspos�
Parle li!lk

L6 Paste fu>eaal,.
Pn'<tP a� l::!l'P-' rlink
fJ<J Prcture

The result is:


A B •�
r������--�������'--����.1...- -t
1 I nterval estimation of tthe populaUon mean

+r
4
Data Im put
- -- Sampl�SE-e 50
---rr Confiden.ce level 0_95

1j
Estimated mea111 (Y-bar) 17 15819992 _

Standard Error-SE(Y-bar) 0_255550251


,a
'9 Computed Values
101 degrees of freedom df 49
1fi

t-critical value tc 2.009575199
131
12 -
tnalf width 0_513547447

1:
I
Confidence Interval
lower limit 16_64465247

� 1
l'lppe!_ li mit 17_67174737 � ·

1• • • interval te.m late I 1111 Iii I


Review of Statistical Inference 449

C.5 HYPOTHESIS TESTS ABOUT A POPULATION MEAN

Hypothesis tests about the population mean are based on the statistic:

Ifthe null hypothesis H0:µ = c is true, then:

Y-c Y-c
t = = �t(N-i)
8/{N se(Y)

The null hypothesis will be rejected if the value of the test statistic becomes too large or too
small, depending upon the nature ofthe alternative hypothesis.

For the right-tail alternative hypothesis Hi:µ > c, we reject the null hypothesis and accept the
alternative if t � tc = tci-a,N-i) where tci-a,N-i) is the 100(1 - a) percentile of the tcN-i)
distribution. The value a is the level of significance of the test, and is the probability of rejecting
the null hypothesis when it is true [Type I error]. In the figure below m N - 1. =

Reject.fl0:
µ;;;;;; c

For the left-tail alternative hypothesis Hi:µ < c, we reject the null hypothesis and accept the
alternative if t � tc= tca,N-i) where tca,N-i) is the 100a percentile of the t(N-i) distribution.
The value a is the level of significance of the test, and is the probability of rejecting the null
hypothesis when it is true [Type I error]. In the figure below m N - 1. =

Reject H0:
�L = C

Do mot
rcjcctH0�
µ,=c

For the two-tail alternative hypothesis Hi:µ * c, we reject the null hypothesis and accept the
alternative if t � -tc t(a/Z,N-i) or if t � tc
= tci-a/Z,N-i)· The value a is the level of
=

significance ofthe test, and is the probability ofrejecting the null hypothesis when it is true [Type
450 Appendix C

I error]. The rejection regions each include a/2 of the rejection probability. In the figure below
m= N -1.

f(r)
IRejecl H0: �l = c Rejec;I H0; µ c =

µ Do um rej cct
Aocepl H1: #. r --�� ��-- Acee.pl NI :J.l ;i! l'
f-1o: µ,= c

C.5.1 An Example

Using the hip data, let us test the null hypothesis H0: µ = 16.5 against the right-tail alternative

hypothesis H1: µ > 16.5. For the hip data N = 50, and the degrees of freedom for the t­
distribution are N - 1 = 49. We reject the null hypothesis and accept the alternative if t � tc =
tci-a,N-l) where tci-a,N-l) is the 100(1 - a) percentile of the tcN-l) distribution. The value a
is the level of significance of the test. Let us choose the standard a = 0.05 level of significance.
The t-critical value is tc = 1.68. We will reject the null hypothesis in favor of the alternative if
t � 1.68. The value of the test statistic is:

y -c 17.1582 - 16.5
t = = = 2"5756
se(Y) . 2556

We reject the null hypothesis and conclude that the population mean hip width is greater than
16.5 inches.

C.5.2 The p-Value

The p-value is a number associated with a hypothesis. If we have the p-value of a test, p, we can
determine the outcome of the test by comparing the p-value to the chosen level of significance, a,
without looking up or calculating the critical values ourselves. The rule is:

p-value rule: Reject the null hypothesis when the p-value is less than, or equal
to, the level of significance a. That is, if p � a then reject H0. If p > a then do
not reject H0.

If you have chosen the level of significance to be a = .01, . 05, . 10 or any other value, you can
compare it to the p-value of a test and then reject, or not reject, without checking the critical value
tc.

How the p-value is computed depends on the alternative. If t is the calculated value [not the
critical value tc] of the t-statistic with N - 1 degrees of freedom, then:
Review of Statistical Inference 451

• if H1: µ > c, p =probability to the right oft


• if H1: µ < c, p =probability to the left oft
• if H1: µ * c, p =sum of probabilities to the right of t
I I and to the left of - ItI

For the numerical example in the previous section the p-value is the area under the tc49)
distribution to the right of 2.5756. This probability is 0.00654 and is smaller than a= 0.05,.
Following the p-value rule we reject the null hypothesis. In the next section we will build a
testing template for each type of test.

p = 0.00654

(
-4 -2 -1 0 1 4
t

tw. -A t{o. 9,49)

C.5.3 A Template for Hypothesis Tests


Insert a new sheet and rename it test mean template. To construct the template we will employ
two primary functions related to thet-distribution: TINV that we have already used to find the
critical value, and TDIST. We also use the logical operator IF.

The syntax for TDIST is TDIST(x, m, tails) where x > 0 is the value at which the distribution is
evaluated, and m is the degrees of freedom, and tails is 1 or 2. If tails = 1, the function returns
TD/ST= P(tcm) > x ). If ails = 2,
t the function returns TD/ST= P(t(m) < -x )+
P(t(m) > x ) .

IF(logical_test, value_if_true, value_if_false) evaluates the condition "logical test" and returns
either TRUE or FALSE. If the condition is TRUE then the function returns "value_if_true", and
if the condition is FALSE the function returns "value if false".

Recall that for helpful descriptions such as that above you can click the question mark icon, and
in the resulting Excel Help window you can type into the search box the term you seek help on.
452 Appendix C

@. Excel Help

7� -
HELP'I
..- P Search ..-

Fill in your worksheet test mean template with the following formulas.

1 Hypothesis ltests about Um mean of a population


·2
3 1Da1a ln�ut
4 :l>:Jmplo SIZC =!JU
s [stimaled mean {Y-bar) = 17 15019992
fr Sti'lndarr! Frrnr-SF(Y-tmr) 0 :;>;i;,;,;,07;.1 Of>:l:iOO
-

7 Null hypotilesis HO: mu= e 16.5


a Lewi of siQniificam;e alpha - Q 05
-·�
10 Computed Valrues
11 dr.;:; =B4--1
121� l--mitiS'liC lit:JIUQ - �(85-87)/B•e

1ir ight-tail test


Riyh�c1i(.k::,il \lalue lt: = =TU�V(2"B8,B11)
p-value =IF(B12>0,TDIST(B12,B11, 1},1-TDIST(ABS(B_:12),B11 ,1)}
=

H Dec1s;1on = =ll-(tH2>=tl15;K:e·1oc:t Ho","Do not mroc1 Ho")


10
19 Left-tail test
20 Left crilicai value -1c = =-TINV(2"·B8.,B11)
21 P-llf!IUa -IF(B12<0,TDIST(ABS(B 12.),811, 1),1 -TD IST{ABS(B12),B11, 1 })
-

22 Decision-= =IF(B12�=B20."Rejec:t Ho"."Do not reject Ho")


?3
24 Two-tai·I test
25 Absolute critical \lalue_tc -TINV(Bll,611)
-

25 p-llfllue ==I DIS I (ABS(El12),1:311,2J


27 Deci$ion = =ll(OR(012<=-025. [)12>=025}."Reject I lo",."01> not reject I lo")
"it -t� test mean templatoa,. ______________,,

In this template note that for p-value calculations we must first ascertain whether the calculated t­
statistic is positive, or not. Recall that argument for the function TDIST(x, m, tails) is x > 0.
Thus the p-value command uses the logical IF statement to check on that. For example, for the
right-tail test, the command is;

=IF(B12>0,TDIST(B12,Bll,1),1-TDIST(ABS(Bl2),Bll,1))

• IfB12>0 0) is TRUE, then the p-value is TDIST(B12,Bll,1) where Bl2


(the t-statistic >

is the t-statistic value and Bll is the degrees of freedom N - 1. This is P( tcN-l) > t).
• If Bl2>0 (the t-statistic > 0) is FALSE, then the p-value is 1-TDIST(ABS(B12),Bll,1)
where B12 is the t-statistic value and Bll is the degrees of freedom N - 1. Here we use
the symmetry of the t-distribution. The p-value for this right tail test is P(t(N-l) > -t)
= 1 - P(tcN-l) > t) by symmetry. The TD/ST function only computes probability
values for positive values of the t-statistic. So we take the absolute value of the t-statistic
Review of Statistical Inference 453

(which is negative based on the IF statement) to change its sign to positive and then use
the fact that the total probability is "one".

The resulting values in the template are given below. Note that the p-value for the test H0: µ = 17
against the alternative H1: µ > 17 is 0.0065 , which is smaller than 0.05. Based on the p­
a =

value rule (reject the null hypothesis when p < a), we reject the hypothesis that µ 17 and =

accept the alternative that µ > 17. Recall that µ is the population mean hip size for adults, and
this result means we can conclude that the average hip size is now greater than 17 inches, at the
5o/o level of significance.

A B ..

1 Hypothesis tests about the mean of a population


2
3 Data Input
4 Sampre- size:: 50
5 Estimated mean (Y-bar) :: 17_ 15819992
6 Stand a rd Errnr-SE{Y-bar) =- 0 25555.0251
7 Null hypothesis HO: mu :: c 16.5
8 Levell of si_gnificance alpha :: 0.05
'9
10 Computed V.aiues
11 df = 49
12 t-statistic value: = .2.5756,18366
-
3
14 Ri'ght-tail te:st
15 Ri·ght critical va lue. tc = 1 .. 67655.0893
16 p-va lu e = 0.006536948
- -

H Decision= Reject Ho
18
19 Le·ft-tail test
-

20 Left critical value -tc = -1 '676560893


21 p-;value = 0' 993463 05,2
22 Decision= Do not reject Ho
23
24 Two-tail- te·st
25 Absolute: critical valu e tc = 2 .0095.]5,199
26 p-value= D.0�3073897
27 Decision = Reject Ho ..

test 1111ean tem late \5 1111 I


454 Appendix C

C.6 OTHER USEFUL TESTS

C.6.1 Simulating Data

In this section we will carry out various tests. To illustrate we will use some randomly generated
data from normal distributions. Label a new worksheet three samples. In Al:Cl enter the labels
Yl, Y2, Y3. In these columns we will create 3 samples of size N 20 from the following
=

distributions: Yl,...,N(0,1), yz,...,N(l.5,1), Y3,...,(1.5,4). Select the Data tab and the Data Analysis
button in the Analysis group of commands found in the far right of the Excel ribbon. Choose
Random Number Generation from the menu. First create the N(0,1) data values as shown
below.

Random Number Generation

Number of�ariables:
]_
1 _

Number of Random NumQe


, rs:

Q.)s1ribution : tie Ip

lo
�tanda.rd deviation �

112345
-uiputoptfons-------=
(i _Qu(Jeiut Range:

r. New Workshe.et EN· 11000 samples


(" New Workbook

Then create the N(l.5,1) values, starting in B2 and using Random Seed 123. Because we will be
using tests comparing one population to another, by using a different Random Seed we ensure
that the populations are independent of each other.
Review of Statistical Inference 455

--
Random Number Generation II ]�
Number of Y'.ariables: 1 Ok�
Number ofRar.idom NumQ.ers : C anc el

Q._istribution: jl\Jormal ___ GJ .!::J.el p

Par:ami:iters--------=-

2tflhdar:d deviation =

8,_andom Seed·:

u'lputop:tions -------:;;;=

(i· Qutp1..1t Range.:

(' New Worksheet E.ly · J1000 samples


('" New Workoook

Finally create the N(l.5,4) values using Random Seed 1234.

-
Random Number Generation j
Number ofY'_arfables:
J_
1 _____
OK �
Number ofRandom NumQ.ers: Cancel I
Q.istr."ibuti. on:
..:J
Parameters-------

2tandard devi<=itior1 =

8,_ahdom Seed·:

. u'lputoptions -------­

(9; Qutput Range.:

(' New Worksheet E.ly; J1000 samples


('" New Workliook

The first few values should look like as shown below.


456 Appendix C

A I B c D

- Y1 Y2 Y3 -
2 -0]3407 -0_7136 -080898
3 0_214334 1_ 705'654 -3-46751
-
4 0_7968.29
- 2-036561 0_938478
5 0.454379 1.24'6432 0 .. 257885
6 -0..92354 3.746707 256976'6
11 1 � �I

C.6.2 Testing a Population Variance

Suppose the random variable Y --N(µ, a2 ) . If we have a random sample Yv Y2, ..., YN then an
unbiased estimator of the population variance is:

N - 2
az
=
Li=i (Yi -Y)
N-1

To test the null hypothesis H0: a 2 = a6 we use the test statistic:

(N-1)82
V= ----
aJ

If the null hypothesis is true, the random variable V--x(N-i


)' where x(N-i) denotes a chi-square
distribution with N-1 degrees of freedom. We can use a one-tail alternative, such as Hi: a 2 >
a6. In this case the test critical value is the 95th percentile of the x(N-i) distribution. Or we can

use a two-tail alternative such as Hi: a2 J


* a , in which case the critical values are the 97.5
percentile on the right and the 2.5 percentile on the left. The chi-square distribution is not
symmetric about zero, so these critical values must be computed separately.

To carry out this test let us first compute the descriptive statistics for sample Yl, and store them
into the worksheet Yl summary stats. Select again the Data Analysis button in the Analysis
group of commands. In the Data Analysis dialog box, select the Descriptive Statistics analysis
tool.
Review of Statistical Inference 457

Descriptive Statistics

Input-------==;­

�nput Rii!nge: 1Al:A21


Groupecj By: • !±_o,lumns
I• B.ows
v !..abels in Frrst Rbw

tput.opt1ons-------....,,.

I Q1Jlput Range :
I- New Worksheet Ely: lv1 sum'mary stats
r• New Workbook

P- r���:��si���ti��,��l
[" Coo.fidence Leve l for Mec;in; %

r Ktli Largest: 11,___


r Kth smallest-: 1 ____,

The results are shown below.

A B c
1 Y1
.2
,3 Mean 0.077479'3'99

-
4 Standard Error 0.15664J6934
5 Me-di an 0.216·5023345
r6 Mod·e #NIA
7' Standard Deviation 0. 7005.416384
8 Samp�e Variance 0. 490716 5,237
. .
191 Kurtosis -1.00129'8518
10 Skewness -0.29126'6209
11 Range: 2..38:4351774
- -
t2 Minimum -1 ."172402.335
13 Maximum 1.21194'9439·
14 Sum � .549687978
15 Gaunt 20
16
I� � � �I Y'l summa

The Sample Variance is 0.490765. This is the statistic we have called 82• Insert a new
worksheet and rename it test variance. In it, build the following template. Copy and paste the
value of the sample variance into the template.
458 Appendix C

II 8

3 D•hl Input
1 'SAnllllO $�0 = 20 I
$ sam111e V\ltlMCe - a •90m5?'.Xlli�l014
s NVI �Yi>O� HO. "'9•""'2 • < 1S
I lmiu{ of !ilgn1hc.t1rn;c � :; () 05

I
9 Cornp'lllod V1l1u1•
10 di= =B.,.,
ll <nl S<IU&rc�lallSll< l'llKIO. �o IOl'GSIB6
12 I+
1� RighM11H 1es1
1• Rigbl OJ11JGilt""""' • •GHlltiV(B7.810)
1& �"'""' "' (B11>, , AH "R<JjOC! IO:l·,·0o l1Ql roioct Mo")
16
18 Lon <ntKOl l'O\UO = =1<!11\NV(1-£17,1110) I
19 Decision IF(61 lu618.'Reject tlo"."Oo •ot rejecl 110")
20
2.� ifwo·!tolltost
2f t.Qncnoco1Yl}IOO =CHllNV(1 au�1a10)

I
=

23 R1gll1 cnucal vrui.e • •ClillNV(BU?,lll 0)


2� Oec1Sio11 :IF(OR{911<:E!21,0n>:B23),''Rejocrno","Oo OOI r!!ll'<'l llo")
. t>•n"-fllf •a{y2) , '""-<rJ)•v..-(y3} th"""""""""'• tesnarr, qce •::::::::::::::::::::::::::
:: ::::::::::::J•••••
::

The function CHIINV is used to find the test critical value. The arguments of the function are
CHIINV(right_tail probabiliy, degrees of freedom). Thus in B7 the right-tail probability is
0.05. The resulting values are:

A B ..

Hypothesis tests about the variance of a population


2
3 Data Input
4 SampJe size = 20
5 Sample variance = 0.490765237
6 Null hypothesis HO: si_gma"2 = c 1.5
7 Level of significance alpha = 0.05
8
9 Computed Value·s
10 df = 1�
11 ehi-square statistic value = 6. 21635'9664 -

12
13 Right-tail test
--
14 Right critical value = 3-0.1435.2721
15 Decision Do not reject Ho
rn
17 Le·tt-tail test
rn Left crltical value = 10:11701315
19 Decision Reject Ho
.20
21 Two-tail test
22 Left critical value = 8.90651,6548
-

--
.23 Right critical value = 3.2.8523.2686
24 Decision
-
Reject H91 :yo

I� � � �I test variance / I� 1111 •

Thus, when testing H0: u2 = 1.5 against the alternatives, at the 5% level of significance:

• Hi: u2 > 1.5: we do not have enough evidence to reject the null hypothesis,
• Hi: u2 < 1.5: we have enough evidence to reject the null hypothesis and conclude
that the population variance is less than 1.5,
Review of Statistical Inference 459

• H1: a2 =f:. 1.5: we have enough evidence to reject the null in favor of the alternative
and conclude that the population variance is not 1.5.

C.6.3 Testing Two Population Means

If we have two populations Yl-N(µvaf) and Y2-N(µ2,ai) we may like to test the null
hypothesis that the two populations have the same mean. This test is carried out differently if the
two population variances are equal (Case 1) or unequal (Case 2). Recall that the three samples
worksheet contains N = 20 Yl-N(O,l) , YZ-N(l.5,1), and Y3-(l.5,4). Let
observations from
us first test that the means of populations Yl and Y2 are equal. The test statistic formula is on p.
717 of Principles ofEconometrics, 4e.

Go back to your three samples worksheet. Select the Data tab, and then the Data Analysis
button on the Analysis group of commands. In the Data Analysis dialog box, select the t-test:
Two-Sample Assuming Equal Variances.

17.JEJ
Hj:,h1�;:im ...
OK�
Movin� Average
Rar.dnm f'.llmber 1Ge11erat1on
Cctra.�I I
Part :.and Percentile
R..egnxsbni
Sa11,plirig


t- 1 est: ra.-oo 1 wo �a , le ltlr Mllans

: 1i!
�t-Tii!·""llt• wo
ll-s
l!"llrlll�lg!l.,.,!.l
" Dl
' u11
m :in11gD
lA1
!\!oci
lllJd
lllll
v!ll
" l icm
r!!D
c9£
••- , .,_

z:-Te•t: Tvfo 5aml'le for Me.an•

In the t-Test dialog box enter the data ranges, select Labels, use the 5% level of significance and
output the results to a new worksheet. Since we are testing the null hypothesis that two
populations have the same mean, we specify the Hypothesized Mean Difference to be 0.

Two-Sample Assuming Eq�C?.1 yariances


t-Test:
-1]�
Input-
. ----------------.,,

Variable 1 Range: IA1:A21 OK tJ.


\lari<1ble ·�Range: jB1 :B21 Canc;el I
!:felp
Hypo"th§Sil!ect Mean Difference: jo
P" l;aoels

·&lpha: lo.os

@'\ New Worksheet E'.ly: l mul = mu2cequal var

( New .�ork:book

The result shows the calculated t-statistic as well as the one- and two-tail critical values and p­
values. Note that the one-tail p-value is calculated from the left tail because the test statistic value
is negative. If the test statistic value had been positive it would have computed the one-tail p­
value from the right tail of the t-distribution.
460 Appendix C

A B c .....

t-Test Two-sample Assuming Equal Variances


2
3 Y1 Y2
4 Mean 0.077479399
-
1.879124117
5 Variance 0.4907•65237 2.042555342

Iii Observafim1� ?Oi ?O


7 PoolBdV'aria1�ce 1.20000!:i289
8 Hypothesized Mean [)iffemnce 0
9 df ::m
10 t Sta� -5.002187394
11 l'(T<=t) one-te.iil 5.46538E-06
12 t Critical one-tall 1.685954461
13 P(T<=t) tvvo tnil. 1.0!J308E 05
1·4 t Critical two-tail 2.0-24394147

Go back to the three samples worksheet. Repeat the test using f 1 and f 3 and use the "unequal"
variance test option. The test statistic and adjusted degrees of freedom for this test are given on
pp. 717 and 718 of Principles ofEconometrics, 4e.

Data Analysis
Malysis Tools

Histogram
OK�
Moving Aver .age
Random Number Generation
.cancel I
Rank and Percentile
tieIp
Regression
Sampling
t-Test: P a i red Two .Sampl�-ftir Means
t-Test: Two-Sam le Assumin E ual variances

t-Test: Two-Sample Assuming Unequal


1 ..... -.: ... 1JB
'
. ------------------�
Inp,utc--
Variabie 1 Ranqe:
OK ,;J
Variable Z Range: lc1:c21 Cancel I
t!elp
Hypothgsized Mean Difference:
lo
P" t.abels
e,.lpha: J
....
o- o
.-s--

Ou tput options --------�


(! Qutput Range; J mul = mu3-l1nequal varl B
0 New Worksheet E'.ly�
(' New \!'iOrkbook
Review of Statistical Inference 461

����--�-��A �-�--�--����B�---'-----..:
C: __ --!�·
t-fest Two-Sample· Assuming Unequal Variances
2.
3 I������������������- 1-. ����y-3 ��
Y
4 Mean 0_077479399 0Ji14201614
5 Variance 0-490765237 2.769870711 =

6 Observations 20 20
_!_ Hypothesized Mean Difference 0
8 df 26
9 t Stat -1 _329270642
1 O P{T<=tt one-tai� 0_097652808
11 i Critica� one�tail 1-7'05617901
12 P(l<=t) ·two-tail o_ 1953056·H
i 3 t Critic.a� two>-tail 2 .05552.9418
-"�-��
'" " • •1 m.11Jl = mu3-un ual v.ar 1111

Recall that Yl-N(0,1) and Y3-(1.5,4). We fail to reject the null hypothesis that the means are
equal in this case. We commit a Type II error.

C.6.4 Testing Two Population Variances

Given two normal populations, we can test whether their variances are equal. Recall that the
three samples we drew were from 1-N(0,1) ,Y2-N(1.S,1) and Y3-(1.S,4). Let us first test the
hypothesis that the variance of Y2 equals the variance of Yl. Go back to your three samples
worksheet. Select the Data tab, and then the Data Analysis button on the Analysis group of
commands. In the Data Analysis dialog box, select the F-test Two-Sample for Variances. This
tool will carry out the F-test for equal variances shown on p. 718 of Principles of Econometrics,
4e.

In the dialog box, enter the range for Yl first, and then enter the range for Y2. Which one is
labeled Variable 1 and which Variable 2 does not matter for the outcome (p-value) of the test.

F-Test Two-Sample for Varianc�..


Input
Variable.! Rar)ge: 1Al:A21 00
Variable I; Rial'lge: JB1:B21 ii
� !,,_abels !:ielp

t!i,lph�: lo.as
Outputoi;itions

C· Qutpµt Rar;ige: E
1- New Worksheet !j'.ly: I\Eist var(y1) =var fy2)j
("' New \f,l_orkbook

The test result shows the sample variances for Yl Y2, the value of the calculated F-statistic
and
(0.2403) and the left-tail (since F < 1) critical value for a 5% test. The p-value is also reported,
462 Appendix C

and based on this test we reject the equality of the population variances, even though we know
them to be true. We commit a Type I error.

A. B c -

1 F-Test Two-Sample for Variances.


2
3 Y1 Y2
4 Mean 0_077479399 1_879124117 =

5 Variance 0_490765237
-
2_042565342
-

6 Observations. 20 20
7 1df 19 19
8 F 0.240269051
-

9 P(F<=f) one-tail 0 001578898


+-
10 F Critical one,_tai� 0 ..461201 089 I

Ji]
11
2
1 � �I 1111 .. I
....

Testing the variances of Y2 and Y3 we find ourselves unable to reject the hypothesis that the
variances are equal, despite the fact that the null hypothesis is false. We commit a Type II error.

F-Test Two-Sample for Va r i an��s�


[!j�
I1Jput

Vari.able l Range:
OK iJ
V.ariable:;;, Range: jc1:c21 Cani;:el . I
t!elp.
P !._abels
atph�: l,._o-o. -s
-�

utpClt options -------�


(". Qµt:p1.:1:t Ra!'.lsie :

C!' New Worksheet E'.ly: )test var(y2)=var(y3)


c• New' wor.�book

A B G ..

1 F-Test Two-Sam_ple for Variances


2
3 Y2 Y3 5

4 Mean 1.879124111 ·0.614201!614


5 Variance 2 .. 042565342 2.169870711
6 Observatio ns 20 2-0
7 df 19 19
8 F 0.737422629
9 P(F<=if), on e-tail 0_25657'81311
F Critica� one-tail 0 .. 4161:201-089

11 1 .. t-1 test var


Review of Statistical Inference 463

C.7 TESTING POPULATION NORMALITY

Hypothesis tests and interval estimation procedures are based on the underlying normality of the
population. If the population is not normal, the same procedures are used based on an appeal to
the Central Limit Theorem, and assuming the sample is adequately large. If the population is
normally distributed then no such worries exist. While there are many tests for normality we will
suggest two. First, construct a histogram and look for a bell shape. Second, use the test proposed
by Jarque and Bera.

C.7.1 A Histogram

Return to the worksheet hip data. For a histogram we must specify the "bins" into which the data
will be placed. The worksheet hip data summary statistics contains the descriptive statistics (see
Section C. l of this workbook). The minimum hip width is 13.53 inches, and the maximum is
20.4. Specify the first bin to be "up to" 14 inches, and the last bin will be 20 "or more" inches. In
cell Cl enter the label bin. In C2:C8 enter the values 14, 15, . . . , 20.

Select the Data tab, and then the Data Analysis button on the Analysis group of commands.
Select Histogram from the pull down list. Fill out the dialog box as shown below. Note that we
have selected Labels. The Output Range is worksheet hip data histogram, and, most
importantly, we want to Chart Output.

Histogram
]�
Input -------�
[nput Range:
Gk tJ
�in Range:
,Cancel I
Pi Labels

utput options:;-------=
l Q_utput Range: 1$C$15 iii
� New W o rk s h eet Ely: jhi.�ata histogram _j
r New Wsrkbook

0 P2_reto (sorted histogram)


C Cumulative Perce,r:itagi;::
P i;;.!la rt 6u:tput

The resulting histogram is:


464 Appendix C

E F G
J
H I I I J I K L I M N I �
i.;i..
bin Frequerrc�
14 3 Hi:stogram ·�

15 '1i rr-
116 7 i5
.. 6 ��-.•.•,�·1 ,')."I:>1 ,.,•,•,

=
11 111 .,,
:I
18 11 <T
... .,,ro ,.0i • J-req;uencv
19 4 ""'; -,_,<'i cl"
u..
�o
20 6
Morre 4 bin

I

'
i:en sarr1p_les 1000 5amples UT I hipdait;ad1i�toaram �111 • 1 �' I ., II

To beautify the histogram remove the spaces between the bars. Click inside the histogram until
the bars have little circles surrounding them. Right-click and select Format Data Series.

H J K L M Qelgte

Histogram lf1 Reset to Match Style

Change' Seri.es Chart Type . . .

� Sf.lect Data ...

•Frequency ( Add Data 1'..a.Qels

I
Add Trendline ...
bin -�-

� format D.ata Sertes ...


-----
--- =--.'·

Slide the Gap Width button to No Gap, then select Close.

Furr nal Dal a Series

Series Options
Series Options
Fill Series overlap-------�

--- J--­
Border Color
S gpar.ata11 OvCJrlappcid

Border S I'S

Shadov
Gap :Width --------�
::i-n Fnrrnilt

Select the comer of the figure box and drag it to the size you desire.
Review of Statistical Inference 465

Rrstogra"m
12

10

(;' 8
c
Ill
::J 6
IC"
Ill
'-
LL 4 •Frequency

14 15 16 17 18 19 20 More

bin - '-----1---___.__
. _

With only 50 data points, using too many bins can result in a figure with no shape. You should
experiment with fewer bins of alternative sizes to see if you can improve the figure. Using one
inch bins is logical.

C.7.2 The Jarque-Bera Test

The Jarque-Bera test for normality examines the skewness and kurtosis of the data (these terms
are defined in Section C.1 of this manual). For a normal distribution the skewness is zero, and the
"excess" kurtosis should be zero. The Jarque-Bera test statistic is:

N ( (K - 3)2) z
]B = 6 S z + 4 ""X(z)

Using the formulas given in Principles of Econometrics, 4e, p. 702, the skewness and kurtosis
coefficients are:

skewness = S = µ3
u3
and kurtosis = K= µ4
u4

where:

- �'L(yi-:Y)2
a= '
N

Use the results from the statistics calculations worksheet to make the following calculations. The
critical value for this chi-square test will be obtained using the CHIINV function (see Section
C.6.2 of this workbook) and the test p-value is obtained using CHIDIST. Enter the formulas
shown below in a new worksheet called Jarque Bera Template.
466 Appendix C

Data Input
2 sample sizff N 50
skewness:S -0_013824895662746
kurtosis;K 2.3315342883-24
level of slgnifrance: alpha 0 05
6 Calculated Values
1 JB test va�ue = =(B2/6}"(B3112+((B4-3)"'2)/4)
chi-square(2) criti�al v��ue = =C_H U NV( B5"2 )
JB test p-vafue = =CHIDIST(B7,2)

The p-value shows that we cannot reject the hypothesis that the hip data comes from a normal
3
distribution.
4
5

Data rnput
8
-sampie N
'9
skewness:S -0.0138248913
llll
kurtosis:K 2 .. 331534288
�evel of signifiance: .aleha 0.05
Values
J B test = O.!B2522747
critic a� va!ue = 5_9·9 'i 46454 7
A B
JB test p-vatue = 0 .6·27343.294
1
2 siz:e : 50
-
3
4
5
6 Ca�cul1a ted
7 va!ue
8 chi-square(2)
:9
INDEX

AR( l),228,241,245,296,299,323 Estimator,Fixed Effects,361,365


Arithmetic Operators,3,402 Estimator,Generalized Least Squares (GLS),221,224,
Autocorrelation,228,301,307,313,373 245,373
Autoregressive Distributed Lags Model,228,252 Estimator,Instrumental Variables (IV),264,281,290
Autoregressive Model,228,241,245,254,296,299,323 Estimator, Least Squares,27
Autoregressive Model,Vector (VAR),317 Estimator,Population Variance,456
Auxiliary Regression, 243,345 Estimator,Prais-Winsten,245
Estimator,Prediction Interval,96
Binomial Probabilities,416 Excel,Bin 57,63,115,330,337,342,393,442,463
Breusch-Pagan Test,206 Excel Command,Button/Menu Item: Select,1
Excel Command,Cell(s): Copy,4,31
Censored Data,393,397 Excel Command, Cell(s): Editing (Font & Alignment),6,
Central Limit Theorem,439 29
Cochrane-Orcutt Estimator,248 Excel Command, Cell(s): Format,72
Coefficient of Determination,98,135,140,153 Excel Command,Cell(s): Insert Symbol,69-70
Cointegration, 306,312,318 Excel Command,Cell(s): Paste,31
Collinearity, 176 Excel Command, Cell(s): Reference,4,30,79
Confidence Interval of flk,71,145 Excel Command, Cell(s): Select,5
Confidence Interval ofµ,444 Excel Command,Column/Row: Insert,6
Contemporaneous Correlation,388 Excel Command,Column/Row: Resize Width,8,33
Correlation Coefficient, 98,99,135,140,168,176,233, Excel Command,Column/Row: Select,6
240,245,388,371 Excel Command,File: Close,9
Correlation Matrix, 168, 176 Excel Command, File: Open, 1, 12, 16
Correlation,Contemporaneous,388 Excel Command,File: Save,8,15,18
Correlogram,for Residuals, 233,240 Excel Command,Graph (or Chart): Create (or Insert),20
Covariance of the Error Terms,388 Excel Command,Graph (or Chart): Data Source,21-22,
Covariance of Least Squares Estimators, 52 35-36,38
Critical Value,Chi-Square,118,206,426 Excel Command,Graph (or Chart): Editing,23,40
Critical Value,Dickey-Fuller Test,302,305,309 Excel Command, Graph (or Chart): Moving, 26
Critical Value,F,154,210,429 Excel Command,Graph (or Chart): Tools Tab,21
Critical Value,N(O,l),444 Excel Command,Help,2-3,406-407
Critical Value,t, 69,449 Excel Command,Keyboard Shortcuts,441
Excel Command,Print,9
Data,Censored,393,399 Excel Command,Ribbon,2
Data,Contemporaneously Correlated,388 Excel Command,Show Formula,421
Data,Latent,397 Excel Command,Tab List,2
Data,Uncensored,397 Excel Command,Worksheet: Create (or Insert),69
Data,Volatile (Time-Varying Volatility),328 Excel Command,Worksheet: Rename,8
Dickey-Fuller Test,301,306,312,318 Excel Command,Worksheet: Unhide,115
Distribution,Binary Random Variable,192,391 Excel Data Analysis Tool, Correlation,168
Distribution,Chi-Square,118,206,426 Excel Data Analysis Tool,Descriptive Statistics,432
Distribution,F,154,210,429 Excel Data Analysis Tool,F-test Two-Sample for
Distribution,Normal,422 Variances,461
Distribution,Standardized Normal,68,233,443,444 Excel Data Analysis Tool,Histogram,57-60
Distribution, t, 58,428,449 Excel Data Analysis Tool,Random Number Generation,47
Distribution, Uniform,395,440-443 Excel Data Analysis Tool,Regression,31
Dummy (Indicator) Variable,63,180,391 Excel Data Analysis Tool,!-Test: Two-Sample Assuming
Equal Variances,459
Error Variance,Constant,52-53, Excel Data Analysis Tool, !-Test: Two-Sample Assuming
Error Variance,Function,206,221,224 Unequal Variances, 460
Estimation,Instrumental Variables (IV),264 Excel File,andy,143,156
Estimation,Two Stage Least Squares (2SLS),264,281,290 Excel File,beer,166
Estimator,Cochrane-Orcutt,248 Excel File,br,53
Estimator,Confidence Interval of flk,71,145 Excel File,byd,341
Estimator, Confidence Interval ofµ,444 Excel File,cars,176

467
468 Index

Excel File,coke,193 Excel Function,TDIST,91,428


Excel File,cps4_small,130,151,182 Excel Function,TINY,69,428
Excel File,cps2,213 Excel Function,VAR,332
Excel File,edu_inc,168 Excel Keyboard Shortcuts,440,441,443
Excel File,food,19,68,96,204 Excel Scroll Bar,2
Excel File,fred,317 Excel Tab List,2
Excel File,fultonfish,286 Excel Table Tool,Convert to Range,378
Excel File,gdp,310 Excel Table Tool,Remove Duplicate,377-378
Excel File,grunfeld2,381 Exponents,4,408
Excel File,hip,431
Excel File,mroz,262,393 Finite Distributed Lag Model,228
Excel File,newbroiler,139 First Difference, 301,306
Excel File,njmin3,198 Fixed Effects Model,357
Excel File,nls_panel,355 Forecast Interval,255
Excel File,okun,228 Forecasting,254
Excel File,pizza4,149 Forecasting Volatility,347
Excel File,phillips_aus,237 F-test,154,210
Excel File,returns,328
Excel File,spurious,299 GARCH Model,349
Excel File,star,194 GARCH-ln-Mean Model,352
Excel File,transport,391 Generalized Least Squares Estimator,221,224,245,373
Excel File,truffles,279 Generalized R-Squared,98,135,140,
Excel File,usa,294 Goldfeld-Quandt Test,210
Excel File,utown,63,180 Goodness-of-Fit Measure,98,135,140,153
Excel File,wa-wheat,122
Excel Function,ABS,91,407 Hausman Test,273
Excel Function,AND, 445 Heteroskedastic Partition, 210
Excel Function,AVERAGE,30 Heteroskedasticity,204
Excel Function,BINOMDIST,419 Histogram,57,63,115,330,337,342,393,442,463
Excel Function,CHIDIST,120 Histogram,of Residuals, 115
Excel Function,CHIINV,120 Hypotheses Tests for a Single Coefficient,in the Multiple
Excel Function,CORREL,99 Linear Regression Model,145
Excel Function,COUNT,220,432-436 Hypotheses Tests for a Single Coefficient,in the Simple
Excel Function,COUNTIF,78,80 Linear Regression Model,81
Excel Function,EXP,62,407
Excel Function,FACT,417-419 Impulse Response Function,323
Excel Function,FDIST,158,429 Indicator (Dummy) Variable,63,180,391
Excel Function,FINV,158,430 Instrumental Variables Estimator (IV),264,281,290
Excel Function,IF,78,79 Integrated of Order 1 (1(1)),301,306
Excel Function,INDEX,49-50,78 Interaction Variable,149,180,182,198
Excel Function,LINEST,49,77 Interval,Estimator of {Jk in the Multiple Linear Regression
Excel Function,LN,58,407 Model,145
Excel Function,MAX,432-436 Interval,Estimator of {Jk in the Simple Linear Regression
Excel Function,MEDIAN,432-436 Model,71
Excel Function, MIN, 432-436 Interval, Estimator ofµ When CT2 is Known, 444
Excel Function,MODE,432-436 Interval,Estimator ofµ When CT2 is Unknown,446
Excel Function,NORMDIST,332,424 Interval,Forecast,255
Excel Function,NORMINV,424 Interval,Prediction,96
Excel Function,NORMSDIST,423 Irrelevant Variables,169
Excel Function,NORMSINV,423
Excel Function,OR,78-79 Jarque-Bera Test for Normality,118,465
Excel Function,Pl,407
Excel Function,POWER,3,407,408 Kurtosis,118,433,465
Excel Function,ROUND,407
Excel Function,SQRT,97,407 Lag Weight,260
Excel Function,STANDARDIZE,422 Lagrange Multiplier Test,206,241,274,344,371,388
Excel Function,SUM,5-6,402-408,432-436 Latent Data,397
Excel Function,SUMPRODUCT,219,402 Latent Variable Model,395-399
Excel Function,SUMSQ,402 Least Squares Estimator,27
Excel Function,SUMX2MY2,402 Least Squares Residuals,35
Excel Function,SUMX2PY2,402 Limited Dependent Variable Model,393
Excel Function,SUMXMY2,402 Logarithms,57,104,129,139,151,191,407
Index 469

Logit Model, 393 Model, Topic: Time Series, Australian, 237-254,310-317


Model, Topic: Time Series, US, 228-237,254-261,294-
Mathematical Function, 4,402 295,301-309,310-323,328-334
Mean Square Residual, 53 Model, Topic: Transport, 391-393
Mean, of Population, 438 Model, Topic: Truffles Market, 278
Model, Cubic, 126 Model, Topic: Wage Equation, 130,151,182-192,212,
Model, Fixed Effects, 357 222,262-277,355-381
Model, Linear: Multiple, 143,154 Model, Topic: Wheat Yield Model, 122-130
Model, Linear: Simple, 19,67,95 Model, Unrestricted, 155
Model, Logit, 393 Model, Vector Autoregressive (VAR), 317
Model, Linear-Log, 104 Model, Vector Error Correction (VEC), 310
Model, Log-Linear, 57,129,151,191 Moment-Based Estimation, 262
Model, Log-Log, 139
Model, Panel Data: Fixed Effects, 357 Nonsample Information, 166
Model, Panel Data: Random Effects, 371 Nonstationary Time-Series Data, 294
Model, Panel Data: Seemingly Unrelated Regressions Nonstationary Variables, 294
(SUR), 388
Model, Panel Data: Set of Regression Equations, 381 Omitted Variables, 167
Model, Polynomials, 122,148
Model, Probit, 393 Panel Data Model, 355
Model, Quadratic, 148 Percentages, 413
Model, Random Effects, 371 Percentile Value, 69
Model, Random Walk, 298 Plot, Regression Line, 34,55,62,105
Model, Restricted, 155 Plot, Residuals, 111,126,128,206
Model, Seemingly Unrelated Regression (SUR), 388 Population Mean, 438
Model, Simultaneous Equations, 278 Population Variance, 438
Model, Time Series Data: Impulse Response Function, 323 Prais-Winsten Estimator, 245
Model, Time Series Data: ARCH(l), 328,341 Prediction, Dynamic Model (Forecasting), 255
Model, Time Series Data: Autoregressive, First Order Prediction, Simple Linear Model, 35,96
(AR(l)), 228,241,245,296, 299,323 Prediction, Simple Log-Linear Model, 132
Model, Time Series Data: Autoregressive (AR(p)), 228, Probability Function, Binomial, 416
241,245,254,296,299,323 Probit Model, 393
Model, Time Series Data: GARCH, 349 p-value, 88,450
Model, Time Series Data: GARCH-In-Mean, 352
Model, Time Series Data: Random Effects, 371 Qualitative Variable Model, 391
Model, Time Series Data: Random Walk, 298
Model, Time Series Data: Spurious Regression, 299 Random Effects Model, 371
Model, Time Series Data: T-GARCH, 350 Random Number Generation, 47,50,75,169,296,334,
Model, Time Series Data: Vector Autoregressive (VAR), 395,436,440,446,454
317 Random Regressors, 262
Model, Time Series Data: Vector Error Correction (VEC), Random Walk Model, 298
310 Reduced Form Equation, 264,268,279,286
Model, Tobit, 401 Regression, Multiple Linear, 143,154
Model, Topic, BYD Lighting, 341-354 Regression, Plotting Line, 34,55,62,105
Model, Topic: Car Mileage, 177 Regression, Simple Linear, 19, 67, 95
Model, Topic: Coke Marketing Example, 192 Repeated Sampling, 50,75,436,440,446
Model, Topic: Demand for Beer, 166 RESET Test, 172
Model, Topic: Family Income, 167-175 Residuals, Correlogram, 192,200-201,204-205,206-207
Model, Topic: Firm Investment, 381-390 Residuals, Histogram, 115
Model, Topic: Food Expenditure, 19-53,67-94,95-108, Residuals, Plot (Scatter or Bar), 111,126,128,206
115-122,204-210,216-220,221,224-227 Residuals, Test of Stationarity, 306,312,318
Model, Topic: Fulton Fish Market, 281 Restricted Model, 155
Model, Topic: Hamburger Chain Data, 143,155-165 R-squared, 98,135,140,153
Model, Topic: House Price, 53-66,180
Model, Topic: Life-Cycle, 149 Sample Mean, 436
Model, Topic: Marketing Example (Coke), 192 Sample Variance, 436
Model, Topic: Minimum Wage, 198 Scaling Data, 100
Model, Topic: Poultry Demand, 139 Scientific Notation, 94,409
Model, Topic: Project STAR, 193 Seemingly Unrelated Regressions (SUR) Model, 388
Model, Topic: Supply and Demand for Fulton Fish, 281 Simple Linear Regression Model Assumptions, 45
Model, Topic: Supply and Demand for Truffles, 278 Simulated Data, 296,323,334,395
Model, Topic: STAR Project, 193 Simulation, 44,75
470 Index

Simultaneous Equations Model, 278 Test Statistic, Chi-Square, 118,206,243,277,344,371,


Skewness, 118,434,465 456,465
Spurious Regression, 299,306 Test Statistic, Cointegration, 306,312,318
Standard Error, of the Forecast in a Dynamic Model, 255 Test Statistic, Contemporaneous Correlation, 388
Standard Error, of the Forecast in a Simple Linear Test Statistic, Dickey-Fuller, 301,306,312,318
Regression Model, 96 Test Statistic, F, 154,210
Standard Error, of the Regression, 53 Test Statistic, Goldfeld-Quandt, 210
Standard Error, of the Sample Mean, 439 Test Statistic, Hausman, 273
364,368
Standard Error, of Fixed Effects Estimator, Test Statistic, Jarque-Bera: for Normality, 118,465
Standard Error, of JV Estimator,
267-268,271-273,282- Test Statistic, Lagrange Multiplier, 206,241,274,344,
283,284-285,291-293 371,388
Test Statistic, Left-Tail,82,84,92,146,449
Standard Error, of Least Squares Estimator, 52-53 Test Statistic, One-Tail Test, 81,82,83,84,92,146,449
Standard Error, of White Estimator, 219 Test Statistic, Population Mean, 449
Standard Scientific Notation, 94,409 Test Statistic, Population Normality, 463
Stationary Process,298 Test Statistic, Population Variance, 456
Stationary Variables, 294 Test Statistic, RESET, 172
Structural Equations, 281,290 Test Statistic, Right-Tail Test, 81,83,92,147,449
Sum of Squared Errors, 53 Test Statistic, Surplus Moment Conditions, 274
Sum of Squared Residuals, 53 Test Statistic, t, 81,241,273,301,449,452
SUR (Seemingly Unrelated Regressions) Model, 388 Test Statistic, Test of Significance of a Model, 159
Surplus Moment Conditions, 268,274 Test Statistic, Test of Significance of an Economic
Hypothesis, 163
Template, Confidence Interval ofµ, 448 Test Statistic, Test of Significance,82,84,87,93,145,159
Template, Correlation Analysis and R2, 99 Test Statistic, Two Population Means, 459
Template, Forecast Interval, 256 Test Statistic, Two Population Variances, 461
Template, F-test, 158 Test Statistic, Two-Tail Tests, 82,86,93,145,449
Template, Goldfeld-Quandt Test, 211 Test Statistic, Unit Root for Stationarity, 301
Template, Hypothesis Test for flk: Left-Tail, 84 Test Statistic, White, 209
Template, Hypothesis Test for flk: Right-Tail, 83 Test Statistic, Z, 233
Template, Hypothesis Test for flk: Two-Tail, 86 T-GARCH Model, 350
Template, Hypothesis Test forµ, 452 Time Series Data, Time-Varying Volatility, 328
Template, Hypothesis Test for Cf2, 458 Tobit Model, 401
Template, Interval Estimate of flk> 72 t-test, 63,196,419,422,233,235,238,267,419
Template, Interval Estimate ofµ, 448 Two Stage Least Squares (2SLS),264,278,286
Template, Jarque-Bera Test, 120,466
Template, Lagrange Multiplier Test, 208,244,276, 347, Unbiasedness of Least Squares Estimator, 44-51
373,390 Uncensored Data, 397
Template, Prediction in Simple Log-Linear Model, 132 Unit Root Test, 301
Template, Prediction Interval in Simple Linear Model, 97 Unrestricted Model, 155
Template, Prediction Interval in Simple Log-Linear Model,
136 VAR Model (Vector Autoregressive Model), 317
Template, Probabilities: Chi-Square Distribution, 427 Variable, Non-Stationary, 294,299
Template, Probabilities: F-distribution, 430 Variable, Stationary, 294
Template, Probabilities: Normal Distribution, 425 Variance, of Least Squares Estimator, 52
Template, Probabilities: t-distribution, 429 Variance, of Population, 438
Template, Simulation, 78 VEC Model (Vector Error Correction), 310
Template, Test of Significance of Model, 159
Test Statistic, ARCH Effect, 341 White Standard Error, 219
Test Statistic, Breusch-Pagan, 206

You might also like