$eScaleQuad = data$eCapQuad + data$eCapTL + data$eCapTL
data
sum(data$eScaleQuad < 0)
1] 22
[
summary(data$eScaleQuad)
Min. 1st Qu. Median Mean 3rd Qu. Max. -0.7436 0.1755 0.5162 0.4576 0.7239 1.4552
hist(data$eScaleQuad, breaks=20)
Microeconomic Theory and Linear Regression
Introduction
This is a summary of a very old collection of materials that I used as teaching assistant (before 2017).
Among those materials I found some class notes from Arne Henningsen, the author of micEcon
R package.
I’ll use that package and some others to show some concepts from Microeconomic Theory.
Packages installation:
# install.packages(c("micEcon","lmtest","bbmle","miscTools"))
library(micEcon)
library(lmtest)
library(stats4) #this is a base package so don't install this!
library(bbmle)
library(miscTools)
From package’s help: “The appleProdFr86 data frame includes cross-sectional production data of 140 French apple producers from the year 1986. These data have been extracted from a panel data set that was used in Ivaldi et al. (1996).”
This data frame contains the following columns:
Variable | Description |
---|---|
vCap | costs of capital (including land) |
vLab | costs of labour (including remuneration of unpaid family labour) |
vMat | costs of intermediate materials (e.g. seedlings, fertilizer, pesticides, fuel) |
qApples | quantity index of produced apples |
qOtherOut | quantity index of all other outputs |
qOut | quantity index of all outputs (not in the original data set, calculated as |
pCap | price index of capital goods |
pLab | price index of labour |
pMat | price index of materials |
pOut | price index of the aggregate output (not in the original data set, artificially generated) |
adv | dummy variable indicating the use of an advisory service (not in the original data set, artificially generated) |
Now I’ll examine appleProdFr86
dataset:
data("appleProdFr86", package = "micEcon")
= appleProdFr86
data rm(appleProdFr86)
str(data)
'data.frame': 140 obs. of 11 variables:
$ vCap : int 219183 130572 81301 34007 38702 122400 89061 92134 65779 94047 ...
$ vLab : int 323991 187956 134147 105794 83717 523842 168140 204859 180598 142240 ...
$ vMat : int 303155 262017 90592 59833 104159 577468 343768 126547 190622 82357 ...
$ qApples : num 1.392 0.864 3.317 0.438 1.831 ...
$ qOtherOut: num 0.977 1.072 0.405 0.436 0.015 ...
$ qOut : num 1374064 1122979 2158518 507389 1070816 ...
$ pCap : num 2.608 3.292 2.194 1.602 0.866 ...
$ pLab : num 0.9 0.753 0.956 1.268 0.938 ...
$ pMat : num 8.89 6.42 3.74 3.17 7.22 ...
$ pOut : num 0.66 0.716 0.937 0.597 0.825 ...
$ adv : num 0 0 1 1 1 1 1 0 0 0 ...
Linear Production Function
A linear production function is of the form:
The dataset does not provide units of capital, labour or materials, but it provides information about costs and price index. Then I create the variables for the units from that:
$qCap = data$vCap/data$pCap
data$qLab = data$vLab/data$pLab
data$qMat = data$vMat/data$pMat data
Let
To test this I’ll use the statistic:
And the correlation test for each input is:
cor.test(data$qOut, data$qCap, alternative="greater")
's product-moment correlation
Pearson
data: data$qOut and data$qCap
t = 8.7546, df = 138, p-value = 3.248e-15
alternative hypothesis: true correlation is greater than 0
95 percent confidence interval:
0.4996247 1.0000000
sample estimates:
cor
0.5975547
cor.test(data$qOut, data$qLab, alternative="greater")
Pearson's product-moment correlation
: data$qOut and data$qLab
data= 20.203, df = 138, p-value < 2.2e-16
t : true correlation is greater than 0
alternative hypothesis95 percent confidence interval:
0.8243685 1.0000000
:
sample estimates
cor 0.8644853
cor.test(data$qOut, data$qMat, alternative="greater")
's product-moment correlation
Pearson
data: data$qOut and data$qMat
t = 15.792, df = 138, p-value < 2.2e-16
alternative hypothesis: true correlation is greater than 0
95 percent confidence interval:
0.7463428 1.0000000
sample estimates:
cor
0.8023509
With the units of input I write the production function and estimate its coefficients:
= lm(qOut ~ qCap + qLab + qMat, data = data)
prodLin $qOutLin = fitted(prodLin)
datasummary(prodLin)
:
Calllm(formula = qOut ~ qCap + qLab + qMat, data = data)
:
Residuals
Min 1Q Median 3Q Max -3888955 -773002 86119 769073 7091521
:
CoefficientsPr(>|t|)
Estimate Std. Error t value -1.616e+06 2.318e+05 -6.972 1.23e-10 ***
(Intercept) 1.788e+00 1.995e+00 0.896 0.372
qCap 1.183e+01 1.272e+00 9.300 3.15e-16 ***
qLab 4.667e+01 1.123e+01 4.154 5.74e-05 ***
qMat ---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
: 1541000 on 136 degrees of freedom
Residual standard error-squared: 0.7868, Adjusted R-squared: 0.7821
Multiple R-statistic: 167.3 on 3 and 136 DF, p-value: < 2.2e-16 F
I can also use Log-Likelihood to obtain the estimates:
= function(beta0, betacap, betalab, betamat, mu, sigma) {
LogLikelihood = data$qOut - (beta0 + betacap*data$qCap + betalab*data$qLab + betamat*data$qMat)
R = suppressWarnings(dnorm(R, mu, sigma, log = TRUE))
R -sum(R)
}
= mle2(LogLikelihood, start = list(beta0 = 0, betacap = 10, betalab = 20, betamat = 30,
prodLin2 mu = 0, sigma=1), control = list(maxit= 10000))
summary(prodLin)
:
Calllm(formula = qOut ~ qCap + qLab + qMat, data = data)
:
Residuals
Min 1Q Median 3Q Max -3888955 -773002 86119 769073 7091521
:
CoefficientsPr(>|t|)
Estimate Std. Error t value -1.616e+06 2.318e+05 -6.972 1.23e-10 ***
(Intercept) 1.788e+00 1.995e+00 0.896 0.372
qCap 1.183e+01 1.272e+00 9.300 3.15e-16 ***
qLab 4.667e+01 1.123e+01 4.154 5.74e-05 ***
qMat ---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
: 1541000 on 136 degrees of freedom
Residual standard error-squared: 0.7868, Adjusted R-squared: 0.7821
Multiple R-statistic: 167.3 on 3 and 136 DF, p-value: < 2.2e-16 F
Quadratic Production Function
A quadratic production function is of the form:
With the arrangements made for the linear case now I have it ready to obtain the coefficients and the predicted output:
= lm(qOut ~ qCap + qLab + qMat + I(0.5*qCap^2) + I(0.5*qLab^2) + I(0.5*qMat^2)
prodQuad + I(qCap*qLab) + I(qCap*qMat) + I(qLab*qMat), data = data)
$qOutQuad = fitted(prodQuad)
datasummary(prodQuad)
:
Calllm(formula = qOut ~ qCap + qLab + qMat + I(0.5 * qCap^2) + I(0.5 *
^2) + I(0.5 * qMat^2) + I(qCap * qLab) + I(qCap * qMat) +
qLabI(qLab * qMat), data = data)
:
Residuals
Min 1Q Median 3Q Max -3928802 -695518 -186123 545509 4474143
:
CoefficientsPr(>|t|)
Estimate Std. Error t value -2.911e+05 3.615e+05 -0.805 0.422072
(Intercept) 5.270e+00 4.403e+00 1.197 0.233532
qCap 6.077e+00 3.185e+00 1.908 0.058581 .
qLab 1.430e+01 2.406e+01 0.595 0.553168
qMat I(0.5 * qCap^2) 5.032e-05 3.699e-05 1.360 0.176039
I(0.5 * qLab^2) -3.084e-05 2.081e-05 -1.482 0.140671
I(0.5 * qMat^2) -1.896e-03 8.951e-04 -2.118 0.036106 *
I(qCap * qLab) -3.097e-05 1.498e-05 -2.067 0.040763 *
I(qCap * qMat) -4.160e-05 1.474e-04 -0.282 0.778206
I(qLab * qMat) 4.011e-04 1.112e-04 3.608 0.000439 ***
---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
: 1344000 on 130 degrees of freedom
Residual standard error-squared: 0.8449, Adjusted R-squared: 0.8342
Multiple R-statistic: 78.68 on 9 and 130 DF, p-value: < 2.2e-16 F
Linear Specification vs Quadratic Specification
From the above, which of the models is better? Wald Test, Likelihood Ratio Test and ANOVA can help. In this case I’m working with nested models, because the Linear Specification is contained within the Quadratic Specification:
waldtest(prodLin, prodQuad)
Wald test
1: qOut ~ qCap + qLab + qMat
Model 2: qOut ~ qCap + qLab + qMat + I(0.5 * qCap^2) + I(0.5 * qLab^2) +
Model I(0.5 * qMat^2) + I(qCap * qLab) + I(qCap * qMat) + I(qLab *
qMat)Pr(>F)
Res.Df Df F 1 136
2 130 6 8.1133 1.869e-07 ***
---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
lrtest(prodLin, prodQuad)
Likelihood ratio test
1: qOut ~ qCap + qLab + qMat
Model 2: qOut ~ qCap + qLab + qMat + I(0.5 * qCap^2) + I(0.5 * qLab^2) +
Model I(0.5 * qMat^2) + I(qCap * qLab) + I(qCap * qMat) + I(qLab *
qMat)#Df LogLik Df Chisq Pr(>Chisq)
1 5 -2191.3
2 11 -2169.1 6 44.529 5.806e-08 ***
---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
anova(prodLin, prodQuad, test = "F")
Analysis of Variance Table
1: qOut ~ qCap + qLab + qMat
Model 2: qOut ~ qCap + qLab + qMat + I(0.5 * qCap^2) + I(0.5 * qLab^2) +
Model I(0.5 * qMat^2) + I(qCap * qLab) + I(qCap * qMat) + I(qLab *
qMat)Pr(>F)
Res.Df RSS Df Sum of Sq F 1 136 3.2285e+14
2 130 2.3489e+14 6 8.7957e+13 8.1133 1.869e-07 ***
---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Signif. codes
From the results the Quadratic Specification should be preferred as the tests show that the additional variables do add information.
Cobb-Douglas Production Function
A Cobb-Douglas production function is of the form:
This function is not linear, so I need to convert the function before fitting:
With the arrangements made for the linear case now I have it ready to obtain the coefficients and the predicted output:
= lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat), data = data)
prodCD $qOutCD = fitted(prodCD)
datasummary(prodCD)
:
Calllm(formula = log(qOut) ~ log(qCap) + log(qLab) + log(qMat), data = data)
:
Residuals
Min 1Q Median 3Q Max -1.67239 -0.28024 0.00667 0.47834 1.30115
:
CoefficientsPr(>|t|)
Estimate Std. Error t value -2.06377 1.31259 -1.572 0.1182
(Intercept) log(qCap) 0.16303 0.08721 1.869 0.0637 .
log(qLab) 0.67622 0.15430 4.383 2.33e-05 ***
log(qMat) 0.62720 0.12587 4.983 1.87e-06 ***
---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
: 0.656 on 136 degrees of freedom
Residual standard error-squared: 0.5943, Adjusted R-squared: 0.5854
Multiple R-statistic: 66.41 on 3 and 136 DF, p-value: < 2.2e-16 F
Translog Production Function
A Translog production function is of the form:
With the arrangements made for the linear case now I have it ready to obtain the coefficients and the predicted output:
= lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat) + I(0.5*log(qCap)^2)
prodTL + I(0.5*log(qLab)^2) + I(0.5*log(qMat)^2) + I(log(qCap)*log(qLab))
+ I(log(qCap)*log(qMat)) + I(log(qLab)*log(qMat)), data = data)
$qOutTL = fitted(prodTL)
datasummary(prodTL)
:
Calllm(formula = log(qOut) ~ log(qCap) + log(qLab) + log(qMat) +
I(0.5 * log(qCap)^2) + I(0.5 * log(qLab)^2) + I(0.5 * log(qMat)^2) +
I(log(qCap) * log(qLab)) + I(log(qCap) * log(qMat)) + I(log(qLab) *
log(qMat)), data = data)
:
Residuals
Min 1Q Median 3Q Max -1.68015 -0.36688 0.05389 0.44125 1.26560
:
CoefficientsPr(>|t|)
Estimate Std. Error t value -4.14581 21.35945 -0.194 0.8464
(Intercept) log(qCap) -2.30683 2.28829 -1.008 0.3153
log(qLab) 1.99328 4.56624 0.437 0.6632
log(qMat) 2.23170 3.76334 0.593 0.5542
I(0.5 * log(qCap)^2) -0.02573 0.20834 -0.124 0.9019
I(0.5 * log(qLab)^2) -1.16364 0.67943 -1.713 0.0892 .
I(0.5 * log(qMat)^2) -0.50368 0.43498 -1.158 0.2490
I(log(qCap) * log(qLab)) 0.56194 0.29120 1.930 0.0558 .
I(log(qCap) * log(qMat)) -0.40996 0.23534 -1.742 0.0839 .
I(log(qLab) * log(qMat)) 0.65793 0.42750 1.539 0.1262
---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
: 0.6412 on 130 degrees of freedom
Residual standard error-squared: 0.6296, Adjusted R-squared: 0.6039
Multiple R-statistic: 24.55 on 9 and 130 DF, p-value: < 2.2e-16 F
Cobb-Douglas Specification vs Translog Specification
This is really similar to the comparison between Linear Specification and Quadratic Specification. In this case the models are nested too and the tests are:
waldtest(prodCD, prodTL)
lrtest(prodCD, prodTL)
anova(prodCD, prodTL, test = "F")
Wald test
1: log(qOut) ~ log(qCap) + log(qLab) + log(qMat)
Model 2: log(qOut) ~ log(qCap) + log(qLab) + log(qMat) + I(0.5 * log(qCap)^2) +
Model I(0.5 * log(qLab)^2) + I(0.5 * log(qMat)^2) + I(log(qCap) *
log(qLab)) + I(log(qCap) * log(qMat)) + I(log(qLab) * log(qMat))
Pr(>F)
Res.Df Df F 1 136
2 130 6 2.062 0.06202 .
---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Signif. codes
From the results the Translog Specification should be preferred as the tests show that the additional variables do add information.
Quadratic Specification vs Translog Specification
The dependent variable is different (
What I can do is to calculate the hypothetical value of
summary(prodTL)$r.squared; rSquared(log(data$qOut), log(data$qOut) - log(data$qOutQuad))
1] 0.6295696
[
1]
[,1,] 0.5481309 [
Another option is to apply an exponential transformation to the Translog Specification and then I’ll obtain a comparable
summary(prodQuad)$r.squared; rSquared(data$qOut, data$qOut - exp(data$qOutTL))
1] 0.8448983
[
1]
[,1,] 0.7696638 [
This table resumes this part of the analysis:
Quadratic | Translog | |
---|---|---|
0.85 | 0.77 | |
0.55 | 0.63 |
Here I can compare
In any case, comparing
Microeconomic Concepts
In this part I’ll use the Quadratic and Translog specifications given that those are the best fit models in the last exercises.
Plot of output versus predicted output after adjusting the scale:
$qOutTL = exp(fitted(prodTL))
datapar(mfrow=c(1,2))
compPlot(data$qOut, data$qOutQuad, log = "xy", main = "Quadratic Specification")
compPlot(data$qOut, data$qOutTL, log = "xy", main = "Translog Specification")
Provided that I have some negative coefficients both in Quadratic and Translog Specifications, there can be a negative predicted output:
sum(data$qOutQuad < 0); sum(data$qOutTL < 0)
1] 0
[
1] 0 [
That problem did not appear here.
Another problem that can appear from negative coefficients is a negative marginal effect after an increase in the inputs. Logically, an increase in the inputs should never decrease the output.
To check this I need the marginal productivity:
And the Product-Factor Elasticity:
Another useful concept is Scale Elasticity:
In microeconomics, the concept of monotonicity refers to the fact that an increase in the inputs does not decreases the output. Then, an observation that does not violate monotonicity condition is an observation such that any of its calculated marginal productivities can at least be zero.
In the case of a Translog Specification the Product-Factor elasticity can be written as:
Also keep in mind this about the Translog Specification:
So I’ll start with the Product-Factor Elasticity for the Translog Specification:
= coef(prodTL)["log(qCap)"]
b1 = coef(prodTL)["log(qLab)"]
b2 = coef(prodTL)["log(qMat)"]
b3 = coef(prodTL)["I(0.5 * log(qCap)^2)"]
b11 = coef(prodTL)["I(0.5 * log(qLab)^2)"]
b22 = coef(prodTL)["I(0.5 * log(qMat)^2)"]
b33 = b21 = coef(prodTL)["I(log(qCap) * log(qLab))"]
b12 = b31 = coef(prodTL)["I(log(qCap) * log(qMat))"]
b13 = b32 = coef(prodTL)["I(log(qLab) * log(qMat))"]
b23
$eCapTL = with(data, b1 + b11*log(qCap) + b12*log(qLab) + b13*log(qMat))
data$eLabTL = with(data, b2 + b21*log(qCap) + b22*log(qLab) + b23*log(qMat))
data$eMatTL = with(data, b3 + b31*log(qCap) + b32*log(qLab) + b33*log(qMat))
data
sum(data$eCapTL < 0); sum(data$eLabTL < 0); sum(data$eMatTL < 0)
1] 32
[
1] 14
[
1] 8
[
summary(data$eCapTL); summary(data$eLabTL); summary(data$eMatTL)
Min. 1st Qu. Median Mean 3rd Qu. Max. -0.46630 0.03239 0.17725 0.15762 0.31437 0.72967
Min. 1st Qu. Median Mean 3rd Qu. Max. -1.0255 0.3283 0.6079 0.6550 1.0096 1.9642
Min. 1st Qu. Median Mean 3rd Qu. Max. -0.3775 0.3608 0.6783 0.6379 0.8727 1.9017
par(mfrow=c(1,3)); hist(data$eCapTL, breaks=20); hist(data$eLabTL, breaks=20); hist(data$eMatTL, breaks=20)
And then I obtain the Marginal Productivity for the Translog Specification:
$mpCapTL = with(data, eCapTL * qOutTL / qCap)
data$mpLabTL = with(data, eLabTL * qOutTL / qLab)
data$mpMatTL = with(data, eMatTL * qOutTL / qMat)
data
sum(data$mpCapTL < 0); sum(data$mpLabTL < 0); sum(data$mpMatTL < 0)
1] 32
[
1] 14
[
1] 8
[
summary(data$mpCapTL); summary(data$mpLabTL); summary(data$mpMatTL)
Min. 1st Qu. Median Mean 3rd Qu. Max. -13.681 0.534 3.766 4.566 7.369 26.836
Min. 1st Qu. Median Mean 3rd Qu. Max. -2.570 2.277 5.337 6.857 10.522 26.829
Min. 1st Qu. Median Mean 3rd Qu. Max. -28.44 28.55 51.83 48.12 70.97 122.86
par(mfrow=c(1,3)); hist(data$mpCapTL, breaks=20); hist(data$mpLabTL, breaks=20); hist(data$mpMatTL, breaks=20)
With the marginal productivity I can determine the number of observations that violate the monotonicity condition:
$monoTL = with(data, mpCapTL >= 0 & mpLabTL >= 0 & mpMatTL >= 0 )
datasum(!data$monoTL)
1] 48 [
Now I compute the Scale Elasticity for the Translog Specification:
$eScaleTL = data$eCapTL + data$eLabTL + data$eMatTL
data
sum(data$eScaleTL < 0)
1] 0
[
summary(data$eScaleTL)
Min. 1st Qu. Median Mean 3rd Qu. Max. 1.163 1.373 1.440 1.451 1.538 1.738
hist(data$eScaleTL, breaks=20)
In the case of a Quadratic Specification the Marginal Productivity can be written as:
Now I compute the Marginal Productivity for the Quadratic Specification:
= coef(prodQuad)["qCap"]
b1 = coef(prodQuad)["qLab"]
b2 = coef(prodQuad)["qMat"]
b3 = coef(prodQuad)["I(0.5 * qCap^2)"]
b11 = coef(prodQuad)["I(0.5 * qLab^2)"]
b22 = coef(prodQuad)["I(0.5 * qMat^2)"]
b33 = b21 = coef(prodQuad)["I(qCap * qLab)"]
b12 = b31 = coef(prodQuad)["I(qCap * qMat)"]
b13 = b32 = coef(prodQuad)["I(qLab * qMat)"]
b23
$mpCapQuad = with(data, b1 + b11*qCap + b12*qLab + b13*qMat)
data$mpLabQuad = with(data, b2 + b21*qCap + b22*qLab + b23*qMat)
data$mpMatQuad = with(data, b3 + b31*qCap + b32*qLab + b33*qMat)
data
sum(data$mpCapQuad < 0); sum(data$mpLabQuad < 0); sum(data$mpMatQuad < 0)
1] 28
[
1] 5
[
1] 8
[
summary(data$mpCapQuad); summary(data$mpLabQuad); summary(data$mpMatQuad)
Min. 1st Qu. Median Mean 3rd Qu. Max. -22.3351 0.7743 2.1523 1.7260 4.0267 16.6884
Min. 1st Qu. Median Mean 3rd Qu. Max. -1.599 4.224 6.113 7.031 9.513 30.780
Min. 1st Qu. Median Mean 3rd Qu. Max. -35.57 31.20 47.08 51.57 63.59 260.64
par(mfrow=c(1,3)); hist(data$mpCapQuad, breaks=20); hist(data$mpLabQuad, breaks=20); hist(data$mpMatQuad, breaks=20)
With the marginal productivity I can determine the number of observations that violate the monotonicity condition:
$monoQuad = with(data, mpCapQuad >= 0 & mpLabQuad >= 0 & mpMatQuad >= 0 )
datasum(!data$monoQuad)
1] 39 [
And the Product-Factor Elasticity for the Quadratic Espcification:
$eCapQuad = with(data, mpCapQuad * qCap / qOutQuad)
data$eLabQuad = with(data, mpLabQuad * qLab / qOutQuad)
data$eMatQuad = with(data, mpMatQuad * qMat / qOutQuad)
data
sum(data$eCapQuad < 0); sum(data$eLabQuad < 0); sum(data$eMatQuad < 0)
1] 28
[
1] 5
[
1] 8
[
summary(data$eCapQuad); summary(data$eLabQuad); summary(data$eMatQuad)
Min. 1st Qu. Median Mean 3rd Qu. Max. -0.34467 0.01521 0.07704 0.14240 0.21646 0.94331
Min. 1st Qu. Median Mean 3rd Qu. Max. -0.3157 0.4492 0.6686 0.6835 0.8694 2.3563
Min. 1st Qu. Median Mean 3rd Qu. Max. -1.4313 0.3797 0.5288 0.4843 0.6135 1.5114
par(mfrow=c(1,3)); hist(data$eCapQuad, breaks=20); hist(data$eLabQuad, breaks=20); hist(data$eMatQuad, breaks=20)
Finally I compute the Scale Elasticity for the Quadratic Especification:
Comparing both specifications:
Quadratic | Translog | |
---|---|---|
0.85 | 0.77 | |
0.55 | 0.63 | |
Obs. with negative predicted output | 0 | 0 |
Obs. that violate the monotonicity condition | 39 | 48 |
Obs. with negative scale elasticity | 22 | 0 |
To seek for an accurate instrument to model the production I should go for a non-parametric regression. By doing that I do not depend on a functional form and I can work on a marginal effects basis that can solve the problem with negative effects of inputs increasing.