Statistics for International Relations Research II

class: center, middle, inverse, title-slide

# Statistics for International Relations Research II
## Panel Models
### <large>James Hollway and Juliette Ganne</large>

---

class: center, middle

.pull-1[.circleon[![](https://static.turbosquid.com/Preview/2016/07/05__06_16_48/FishSkeletonb.jpg46B6AAD0-198F-40AC-B932-1768C4B0F869Zoom.jpg)]]
.pull-1[.circleon[![](https://www.psassets.ch/thumbs/2e/1d/08716b4f86e44e119ac2ec013b10-909889.jpg)]]
.pull-1[.circleon[![](https://static.toiimg.com/thumb/imgsize-5231,msid-35966935,width-400,resizemode-4/35966935.jpg)]]

---
class: center, middle

.pull-1[.circleon[![](https://www.club.cc.cmu.edu/~cmccabe/image/crazy_clock.jpg)]]
.pull-1[.circleon[![](https://www.fabrikat.ch/media/catalog/product/cache/4/thumbnail/447x/9df78eab33525d08d6e5fb8d27136e95/r/o/rollins_leatherrip_hammer_9_1.jpg)]]
.pull-1[.circleon[![](https://www.random.org/analysis/randbitmap-wamp.png)]]

???

- What is a panel structure?
- Why not use simple OLS, logits?
- Different types of variables and potential effects.
- Issues of serial correlation.

- Motivating and using fixed and random effects models.
- When to use which model? Panel corrected standard errors
- How to handle complication? Slow-moving variables, irregular time observations, models for non-continuous DVs.

---
class: center, middle

# Data and Error Structures

.pull-1[.circleon[![](https://www.club.cc.cmu.edu/~cmccabe/image/crazy_clock.jpg)]]
.pull-1[.circleoff[![](https://www.fabrikat.ch/media/catalog/product/cache/4/thumbnail/447x/9df78eab33525d08d6e5fb8d27136e95/r/o/rollins_leatherrip_hammer_9_1.jpg)]]
.pull-1[.circleoff[![](https://www.random.org/analysis/randbitmap-wamp.png)]]

---
## Notation

In this lecture, we are interested in data that have both cross-sectional and temporal variation.

The following terminology and notation will be useful:
- "units" are the individual things on which we have data (countries, clinics, individuals)
  - `$i \in \mathcal{N}$`
- "observations" are the measurements (for each variable) on each unit at a given point of time (GDP, nb. of successful surgery, wage)
  - `$t \in \mathcal{T}$`

This means that the total number of observations is `$NT$`.

Cross-sectional data is just at one point in time, `$Nt$`,
whereas time-series data is often just one unit over time, `$iT$`.

To talk about panel data, we need variation in both.

---
## Types of Data Structures

.red[Cross-sectional data] consists of observations on different individuals or groups at a single point in time.
- Examples are R&D spending by firms by industry, immigration policy across 24 European countries, or level of dissident repression among authoritarian regimes.
- Endogeneity is difficult to rule out, unless one uses causal estimation models (e.g. matching, or instrumental variables).

.red[Pooled cross-sectional data] describes randomly sampled cross-sections of individuals at different points in time.
- Example: ESS or ISSP (surveys come in ‘waves’ ask the same questions, but different individuals).
- Pooling makes sense if cross-sections are randomly sampled (like one big sample), and the units are interchangeable.
- Time dummy variables can be used to capture structural change over time.
- Often used to see the impact of policy or programs.

---

.red[Panel data] generally refers to data which are cross-sectionally dominated; that is, where *N* is significantly larger than *T*. 
Examples are the ANES panel studies (N = 2000; T = 6) or the Panel Study of Income Dynamics (N = large, T = 12 or so).
- Such data usually have a fixed T, so that these data's asymptotics are in N, which is important (we'll come back to this).
- This is a longitudinal type, where the same units i.e. the same households or individuals are captured over time.
- Panel data structure makes it possible to deal with certain types of endogeneity without the use of exogenous instruments.

.red[Time series cross-sectional data] (TSCS) usually refers to data in which either T is dominant, or `$N \approx T$`.
- Common in comparative politics. 
- It can also refer to data where N is dominant, but T is larger than in panel data 
(e.g. all-dyads all-years IR data, with N = several thousand and T = 50 or more).
- Here, N is usually fixed, and the asymptotics are in T; 
moreover, if we have enough data, we can say something about the time-series properties of the data 
as well as the cross-sectional part.

---
### Comparative Welfare States

.pull-left-1[

<div class="Rtable1"><table class="Rtable1">
<thead>
<tr>
<th class='rowlabel firstrow lastrow'></th>
<th class='firstrow lastrow'>Overall (N=859)</th>
</tr>
</thead>
<tbody>
<tr>
<td class='rowlabel firstrow'>Social expenditure per GDP</td>
<td class='firstrow'></td>
</tr>
<tr>
<td class='rowlabel'>Mean (SD)</td>
<td>14.0 (3.44)</td>
</tr>
<tr>
<td class='rowlabel'>Median [Min, Max]</td>
<td>14.3 [6.17, 23.1]</td>
</tr>
<tr>
<td class='rowlabel lastrow'>Missing</td>
<td class='lastrow'>62 (7.2%)</td>
</tr>
<tr>
<td class='rowlabel firstrow'>Imports</td>
<td class='firstrow'></td>
</tr>
<tr>
<td class='rowlabel'>Mean (SD)</td>
<td>-0.411 (0.249)</td>
</tr>
<tr>
<td class='rowlabel'>Median [Min, Max]</td>
<td>-0.360 [-1.47, -0.0829]</td>
</tr>
<tr>
<td class='rowlabel lastrow'>Missing</td>
<td class='lastrow'>89 (10.4%)</td>
</tr>
<tr>
<td class='rowlabel firstrow'>Votes for leftist parties</td>
<td class='firstrow'></td>
</tr>
<tr>
<td class='rowlabel'>Mean (SD)</td>
<td>37.0 (13.2)</td>
</tr>
<tr>
<td class='rowlabel'>Median [Min, Max]</td>
<td>39.7 [0, 60.7]</td>
</tr>
<tr>
<td class='rowlabel lastrow'>Missing</td>
<td class='lastrow'>23 (2.7%)</td>
</tr>
</tbody>
</table>
</div>

]

.pull-right-2[

]

???

Some variables vary over time, while others only over the units,
and others over both units and time.

Absence of variation on one dimension means there is nothing to say about that phenomenon there,
and little variation in one dimension means there is little to say about the phenomenon:
*variation is information*!

This means one needs to consider carefully “where” the variation in one's data is, 
and (more important) where one's theories suggest we should see variation as well.

---
### Long vs Wide

Panel/TSCS treated in one of two formats: long vs wide.
Long format is often preferred as more efficient and flexible.

.pull-left[

.red[Long data] has `$NT$` rows, with columns for each variable.
.tiny[

```r
data1 %>% tbl_df %>% select(idn, year, sstran, rgdpecap, leftvot) %>% 
  print(n=5)
```

```
## # A tibble: 859 x 5
## idn year sstran rgdpecap leftvot 
## <fct> <dbl> <labelled> <dbl> <labelled>
## 1 1 1980 6.364682 21955. 40.5 
## 2 1 1981 6.353022 22753. 45.1 
## 3 1 1982 7.232827 21889. 45.1 
## 4 1 1983 7.439353 22836. 48.8 
## 5 1 1984 7.268586 23339. 49.3 
## # … with 854 more rows
```

]
]

.pull-right[

.red[Wide data] has `$N$` rows and additional columns for each variable multiplied by `$T$`.

```r
data1 %>% tbl_df %>% select(idn, year, sstran, rgdpecap, leftvot) %>%
* pivot_wider(names_from = year,
*             values_from = c(sstran, rgdpecap, leftvot)) %>%
  print(n=3)
```

```
## # A tibble: 22 x 121
## idn sstran_1980 sstran_1981 sstran_1982 sstran_1983 sstran_1984 sstran_1985
## <fct> <labelled> <labelled> <labelled> <labelled> <labelled> <labelled> 
## 1 1 6.364682 6.353022 7.232827 7.439353 7.268586 7.041956 
## 2 2 16.370188 16.818114 17.164325 17.293337 17.458069 17.804682 
## 3 3 16.807869 18.029742 18.236084 18.825784 18.163483 17.828126 
## # … with 19 more rows, and 114 more variables: sstran_1986 <labelled>,
## # sstran_1987 <labelled>, sstran_1988 <labelled>, sstran_1989 <labelled>,
## # sstran_1990 <labelled>, sstran_1991 <labelled>, sstran_1992 <labelled>,
## # sstran_1993 <labelled>, sstran_1994 <labelled>, sstran_1995 <labelled>,
## # sstran_1996 <labelled>, sstran_1997 <labelled>, sstran_1998 <labelled>,
## # sstran_1999 <labelled>, sstran_2000 <labelled>, sstran_2001 <labelled>,
## # sstran_2002 <labelled>, sstran_2003 <labelled>, sstran_2004 <labelled>,
## # sstran_2005 <labelled>, sstran_2006 <labelled>, sstran_2007 <labelled>,
## # sstran_2008 <labelled>, sstran_2009 <labelled>, sstran_2010 <labelled>,
## # sstran_2011 <labelled>, sstran_2012 <labelled>, sstran_2013 <labelled>,
## # sstran_2014 <labelled>, sstran_2015 <labelled>, sstran_2016 <labelled>,
## # sstran_2017 <labelled>, sstran_2018 <labelled>, sstran_NA <labelled>,
## # rgdpecap_1980 <dbl>, rgdpecap_1981 <dbl>, rgdpecap_1982 <dbl>,
## # rgdpecap_1983 <dbl>, rgdpecap_1984 <dbl>, rgdpecap_1985 <dbl>,
## # rgdpecap_1986 <dbl>, rgdpecap_1987 <dbl>, rgdpecap_1988 <dbl>,
## # rgdpecap_1989 <dbl>, rgdpecap_1990 <dbl>, rgdpecap_1991 <dbl>,
## # rgdpecap_1992 <dbl>, rgdpecap_1993 <dbl>, rgdpecap_1994 <dbl>,
## # rgdpecap_1995 <dbl>, rgdpecap_1996 <dbl>, rgdpecap_1997 <dbl>,
## # rgdpecap_1998 <dbl>, rgdpecap_1999 <dbl>, rgdpecap_2000 <dbl>,
## # rgdpecap_2001 <dbl>, rgdpecap_2002 <dbl>, rgdpecap_2003 <dbl>,
## # rgdpecap_2004 <dbl>, rgdpecap_2005 <dbl>, rgdpecap_2006 <dbl>,
## # rgdpecap_2007 <dbl>, rgdpecap_2008 <dbl>, rgdpecap_2009 <dbl>,
## # rgdpecap_2010 <dbl>, rgdpecap_2011 <dbl>, rgdpecap_2012 <dbl>,
## # rgdpecap_2013 <dbl>, rgdpecap_2014 <dbl>, rgdpecap_2015 <dbl>,
## # rgdpecap_2016 <dbl>, rgdpecap_2017 <dbl>, rgdpecap_2018 <dbl>,
## # rgdpecap_NA <dbl>, leftvot_1980 <labelled>, leftvot_1981 <labelled>,
## # leftvot_1982 <labelled>, leftvot_1983 <labelled>, leftvot_1984 <labelled>,
## # leftvot_1985 <labelled>, leftvot_1986 <labelled>, leftvot_1987 <labelled>,
## # leftvot_1988 <labelled>, leftvot_1989 <labelled>, leftvot_1990 <labelled>,
## # leftvot_1991 <labelled>, leftvot_1992 <labelled>, leftvot_1993 <labelled>,
## # leftvot_1994 <labelled>, leftvot_1995 <labelled>, leftvot_1996 <labelled>,
## # leftvot_1997 <labelled>, leftvot_1998 <labelled>, leftvot_1999 <labelled>,
## # leftvot_2000 <labelled>, leftvot_2001 <labelled>, leftvot_2002 <labelled>,
## # leftvot_2003 <labelled>, leftvot_2004 <labelled>, leftvot_2005 <labelled>,
## # …
```

]

???

You can pivot or *reshape* between these two forms using various functions in R.

---
### Time-Constant and Time-Varying Variables

Three types of explanatory variables that can be located either at the level of units or level of contexts (aka time/group).

.red[Time-constant variables]: 
- e.g. ethnicity or gender (individuals), geographical location or type of government (context).
- Some are treated as time-constant because change is rare or a variable is more or less a stable characteristic.
- They do not vary over time (obviously) but can vary across units.

.red[Time-varying variables]: 
- e.g. labor force experience and on the job-training (individual), or economic growth and public spending (context).
- Can characterize the unit or the context.

.red[Time]: 
- Debatable whether time itself really an explanatory variable or an indicator for other unobserved characteristics that change over time.
- But time may capture possible time trends in the data.

---
## Pooled OLS

.pull-left[

Now, we could just model this using OLS: `$Y_i = \alpha + \beta X_i + u_i$`

Since each observation involves both a unit and a timepoint, it is really:

`$$Y_{it} = \alpha + \beta X_{it} + u_{it}$$`

This basically 'pools' all this information and just concentrates on the relationship
between explanatory variables and the response variable (here social expenditure).

]
.pull-right[

```r
ols <- lm(sstran ~ csh_m + leftvot, data1)
tab_model(ols, digits = 3)
```

</table>

]

---
### So why not OLS?

Recall that, in addition to all the usual assumptions, OLS is also assuming that
- the constant term is constant across different *i*s
- the effect of any given variable `$X$` on `$Y$` is constant across observations 
(at least to the extent that non-constancy isn't specified in the model, e.g., through interaction terms).

But this is usually problematic in a panel/TSCS context,
because we usually have some reason to believe that there may be differences
in either `$\alpha$` or `$\beta$` over `$i$` or `$t$`,
which leads to a form of specification bias.

While aggregating variation across a dimension can be useful, 
it can also tempt one to commit the .red[ecological fallacy]:
inferring individual-level relationships on the basis of aggregate data.

For example, at the individual level, in the US, being wealthier is positively correlated with voting for the Republican party, but at the state level, the poorest states are the ones voting for Republican Presidential candidates.

---
### Simpson's Paradox

.pull-left[

]

.pull-right[

]

---
### Another example

---
## Varying intercepts and slopes

*Intercepts may vary*
- e.g. different units have different starting points for the (same) slopes

If we estimate this model as OLS, we can get biased coefficients.

*Slopes may vary*
- e.g. different units respond to covariates differently and so the effect of X on Y differs

If we estimate this model as OLS, we'll only get an 'average' of the different slopes,
and if there are, say, two groups that have radically different responses to a covariate,
then these can cancel out and we could get a Type II error.

*Both intercepts and slopes may vary*
- e.g. different units start in different places *and* respond to covariates differently

???

Though varying intercepts and slopes most commonly used by unit,
we could also have variation in `$\alpha$` or `$\beta$` over time.

---
class: center, middle

# Fixed Effects

.pull-1[.circleoff[![](https://www.club.cc.cmu.edu/~cmccabe/image/crazy_clock.jpg)]]
.pull-1[.circleon[![](https://www.fabrikat.ch/media/catalog/product/cache/4/thumbnail/447x/9df78eab33525d08d6e5fb8d27136e95/r/o/rollins_leatherrip_hammer_9_1.jpg)]]
.pull-1[.circleoff[![](https://www.random.org/analysis/randbitmap-wamp.png)]]

---
## Modeling this

Panel data models differ depending on where we think errors might correlate.

We'll concentrate on .red[fixed effects] and .red[random effects] today, 
since they are the building block to understanding the rest.

When random effects are used together with fixed effects, we call this a .red[mixed effects model].

There are other names and similar models out there too that you may encounter,
such as .red[hierarchical models] (a multilevel model with a single nested hierarchy) 
and .red[multilevel models] (a hierarchical model with multiple non-nested hierarchies).

---
## The Error Term

Now we could have different `$\alpha$`s and `$\beta$`s for each unit, each time point, 
or every combination of unit and time point:

`$$Y_{it} = \alpha_{it} + \beta_{it} + u_{it}$$`

But we've been assuming throughout that `$u_{it}$` is homoskedastic and uncorrelated, 
both within and across `$i$` and `$t$`, i.e.:
- no cross-unit heteroskedasticity
- no temporal heteroskedasticity
- no autocorrelation

That's a pretty tall order.

Remember, the error term is supposed to be a *stochastic* element to the model and should not incorporate any *systematic* differences
(those should be in the model). But:
- Cross-unit differences mean that the model does a better job of explaining some units than others
- Time effects (such as socialization, institutionalization, learning, or other such dynamics) cause the model to do a better or worse job of explaining Y over time,
- Omitted variables lead to correlation in the residuals, either across units (because time matters) or (more commonly) over time (because units matter).

---
### Two types of errors

We can actually have two different error terms that capture different types of unobserved heterogeneity.
1. `$u_i$` the unit-specific error to account for between unit variation
  - unobserved predictors of `$Y$` that are specific to the unit and therefore time-constant.
2. `$e_{it}$` the time-varying error to account for within unit variation
  - unobserved predictors of `$Y$` that are specific to the time point and the unit.
  
--

Now, this wouldn't be a problem (OLS would not be biased) if this unobserved heterogeneity would be independent of the explanatory variables in the model, but:
- most likely still serial correlation
- residuals may still correlate on the basis of unobserved unit-specific heterogeneity, even if uncorrelated with variables in the model.

???

Key point: measurements over time are almost never independent so OLS assumptions violated; i.e. don't use OLS...

---
## Introduction to Fixed Effects

Fixed effects (FE) explore the relationship between predictor and outcome variables within an entity (country, person, company, etc.).

Each entity has its own individual characteristics that may or may not influence the predictor variables.
- For example, being a male or female could influence the opinion towards a certain issue;
- The political system of a particular country could have some effect on trade or GDP;
- Or the business practices of a company may influence its stock price.

This can be a source of unit-level unobserved heterogeneity/idiosyncratic error, `$u_i$`
- Easy to fix if we have information about them. Simply put them as another independent variable into our regression model.
- But what about those factors that are hard to measure or those which we have not yet considered?

Note the separate intercept term.
i.e., some clusters tend to have higher values of Y than others.
This is known as the .red[variable intercept model].
You can think of it as a model of individual-level heterogeneity (which matters if we have omitted variable bias).

???

This unobserved heterogeneity constitute the unobserved factors that influence the DV that changes across units and time.

We do not want to multiply the number of variables making the model too complex, when this could be dealt with all at once.

---

.pull-left[
## The LSDV Method

Treating the unit effects `$\alpha_i$` as fixed values is, in many respects, the simplest thing we can do.

Consider first a general model in which we replace the general intercept
with individual (unit-) level effects,
i.e. that some units/clusters have higher or lower levels of the outcome variable than others:

`$$Y_{it} = \alpha_i + \beta X_{it} + u_{it}$$`

This is also called the .red[least-squares dummy variables] (LSDV) method.
That is, we simply estimate the equation from earlier by including separate dummy variables for each unit in the model along with the covariates.

This dummy works like a sponge – it “soaks up” all potential error that is due to unobserved country-specific characteristics.

]
.pull-right[
.small[

```r
*lsdv <- lm(sstran ~ 0 + factor(id) + csh_m + leftvot, data1)
tab_model(lsdv, rm.terms = "factor(id) [BEL,CAN,FIN,FRG,IRE,ITA,LUX,JPN,NOR,SPA,SWE,UKM,FRA,GRE,POR]")
```

</table>

]
]

???

Basically, we create dummy variables for each country, the dummy for Belgium works that all the observations pertaining to Belgium get a 1 and all the observations from other countries get a 0.

The O at the front of the formula (could be a -1 at the end) is removing all intercepts (as shown in the table) and the creating one time-invariant intercept for each unit with the dummies.

---
### Comparing Pooled OLS and LSDV

.pull-left-1[

A pooled OLS model ignores the panel structure of the data. 
- but observations belonging to the same country are not independent!
- two variables probably do not capture all country-specific heterogeneity.

An LSDV model includes `$u_i$` (error to account for the specificities of each country) to adjust for that.
- Once we add the country-dummies, the effect of imports is smaller and not statistically significant.

]

.pull-right-2[
.tiny[
<table style="border-collapse:collapse; border:none;">
<tr>
<th style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; text-align:left; ">&nbsp;</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">OLS</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">LSDV</th>
</tr>
<tr>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; text-align:left; ">Predictors</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; col7">p</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">(Intercept)</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.016</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">9.288&nbsp;&ndash;&nbsp;10.743</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7"></td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Imports</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-3.586</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-4.529&nbsp;&ndash;&nbsp;-2.643</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.614</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-2.020&nbsp;&ndash;&nbsp;0.792</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">0.392</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Votes for leftist parties</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">0.066</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">0.049&nbsp;&ndash;&nbsp;0.084</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.064</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.091&nbsp;&ndash;&nbsp;-0.037</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)AUL</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.377</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.975&nbsp;&ndash;&nbsp;11.778</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)AUS</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">20.890</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">19.334&nbsp;&ndash;&nbsp;22.445</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)DEN</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">19.192</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">17.691&nbsp;&ndash;&nbsp;20.693</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)NET</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">16.281</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">15.074&nbsp;&ndash;&nbsp;17.487</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)NZL</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">14.085</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">12.716&nbsp;&ndash;&nbsp;15.454</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)SWZ</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">12.739</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">11.642&nbsp;&ndash;&nbsp;13.836</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)USA</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">11.283</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.627&nbsp;&ndash;&nbsp;11.940</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">R2 / R2 adjusted</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.154 / 0.151</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.983 / 0.982</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">AIC</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3789.171</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3069.280</td>
</tr>

</table>

]
]

???

Pooled OLS estimation is simply an OLS technique run on panel data

Comparing OLS LSDV we see that the latter has no intercept and that the significance of our variables of interest are no longer present.

---

.pull-left[

```r
plot_model(ols, type = "pred", 
           terms = c("leftvot"))
```

]

.pull-right[

```r
plot_model(lsdv, type = "pred", 
           terms = c("leftvot"))
```

<img src="STAT_L8_Mixed_files/figure-html/lsdvPlot1-1.png" width="504" />
]

```r
plot_model(lsdv, type = "pred", terms = c("leftvot","id"))
```

]
]

???

To come back to the Simpson paradox, the OLS was telling us that vote for leftist parties where positively correlated (and significant) with the spcial expenditure per capita, but when we remove the intercept/control for the unit-specific characteristics then the effect is actually negative. We might want to rethinking our hypothesis/variables.

---

---
## Within estimator

.pull-left[

For large datasets, an alternative is to simply leave the intercept in and then the first country (Australia) is the baseline.

It is called the .red[within estimator]

We measure how the observations in the respective country deviate on average from this country.

We can write this model as:

`$$Y_{it} = \alpha_i + \beta_B \bar{X}_i + \beta_W (X_{it} - \bar{X}_i) + u_{it}$$`

Why are the `$R^2$` statistics different?
- Since LSDV uses the original data, `$R^2$` measures the explained proportion of the overall variance.
- Since the FE model used time-demeaned data, `$R^2$` measures the explained portion of the within variance.

]

.pull-right[
.tiny[
<table style="border-collapse:collapse; border:none;">
<tr>
<th style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; text-align:left; ">&nbsp;</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">LSDV</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">Within</th>
</tr>
<tr>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; text-align:left; ">Predictors</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; col7">p</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)AUL</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.377</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.975&nbsp;&ndash;&nbsp;11.778</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7"></td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)AUS</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">20.890</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">19.334&nbsp;&ndash;&nbsp;22.445</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.513</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">9.521&nbsp;&ndash;&nbsp;11.505</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)DEN</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">19.192</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">17.691&nbsp;&ndash;&nbsp;20.693</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.815</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">7.837&nbsp;&ndash;&nbsp;9.793</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)NET</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">16.281</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">15.074&nbsp;&ndash;&nbsp;17.487</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">5.904</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">4.973&nbsp;&ndash;&nbsp;6.835</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)NZL</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">14.085</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">12.716&nbsp;&ndash;&nbsp;15.454</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">3.709</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">2.812&nbsp;&ndash;&nbsp;4.605</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)SWZ</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">12.739</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">11.642&nbsp;&ndash;&nbsp;13.836</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">2.362</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">1.303&nbsp;&ndash;&nbsp;3.421</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)USA</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">11.283</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.627&nbsp;&ndash;&nbsp;11.940</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">0.907</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.603&nbsp;&ndash;&nbsp;2.417</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">0.239</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Imports</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.614</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-2.020&nbsp;&ndash;&nbsp;0.792</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">0.392</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.614</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-2.020&nbsp;&ndash;&nbsp;0.792</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">0.392</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Votes for leftist parties</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.064</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.091&nbsp;&ndash;&nbsp;-0.037</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.064</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.091&nbsp;&ndash;&nbsp;-0.037</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">(Intercept)</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.377</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.975&nbsp;&ndash;&nbsp;11.778</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">R2 / R2 adjusted</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.983 / 0.982</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.699 / 0.689</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">AIC</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3069.280</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3069.280</td>
</tr>

</table>

]
]

???

Now, the regression output actually does report a constant here, which is simply the basline category Australia.

What we see is that our adjusted `$R^2$` is much better for the LSDV, because with LSDV we are interested in the variance overall, while with our within-estimation model explains only the variance within a single unit/country.

Another difference is the estimates for the dummy variables, but if we look closely, in our LSDV model, this is just estimates+intercept. Our estimate for our other variables are the same.

LSDV (regress with group dummies) and the fixed effects/Within estimator (regress with demeaned variables) are exactly the same. The Within estimator is just a computational trick for estimating the fixed effect.

---

```r
plot_model(lsdv, type = "pred", terms = c("leftvot","id"))
```

```r
plot_model(fe, type = "pred", terms = c("leftvot","id"))
```

???

They are the same.

---
## Advantages and disadvantages

With FE, we assume some unit characteristics may impact or bias IVs or DVs and we need to control for this.
FE removes the effect of time-invariant unit characteristics to assess predictors' net effect on the outcome.
They are therefore used to study the causes of changes within each unit (and are thus broadly used in e.g. economics).

**Advantages**
1. FE estimates always .blue[BLUE], even if unit effects correlate with a predictor
1. They’re generally widely used, well established, and (almost always) non-controversial.
1. Use fixed-effects (FE) whenever you are only interested in analyzing the impact of variables that vary over time

**Disadvantages**
1. FE models inefficient, using up many degrees of freedom and affecting standard errors
1. FE will not work well with data for which within-cluster variation is minimal or for investigating variables that do not change over time or only change slowly (e.g. race or age) because they will be highly collinear with the fixed effects.
1. No out of sample predictions because you are linking your estimates to these particular units

---
class: center, middle

# Random Effects

.pull-1[.circleoff[![](https://www.club.cc.cmu.edu/~cmccabe/image/crazy_clock.jpg)]]
.pull-1[.circleoff[![](https://www.fabrikat.ch/media/catalog/product/cache/4/thumbnail/447x/9df78eab33525d08d6e5fb8d27136e95/r/o/rollins_leatherrip_hammer_9_1.jpg)]]
.pull-1[.circleon[![](https://www.random.org/analysis/randbitmap-wamp.png)]]

---
## Introduction to Random Effects

An alternative to treating units with fixed effects is to treat them with .red[random effects].
- fixed effects essentially your predictor variables (what you are interested in after accounting for random variability)
- random effects best defined as noise in your data (arising from uncontrollable variability within the sample)

With random effects, 
across unit variation is assumed to be random and uncorrelated with the predictor or independent variables included in the model.
That is, heterogeneity in our unit-specific error `$u_i$` is a sort of random disturbance.

In the RE model, the `$u_i$`s are now seen as one component of the stochastic part of the model (as the realization of a random variable).
Unlike in the fixed model, 
we want to estimate error terms `$u_i$` and `$e_{it}$` which together constitute `$\epsilon_{it}$`

`$$Y_{it} - \lambda  \bar{Y}_{i}=  \beta (X_{it}- \lambda \bar X_{i})... + \epsilon_{it}- \lambda \bar \epsilon_{i}$$`

???

A “fixed variable” is one that is assumed to be measured without error. It is also assumed that the values of a fixed variable in one study are the same as the values of the fixed variable in another study. “Random variables” are assumed to be values that are drawn from a larger population of values and thus will represent them. You can think of the values of random variables as representing a random sample of all possible values or instances of that variable. Thus, we expect to generalize the results obtained with a random variable to all other possible instances of that value (e.g., a job candidate with a strong résumé). Most of the time in ANOVA and regression analysis we assume the independent variables are fixed.

The terms “random” and “fixed” are used in the context of ANOVA and regression models and refer to a certain type of statistical model. Almost always, researchers use fixed effects regression or ANOVA and they are rarely faced with a situation involving random effects analyses. A fixed-effects ANOVA refers to assumptions about the independent variable and that error distribution for the variable. An experimental design is the easiest example for illustrating the principal. Usually, the researcher is interested in only generalizing the results to experimental values used in the study. For instance, a drug study might use 0 mg, 5 mg, and 10 mg of an experimental drug. This is a circumstance when a fixed effects ANOVA would be appropriate. In this example, the extrapolation is to other studies or treatments that might use the same values of the drug (i.e., 0 mg, 5 mg, and 10 mg). However, if the researcher wants to make inferences beyond the particular values of the independent variable used in the study, a random effects model is used. A common example would be the use of public art works representing low, moderate, and high abstractness (e.g., statue of a war hero vs. a pivoting geometric design). The researcher would like to make inferences beyond just one art piece representing each category of abstractness, so the art pieces are conceptualized as pieces randomly drawn from a larger universe of possible pieces that are sampled from the domain for that level of abstractness. For example, one could imagine using several instances of high abstract pieces that are randomly drawn from a larger population of high abstract pieces and are thought to be only a few of the possible particular instances of high abstract art. Thus, the inferences are made to a larger universe of art works with variations of abstractness within each category. Such a generalization is more of an inferential leap, and, consequently, the random effects model is less powerful because we are taking into account some additional expected random variation on the independent variable. Random effects models are sometimes referred to as “Model II” or “variance component models.” Analyses using both fixed and random effects are called “mixed models” or "mixed effects models" which is one of the terms given to multilevel models.

So in the example for today, one could think that the political system and the unionization covary with the our composite error term which includes both the specificities of the countries themselves and the action of time.

The random effects will create more bias but less variance, which could lead to the estimate being closer to the real parameter value.

A way to think about RE is to see them in-between FE and pooled OLS. The lambda is partly demeaning the data but not completely. We are taking off a fraction of the time demeaned values.

---
### Understanding fixed and random effects models

#### Fixed effects models

.small[In fixed-effects models (e.g., regression, ANOVA, generalized linear models), there is only one source of random variability: the random sample we take to measure our variables.
- e.g. patients in a health facility, with measures of their medical history to estimate their probability of recovery
- e.g. individual students in a school system, with demographic information to predict their grade point averages

We call the variability across individuals’ “residual” variance (in linear models, this is the estimate of `$\sigma^2$`, also called the mean squared error). It’s the variability that was unexplained by the predictors in the model (the fixed effects).]

#### Mixed effects models (with random effects)

.small[Mixed effects models—whether linear or generalized linear—are different in that there is more than one source of random variability in the data.
- e.g. in addition to patients, there may also be random variability across the doctors of those patients
- e.g. in addition to students, there may be random variability from the teachers of those students.

Some doctors’ patients may have a greater probability of recovery, and others may have a lower probability, even after we have accounted for the doctors’ experience and other measurable traits. Some teachers’ students will have higher GPAs than other teachers’ students, even after we account for teaching methods.]

???

#### Random Effects: Intercepts and Slopes

We account for these differences through the incorporation of random effects. Random intercepts allow the outcome to be higher or lower for each doctor or teacher; random slopes allow fixed effects to vary for each doctor or teacher.

What do these random effects mean? How do we interpret them? We usually talk about them in terms of their variability, instead of focusing on them individually.

Using the patient/doctor data as an example, this allows us to make “broad level” inferences about the larger population of patients, which do not depend on a particular doctor. In other words, we can now incorporate (instead of ignore) doctor-to-doctor variability in patient recovery, and improve our ability to describe how fixed effects relate to outcomes.

#### Variance of Random Effects

We can also talk directly about the variability of random effects, similar to how we talk about residual variance in linear models.

There is no general measure of whether variability is large or small, but subject-matter experts can consider standard deviations of random effects relative to the outcomes.

For example, if teacher-averaged GPAs only vary from the overall average with an SD of 0.02 GPA points, the teachers may be considered rather uniform; however, if teacher-averaged GPAs varied from the overall average with an SD of 0.5 GPA points, it would seem as if individual teachers could make a large difference in their students’ success.

(For an additional way to look at variability in linear mixed effects models, check out Karen’s blog post on ICC here.)

#### Individual random effects

Finally, we can talk about individual random effects, although we usually don’t.  This was not the original purpose of mixed effects models, although it has turned out to be useful in certain applications. Software programs do provide access to the random effects (best linear unbiased predictors, or BLUPs) associated with each of the random subjects.

BLUPs are the differences between the intercept for each random subject and the overall intercept (or slope for each random subject and the overall slope). In some software, such as SAS, these are accompanied by standard errors, t-tests, and p-values.

In the case of the patient/doctor data set (assuming no random slopes for easier interpretation), a small p-value for an individual doctor’s random intercept would indicate that the doctor’s typical patient recovery probability is significantly different from an average doctor’s typical patient recovery probability.

These standard errors and p-values are adjusted so that they account for all of the fixed effects in the model as well as the random variability among patients. Clearly, this information could be of interest to the doctor’s place of work, or to a patient who is choosing a doctor.

---
## RE assumptions

RE assumes that:
- `$u_i$` and `$e_{it}$` are uncorrelated with the explanatory variables, 
- have constant variances `$\sigma^2_{u}$` and `$\sigma^2_{e}$`
- and are independent of each other and across units.

Given this we arrive at `$\epsilon_{it} = u_i + e_{it}$`.

Because the overall error term `$\epsilon_{it}$` is split into two components, `$u_i$` (unit-specific error) and `$e_{it}$` (time-varying error),
this model is also called the error or .red[variance component model].

Then we can get the expected amount of serial correlation by calculating:

`$$\lambda = 1-\sqrt{\frac{\sigma^2_{u}}{\sigma^2_{u} + T\sigma^2_{e}}}$$`

???

If the lambda of this equation is 0 then we should consider pooled OLS rather than RE and if it is closed to 1 then we should go with FE.

Everything is concentrated in this `$T\sigma^2_{e}$` 
- if the number is close to infinity then we need the fixed effect to "soak up" all those specificities linked to the countries. 
- if the result is 0 than it means that the `$\sigma^2_{e}$` is also close to 0 and that our error is unimportant and that our model explains the variation already then we can use OLS.

If it is in-between 0 and 1 (usually the case), we should consider using RE. With RE, we are taking a part of the mean of our variable but not all (quasi time-demeaned system).

---

.pull-left-2[
.tiny[

```r
rei <- lme4::lmer(sstran ~ csh_m + leftvot + (1|id), data1)
res <- lme4::lmer(sstran ~ csh_m + leftvot + (1 + leftvot|id), data1)
rep <- lme4::lmer(sstran ~ csh_m + leftvot + (1 + leftvot + year|id), data1)
tab_model(rei, res, rep, dv.labels = c("1","2","3"), rm.terms = "id [BEL,CAN,FIN,ITA,LUX,JPN,NOR,NET,SPA,SWE,SWZ,NZL,UKM,FRA,GRE,POR]", digits = 3, show.aic = T)
```

</table>

]
]

.pull-right-1[

`$N$` is our number of units (22 countries)

`$\tau_{00}$` is the random intercept for our units

`$\tau_{11}$` is the random slopes for our units by variable

`$\sigma^2$` is the residual variance

`$\varrho_{01}$` is the correlation between the intercepts and the coefficients

`$ICC$` is the *Intraclass Correlation Coefficient*, the ratio of the between-cluster variance to the total variance

`$AIC$` is as we've already discussed

Marginal `$R^2$` only considers the variance of the fixed effects

Conditional `$R^2$` takes both fixed and random effects into account

]

???

We are going to use the `lme4` package here to fit linear models with random effects.
Please note though that there are a host of other packages out there,
some of which are more flexible or specific.
`lme4` is a pretty standard package though, so that's what we'll be using here.

Note that in the formulas we have some new additions to our typical syntax.
In the first model, we're adding a random intercept effect for countries.
The 1 before the bar/pipe/mid indicates that we want an intercept for our random effect.
In the second model, we're adding random slopes for the effect of leftist parties by each country.
And in the third model, we're adding time as an extra random slope.
To suppress the random intercept, you would do (year - 1|...).

If `$\varrho_{01}$` is positive, it suggests that those groups with larger intercepts will have steeper slopes,
whereas if `$\varrho_{01}$` is negative, it suggests those groups with larger intercepts will have flatter slopes.

The ICC can help determine whether a linear mixed model is even necessary.
ICC is a measure of how much of the variation in the response variable, which is not attributed to fixed effects, is accounted for by a random effect. It is the ratio of the variance of the random effect to the total random variation.
If the correlation were 0, then observations within clusters are no more similar than observations from different clusters,
and we may as well use a pooled OLS.
However here it is high (>.75), which suggests there is good reason to be using this.
There is consistency in grouped observations.

In our 3 model, our fixed effects are explaining most of the variation, while in model 1 and 2, the addition of random effects is greatly improving our explanatory power. 
This is logical, since in model 3, countries are consider through both random and fixed effects.

Note that we are treating year as a continuous variable here, 
because to treat as a factor would be to use up too many degrees of freedom.

---

.pull-left[

```r
plot_model(rei, type = "pred", terms = c("leftvot","id"), pred.type="re")
```

]

.pull-right[

```r
plot_model(res, type = "pred", terms = c("leftvot","id"), pred.type="re")
```

<img src="STAT_L8_Mixed_files/figure-html/plotRES-1.png" width="504" />
]

```r
plot_model(rep, type = "pred", terms = c("leftvot","id","year"), pred.type="re")
```

???

For more on interpreting random effects in linear mixed-effect models see [here](https://lionel68.github.io/r%20and%20stat/interpreting-random-effects-in-linear-mixed-effect-models/).

---
### Why might we want to do this?

**Advantages**
1. REs provide gains in error variance as fewer parameters need to be estimated.
1. REs allow time-invariant variables to play a role as explanatory variables (e.g. gender)
  - in a FE model, these variables would be absorbed by the dummy variables
1. REs allow us to port our results more readily and make generalizations also to (some) other contexts
  - Under the RE model `$u_i$` is assumed to be a random draw 
  from the universe of all possible values of a random variable having a certain distribution 
  (e.g., the normal distribution).
  - Under the FE model `$u_i$` is assumed to be a parameter 
  that is to be estimated from the data of the sampled unit 
  (and hence, may be different in another sample).

**Disadvantages**

1. REs require that all unmeasured factors that go in to `$\alpha_i$` are uncorrelated with some of the `$X$`s that are in the model.
  - Unless experiments, some variables may not unavailable or unknown, leading to OVB
1. You need to specify those individual characteristics that may or may not influence the predictor variables. 
There should be no omitted variable.

---
## Comparing Pooled OLS, LSVD, FE and RE Models

.tiny[

<table style="border-collapse:collapse; border:none;">
<tr>
<th style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; text-align:left; ">&nbsp;</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">Pooled OLS</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">LSDV</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">Within</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">1</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">2</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">3</th>
</tr>
<tr>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; text-align:left; ">Predictors</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; col7">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; col8">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; col9">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 0">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 1">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 2">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 3">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 4">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 5">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 6">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 7">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 8">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; 9">p</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">(Intercept)</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.016</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">9.288&nbsp;&ndash;&nbsp;10.743</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">10.377</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">8.975&nbsp;&ndash;&nbsp;11.778</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1">15.531</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2">13.795&nbsp;&ndash;&nbsp;17.267</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4">15.413</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5">12.864&nbsp;&ndash;&nbsp;17.962</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7">15.501</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8">13.682&nbsp;&ndash;&nbsp;17.320</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Imports</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-3.586</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-4.529&nbsp;&ndash;&nbsp;-2.643</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.614</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-2.020&nbsp;&ndash;&nbsp;0.792</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">0.392</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">-0.614</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">-2.020&nbsp;&ndash;&nbsp;0.792</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">0.392</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1">-0.876</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2">-2.238&nbsp;&ndash;&nbsp;0.485</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3">0.207</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4">-1.101</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5">-2.511&nbsp;&ndash;&nbsp;0.310</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6">0.126</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7">-0.811</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8">-2.193&nbsp;&ndash;&nbsp;0.571</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9">0.250</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Votes for leftist parties</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">0.066</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">0.049&nbsp;&ndash;&nbsp;0.084</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.064</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">-0.091&nbsp;&ndash;&nbsp;-0.037</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">-0.064</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">-0.091&nbsp;&ndash;&nbsp;-0.037</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1">-0.054</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2">-0.080&nbsp;&ndash;&nbsp;-0.028</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4">-0.057</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5">-0.115&nbsp;&ndash;&nbsp;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6">0.054</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7">-0.053</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8">-0.083&nbsp;&ndash;&nbsp;-0.023</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9">0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)AUL</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.377</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.975&nbsp;&ndash;&nbsp;11.778</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9"></td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)AUS</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">20.890</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">19.334&nbsp;&ndash;&nbsp;22.445</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">10.513</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">9.521&nbsp;&ndash;&nbsp;11.505</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9"></td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)DEN</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">19.192</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">17.691&nbsp;&ndash;&nbsp;20.693</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">8.815</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">7.837&nbsp;&ndash;&nbsp;9.793</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9"></td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)FRG</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">19.588</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">18.128&nbsp;&ndash;&nbsp;21.047</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">9.211</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">8.313&nbsp;&ndash;&nbsp;10.109</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9"></td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)IRE</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">13.209</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">12.296&nbsp;&ndash;&nbsp;14.122</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">2.832</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">1.692&nbsp;&ndash;&nbsp;3.973</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9"></td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">factor(id)USA</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; "></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">11.283</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.627&nbsp;&ndash;&nbsp;11.940</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col8">0.907</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col9">-0.603&nbsp;&ndash;&nbsp;2.417</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 0">0.239</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 1"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 2"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 3"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 4"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 5"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 6"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 7"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 8"></td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; 9"></td>
</tr>
<tr>
<td colspan="19" style="font-weight:bold; text-align:left; padding-top:.8em;">Random Effects</td>
</tr>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&sigma;2</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3.62</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3.41</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3.59</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&tau;11</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.01 id.leftvot</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.00 id.leftvot</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">N</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">22 id</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">22 id</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">22 id</td>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">738</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">R2 / R2 adjusted</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.154 / 0.151</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.983 / 0.982</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.699 / 0.689</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.037 / 0.740</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.038 / 0.779</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.040 / 0.708</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">AIC</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3789.171</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3069.280</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3069.280</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3155.161</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3141.780</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">3162.989</td>
</tr>

</table>

]

???

Pooled OLS not the best option since observations within the same countries are not independent.

LSDV, we removed the intercept and we see that some of the IV are not longer significant. High R-square. 
Within, Australia is our baseline to create the dummy variables (intercept), estimates are the same for our IV, but not for our dummies. 
Mixed effects/RET, both time and countries are treated with random effects, coefficients are not the same, the question is how different are they? See Hausman test

---
## Fixed or Random?

There is a lot of discussion about which one to use.
(And a lot of [confusion about the terms](https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/) themselves).

One option is to use a Hausman test to test how much coefficients under each model differ.

The idea is the following: 
If two estimators are consistent under a given set of assumptions, 
their estimates should not differ significantly. 
But if only one of the two estimators provides consistent estimates, 
then the estimates from both estimators should differ significantly.

The Hausman test calculates the standard error of the difference between FE and RE and then can be used for a t-test.
The null hypothesis is that the preferred model is random effects versus the alternative the fixed
effects. It basically tests whether the unique errors `$u_i$` are correlated with the independant variables, 
the null hypothesis is they are not.

---
### Hausman test

More formally, we can test whether a fixed or random effects model is appropriate using a Durbin–Wu–Hausman test.

`$$H_0: \alpha_i \perp X_{it}, Z_i$$`

`$$H_0: \alpha_i \not\perp X_{it}, Z_i$$`

If `$H_{0}$` is true, both `$\widehat {\beta }_{{RE}}$` and `$\widehat {\beta }_{{FE}}$` are consistent, 
but only `$\widehat {\beta }_{{RE}}$` is efficient. 
If `$H_{{a}}$` is true, `$\widehat {\beta }_{{FE}}$` is consistent and `$\widehat {\beta }_{{RE}}$` is not.

---

.tiny[

```r
hausman_test <- function (lmerMod, lmMod, ...) { ## changed function call
 coef.wi <- coef(lmMod)
 coef.re <- fixef(lmerMod) ## changed coef() to fixef() for glmer
 vcov.wi <- vcov(lmMod)
 vcov.re <- vcov(lmerMod)
 names.wi <- names(coef.wi)
 names.re <- names(coef.re)
 coef.h <- names.re[names.re %in% names.wi]
 dbeta <- coef.wi[coef.h] - coef.re[coef.h]
 dvcov <- vcov.re[coef.h, coef.h] - vcov.wi[coef.h, coef.h]
 stat <- abs(t(dbeta) %*% as.matrix(solve(dvcov)) %*% dbeta) ## added as.matrix()
 pval <- pchisq(stat, df = length(dbeta), lower.tail = FALSE)
 names(stat) <- "chisq"
 parameter <- length(dbeta)
 names(parameter) <- "df"
 res <- list(statistic = stat, p.value = pval, parameter = parameter, 
 method = "Hausman Test", alternative = "one model is inconsistent",
 data.name=deparse(getCall(lmerMod)$data)) ## changed
 class(res) <- "htest"
 return(res)
}
hausman_test(rei, fe)
```

```
## 
## 	Hausman Test
## 
## data:  data1
## chisq = 26.787, df = 3, p-value = 6.525e-06
## alternative hypothesis: one model is inconsistent
```
]

???

As we have seen the coeffecients are different depending on the RE and FE models, 
the question is how much different are they?

If the p-value is significant (for example <0.05) then use fixed effects, if not use random effects.

---
### Slow-Moving Variables

Now, the Hausman test can only tell you if there is a difference in the coefficients, 
but that does not mean in consequences that you **HAVE** to use the FE specification.

One case in which we want to weigh the trade-offs between FE and RE more closely, are slow-moving/sluggish variables, 
i.e. there is little within-unit variation over time.
Remember, sluggish variables will be highly collinear with the fixed effects.

The inclusion of fixed effects would be problematic, 
as it potentially discards much of the information 
and leads to imprecise estimates and large standard errors (Barro, 2012).

Clark and Linzer (2015) suggest that in deciding between FE and RE, 
one should also consider the sample size and the correlation between the covariate and unit effects.
- In particular in small datasets, and in presence of sluggish variables, 
the random-effects model will tend to produce superior estimates of `$\beta$` when there are few units or observations per unit, 
and when the correlation between the independent variable and unit effects is relatively low.
- Otherwise, the fixed-effects model may be preferable, 
as the random-effects model does not induce sufficiently high variance reduction to offset its increase in bias.

---
### Which model to choose?

In many cases you also have to decide what is of particular theoretical interest for you.
- If you are theoretically interested in cross-national variation over variation within states across time, 
then RE is more appropriate.
- If you are theoretically interested in variation across time and want to make causal inferences, 
then FE is more appropriate.
- If your key IV is time-constant or sluggish, 
FE will drop this variable from the estimation and you cannot say anything about it.
- In RE models, we still may have unit-specific autocorrelation and heteroskedasticity. 
These need to be corrected as well.

For non-linear models, RE are the predominant approach: The RE model is more parsimonious and FE is less efficient if it does not capture the true model.

---
## Simultaneity Bias

Simultaneity bias is introduced if we do not account for the fact that some effects unfold not immediately but slowly over time.
- E.g., an increase in spending on active labor market programs does not lead in one and the same year to a reduction in unemployment. 
Rather, it may take up to two or more years.

To that end it is a common practice to lag an independent variable, 
e.g. in cases of GDP measures, expenditure, unemployment, etc.

One key theoretical challenge is to justify the lag structure. 
Is it 1- year, 2-years, or even 5-years? 
Depends on your variable of interest.

You can get some information about this by observing the correlation 
between your outcome variable with the predictor of interest for different lag values 
(e.g. correlation between Y and Xt−1, Y and Xt−2, etc.).

---
### Lagging the DV

Used when you understand the underlying process as dynamic, 
and if you expect that the current level of the DV is heavily determined by its past level 
(i.e. serial or autocorrelation is very strong).

Pros:
- Including the lagged DV will help you overcome omitted variable bias. 
- You can account for autocorrelation.
- Parsimonious.

Cons:
- Including the lagged DV will take out a lot of your variance and is likely to make your other DV's effects less significant (which means both make the `$\beta$`s smaller and the standard errors bigger). 
  - In other words, we underestimate the true relationships at play.

However, what it will allow you to do is say that those IVs that still influence your outcome have an effect controlling for past value of the DV.

---
## Extensions

Here we have used fixed and random effects in a panel modelling context.

But there are various extensions to...
- non-continuous dependent variables (generalised linear mixed effect models)
- non-linear models
- hierarchical models
- nested models
- multilevel models
- etc.

Mixed effects are the basis for a lot of much broader and more flexible models,
and really deserve their own course...

???

For more, please see Snijders and Bosker (1999).

Some good introductions to multilevel modelling with R more generally are [here](https://data.princeton.edu/pop510/lang1) and [here](https://data.princeton.edu/pop510/egm).
See also [here](https://www.rensvandeschoot.com/tutorials/lme4/).

---
class: center, middle

# Summary