Choose model variables by AIC in a stepwise algorithm with the MASS package in R

Running a regression model with too many variables – especially irrelevant ones – will lead to a needlessly complex model. Stepwise can help to choose the best variables to add. Packages you need: First, choose a model and throw every variable you think has an impact on your dependent variable! I hear the voice of my undergrad professor in my ear: try not to go … Continue reading Choose model variables by AIC in a stepwise algorithm with the MASS package in R

Recode variables with car package in R

There is one caveat with this function that we are using from the car package: recode is also in the dplyr package so R gets confused if you just type in recode on its own; it doesn’t know which package you’re using. So, you must write car::recode(). This placates the R gods and they are clear which package to use. It is useful for all … Continue reading Recode variables with car package in R

Move year variable to first column in dataframe with dplyr package in R

A quick hack to create a year variable from a string variable and place it as column number one in your dataframe. First problem with my initial dataset is that the date is a string of numbers and I want the first four characters in the string. Now I want to place it at the beginning to keep things more organised: And we are done! … Continue reading Move year variable to first column in dataframe with dplyr package in R

Plot marginal effects with sjPlot package in R

Without examining interaction effects in your model, sometimes we are incorrect about the real relationship between variables. This is particularly evident in political science when we consider the impact of regime type on the relationship between our dependent and independent variables. For example, if I were to look at the relationship between pro-democracy protests and executive bribery, I would expect to see that the higher … Continue reading Plot marginal effects with sjPlot package in R

Check for multicollinearity with the car package in R

Packages we will need: When one independent variable is highly correlated with another independent variable (or with a combination of independent variables), the marginal contribution of that independent variable is influenced by other predictor variables in the model. And so, as a result: Estimates for regression coefficients of the independent variables can be unreliable. Tests of significance for regression coefficients can be misleading. To check for multicollinearity … Continue reading Check for multicollinearity with the car package in R

Correct for heteroskedasticity in OLS with sandwich package in R

Packages we will need: If our OLS model demonstrates high level of heteroskedasticity (i.e. when the error term of our model is not randomly distributed across observations and there is a discernible pattern in the error variance), we run into problems. Why? Because this means OLS will use sub-optimal¬†estimators based on incorrect assumptions and the standard errors computed using these flawed least square estimators are … Continue reading Correct for heteroskedasticity in OLS with sandwich package in R

Check for heteroskedasticity in OLS with lmtest package in R

One core assumption when calculating ordinary least squares regressions is that all the random variables in the model have equal variance around the best fitting line. Essentially, when we run an OLS, we expect that the error terms have no fan pattern. Example of homoskedasticiy So let’s look at an example of this assumption being satisfied. I run a simple regression to see whether there … Continue reading Check for heteroskedasticity in OLS with lmtest package in R