Interpret multicollinearity tests from the mctest package in R

Packages we will need :

library(mctest)

The mctest package’s functions have many multicollinearity diagnostic tests for overall and individual multicollinearity. Additionally, the package can show which regressors may be the reason of for the collinearity problem in your model.

Click here to read the CRAN PDF for all the function arguments available.

So – as always – we first fit a model.

Given the amount of news we have had about elections in the news recently, let’s look at variables that capture different aspects of elections and see how they relate to scores of democracy. These different election components will probably overlap.

In fact, I suspect multicollinearity will be problematic with the variables I am looking at.

Click here for a previous blog post on Variance Inflation Factor (VIF) score, the easiest and fastest way to test for multicollinearity in R.

The variables in my model are:

emb_autonomy – the extent to which the election management body of the country has autonomy from the government to apply election laws and administrative rules impartially in national elections.
election_multiparty – the extent to which the elections involved real multiparty competition.
election_votebuy – the extent to which there was evidence of vote and/or turnout buying.
election_intimidate – the extent to which opposition candidates/parties/campaign workers subjected to repression, intimidation, violence, or harassment by the government, the ruling party, or their agents.
election_free – the extent to which the election was judged free and fair.

In this model the dependent variable is democracy score for each of the 178 countries in this dataset. The score measures the extent to which a country ensures responsiveness and accountability between leaders and citizens. This is when suffrage is extensive; political and civil society organizations can operate freely; governmental positions are clean and not marred by fraud, corruption or irregularities; and the chief executive of a country is selected directly or indirectly through elections.

election_model <- lm(democracy ~ ., data = election_df)
stargazer(election_model, type = "text")

However, I suspect these variables suffer from high multicollinearity. Usually your knowledge of the variables – and how they were operationalised – will give you a hunch. But it is good practice to check everytime, regardless.

The eigprop() function can be used to detect the existence of multicollinearity among regressors. The function computes eigenvalues, condition indices and variance decomposition proportions for each of the regression coefficients in my election model.

To check the linear dependencies associated with the corresponding eigenvalue, the eigprop compares variance proportion with threshold value (default is 0.5) and displays the proportions greater than given threshold from each row and column, if any.

So first, let’s run the overall multicollinearity test with the eigprop() function :

mctest::eigprop(election_model)

If many of the Eigenvalues are near to 0, this indicates that there is multicollinearity.

Unfortunately, the phrase “near to” is not a clear numerical threshold. So we can look next door to the Condition Index score in the next column.

This takes the Eigenvalue index and takes a square root of the ratio of the largest eigenvalue (dimension 1) over the eigenvalue of the dimension.

Condition Index values over 10 risk multicollinearity problems.

In our model, we see the last variable – the extent to which an election is free and fair – suffers from high multicollinearity with other regressors in the model. The Eigenvalue is close to zero and the Condition Index (CI) is near 10. Maybe we can consider dropping this variable, if our research theory allows its.

Another battery of tests that the mctest package offers is the imcdiag( ) function. This looks at individual multicollinearity. That is, when we add or subtract individual variables from the model.

mctest::imcdiag(election_model)

A value of 1 means that the predictor is not correlated with other variables. As in a previous blog post on Variance Inflation Factor (VIF) score, we want low scores. Scores over 5 are moderately multicollinear. Scores over 10 are very problematic.

And, once again, we see the last variable is HIGHLY problematic, with a score of 14.7. However, all of the VIF scores are not very good.

The Tolerance (TOL) score is related to the VIF score; it is the reciprocal of VIF.

The Wi score is calculated by the Farrar Wi, which an F-test for locating the regressors which are collinear with others and it makes use of multiple correlation coefficients among regressors. Higher scores indicate more problematic multicollinearity.

The Leamer score is measured by Leamer’s Method : calculating the square root of the ratio of variances of estimated coefficients when estimated without and with the other regressors. Lower scores indicate more problematic multicollinearity.

The CVIF score is calculated by evaluating the impact of the correlation among regressors in the variance of the OLSEs. Higher scores indicate more problematic multicollinearity.

The Klein score is calculated by Klein’s Rule, which argues that if Rj from any one of the models minus one regressor is greater than the overall R2 (obtained from the regression of y on all the regressors) then multicollinearity may be troublesome. All scores are 0, which means that the R2 score of any model minus one regression is not greater than the R2 with full model.

Click here to read the mctest paper by its authors – Imdadullah et al. (2016) – that discusses all of the mathematics behind all of the tests in the package.

In conclusion, my model suffers from multicollinearity so I will need to drop some variables or rethink what I am trying to measure.

Click here to run Stepwise regression analysis and see which variables we can drop and come up with a more parsimonious model (the first suspect I would drop would be the free and fair elections variable)

Perhaps, I am capturing the same concept in many variables. Therefore I can run Principal Component Analysis (PCA) and create a new index that covers all of these electoral features.

Next blog will look at running PCA in R and examining the components we can extract.

References

Imdadullah, M., Aslam, M., & Altaf, S. (2016). mctest: An R Package for Detection of Collinearity among Regressors. R J., 8(2), 495.