How to run regressions with the tidymodels package in R: PART 1


The tidymodels framework in R is a collection of packages for modeling.

Within tidymodels, the parsnip package is primarily responsible for specifying models in a way that is independent of the underlying modeling engines. The set_engine() function in parsnip allows users to specify which computational engine to use for modeling, enabling the same model specification to be used across different packages and implementations.

 - Find & Share on GIPHY

In this blog series, we will look at some commonly used models and engines within the tidymodels package

  1. Linear Regression (lm): The classic linear regression model, with the default engine being stats, referring to the base R stats package.
  2. Logistic Regression (logistic_reg): Used for binary classification problems, with engines like stats for the base R implementation and glmnet for regularized regression.
  3. Random Forest (rand_forest): A popular ensemble method for classification and regression tasks, with engines like ranger and randomForest.
  4. Boosted Trees (boost_tree): Used for boosting tasks, with engines such as xgboost, lightgbm, and catboost.
  5. Decision Trees (decision_tree): A base model for classification and regression, with engines like rpart and C5.0.
  6. K-Nearest Neighbors (nearest_neighbor): A simple yet effective non-parametric method, with engines like kknn and caret.
  7. Principal Component Analysis (pca): For dimensionality reduction, with the stats engine.
  8. Lasso and Ridge Regression (linear_reg): For regression with regularization, specifying the penalty parameter and using engines like glmnet.

Click here for some resources I found:

  1. https://rviews.rstudio.com/2019/06/19/a-gentle-intro-to-tidymodels
  2. https://rpubs.com/chenx/tidymodels_tutorial
  3. https://bookdown.org/paul/ai_ml_for_social_scientists/06_01_ml_with_tidymodels.html