Regularization in regression model (RIDGE, LASSO and ELASTIC NET regression)

The full code is available on Github.

In this tutorial, we will explore the various regularization option that we have when we are implementing a regression model. We will work with the heart disease dataset and our goal will be to implement a regression model that will predict if a patient has a heart disease or not.
 

1- Data Presentation

The heart disease dataset is created in 1988 by four Doctors of Medicine: 
  • Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. 
  • University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. 
  • University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. 
  • V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D. 
The dataset contains 4 databases and each database has 76 raw attributes but all published experiments refer to using a subset of 14 of them. The complete description of the dataset is available here.  In our study, We will use the Cleveland database and our goal will be to predict if a given patient has a heart disease or not. A preprocessed version of the dataset can be downloaded here.
 

2- Data Preprocessing

2.1- Import the Dataset into Pandas

We will start by importing our dataset into Pandas DataFrame. The dataset contains few missing values represented by the value ?. The code bellow help to import the dataset:
 
We can see from the DataFrame information that missing values are present in the following features: ca and thal.
 

2.2- Remove missing values

Before we decide if we should remove the missing value or perform data imputation, we first count the number of rows in our dataset that contains some missing values.
 
We have 6 rows in our dataset that contain some missing values. We choose to remove the rows with missing data for two reasons: 
  • Data imputation may introduce some bias in our dataset
  • The number of rows with missing data is very small (only 6).

 

2.3- Sanity check

After removing the missing data, we have to perform a sanity check on our dataset. We do that with the following piece of code:
 
The badRows DataFrame contains 0 entries. It means that we don’t have any mistake in our dataset.
 

2.4- Explode categorical features

We will now explode the non binary categorical features to a binary feature (value in {0, 1}). We perform the following transformations :
  • chest pain type (cp) is exploded to : 
    • isTypicalAngineCp : True when cp is typical angina
    • isATypicalAngineCp : True when cp is atypical angina
    • isNonAnginalPainCp : True when cp is non-anginal pain
    • isAsymptomaticCp : True when cp is asymptomatic
  • resting electrocardiographic results (restecg) is exploded to :
    • isNormalRestecg : True when restecg is normal
    • hasStWaveRestecg : True when restecg has ST-T wave abnormality
    • hasHypertrophyRestecg : True when restecg shows probable or definite left ventricular hypertrophy
  • the slope of the peak exercise ST segment (slope) is exploded to :
    • isUpSlopping : True when is upsloping
    • isFlatSlop : True when slope is flat
    • isDownSlopping: True when is downsloping

2.5- Split and Visualize DataSet

We split our dataset into training set (70%), validation set (15%) and test set (15%). After that We use Principal Component Analysis (PCA) to visualize our dataset.
 
After visualization, We notice two main issues with our dataset:
  • The Dataset contains few outliers: We decide not to remove the outliers but instead we will perform a robust scaling to avoid the outliers effects.
  • The dataset is not correclty linearly separable: We will perform a standardization of the dataset.
We visualize again our dataset after the robust scaling and the standardization.
We can see that our dataset is more linearly separable now and also we don’t have outlier anymore.
 

3- Models

We implement four different regression models on our dataset: 
  • Simple Logistic regression
  • Ridge regression
  • Lasso regression
  • Elastic net regression.
The main difference in the implementation of the four models reside in the loss function.
 

3.1- Simple Logistic regression

The loss function of the logistic regression is straightforward and does not contain any regularization mechanism. We implement the loss function of the simple logistic regression with the following piece of code:
 
We wrap the loss function in the reduce_mean tenforflow function because we are using batch training. This function is not needed in the case of stochastic training. The simple logistic regression model gives us an accuracy of 81.81% on our test set.
 

3.2- Ridge regression

The ridge regression is a form of regression that makes use of the L2 regularizationtechnique to make the model less dependent on the slope. Ridge regression can shrink the slope close to zero but never exactly to zero. The formula of the loss funtion of the ridge regression is as follow:
 
We implement that in tensorflow with the following piece of code:
 
The ridge regression model gives us an accuracy of 81.81 on our test set. We see that the ridge regression gives the same accuracy as the simple logistic regression. We conclude that the L2 regularization technique does not make any improvement in the case of our dataset.
 

3.3- Lasso regression

The Lasso regression is a form of regression that makes use of the L1 regularizationtechnique to make the model less dependent on the slope. Lasso regression can shrink the slope to zero whereas ridge regression cannot. The formula of the loss funtion of the Lasso regresion is as follow:
 
We implement that in tensorflow with the following piece of code:
 
The lasso regression model gives us an accuracy of 84.09 on our test set. Compare to our previous accuracy, we have an improvement of 2.28%.
 

3.4- Elastic net regression

The elastic net regression is a combination of the ridge regression and the lasso regression. We combine the L1 regularization technique with the L2 regularizationtechnique. The tensorflow implementation is as follow:
 
The elastic net regression model gives us an accuracy of 84.09 on our test set. We see that the elastic net regression gives the same accuracy as the lasso logistic regression. This confirms that the L2 regularization technique does not make any improvement in the case of our dataset.
 

3.5- Loss and accuracy graph

The following graph shows the accuracy of the various models that we have implement:
 
During the training, we test our model on the validation set after each 100 iterations. The following graphs show the variation of the loss and the accuracy during our training:
 
The full code is available on Github.

Thanks for reading. Please leave feedback and questions in the comments!

You May Also Like

About the Author: Miguel KAKANAKOU

Leave a Reply

WP to LinkedIn Auto Publish Powered By : XYZScripts.com