**The full code is available on Github.**

### 1- Data Presentation

The heart disease dataset is created in**1988**by four Doctors of Medicine:

- Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
- University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
- University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
- V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.

**the Cleveland database**and our goal will be to predict if a given patient has a heart disease or not. A preprocessed version of the dataset can be downloaded here.

### 2- Data Preprocessing

#### 2.1- Import the Dataset into Pandas

We will start by importing our dataset into Pandas DataFrame. The dataset contains few missing values represented by the value`?`

. The code bellow help to import the dataset:We can see from the DataFrame information that missing values are present in the following features:

**and**

`ca`

**.**

`thal`

#### 2.2- Remove missing values

Before we decide if we should remove the missing value or perform data imputation, we first count the number of rows in our dataset that contains some missing values.We have 6 rows in our dataset that contain some missing values. We choose to remove the rows with missing data for two reasons:

- Data imputation may introduce some bias in our dataset
- The number of rows with missing data is very small (only 6).

#### 2.3- Sanity check

After removing the missing data, we have to perform a sanity check on our dataset. We do that with the following piece of code:The

**DataFrame contains**

`badRows`

**0 entries**. It means that we don’t have any mistake in our dataset.

#### 2.4- Explode categorical features

We will now explode the non binary categorical features to a binary feature (value in {0, 1}). We perform the following transformations :**chest pain type**(cp) is exploded to :*isTypicalAngineCp*: True when cp is typical angina*isATypicalAngineCp*: True when cp is atypical angina*isNonAnginalPainCp*: True when cp is non-anginal pain*isAsymptomaticCp*: True when cp is asymptomatic

**resting electrocardiographic results**(restecg) is exploded to :*isNormalRestecg*: True when restecg is normal*hasStWaveRestecg*: True when restecg has ST-T wave abnormality*hasHypertrophyRestecg*: True when restecg shows probable or definite left ventricular hypertrophy

**the slope of the peak exercise ST segment**(slope) is exploded to :*isUpSlopping*: True when is upsloping*isFlatSlop*: True when slope is flat*isDownSlopping*: True when is downsloping

#### 2.5- Split and Visualize DataSet

We split our dataset into training set (**70%**), validation set (

**15%**) and test set (

**15%**). After that We use Principal Component Analysis (PCA) to visualize our dataset.

After visualization, We notice two main issues with our dataset:

- The Dataset contains few outliers: We decide not to remove the outliers but instead we will perform a robust scaling to avoid the outliers effects.
- The dataset is not correclty linearly separable: We will perform a standardization of the dataset.

### 3- Models

We implement four different regression models on our dataset:- Simple Logistic regression
- Ridge regression
- Lasso regression
- Elastic net regression.

#### 3.1- Simple Logistic regression

The loss function of the logistic regression is straightforward and does not contain any regularization mechanism. We implement the loss function of the simple logistic regression with the following piece of code:We wrap the loss function in the

**tenforflow function because we are using batch training. This function is not needed in the case of stochastic training. The simple logistic regression model gives us an accuracy of**

`reduce_mean`

**81.81%**on our test set.

#### 3.2- Ridge regression

The ridge regression is a form of regression that makes use of the**L2 regularization**technique to make the model less dependent on the slope.

**Ridge regression**can shrink the slope close to zero but never exactly to zero. The formula of the loss funtion of the ridge regression is as follow:

We implement that in tensorflow with the following piece of code:

The ridge regression model gives us an accuracy of

**81.81**on our test set. We see that the ridge regression gives the same accuracy as the simple logistic regression. We conclude that the

**L2 regularization**technique does not make any improvement in the case of our dataset.

#### 3.3- Lasso regression

The Lasso regression is a form of regression that makes use of the**L1 regularization**technique to make the model less dependent on the slope.

**Lasso regression**can shrink the slope to zero whereas

**ridge regression**cannot. The formula of the loss funtion of the Lasso regresion is as follow:

We implement that in tensorflow with the following piece of code:

The lasso regression model gives us an accuracy of

**84.09**on our test set. Compare to our previous accuracy, we have an improvement of

**2.28%**.

#### 3.4- Elastic net regression

The elastic net regression is a combination of the ridge regression and the lasso regression. We combine the**L1 regularization**technique with the

**L2 regularization**technique. The tensorflow implementation is as follow:

The elastic net regression model gives us an accuracy of

**84.09**on our test set. We see that the elastic net regression gives the same accuracy as the lasso logistic regression. This confirms that the

**L2 regularization**technique does not make any improvement in the case of our dataset.

#### 3.5- Loss and accuracy graph

The following graph shows the accuracy of the various models that we have implement:During the training, we test our model on the

**validation set**after each

**100 iterations**. The following graphs show the variation of the loss and the accuracy during our training:

**The full code is available on Github.**

**Thanks for reading. Please leave feedback and questions in the comments!**