**The full code is available on Github.**

In **part one** of this tutorial, we describe and explore our dataset. In this part, we will implement different SVM model and search for optimal hyper-parameters.

## Simple SVM model

### Sklearn custom estimator for Simple SVM

We decide to use **sklearn.model_selection.GridSearchCV** to perform a grid search for optimal hyper-parameters search. As we are using **Tensorflow**, our first challenge is to write a custom estimator that is compatible with sklearn estimator. We found two usefuls resources online to achieve this challenge:

- http://danielhnyk.cz/creating-your-own-estimator-scikit-learn/
- https://scikit-learn.org/dev/developers/contributing.html#rolling-your-own-estimator

The main idea here is to define and train the model only in the **fit** method. Our simple SVM implementation is straight forward: we compute the model output and the loss and finally optimize the loss.

Please note that **Sklearn GridSearchCV** will use our **score** function during the grid search process. During the data exploration, we decided to use the **F1-Score** because our dataset is imbalanced. We then need to implement the **F1-Score** in our **score** function. We do that as follow:

The full code of the simple SVM model is available **here**.

### Optimal hyper-parameters search

Our simple SVM model has only one hyper-paramerter: **C**. We search for the optimal value of **C** in two steps. First we perform a coarse grid search to find the optimal range. We use exponetially growing sequences of C: **[2^-5, …, 2^15]**.

The coarse search shows that we achieve the best performance when **C=2^-1**.

We now perform a finer grid search on the neighborhood of the optimal value.

The finer grid search shows that we achieve the best performance when **C=0.4506252**.

### Model performance

We use the optimal value of the hyper-parameter **C** to train our model on the whole training set without any cross validation. After that, we test the model on the test set. The following table shows the summary of our model performance.

Metric | Value |
---|---|

Accuracy | 0.95875 |

Precision | 0.95875 |

Recall | 0.95875 |

F1-Score | 0.95874 |

## SVM with RBF kernel

### Sklearn custom estimator for SVM with RBF kernel

The only useful resource online here is the implementation available in the **tensorflow cookbook**. The code is available on GitHub **here**. We have some issue in the available implementation and the workaround is to use the **batchsize=size of the training set**. The issue is raised in this **issue**.

I have implemented the naive solution proposed by **tatnguyennguyen**: *Take a look at this http://cs229.stanford.edu/extra-notes/representer-function.pdf. I think the (naive) proper way to implement kernel SVM is to use parameter vector b with the size equal to the size of the whole training set, and in each epoch, update those b_i corresponding to training examples that are chosen at that epoch.*

I recommend this useful course of **Andrew Ng** on SVM with Kernel to correctly understand the implementation. The main idea here is to consider each training data as a landmark.

The full code of the SVM with RBF kernel model is available **here**.

### Optimal hyper-parameters search

The SVM with RBF kernel model has two hyper-paramerters: **C** and **gamma**. Same with the simple SVM model, We search for the optimal values in two steps. First we perform a coarse grid search to find the optimal range. We use exponetially growing sequences of C and gamma: **[2^-3, …, 2^3]** for both **C** and **gamma**.

The coarse search shows that we achieve the best performance when **C=2^2** and **gamma=2^-1**.

We now perform a finer grid search on the neighborhood of the optimal value.

The finer grid search shows that we achieve the best performance when **C=3.8637455** and **gamma=0.5176324**.

### Model performance

We use the optimal value of the hyper-parameters **C** and **gamma** to train our model on the whole training set without any cross validation. After that, we test the model on the test set. The following table shows the summary of our model performance.

Metric | Value |
---|---|

Accuracy | 0.96725 |

Precision | 0.96726 |

Recall | 0.96725 |

F1-Score | 0.96724 |

## Model performance comparison

The following tableau summarize the performance of the two models:

Accuracy | Precision | Recall | F1-Score | |
---|---|---|---|---|

Simple SVM | 0.95875 | 0.95875 | 0.95875 | 0.95874 |

SVM with RBF kernel | 0.96725 | 0.96726 | 0.96725 | 0.96724 |

We see that we have some improvement in all the metric with the RBF kernel.

**The full code is available on Github.****Thanks for reading. Please leave feedback and questions in the comments!**