A Practical Guide to SVM (Part 2) : TensorFlow Implementation

The full code is available on Github.

In part one of this tutorial, we describe and explore our dataset. In this part, we will implement different SVM model and search for optimal hyper-parameters.

Simple SVM model

Sklearn custom estimator for Simple SVM

We decide to use sklearn.model_selection.GridSearchCV to perform a grid search for optimal hyper-parameters search. As we are using Tensorflow, our first challenge is to write a custom estimator that is compatible with sklearn estimator. We found two usefuls resources online to achieve this challenge:

  • http://danielhnyk.cz/creating-your-own-estimator-scikit-learn/
  • https://scikit-learn.org/dev/developers/contributing.html#rolling-your-own-estimator

The main idea here is to define and train the model only in the fit method. Our simple SVM implementation is straight forward: we compute the model output and the loss and finally optimize the loss.

Please note that Sklearn GridSearchCV will use our score function during the grid search process. During the data exploration, we decided to use the F1-Score because our dataset is imbalanced. We then need to implement the F1-Score in our score function. We do that as follow:

The full code of the simple SVM model is available here.

Optimal hyper-parameters search

Our simple SVM model has only one hyper-paramerter: C. We search for the optimal value of C in two steps. First we perform a coarse grid search to find the optimal range. We use exponetially growing sequences of C: [2^-5, …, 2^15].

 

The coarse search shows that we achieve the best performance when C=2^-1.

We now perform a finer grid search on the neighborhood of the optimal value.

 

The finer grid search shows that we achieve the best performance when C=0.4506252.

Model performance

We use the optimal value of the hyper-parameter C to train our model on the whole training set without any cross validation. After that, we test the model on the test set. The following table shows the summary of our model performance.

Metric Value
Accuracy 0.95875
Precision 0.95875
Recall 0.95875
F1-Score 0.95874

SVM with RBF kernel

Sklearn custom estimator for SVM with RBF kernel

The only useful resource online here is the implementation available in the tensorflow cookbook. The code is available on GitHub here. We have some issue in the available implementation and the workaround is to use the batchsize=size of the training set. The issue is raised in this issue.

I have implemented the naive solution proposed by tatnguyennguyen: Take a look at this http://cs229.stanford.edu/extra-notes/representer-function.pdf. I think the (naive) proper way to implement kernel SVM is to use parameter vector b with the size equal to the size of the whole training set, and in each epoch, update those b_i corresponding to training examples that are chosen at that epoch.

I recommend this useful course of Andrew Ng on SVM with Kernel to correctly understand the implementation. The main idea here is to consider each training data as a landmark.

The full code of the SVM with RBF kernel model is available here.

Optimal hyper-parameters search

The SVM with RBF kernel model has two hyper-paramerters: C and gamma. Same with the simple SVM model, We search for the optimal values in two steps. First we perform a coarse grid search to find the optimal range. We use exponetially growing sequences of C and gamma: [2^-3, …, 2^3] for both C and gamma.

 

The coarse search shows that we achieve the best performance when C=2^2 and gamma=2^-1.

We now perform a finer grid search on the neighborhood of the optimal value.

 

The finer grid search shows that we achieve the best performance when C=3.8637455 and gamma=0.5176324.

Model performance

We use the optimal value of the hyper-parameters C and gamma to train our model on the whole training set without any cross validation. After that, we test the model on the test set. The following table shows the summary of our model performance.

Metric Value
Accuracy 0.96725
Precision 0.96726
Recall 0.96725
F1-Score 0.96724

Model performance comparison

The following tableau summarize the performance of the two models:

  Accuracy Precision Recall F1-Score
Simple SVM 0.95875 0.95875 0.95875 0.95874
SVM with RBF kernel 0.96725 0.96726 0.96725 0.96724

We see that we have some improvement in all the metric with the RBF kernel.

The full code is available on Github.

Thanks for reading. Please leave feedback and questions in the comments!

You May Also Like

About the Author: Miguel KAKANAKOU

Leave a Reply

WP to LinkedIn Auto Publish Powered By : XYZScripts.com