머신러닝 수업 4주차
Support Vector Machine
Support Vector Machine
SVM is the solution of this optimization
\[\text{minimize}_{w, w_0}{1\over2}\|w\|^2_2\] \[\text{subject to } y_j(w^\top x_j + w_0) \geq 1, j=1,...,N\]Lagrange Function
- Goal : Construct the dual problem of
- Approach: Consider the Lagrangian function
- Solution happens at the saddle points of $\mathcal{L}$:
Inequality Constrained Optimization
Inequality constrained optimization:
\[\text{minimize }_{x\in \mathbb{R}^n} f(x)\] \[g_i(x) \geq 0, i=1,...,m\] \[h_j(x) \geq 0, j=1,...,k\]Requires a function: Lagrangian function
\[\mathcal{L}(x,\mu, v)=f(x)-\sum^{m}_{i=1}\mu_ig_i(x)-\sum^k_{j=1}v_jh_j(x)\]$\mu \in \mathbb{R}^m$ and $v \in \mathbb{R}^k$ are called the Lagrange multipliers or the dual variables.
Karush-Kahn-Tucker Conditions
If $(x^{}, \mu^{}, v^{*})$ is the solution to the constrained optimization, then all the following conditions should hold:
$\nabla_x \mathcal{L}(x^{}, \mu^{}, v^{*})=0$
- Stationarity.
- The primal variables shoud be stationary.
$g_i(x^{}) \geq 0$ and $h_j(x^{})=0$ for all $i$ and $j$.
- Primal Feasibility.
- Ensures that constraints are satisfied.
$\mu^{*}_I \geq 0$ for all $i$ and $j$.
- Dual Feasibility.
- Require $\mu^{}_i \geq 0$ but no constraint on $v^{}_i$.
$\mu^{}_i g_i(x^{})=0$ for all $i$ and $j$.
- Complementary Slackness
- Either $\mu^{}_i=0$ or $g_i(x^{})=0$ (or both).
KKT Condition is a first order necessary condition.
Lagrangian function
The Lagrangian function is
\[\mathcal{L}(w, w_0, \lambda)={1\over2 }\|w\|^2_2+\sum^N_{j=1}\lambda_i[1-y_j(w^\top x_j + w_0)]\] \[[1-y_j(w^\top x_j + w_0)] \leq 0\]Complementary Condition says
- $\lambda_j > 0$ and $[1-y_j(w^\top x_j + w_0)]=0$
- $\lambda_j = 0$ and $[1-y_j(w^\top x_j + w_0)]<0$
So, if we want $\nabla_\lambda \mathcal{L}(w,w_0, \lambda)=0$, then must be one of the two cases:
\[\sum^N_{j=1}\lambda_j [1-y_j(w^\top x_j + w_0)]\]No saddle point because linear in $\lambda$.
But $1-y_j(w^\top x_j + w_0) \leq 0$. So unbounded minimum. So must go with maximum.
Primal Problem
Let $\lambda^*$ as the maximizer
\[\lambda^*=\text{argmax}_{\lambda \geq 0} \{ \sum^N_{j=1} \lambda_j[1-y_j(w^\top x_j + w_0)] \}\]Then the primal problem is
\[\text{minimize}_{w,w_0}\mathcal{L(w, w_0, \lambda^*)}\] \[=\text{minimize}_{w,w_0} \{ {1\over2}\|w\|^2_2+\text{max}_{\lambda \geq 0} \{\sum^N_{j=1} \lambda_j[1-y_j(w^\top x_j +w_0)] \} \}\] \[=\text{minimize}_{w, w_0} \{ \text{max}_{\lambda \geq 0} \mathcal{L}(w, w_0, \lambda) \}\]This is a min-max problem.
\[\min_{w, w_0} \max_{\lambda \geq 0} \mathcal{L}(w, w_0, \lambda )\]Strong Duality
- Recall that our problem is quadratic programming (QP)
- Strong Duality holds for QP:
Primal
\[\min_{w,w_0}\max_{\lambda \geq 0} \mathcal{L}(w, w_0, \lambda)\]Dual
\[\max_{\lambda \geq 0}\min_{w, w_0} \mathcal{L}(w, w_0, \lambda)\]Toy Example
- The SVM problem
is in the form of
$\text{minimize}_u |u|^2$, subject to $a^\top_j u \geq b_j$, $j=1,2,…,N$
- Example:
-
Can we write down its dual problem?
-
Lagrangian function is
-
Minimize over $u$.
-
Plugging into the Lagrangian function yields
- Primal is QP. Dual is also QP.
Have We Grained Anything?
- Here is the dual problem:
subject to $\lambda \geq 0$
- These terms are all negative! So we must have $\lambda_2=\lambda_3=0$.
- This gives
which is maximized at $\lambda_1={4\over5}$
- Plugging into the primal yields
Dual of SVM
- We want to find the dual problem of
- We start with the Lagrangian function
- Let us minimize over $(w, w_0)$ : Primal stationarity
Interpreting $\nabla_w \mathcal{L}(w,w_0,\lambda)=0$
- The first result suggests that
-
This is support vector: $\lambda_j$ is either $\lambda_j=0$ or $\lambda_j > 0$
-
The complementarity condition states that
-
If $1-y_j(w^{\top}x_j + w_0^{}) > 0, \lambda^{*}_j=0$
-
If $\lambda^{}_j > 0, 1-y_j(w^{\top}x_j + w_0^{*})=0$
-
So you can define the support vector set:
- So the optimal weight is
Back to Duality
The Lagrangian function is
\[\mathcal{L}(w^{*}, w^{*}_0, \lambda)={1\over2}\|w\|^2_2+\sum^N_{j=1}\lambda_j[1-y_j((w^{*\top})x_j+w_0)]\] \[={1\over2}\|\sum^{N}_{j=1}\lambda_jy_jx_j\|^2_2+\sum^N_{j=1}\lambda_j[1-y_j((\sum^n_{i=1}\lambda_iy_ix_i)^\top x_j + w_0]\]- We can show that
- and we can show that
- Therefore, the dual problem is
subject to
\[\sum^N_{j=1}\lambda_jy_j=0\]If you prefer matrix-vector:
\[\text{maximize}_{\lambda \geq 0} -{1\over2}\lambda^\top Q \lambda + 1^\top \lambda\]subject to $y^\top \lambda =0$
We can combine the constraints $\lambda \geq 0$ and $y^\top \lambda =0$ as $A\lambda \geq 0$
- $y^\top \lambda =0$ means
- Thus, we can write $y^\top \lambda=0$ as
- Therefore, the matrices $Q$ and $A$ are
and
\[A=\begin{bmatrix}y^\top \\ -y^\top \\ I\end{bmatrix}\]So How to Solve the SVM Problem?
- You look at the dual problem
subject to
\[A\lambda \geq 0\]-
You get the solution $\lambda^{*}$
-
Then compute $w^{*}$
- $\nu$ is the set of support vectors: $\lambda_j > 0$
Are We Done Yet?
-
Not quite
-
We still need to find out $w_0^{*}$
-
Pick any support vector $x^+ \in \mathcal{C}+$ and $x^{-} \in \mathcal{C}{-}.$
-
Then we have
- Sum them, we have $w^\top(x^{+} + x^{-}) + 2w_0=0$, which means
Summary of Dual SVM
- Primal
subject to $y_j(w^\top x_j + w_0) \geq 1$
- Strong Duality
- Dual
subject to $\sum^N_{j=1} \lambda_jy_j=0$
- The weights are computed as
-
This is support vector: $\lambda_j$ is either $\lambda_j=0$ or $\lambda_j > 0$
-
Pick any support vector $x^{+} \in \mathcal{C}{+}$ and $x^{-} \in \mathcal{C}{-}$
-
Then we have
- Sum them, we have $x^\top(x^{+}+x^{-}) + 2w_0=0$, which means