(Week4) SVM

 

머신러닝 수업 4주차


Support Vector Machine

Support Vector Machine

SVM is the solution of this optimization

\[\text{minimize}_{w, w_0}{1\over2}\|w\|^2_2\] \[\text{subject to } y_j(w^\top x_j + w_0) \geq 1, j=1,...,N\]

Lagrange Function

  • Goal : Construct the dual problem of
\[\text{minimize}_{w, w_0}{1\over2}\|w\|^2_2\] \[\text{subject to } y_j(w^\top x_j + w_0) \geq 1, j=1,...,N\]
  • Approach: Consider the Lagrangian function
\[\mathcal{L}(w, w_0, \lambda)={1\over2}\|w\|^2_2+\sum^N_{j=1}\lambda_j[1-y_j(w^\top x_j + w_0)]\]
  • Solution happens at the saddle points of $\mathcal{L}$:
\[\nabla_{w, w_0}\mathcal{L}(w, w_0, \lambda)=0\] \[\nabla_\lambda \mathcal{L}(w, w_0, \lambda)=0\]

Inequality Constrained Optimization

Inequality constrained optimization:

\[\text{minimize }_{x\in \mathbb{R}^n} f(x)\] \[g_i(x) \geq 0, i=1,...,m\] \[h_j(x) \geq 0, j=1,...,k\]

Requires a function: Lagrangian function

\[\mathcal{L}(x,\mu, v)=f(x)-\sum^{m}_{i=1}\mu_ig_i(x)-\sum^k_{j=1}v_jh_j(x)\]

$\mu \in \mathbb{R}^m$ and $v \in \mathbb{R}^k$ are called the Lagrange multipliers or the dual variables.

Karush-Kahn-Tucker Conditions

If $(x^{}, \mu^{}, v^{*})$ is the solution to the constrained optimization, then all the following conditions should hold:

$\nabla_x \mathcal{L}(x^{}, \mu^{}, v^{*})=0$

  • Stationarity.
  • The primal variables shoud be stationary.

$g_i(x^{}) \geq 0$ and $h_j(x^{})=0$ for all $i$ and $j$.

  • Primal Feasibility.
  • Ensures that constraints are satisfied.

$\mu^{*}_I \geq 0$ for all $i$ and $j$.

  • Dual Feasibility.
  • Require $\mu^{}_i \geq 0$ but no constraint on $v^{}_i$.

$\mu^{}_i g_i(x^{})=0$ for all $i$ and $j$.

  • Complementary Slackness
  • Either $\mu^{}_i=0$ or $g_i(x^{})=0$ (or both).

KKT Condition is a first order necessary condition.

Lagrangian function

The Lagrangian function is

\[\mathcal{L}(w, w_0, \lambda)={1\over2 }\|w\|^2_2+\sum^N_{j=1}\lambda_i[1-y_j(w^\top x_j + w_0)]\] \[[1-y_j(w^\top x_j + w_0)] \leq 0\]

Complementary Condition says

  • $\lambda_j > 0$ and $[1-y_j(w^\top x_j + w_0)]=0$
  • $\lambda_j = 0$ and $[1-y_j(w^\top x_j + w_0)]<0$

So, if we want $\nabla_\lambda \mathcal{L}(w,w_0, \lambda)=0$, then must be one of the two cases:

\[\sum^N_{j=1}\lambda_j [1-y_j(w^\top x_j + w_0)]\]

No saddle point because linear in $\lambda$.

But $1-y_j(w^\top x_j + w_0) \leq 0$. So unbounded minimum. So must go with maximum.

Primal Problem

Let $\lambda^*$ as the maximizer

\[\lambda^*=\text{argmax}_{\lambda \geq 0} \{ \sum^N_{j=1} \lambda_j[1-y_j(w^\top x_j + w_0)] \}\]

Then the primal problem is

\[\text{minimize}_{w,w_0}\mathcal{L(w, w_0, \lambda^*)}\] \[=\text{minimize}_{w,w_0} \{ {1\over2}\|w\|^2_2+\text{max}_{\lambda \geq 0} \{\sum^N_{j=1} \lambda_j[1-y_j(w^\top x_j +w_0)] \} \}\] \[=\text{minimize}_{w, w_0} \{ \text{max}_{\lambda \geq 0} \mathcal{L}(w, w_0, \lambda) \}\]

This is a min-max problem.

\[\min_{w, w_0} \max_{\lambda \geq 0} \mathcal{L}(w, w_0, \lambda )\]

Strong Duality

  • Recall that our problem is quadratic programming (QP)
  • Strong Duality holds for QP:

Primal

\[\min_{w,w_0}\max_{\lambda \geq 0} \mathcal{L}(w, w_0, \lambda)\]

Dual

\[\max_{\lambda \geq 0}\min_{w, w_0} \mathcal{L}(w, w_0, \lambda)\]

Toy Example

  • The SVM problem
\[\text{minimize}_{w, w_0} {1\over 2}\|w\|^2_2\]

is in the form of

$\text{minimize}_u |u|^2$, subject to $a^\top_j u \geq b_j$, $j=1,2,…,N$

  • Example:
\[\text{minimize}_{u1, u2} {u^2_1 + u^2_2}\] \[\begin{bmatrix} 1 & 2 \\ 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} \geq \begin{bmatrix} 2 \\ 0 \\ 0\end{bmatrix}\]
  • Can we write down its dual problem?

  • Lagrangian function is

\[\mathcal{L}(u, v)=u^2_1+u^2_2 + \lambda_1(2-u_1-2u_2)-\lambda_2u1-\lambda_3u_3\]
  • Minimize over $u$.

  • Plugging into the Lagrangian function yields

\[\text{maximize}_\lambda \mathcal{L}(\lambda)=-{5\over4}\lambda^2_1-{1\over4}\lambda^2_2-{1\over4}\lambda^2_3-{1\over2}\lambda_1\lambda_2-\lambda_1\lambda_3+2\lambda_1\]
  • Primal is QP. Dual is also QP.

Have We Grained Anything?

  • Here is the dual problem:
\[\text{maxmize}_\lambda \mathcal{L}(\lambda) = -{5\over4}\lambda_1^2-{1\over4}\lambda_2^2-{1\over4}\lambda_3^2-{1\over2}\lambda_1\lambda_2-\lambda_1\lambda_3+2\lambda_1\]

subject to $\lambda \geq 0$

  • These terms are all negative! So we must have $\lambda_2=\lambda_3=0$.
\[-{1\over4}\lambda_2^2-{1\over4}\lambda_3^2-{1\over2}\lambda_1\lambda_2-\lambda_1\lambda_3\]
  • This gives
\[\text{maximize}_{\lambda_1 \geq 0}-{5\over4}\lambda^2_1 + 2\lambda_1\]

which is maximized at $\lambda_1={4\over5}$

  • Plugging into the primal yields
\[u_1={\lambda_1+\lambda_2\over2}={2\over5}\] \[u_2={2\lambda_1+\lambda_3\over2}={4\over5}\]

Dual of SVM

  • We want to find the dual problem of
\[\text{minimize}_{w, w_0}{1\over2}\|w\|^2_2\] \[\text{subject to } y_j(w^\top x_j + w_0) \geq 1, j=1,...,N\]
  • We start with the Lagrangian function
\[\mathcal{L}(w, w_0, \lambda)={1\over2}\|w\|^2_2+\sum^N_{j=1}\lambda_j[1-y_j(w^\top x_j + w_0)]\]
  • Let us minimize over $(w, w_0)$ : Primal stationarity
\[\nabla_w \mathcal{L}(w, w_0, \lambda)=w-\sum^N_{j=1}\lambda_jy_jx_j=0\] \[\nabla_{w_0}\mathcal{L}(w, w_0, \lambda)=\sum^N_{j=1}\lambda_jy_j=0\]

Interpreting $\nabla_w \mathcal{L}(w,w_0,\lambda)=0$

  • The first result suggests that
\[w=\sum^N_{j=1}\lambda_jy_jx_j\]
  • This is support vector: $\lambda_j$ is either $\lambda_j=0$ or $\lambda_j > 0$

  • The complementarity condition states that

\[\lambda_j^{*}[1-y_j(w^\top x_j + w_0^{*})]=0\]
  • If $1-y_j(w^{\top}x_j + w_0^{}) > 0, \lambda^{*}_j=0$

  • If $\lambda^{}_j > 0, 1-y_j(w^{\top}x_j + w_0^{*})=0$

  • So you can define the support vector set:

\[\nu=\{ j \mid \lambda^{*}_j > 0\}\]
  • So the optimal weight is
\[w^{*}=\sum_{j \in \nu} \lambda^{*}_j y_j x_j\]

Back to Duality

The Lagrangian function is

\[\mathcal{L}(w^{*}, w^{*}_0, \lambda)={1\over2}\|w\|^2_2+\sum^N_{j=1}\lambda_j[1-y_j((w^{*\top})x_j+w_0)]\] \[={1\over2}\|\sum^{N}_{j=1}\lambda_jy_jx_j\|^2_2+\sum^N_{j=1}\lambda_j[1-y_j((\sum^n_{i=1}\lambda_iy_ix_i)^\top x_j + w_0]\]
  • We can show that
\[A={1\over2}\sum^{N}_{i=1}\sum^{N}_{j=1}\lambda_i\lambda_jy_iy_jx_i^\top x_j\] \[\nabla_{w_0}\mathcal{L}(w, w_0, \lambda)=\sum_{j=1}^N{\lambda_jy_j}=0\] \[B=\sum^N_{j=1}\lambda_j-\sum^N_{i=1} \sum^N_{j=1} \lambda_i \lambda_j y_i y_j x^\top_i x_j - (\sum^N_{j=1}\lambda_jy_j)w_0\]
  • and we can show that
\[A+B=\sum^N_{j=1}\lambda_j-{1\over2}\sum^N_{i=1}\sum^N_{j=1}\lambda_i\lambda_j y_iy_jx^\top_ix_j\]
  • Therefore, the dual problem is
\[\text{maximize}_{\lambda \geq 0} -{1\over2}\sum^N_{i=1}\sum^N_{j=1}\lambda_i\lambda_j y_iy_jx^\top_ix_j + \sum^N_{j=1}\lambda_j\]

subject to

\[\sum^N_{j=1}\lambda_jy_j=0\]

If you prefer matrix-vector:

\[\text{maximize}_{\lambda \geq 0} -{1\over2}\lambda^\top Q \lambda + 1^\top \lambda\]

subject to $y^\top \lambda =0$

We can combine the constraints $\lambda \geq 0$ and $y^\top \lambda =0$ as $A\lambda \geq 0$

  • $y^\top \lambda =0$ means
\[y^\top \lambda \geq 0 \text{ and } y^\top \lambda \leq 0\]
  • Thus, we can write $y^\top \lambda=0$ as
\[\begin{bmatrix}y^\top \\ -y^\top \end{bmatrix} \lambda \geq \begin{bmatrix}0 \\ 0 \end{bmatrix}\]
  • Therefore, the matrices $Q$ and $A$ are
\[Q=\begin{bmatrix}y_1y_1x^\top_1 x_1 & .... & y_1y_Nx^\top_1x_N \\ y_2y_1x^\top_2 x_1 & .... & y_2y_Nx^\top_2x_N \\ ... & ... & ... \\ y_Ny_1x^\top_N x_1 & .... & y_Ny_Nx^\top_Nx_N \\ \end{bmatrix}\]

and

\[A=\begin{bmatrix}y^\top \\ -y^\top \\ I\end{bmatrix}\]

So How to Solve the SVM Problem?

  • You look at the dual problem
\[\text{maximize}_\lambda {-{1\over2}\lambda^\top Q \lambda + 1^\top \lambda}\]

subject to

\[A\lambda \geq 0\]
  • You get the solution $\lambda^{*}$

  • Then compute $w^{*}$

\[w*=\sum_{j\in \nu}\lambda^{*}_jy_jx_j\]
  • $\nu$ is the set of support vectors: $\lambda_j > 0$

Are We Done Yet?

  • Not quite

  • We still need to find out $w_0^{*}$

  • Pick any support vector $x^+ \in \mathcal{C}+$ and $x^{-} \in \mathcal{C}{-}.$

  • Then we have

\[w^\top x^+ + w_0 = +1\] \[w^\top x^{-} + w_0 = -1\]
  • Sum them, we have $w^\top(x^{+} + x^{-}) + 2w_0=0$, which means
\[w^{*}_0=-{(x^{+}+x^{-})^\top w^{*} \over2}\]

Summary of Dual SVM

  • Primal
\[\text{minimize}_{w,w_0}{1\over2}\|w\|^2_2\]

subject to $y_j(w^\top x_j + w_0) \geq 1$

  • Strong Duality
\[\min_{w, w_0}\max_{\lambda \geq 0} \mathcal{L}(w, w_0, \lambda)=\max_{\lambda \geq 0}\min_{w, w_0} \mathcal{L}(w, w_0, \lambda)\]
  • Dual
\[\text{maximize}_{\lambda \geq 0} -{1\over2}\sum^N_{i=1}\sum^N_{j=1}\lambda_i\lambda_j y_i y_j x^\top_i x_j + \sum^N_{j=1} \lambda_j\]

subject to $\sum^N_{j=1} \lambda_jy_j=0$

  • The weights are computed as
\[w^{*}=\sum^N_{j=1}\lambda^{*}_jy_jx_j\]
  • This is support vector: $\lambda_j$ is either $\lambda_j=0$ or $\lambda_j > 0$

  • Pick any support vector $x^{+} \in \mathcal{C}{+}$ and $x^{-} \in \mathcal{C}{-}$

  • Then we have

\[w^\top x^{+} + w_0 = +1\] \[w^\top x^{-} + w_0 = -1\]
  • Sum them, we have $x^\top(x^{+}+x^{-}) + 2w_0=0$, which means
\[w^{*}_0=-{(x^{+}+x^{-})^\top w^{*} \over2}\]