머신러닝 수업 예상 문제 및 풀이
Mid-Term
- Find the dimensions of the box with largest volume if the total surface area is $64\text{cm}^2$.
$x>0, y>0, z>0$ So
\[{xz\over x+z}={yz\over y+z}\] \[{x\over x+z}={y\over y+z}\] \[xy+xz=xy+yz\] \[x=y=z\] \[x^2={32\over3}\] \[f(x,y,z)=x^3={128\sqrt{2}\over3\sqrt{3}}\]- Find the maximum and minimum of $f(x,y)=5x-3y$ subject to the constraint $x^2+y^2=136$
- Find the maximum and minimum values of $f(x,y,z)=xyz$ subject to the constraint $x+y+z=1$. Assume that $x,y,z\geq0$.
if $x>0, y>0, z>0$
\[x=y=z={1\over3},f(x,y,z)={1\over27}\] \[x=1,y=0,z=0,f(x,y,z)=0\]- Find the maximum and minimum values of $f(x,y)=4x^2+10y^2$ on the disk $x^2+y^2 \leq 4$.
if $x=0 \text{ and } y=0$, $f(0,0)=0$.
else if $x=0 \text{ and } \lambda=4$, $y=0$, $f(x,y)=0$
else if $x=0 \text{ and } \lambda=10, -2\leq y \leq 2$, $f(x,y)=40$
else if $y=0 \text{ and } \lambda=4, -2\leq x \leq 2$, $f(x,y)=16$
- Find the maximum and minimum of $f(x,y,z)=4y-2z$ subject to the constraints $2x-y-z=2$ and $x^2+y^2=1$
- Write pros and cons of Ridge and LASSO Regression.
Ridge - Analytic, theoretical gurantees, simple / limited interpretability, not reflect the nature of certain problems.
LASSO - Proven, Echoes particularly well / No closed-form solution.
- Write three regularization techniques you know and explain it. (ex. Weight Decay)
Weight Decay: Penalizes large weights, improves generalization.
Early Stopping.
Bagging: uses k different datasets.
Dropout: disable a random set of neurons
- What is Dropout?
Using half the network, consider it as two models in one.
- What is Batch Normalization?
BN normalizes the mean and variance of the inputs to your activation functions.
\[y^{(k)}=\gamma^{(k)} x^{(k)} + \beta^{(k)}\]- Differences of Linear regression and Kernel method.
Linear regression - Pick a global model.
Kernel method - Pick a local model.
- In RBF Kernel, What happens when RBF kernel has Large $\sigma$ or small $\sigma$?
Large $\sigma \rightarrow$ over smoothing
- Why SVM uses signed distance instead of unsigned distance?
For misclassification.
- Explain about two tricks in SVM!
Scaling and Max-to-min optimization
- Find hyperplane function $h(x)$ and margin. There are four points and labels.
$x_1=\begin{bmatrix}0 \ 0\end{bmatrix}, x_2=\begin{bmatrix}2 \ 2\end{bmatrix}, x_3=\begin{bmatrix}2 \ 0\end{bmatrix}, x_4=\begin{bmatrix}3 \ 0\end{bmatrix}$
$y_1=-1,y_2=-1,y_3=+1, y_4=+1$
Solve :
\[y_j(w^\top x_j + w_0) \geq 1\](1) $-w_0 \geq 1$
(2) $-(2w_1 + 2w_2 + w_0) \geq 1$
(3) $(2w_1+w_0) \geq 1$
(4) $(3w_1 + w_0) \geq 1$
\[w_0 \leq -1\] \[w_1 \geq 1\] \[w_2 \leq -1\] \[{1\over2}\|w\|^2={1\over2}(w^2_1+w^2_2) \geq 1\] \[w_1=1,w_2=-1,w_0=-1\] \[h(x)=\text{sign}(x_1-x_2-1)\]Margin:
\[{1 \over \|w\|_2} = {1 \over \sqrt2}\]- Explain about Karush-Kahn-Tucker Conditions
Stationarity, Primal Feasibility, Dual Feasibility, Complementary Slackness
-
SVM Problem : $\text{minimize}_{u_1, u_2} u^2_1+u^2_2$, subject to $\begin{bmatrix} 1 & 2 \ 1 & 0 \ 0 & 1 \ \end{bmatrix} \begin{bmatrix} u_1 \ u_2 \end{bmatrix} \geq \begin{bmatrix} 2 \ 0 \ 0 \ \end{bmatrix}$
-
If $C$ is big, then enforce $\xi$ to be small or big?
$\text{minimize}{1\over2}|w|^2_2+C|\xi|^2$
- Radial Basis Function takes the form of
$K(u,v)=\exp{-\gamma |u-v|^2_2}$,
What happens if $\gamma$ is too big?
-
Write down the cross-entropy loss and explain why not use L2 loss.
-
Average Pooling vs Max Pooling
-
Explain SGD.
-
Write down the delta rule term.
-
Explain RNN and LSTM and Conv LSTM and GRU.
-
AlexNet
-
ZFNet
-
VGGNet
-
GoogleNet
-
ResNet
-
DenseNet
-
Depth-wise Seperable Convolution
-
EfficientNet v1
-
Compound Scaling
-
EfficientNet-v2
-
Overfitting and Underfitting
-
Random Initialization