机器学习之数学
The enthusiastic practitioner who is interested to learn more about the magic behind successful machine learning algorithms currently faces a daunting set of pre-requisite knowledge:
- Programming languages and data analysis tools.
- Large-scale computation and the associated frameworks.
- Mathematics and statistics and how machine learning builds on it.
常用符号
Symble | Name | Explanation / Examples |
---|---|---|
$\mathbb{R} \ \mathbf{R}$ | real numbers | $ \pi \in \mathbb{R} $ |
$\mathbb{C} \ \mathbf{C}$ | complex numbers | $ i \in \mathbb{C} $ |
$\sum$ | summation 求和 | $ \sum_{k=1}^n a_{k} = a_1 + a_2 + \dots + a_n $ |
$\prod$ | product 积 | $ \prod_{k=1}^n a_{k} = a_1 \cdot a_2 \cdot \dots \cdot a_n $ |
$\propto$ | proportionality 比例性 | $y \propto x$ means that y = kx for some constant k |
$\forall$ | universal quantification 通用量化 |
$ \forall n \in N, n^2 \ge n $ |
$\int$ | integral 积分 | $ \int_a^b x^2 dx = \frac{b^3 - a^3}{3} $ |
$'$ | derivative 导数 | If $f(x) := x^2$, then $f’(x) = 2x$. |
$\partial$ | partial derivative 偏导数 |
If $f(x, y) := x^2 y$, then $ \frac{\partial f}{\partial x} = 2xy $. |
$\nabla$ | gradient 梯度 | $ \nabla \cdot \overset{\rightharpoonup}{v} $ = $ \frac{\partial v_x}{\partial x} + \frac{\partial v_y}{\partial y} + \frac{\partial v_z}{\partial z}$ |
$\Delta$ | delta | $ \frac{\Delta y}{\Delta x} $ is the gradient of a straight line. |
$P(A|B)$ | probability 概率 | Probability of A given B |
$\mathrm{E}$ | expected value 期望值 | $\mathrm{E}[X]$ = $ \sum_i^\infty x_i p_i $ |
$| \ldots |$ | determinant 行列式 | $ \det(u, v) = \begin{vmatrix} 1 & 2 \\ 2 & 9 \end{vmatrix} = 1 \times 9 - 2 \times 2 = 5 $ |
$\odot$ | Hadamard product 哈达玛积 |
$ \begin{vmatrix} 1 & 2 \\ 2 & 4 \end{vmatrix} \odot \begin{vmatrix} 1 & 2 \\ 0 & 1 \end{vmatrix} = \begin{vmatrix} 1\cdot1 & 2\cdot2 \\ 2\cdot0 & 4\cdot1 \end{vmatrix} = \begin{vmatrix} 1 & 4 \\ 0 & 4 \end{vmatrix} $ |
$\hat a$ | estimator 估计量 |
$\hat \theta$ is the estimator or the estimate for the parameter $\theta$ |
$\sigma$ | selection | $ \sigma_{a \theta b} (R) = \{t : t \in R,\ t(a) \ \theta \ t(b)\} $ |
$argmax$ | arguments of the maxima | 函数输出尽可能大的输入或参数 |
常用术语
Name | Explanation / Examples |
---|---|
Covariance 协方差 |
$cov(X, Y) = \mathrm{E}((X - \mu)(Y - \nu)) = \mathrm{E}(X \cdot Y) - \mu \nu$ { $\mathrm{E}(X) = \mu$ , $\mathrm{E}(X) = \nu$ } |
Variance 方差 |
$var(V) = E[(X - \mu)^2] = cov(X, X)$ |
Standard Deviation 标准差 |
$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2 }$ { $\mathrm{E}(x) = \mu$ } |
Mean Absolute Error 平均绝对误差 |
$\mathrm{MAE} = \frac{1}{n} {\sum_{i=1}^{n} \left| Y_i - \hat{Y_i} \right|} $ { $Y_{i}$: 观察值, $\hat{Y_{i}}$: 预测值 } |
Mean Squared Error 均方误差 |
$\mathrm{MSE} = \frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y_i})^2 $ { $Y_{i}$: 观察值, $\hat{Y_{i}}$: 预测值 } |
Root Mean Square Error 均方根误差 |
$\mathrm{RMSE} = \sqrt{\mathrm{MSE}}$ |
激活函数
-
Sigmoid
$$
\begin{align}
S(x) = \frac{1 + e^{-x}}{1} = \frac{e^x + 1}{e^x}
\end{align}
$$ -
Hyperbolic tangent (tanh) 双曲正切
$$
\begin{align}
\tanh(x) = \frac{\sinh(x)}{\cosh(x)} = \frac {e^x - e^{-x}}{e^x + e^{-x}} = \frac{e^{2x} - 1}{e^{2x} + 1}
\end{align}
$$ -
Rectifier (ReLU) 修正线性单元
$$
f(x) = x^+ = max(0, x)
$$ -
Leaky ReLU 带泄露修正线性单元
$$
f(x) =
\begin{cases}
x &\ x \gt 0, \\
0.01x &\ otherwise.
\end{cases}
$$
高等数学
-
数列极限
$$
\lim_{n \to \infty} x_n = a, \forall \epsilon \gt 0,\exists \ 正整数 N,当n \gt N 时, \left| x_n - a \right| < \epsilon .
$$ -
两个重要的极限
$$
\lim_{x \to 0} \frac{\sin x}{x} = 1
$$$$
\lim_{x \to \infty} (1 + \frac{1}{x})^x = e
$$ -
泰勒公式
$$
f(x) = \sum_{n = 0}^{\infty} \frac{f^{(n)}(a)}{n!} \cdot (x - a^n) \\
f^{(n)}(a) 表示函数f在点a处的n阶导数, 如果a=0,这个级数称为麦克劳林级数。
$$ -
欧几里得距离 (Euclidean distance)
$$
d(p, q) = \sqrt{\sum_{i=1}^n (q_i - p_i)^2}
$$ -
明氏距离 (Minkowski distance)
$$
D(X,Y) = d_p(x, y) = (\sum_{i=1}^n \left|x_i - y_i\right|^p)^{\frac{1}{p}}
$$
线性代数
-
矩阵
$$
A = [a_{ij}]_{m \times n}
$$ -
矩阵的转置
$$
A^T = [a_{ji}]_{n \times m}
$$ -
秩
$$
A = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 4 & 3 & 2 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 & -2 \\ 0 & 1 & 2 & 3 \\ 0 & 0 & 0 & 0 \end{bmatrix} \\
Rank(A) = Rank(A^T) = 2
$$ -
零空间
$$
矩阵 A 的零空间 N(A) 就是由满足 A \cdot \overset{\rightharpoonup}{x} = 0 的所有向量 \overset{\rightharpoonup}{x} 的集合。\\
一个矩阵的零空间为 \overset{\rightharpoonup}{0} 的充分必要条件是这个矩阵的所有列线性无关。
$$ -
左零空间
$$
矩阵 A 的左零空间是 A 的转置的零空间。 \\
N(A^T) = \begin{Bmatrix} \overset{\rightharpoonup}{x}| A^T \overset{\rightharpoonup}{x} = \overset{\rightharpoonup}{0} \end{Bmatrix} = \begin{Bmatrix} \overset{\rightharpoonup}{x}| \overset{\rightharpoonup}{x}^T A = \overset{\rightharpoonup}{0}^T \end{Bmatrix}
$$ -
列空间 (由每一列的向量张成的空间)
$$
A_{m \times n} = \begin{bmatrix} \overset{\rightharpoonup}{v_1} & \overset{\rightharpoonup}{v_2} \ldots \overset{\rightharpoonup}{v_n} \end{bmatrix}
$$$$
\therefore \ C(A) = span(\overset{\rightharpoonup}{v_1}, \overset{\rightharpoonup}{v_2}, \ldots, \overset{\rightharpoonup}{v_n})
$$ -
行空间
$$
R(A) = C(A^T)
$$
概率论
-
条件概率
$$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$$ -
贝叶斯
$$
\begin{align}
\displaystyle P(A|B) = \frac{\frac{P(A \cap B)}{P(A)} \cdot P(A)}{P(B)} = \frac{P(B|A) \cdot P(A)}{P(B)}
\end{align}
$$ -
Chain Rule
$$
P(F, G, P) = P(F) P(G, P | F) = P(F) P(G | F) P(P | F, G)
$$
- Blog Link: https://neo1989.net/Notes/NOTE-math-basic/
- Copyright Declaration: 转载请声明出处。