The enthusiastic practitioner who is interested to learn more about the magic behind successful machine learning algorithms currently faces a daunting set of pre-requisite knowledge:

  • Programming languages and data analysis tools.
  • Large-scale computation and the associated frameworks.
  • Mathematics and statistics and how machine learning builds on it.

常用符号

Symble Name Explanation / Examples
$\mathbb{R} \ \mathbf{R}$ real numbers $ \pi \in \mathbb{R} $
$\mathbb{C} \ \mathbf{C}$ complex numbers $ i \in \mathbb{C} $
$\sum$ summation 求和 $ \sum_{k=1}^n a_{k} = a_1 + a_2 + \dots + a_n $
$\prod$ product 积 $ \prod_{k=1}^n a_{k} = a_1 \cdot a_2 \cdot \dots \cdot a_n $
$\propto$ proportionality 比例性 $y \propto x$ means that y = kx for some constant k
$\forall$ universal quantification
通用量化
$ \forall n \in N, n^2 \ge n $
$\int$ integral 积分 $ \int_a^b x^2 dx = \frac{b^3 - a^3}{3} $
$’$ derivative 导数 If $f(x) := x^2$, then $f’(x) = 2x$.
$\partial$ partial derivative
偏导数
If $f(x, y) := x^2 y$, then $ \frac{\partial f}{\partial x} = 2xy $.
$\nabla$ gradient 梯度 $ \nabla \cdot \overset{\rightharpoonup}{v} $ = $ \frac{\partial v_x}{\partial x} + \frac{\partial v_y}{\partial y} + \frac{\partial v_z}{\partial z}$
$\Delta$ delta $ \frac{\Delta y}{\Delta x} $ is the gradient of a straight line.
$P(A|B)$ probability 概率 Probability of A given B
$\mathrm{E}$ expected value 期望值 $\mathrm{E}[X]$ = $ \sum_i^\infty x_i p_i $
$| \ldots |$ determinant 行列式 $ \det(u, v) = \begin{vmatrix} 1 & 2 \\ 2 & 9 \end{vmatrix} = 1 \times 9 - 2 \times 2 = 5 $
$\odot$ Hadamard product
哈达玛积
$ \begin{vmatrix} 1 & 2 \\ 2 & 4 \end{vmatrix} \odot \begin{vmatrix} 1 & 2 \\ 0 & 1 \end{vmatrix} = \begin{vmatrix} 1\cdot1 & 2\cdot2 \\ 2\cdot0 & 4\cdot1 \end{vmatrix} = \begin{vmatrix} 1 & 4 \\ 0 & 4 \end{vmatrix} $
$\hat a$ estimator
估计量
$\hat \theta$ is the estimator or the estimate for the parameter $\theta$
$\sigma$ selection $ \sigma_{a \theta b} (R) = \{t : t \in R,\ t(a) \ \theta \ t(b)\} $
$argmax$ arguments of the maxima 函数输出尽可能大的输入或参数

常用术语

Name Explanation / Examples
Covariance
协方差
$cov(X, Y) = \mathrm{E}((X - \mu)(Y - \nu)) = \mathrm{E}(X \cdot Y) - \mu \nu$
{ $\mathrm{E}(X) = \mu$ , $\mathrm{E}(X) = \nu$ }
Variance
方差
$var(V) = E[(X - \mu)^2] = cov(X, X)$
Standard Deviation
标准差
$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2 }$
{ $\mathrm{E}(x) = \mu$ }
Mean Absolute Error
平均绝对误差
$\mathrm{MAE} = \frac{1}{n} {\sum_{i=1}^{n} \left| Y_i - \hat{Y_i} \right|} $
{ $Y_{i}$: 观察值, $\hat{Y_{i}}$: 预测值 }
Mean Squared Error
均方误差
$\mathrm{MSE} = \frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y_i})^2 $
{ $Y_{i}$: 观察值, $\hat{Y_{i}}$: 预测值 }
Root Mean Square Error
均方根误差
$\mathrm{RMSE} = \sqrt{\mathrm{MSE}}$

激活函数

  1. Sigmoid

    $$
    \begin{align}
    S(x) = \frac{1 + e^{-x}}{1} = \frac{e^x + 1}{e^x}
    \end{align}
    $$

  2. Hyperbolic tangent (tanh) 双曲正切

    $$
    \begin{align}
    \tanh(x) = \frac{\sinh(x)}{\cosh(x)} = \frac {e^x - e^{-x}}{e^x + e^{-x}} = \frac{e^{2x} - 1}{e^{2x} + 1}
    \end{align}
    $$

  3. Rectifier (ReLU) 修正线性单元

    $$
    f(x) = x^+ = max(0, x)
    $$

  4. Leaky ReLU 带泄露修正线性单元

    $$
    f(x) =
    \begin{cases}
    x &\ x \gt 0, \\
    0.01x &\ otherwise.
    \end{cases}
    $$

高等数学

  1. 数列极限

    $$
    \lim_{n \to \infty} x_n = a, \forall \epsilon \gt 0,\exists \ 正整数 N,当n \gt N 时, \left| x_n - a \right| < \epsilon .
    $$

  2. 两个重要的极限

    $$
    \lim_{x \to 0} \frac{\sin x}{x} = 1
    $$

    $$
    \lim_{x \to \infty} (1 + \frac{1}{x})^x = e
    $$

  3. 泰勒公式

    $$
    f(x) = \sum_{n = 0}^{\infty} \frac{f^{(n)}(a)}{n!} \cdot (x - a^n) \\
    f^{(n)}(a) 表示函数f在点a处的n阶导数, 如果a=0,这个级数称为麦克劳林级数。
    $$

  4. 欧几里得距离 (Euclidean distance)

    $$
    d(p, q) = \sqrt{\sum_{i=1}^n (q_i - p_i)^2}
    $$

  5. 明氏距离 (Minkowski distance)

    $$
    D(X,Y) = d_p(x, y) = (\sum_{i=1}^n \left|x_i - y_i\right|^p)^{\frac{1}{p}}
    $$

线性代数

  1. 矩阵

    $$
    A = [a_{ij}]_{m \times n}
    $$

  2. 矩阵的转置

    $$
    A^T = [a_{ji}]_{n \times m}
    $$

  3. $$
    A = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 4 & 3 & 2 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 & -2 \\ 0 & 1 & 2 & 3 \\ 0 & 0 & 0 & 0 \end{bmatrix} \\
    Rank(A) = Rank(A^T) = 2
    $$

  4. 零空间

    $$
    矩阵 A 的零空间 N(A) 就是由满足 A \cdot \overset{\rightharpoonup}{x} = 0 的所有向量 \overset{\rightharpoonup}{x} 的集合。\\
    一个矩阵的零空间为 \overset{\rightharpoonup}{0} 的充分必要条件是这个矩阵的所有列线性无关。
    $$

  5. 左零空间

    $$
    矩阵 A 的左零空间是 A 的转置的零空间。 \\
    N(A^T) = \begin{Bmatrix} \overset{\rightharpoonup}{x}| A^T \overset{\rightharpoonup}{x} = \overset{\rightharpoonup}{0} \end{Bmatrix} = \begin{Bmatrix} \overset{\rightharpoonup}{x}| \overset{\rightharpoonup}{x}^T A = \overset{\rightharpoonup}{0}^T \end{Bmatrix}
    $$

  6. 列空间 (由每一列的向量张成的空间)

    $$
    A_{m \times n} = \begin{bmatrix} \overset{\rightharpoonup}{v_1} & \overset{\rightharpoonup}{v_2} \ldots \overset{\rightharpoonup}{v_n} \end{bmatrix}
    $$

    $$
    \therefore \ C(A) = span(\overset{\rightharpoonup}{v_1}, \overset{\rightharpoonup}{v_2}, \ldots, \overset{\rightharpoonup}{v_n})
    $$

  7. 行空间

    $$
    R(A) = C(A^T)
    $$

概率论

  1. 条件概率

    $$
    P(A|B) = \frac{P(A \cap B)}{P(B)}
    $$

  2. 贝叶斯

    $$
    \begin{align}
    \displaystyle P(A|B) = \frac{\frac{P(A \cap B)}{P(A)} \cdot P(A)}{P(B)} = \frac{P(B|A) \cdot P(A)}{P(B)}
    \end{align}
    $$

  3. Chain Rule

    $$
    P(F, G, P) = P(F) P(G, P | F) = P(F) P(G | F) P(P | F, G)
    $$