逻辑回归的似然估计公式推导

假设现有训练样本集合 \(\{(x_{i}, y_{i}) \mid i = 1, 2, \dots, N \}\),\(y_{i} \in \{0, 1\}\),每个样本有 \(m\) 个特征。令条件概率 \(P(Y = 1 \mid X = x_i)\) 为 \(p(x_i)\),则有逻辑回归模型:

\[\log{\frac{p(x_i)}{1 - p(x_i)}}=w_0+w_1x_{i, 1} + w_2x_{i, 2}+\dots+w_mx_{i, m} = W^TX_i\]

其中 \(W=\left[\begin{array}{c}{w_{0}}\\{w_{1}}\\{w_{2}}\\{\dots}\\{w_{m}}\end{array}\right]\)(\(w_0\) 为偏移量,\(w_1\)、\(w_2\)、……、\(w_m\) 为权重值),\(X_i = \left[\begin{array}{ c }{1}\\{x_{i, 1}}\\{x_{i, 2}}\\{\dots}\\{x_{i, m}}\end{array}\right]\)(\(x_{i, 1}\)、\(x_{i, 2}\)、……、\(x_{i, ,}\) 特征值)。

求解上式,得到 \(p(x_i) = \frac{e^{ W^T X_i }}{1 + e^{ W^T X_i}} = \frac{1}{1 + e^{-W^T X_i}}\)

对于一个特征点 \((x_i, y_i)\) 来说,\(y_i\) 要么为 1(概率为 \(p(x_i)\)),要么为 0 (概率为 \(1 - p(x_i)\)),所以似然函数为:

\[L(W) = \prod_{i=1}^{n} \left[ p(x_i)^{y_i} \left(1 - p(x_i)\right)^{1 - y_i} \right]\]

相应的 log 似然函数为:

\[\begin{aligned} l(W) &= \sum_{i=1}^{n} \left[ y_i \log{p(x_i)} + (1 - y_i) \log \left(1 - p(x_i) \right) \right] \\ &= \sum_{i=1}^{n} \left[ y_i \log{ \frac{p(x_i)}{1 - p(x_i)} } + \log \left(1 - p(x_i)\right) \right] \\ &= \sum_{i=1}^{n} \left[ y_i W^T X_i - \log{ (1 + e^{ W^T X_i }) } \right] \end{aligned}\]

为了使 \(l(W)\) 最大化,可以在 log 似然函数中对每个 \(w_j\) 求偏导数:

\[\begin{aligned} \frac{\partial{l(W)}}{\partial{w_j}} &= \sum_{i=1}^{n} \left[ y_i x_{i, j} - \frac{1}{1 + e^{ W^T X_i }} e^{ W^T X_i } x_{i, j} \right] \\ &= \sum_{i=1}^{n} \left[ \left( y_i - \frac{1}{1 + e^{ -W^T X_i }} \right) x_{i, j} \right] \end{aligned}\]

令偏导数等于零,得到对应的极大值点 \(w_j^{'}\)。对每个 \(w_j\) 求偏导数之后,得到让 log 似然函数取极大值的 \(W^{'} = \left[\begin{array}{c}{w_{0}^{'}}\\{w_{1}^{'}}\\{w_{2}^{'}}\\{\dots}\\{w_{m}^{'}}\end{array}\right]\) 。

梯度回传

在梯度下降方法中,损失函数为 \(C(W) = - l(W)\),于是有:

\[\frac{ \partial{C(W)}}{ \partial{W}} = - \frac{ \partial{l(W)}}{ \partial{W}} = \sum_{i=1}^{n} \left[ \left( \frac{1}{1 + e^{ -W^T X_i}} - y_i \right) X_i \right]\]

于是,梯度回传公式为:

\[\begin{aligned} W_{new} &= W - \alpha \frac{ \partial{C(W)}}{ \partial{W}} \\ &= W - \alpha \sum_{i=1}^{n} \left[ \left( \frac{1}{1 + e^{ -W^T X_i}} - y_i \right) X_i \right] \end{aligned}\]

其中 \(\alpha\) 为学习率。

参考

Updated: