Logistic regression
Suppose we have a set of data \(\mathbb{D} = \{(\boldsymbol{x}_i, y_i)\}_{i = 1}^m\), where \(\boldsymbol{x}_i \in \mathbb{R}^n\) is the feature vector and \(y_i \in \{0, 1\}\) is the label, we first make the hypothesis \(h_{\boldsymbol{\theta}}\) as:
where we take the same notation convention as in the previous post, and it can be viewed as a mapping from \(\mathbb{R}^n\) to \(\mathbb{R}^1\) as well. But the domain of \(y\) is \(\{0, 1\}\). So we further apply the logistic transform to the hypothesis as:
As a result, when \(h_{\boldsymbol{\theta}}(\boldsymbol{x}) \ge 0\), the prediction result is \(1\), otherwise the prediction result is \(0\).
Logit link function
The logit link function is like:
or, equivalently
which defines the relationship between the mean of \(Y\) and \(X\).
And since the \(Y\) is bernoulli \(0 - 1\), the variance of \(Y\) is
After that, the odds, a quantity to quantify the binary data, is computed as
And here, we have
So \(\theta_i\) denotes the contribution of unit increase in \(\boldsymbol{X}\) to the \(\text{odds}\).
Logistic loss function
The Logistic loss function is defined as
And the gradient function is
So this optimization can be easily solved by SGD or L-BFGS.
Maximum Likelihood Estimation
The log-likelihood of the model is like
And the gradient vector \(\boldsymbol{g}\) is like:
The Hessian matrix is like:
By the Newton-Raphson method
In Gaussian case, we have
and
According to the property that the Gaussian vectors remains Gaussian with linear transformation, we have
For logistic regression, \(\hat{\boldsymbol{\theta}}\) is not linear in \(Y\). However, asymptotically (large \(m\)), it is Gaussian, and its covariance can be estimated as:
The square root elements of \(\text{cov}(\hat{\boldsymbol{\theta}})\) is \(\text{s.e.}(\hat{\boldsymbol{\theta}})\).
And the confidence interval for \(\boldsymbol{\theta}\) is \([\hat{\boldsymbol{\theta}} - z_{\alpha / 2}\text{s.e.}(\hat{\boldsymbol{\theta}}), \hat{\boldsymbol{\theta}} + z_{\alpha / 2}\text{s.e.}(\hat{\boldsymbol{\theta}})]\).
Note:
here is the revisit on this topic.