Logistic regression

Suppose we have a set of data \(\mathbb{D} = \{(\boldsymbol{x}_i, y_i)\}_{i = 1}^m\), where \(\boldsymbol{x}_i \in \mathbb{R}^n\) is the feature vector and \(y_i \in \{0, 1\}\) is the label, we first make the hypothesis \(h_{\boldsymbol{\theta}}\) as:

where we take the same notation convention as in the previous post, and it can be viewed as a mapping from \(\mathbb{R}^n\) to \(\mathbb{R}^1\) as well. But the domain of \(y\) is \(\{0, 1\}\). So we further apply the logistic transform to the hypothesis as:

As a result, when \(h_{\boldsymbol{\theta}}(\boldsymbol{x}) \ge 0\), the prediction result is \(1\), otherwise the prediction result is \(0\).

The logit link function is like:

or, equivalently

which defines the relationship between the mean of \(Y\) and \(X\).

And since the \(Y\) is bernoulli \(0 - 1\), the variance of \(Y\) is

After that, the odds, a quantity to quantify the binary data, is computed as

And here, we have

So \(\theta_i\) denotes the contribution of unit increase in \(\boldsymbol{X}\) to the \(\text{odds}\).

Logistic loss function

The Logistic loss function is defined as

And the gradient function is

So this optimization can be easily solved by SGD or L-BFGS.

Maximum Likelihood Estimation

The log-likelihood of the model is like

And the gradient vector \(\boldsymbol{g}\) is like:

The Hessian matrix is like:

By the Newton-Raphson method

In Gaussian case, we have

and

According to the property that the Gaussian vectors remains Gaussian with linear transformation, we have

For logistic regression, \(\hat{\boldsymbol{\theta}}\) is not linear in \(Y\). However, asymptotically (large \(m\)), it is Gaussian, and its covariance can be estimated as:

The square root elements of \(\text{cov}(\hat{\boldsymbol{\theta}})\) is \(\text{s.e.}(\hat{\boldsymbol{\theta}})\).

And the confidence interval for \(\boldsymbol{\theta}\) is \([\hat{\boldsymbol{\theta}} - z_{\alpha / 2}\text{s.e.}(\hat{\boldsymbol{\theta}}), \hat{\boldsymbol{\theta}} + z_{\alpha / 2}\text{s.e.}(\hat{\boldsymbol{\theta}})]\).

Note: here is the revisit on this topic.