OP mistakenly believes the relationship between these two functions is due to the number of samples (i.e. single vs all). However, the actual difference is simply how we select our training labels.
In the case of binary classification we may assign the labels y=±1 or y=0,1.
As it has already been stated, the logistic function σ(z) is a good choice since it has the form of a probability, i.e. σ(−z)=1−σ(z) and σ(z)∈(0,1) as z→±∞. If we pick the labels y=0,1 we may assign
P(y=1|z)P(y=0|z)=σ(z)=11+e−z=1−σ(z)=11+ez
which can be written more compactly as P(y|z)=σ(z)y(1−σ(z))1−y.
It is easier to maximize the log-likelihood. Maximizing the log-likelihood is the same as minimizing the negative log-likelihood. For m samples {xi,yi}, after taking the natural logarithm and some simplification, we will find out:
l(z)=−log(∏imP(yi|zi))=−∑imlog(P(yi|zi))=∑im−yizi+log(1+ezi)
Full derivation and additional information can be found on this jupyter notebook. On the other hand, we may have instead used the labels y=±1. It is pretty obvious then that we can assign
P(y|z)=σ(yz).
It is also obvious that P(y=0|z)=P(y=−1|z)=σ(−z). Following the same steps as before we minimize in this case the loss function
L(z)=−log(∏jmP(yj|zj))=−∑jmlog(P(yj|zj))=∑jmlog(1+e−yzj)
Where the last step follows after we take the reciprocal which is induced by the negative sign. While we should not equate these two forms, given that in each form y takes different values, nevertheless these two are equivalent:
−yizi+log(1+ezi)≡log(1+e−yzj)
The case yi=1 is trivial to show. If yi≠1, then yi=0 on the left hand side and yi=−1 on the right hand side.
While there may be fundamental reasons as to why we have two different forms (see Why there are two different logistic loss formulation / notations?), one reason to choose the former is for practical considerations. In the former we can use the property ∂σ(z)/∂z=σ(z)(1−σ(z)) to trivially calculate ∇l(z) and ∇2l(z), both of which are needed for convergence analysis (i.e. to determine the convexity of the loss function by calculating the Hessian).