Confidence interval for normal mean. Suppose we have a random sample X1,X2,…Xn from a normal population. Let's look at the confidence interval for normal mean μ in terms of hypothesis testing. If σ is known, then a two-sided test of H0:μ=μ0 against Ha:μ≠μ0 is based on the statistic Z=X¯−μ0σ/n√. When H0 is true, Z∼Norm(0,1), so we reject H0 at the 5% level if |Z|≥1.96.
Then 'inverting the test', we say that a 95% CI for μ consists of the values μ0 that do not lead to rejection--the 'believable' values of μ. The CI is of the form X¯±1.96σ/n−−√, where ±1.96 cut probability
0.025 from the upper and lower tails, respectively, of the standard normal distribution.
If the population standard deviation σ is unknown and estimated by by the sample standard deviation S, then we use the statistic T=X¯−μ0S/n√. Before the early 1900's people supposed that T is approximately standard normal for n large enough and used S as a substitute for unknown σ. There was debate about how large counts as large enough.
Eventually, it was known that T∼T(ν=n−1), Student's t distribution with n−1 degrees of
freedom. Accordingly, when σ is not known, we use X¯±t∗S/n−−√, where ±t∗ cut probability 0.025 from the upper and lower tails, respectively, of T(n−1).
[Note: For n>30, people have noticed that for 95% CIs t∗≈2≈1.96. Thus the century-old idea that you can "get by" just substituting S for σ when σ is unknown and n>30, has persisted even in some recently-published books.]
Confidence interval for binomial proportion. In the binomial case, suppose we have observed X successes in a binomial experiment with n independent trials. Then we use p^=X/n as an estimate of the binomial success probability p.
In order to test H0:p=p0 vs Ha:p≠p>0, we use the statitic Z=p^−p0p0(1−p0)/n√. Under H0, we know that Z∼aprxNorm(0,1). So we reject H0 if |Z|≥1.96.
If we seek to invert this test to get a 95% CI for p, we run into some difficulties. The 'easy' way to invert the test is to
start by writing p^±1.96p(1−p)n−−−−−√. But his is useless because the value of p under the square root is unknown. The traditional Wald CI assumes that, for sufficiently large n, it is OK to substitute p^ for unknown p. Thus the Wald CI is of the form p^±1.96p^(1−p^)n−−−−−√. [Unfortunately, the Wald interval works well only if the number of trials n is at least several hundred.]
More carefully, one can solve a somewhat messy quadratic inequality to 'invert the test'. The result is the Wilson interval. (See Wikipedia.) For a 95% confidence interval a somewhat simplified version of this result comes from
defining nˇ=n+4 and pˇ=(X+2)/nˇ and then computing the interval as pˇ±1.96pˇ(1−pˇ)nˇ−−−−−√.
This style of binomial confidence interval is widely known as the Agresti-Coull interval; it has been widely advocated in elementary textbooks for about the last 20 years.
In summary, one way to look at your question is that CIs for normal μ and binomial p can be viewed as inversions of tests.
(a) The t distribution provides an exact solution to the problem of needing to use S for σ when σ is unknown.
(b) Using p^ for p requires some care because the mean and variance of p^ both depend on p. The Agresti-Coull CI provides one serviceable way to get CIs for binomial p that are reasonably accurate even for moderately small n.