Wow, great question! Let me try to explain the resolution. It'll take three distinct steps.
The first thing to note is that the entropy is focused more on the average number of bits needed per draw, not the maximum number of bits needed.
With your sampling procedure, the maximum number of random bits needed per draw is N bits, but the average number of bits needed is 2 bits (the average of a geometric distribution with p=1/2) -- this is because there is a 1/2 probability that you only need 1 bit (if the first bit turns out to be 1), a 1/4 probability that you only need 2 bits (if the first two bits turn out to be 01), a 1/8 probability that you only need 3 bits (if the first three bits turn out to be 001), and so on.
The second thing to note is that the entropy doesn't really capture the average number of bits needed for a single draw. Instead, the entropy captures the amortized number of bits needed to sample m i.i.d. draws from this distribution. Suppose we need f(m) bits to sample m draws; then the entropy is the limit of f(m)/m as m→∞.
The third thing to note is that, with this distribution, you can sample m i.i.d. draws with fewer bits than needed to repeatedly sample one draw. Suppose you naively decided to draw one sample (takes 2 random bits on average), then draw another sample (using 2 more random bits on average), and so on, until you've repeated this m times. That would require about 2m random bits on average.
But it turns out there's a way to sample from m draws using fewer than 2m bits. It's hard to believe, but it's true!
Let me give you the intuition. Suppose you wrote down the result of sampling m draws, where m is really large. Then the result could be specified as a m-bit string. This m-bit string will be mostly 0's, with a few 1's in it: in particular, on average it will have about m/2N 1's (could be more or less than that, but if m is sufficiently large, usually the number will be close to that). The length of the gaps between the 1's are random, but will typically be somewhere vaguely in the vicinity of 2N (could easily be half that or twice that or even more, but of that order of magnitude). Of course, instead of writing down the entire m-bit string, we could write it down more succinctly by writing down a list of the lengths of the gaps -- that carries all the same information, in a more compressed format. How much more succinct? Well, we'll usually need about N bits to represent the length of each gap; and there will be about m/2N gaps; so we'll need in total about mN/2N bits (could be a bit more, could be a bit less, but if m is sufficiently large, it'll usually be close to that). That's a lot shorter than a m-bit string.
And if there's a way to write down the string this succinctly, perhaps it won't be too surprising if that means there's a way to generate the string with a number of random bits comparable to the length of the string. In particular, you randomly generate the length of each gap; this is sampling from a geometric distribution with p=1/2N, and that can be done with roughly ∼N random bits on average (not 2N). You'll need about m/2N i.i.d. draws from this geometric distribution, so you'll need in total roughly ∼Nm/2N random bits. (It could be a small constant factor larger, but not too much larger.) And, notice is that this is much smaller than 2m bits.
So, we can sample m i.i.d. draws from your distribution, using just f(m)∼Nm/2N random bits (roughly). Recall that the entropy is limm→∞f(m)/m. So this means that you should expect the entropy to be (roughly) N/2N. That's off by a little bit, because the above calculation was sketchy and crude -- but hopefully it gives you some intuition for why the entropy is what it is, and why everything is consistent and reasonable.