Koşullu olasılık için formülün arkasındaki sezgi nedir?


30

B'nin gerçekleşmesi koşuluyla , koşullu olasılık formülü : P ( AAB

P(A | B)=P(AB)P(B).

Ders kitabım bunun arkasındaki sezgiyi Venn şeması açısından açıklıyor.

görüntü tanımını buraya girin

B gerçekleştiği göz önüne alındığında, A gerçekleşmesinin tek yolu , olayın A ve kesişimine düşmesidir B.

Bu durumda, P ( A) olasılığı olmazP(A|B) basitçeA kavşağının olasılığına eşit olmakB, çünkü olayın gerçekleşmesinin tek yolu bu mu? Neyi kaçırıyorum?


7
Koşullu olasılığın "ne" olduğuna dair sezgisel bir fikriniz var mı?
Juho Kokkala

4
B (olay üzerine şartlandırma tarafından etmiştir oluştu), aralarından sonuçların alanınızı kısıtlamak Ω sadece B'ye (bütün düzlemde). B dışındaki her şeyi unutursunuz. A olayının olasılığı B ile ölçülmelidir, çünkü olasılık 0 ile 1 arasındadır.
Vladislavs Dovgalecs

1
Etkinlik A çemberinin beyaz bölümünün, Etkinlik B'nin gerçekleştiğini öğrendikten sonra nüfusun bir parçası olmadığı gerçeğini kaçırıyorsunuz.
Monty Harder

4
Sezgiler kesin değildir, ne de tekil değildirler, peki neden (tekil) sezgiyi soruyorsunuz? Yararlı bir sezgi yeterlidir, ancak bütün öneriler tüm insanlar için faydalı olmayacaktır.
John Coleman

Yanıtlar:


23

B'nin gerçekleştiği (A ile veya A olmadan) iyi bir sezgi, A'nın olasılığı nedir? Yani, şu anda B'nin gerçekleştiği evrendeyiz - tam doğru çember. Bu dairede, A'nın olasılığı, A'nın kesiştiği B'nin, dairenin alanına bölünmesidir.


5
Başka bir deyişle - Size gerçekleştiğini söylüyorum , yani B dairesinde yaşıyoruz . Bu dünyada, olayların yüzde kaçı objektiftir ( A B )? BBAB
MichaelChirico

18

Bunun gibi düşünürdüm: Sezgiyi anladığınızı kabul ediyorum:

B'nin gerçekleştiği göz önüne alındığında, A'nın gerçekleşmesinin tek yolu, A & B'nin kesişme noktasındaki çiftlerin bile düşmesidir.

ve gönderdiğin ikinci resmi yorumlayacağım:

  1. Tüm beyaz dikdörtgenin sizin örnek alanınız olduğunu hayal edin Ω .

    Kümeye bir olasılık atamak , ayarlanan bir anlamda ölçtüğünüz anlamına gelir . Dikdörtgenin alanını ölçtüğünüz gibi aynıdır, ancak olasılık belirli özelliklere sahip farklı bir tür ölçümdür (bu konuda daha fazla bir şey söylemeyeceğim).

  2. olduğunu biliyorsunuz ve bu şöyle yorumlanıyor:P(Ω)=1

    olabilecek tüm olayları ve bir şeylerin olması gerektiğini temsil eder, bu yüzden bir şeylerin olma olasılığı% 100'dür.Ω

  3. Benzer şekilde, küme , örnek boşluğu Ω ile orantılı olan bir olasılık ( P) ( A ) 'ya sahiptir . Grafiksel şunu gördün konuşan bir Ω dolayısıyla ölçüsü A (onun olasılık P ( A ) ) daha az olmak zorundadır P ( Ω ) . Aynı sebep, küme A B için de geçerlidir . Bu set ölçülebilir ve ölçüsü P ( A B ) 'dir .AP(A)ΩAΩAP(A)P(Ω)ABP(AB)

  4. Bunu anlatılır şimdi ise oldu sen düşünmek zorunda sanki B "yeni" idi Ω . Eğer B sizin "yeni" Ω iseniz, o zaman B setinde her şeyin olduğuna% 100 emin olabilirsinizBBΩBΩB .

    Ve onun anlamı ne? Şimdi, “yeni” yarışmasında ve “yeni” örnek alanı B olarak ifade edilmeleri gerektiğini dikkate alarak tüm olasılık ölçümlerini yeniden ölçeklendirmeniz gerektiği anlamına gelir.P(BB)=1B . Bu basit bir orandır.

    Sezgi, şöyle derken neredeyse doğru:

P (A | B) olasılığı basitçe A kavşağı B olasılığına eşit olacaktır.

ve "neredeyse" şimdi örnek alanınızın değişmiş olmasından ( şimdi ) ve P'yi ( A B ) uygun şekilde yeniden ölçeklendirmek istediğinizden kaynaklanmaktadır .BP(AB)

  1. ,örneklem alanının şimdi B olduğu yeni dünyadaki P ( A B ) . Kelimelerin içinde böyle söylersiniz (ve lütfen resimle kümeleriyle görselleştirmeyi deneyin):P(AB)P(AB)B

    Yeni dünyada, ölçüsü ile A B ölçüsü arasındaki oran, Ω ölçüsü ile A B ölçüsü arasındaki oranla aynı olmalıdır.BABΩAB

  2. Son olarak, bunu matematiksel dile çevirin (basit bir oran):

P(B):P(AB)=P(Ω):P(AB)

ve olduğundan beri :P(Ω)=1

P(AB)=P(AB):P(B)

5

Sezgiyi kolayca aşağıdaki problemi düşünerek göreceksiniz.

Diyelim ki, 10 topunuz var: 6 Siyah ve 4 kırmızı. Siyah toplar 3 Başar ve kırmızı toplar sadece 1 Başar. Bu bir siyah top da Başar mı?

Cevap çok kolay:% 50, çünkü toplam 6 Siyah top dışında 3 Harika Siyah topumuz var.

Sorunumuza olasılıkları şu şekilde eşlersiniz:

  • Black AND Awesome olan 3 top P'ye ( A B ) karşılık gelir.P(AB)
  • Siyah olan 6 top P'ye ( B ) karşılık gelirP(B)
  • Siyah olduğunu bildiğimizde bir topun harika olması olasılığı: P(AB)

1
Would it not make more sense to write n(B)=6 rather than P(B)=6?
Silverfish

@Silverfish Daha doğru olurdu, ama bu davadaki sezginin peşindeydim
Aksakal

4

For a basic intuition of the conditional probability formula, I always like using a two way table. Let's say there are 150 students in a yeargroup, of whom 80 are female and 70 male, each of whom must study exactly one language course. The two-way table of students taking different courses is:

        | French   German   Italian  | Total
-------- --------------------------- -------
Male    |     30       20        20  |    70
Female  |     25       15        40  |    80
-------- --------------------------- -------
Total   |     55       35        60  |   150

Given that a student takes the Italian course, what is the probability they are female? Well the Italian course has 60 students, of whom 40 are females studying Italian, so the probability must be:

P(F | İtalyan)=n(Fİtalyan)n(İtalyan)=4060=23

n(A)An(FItalian)n(F), because the latter would have included all 80 females, including the other 40 who do not study Italian.

But if the question were flipped around, what is the probability that a student takes the Italian course, given that they are female? Then 40 of the 80 female students take the Italian course, so we have:

P(Italian|F)=n(ItalianF)n(F)=4080=12

I hope this provides intuition for why

P(A|B)=n(AB)n(B)

Understanding why the fraction can be written with probabilities instead of cardinalities is a matter of equivalent fractions. For example, let us return to the probability a student is female given that they are studying Italian. There are 150 students in total, so the probability that a student is female and studies Italian is 40/150 (this is a "joint" probability) and the probability a student studies Italian is 60/150 (this is a "marginal" probability). Note that dividing the joint probability by the marginal probability gives:

P(FItalian)P(Italian)=40/15060/150=4060=n(FItalian)n(Italian)=P(F|Italian)

(To see that the fractions are equivalent, multiplying numerator and denominator by 150 removes the "/150" in each.)

More generally, if your sampling space Ω has cardinality n(Ω) — in this example the cardinality was 150 — we find that

P(A|B)=n(AB)n(B)=n(AB)/n(Ω)n(B)/n(Ω)=P(AB)P(B)

3

I would reverse the logic. The probability that both A and B is either:

  1. The probability B happened, and that given that A happened.
  2. Same but reverse roles for A and B

This will give you

p(AB)=p(B)p(AB)

If you're looking for a negative to your suggestion, it's while it's true the probability of A given B is contained in the probability of the product, the space you're rolling the dice in is smaller than your original probability space - you know for sure you're "in" B, hence you divide by the size of the new space.


2

The Venn diagram doesn't represent probability, it represents the measure of subsets of the event space. A probability is the ratio between two measures; the probability of X is the size of "everything that constitutes X" divided the size of "all the events being considered". Any time you're calculating a probability, you need both a "success space" and a "population space". You can't calculated a probability based just on "how big" the success space is. For instance, the probability of rolling a seven with two dice is the number of ways of rolling a seven divided by the total number of ways of rolling two dice. Just knowing the number of ways of rolling a seven is not enough to calculate the probability. P(A|B) is the ratio of the measure of the "both A and B happen" space and the measure of the "B happens" space. That's what the "|" means: it means "make what comes after this the population space".


2

I think the best way to think about this is drawing step-by-step paths.

Let's describe Event B as rolling a 4 on a fair die - this can be easily shown to have probability 16. Now let's describe Event A as drawing an Ace from a standard 52-card deck of cards - this can be easily shown to have probability 113.

Let's now run an experiment where we roll a die and then pick a card. So P(A|B) would be the probability that we draw an Ace, given that we have already rolled a 4. If you look at the image, this would be the 16 path (go up) and then the 113 path (go up again).

Intuitively, the total probability space is what we have already been given: rolling the 4. We can ignore the 113 and 1213 the initial down path leads to, since it was GIVEN that we rolled a 4. By law of multiplication, our total space is then (16×113)+(16×1213).

Now what's the probability we drew an Ace, GIVEN that we rolled a 4? The answer by using the path is (16×113), which we then need to divide by the total space. So we get

P(A|B)=16×113(16×113)+(16×1213).

enter image description here


2
I was wondering what the downvote was for, because probability trees can be very instructive. Perhaps the concern is that using independent events for the illustration misses the very point of conditional probability, which is that the probability distribution can change depending on the conditioning event. Using a less-superficial illustration may help.
whuber

1

Think of it on terms of counts. Marginal probability is how many times A occurred divided by sample size. Joint probability of A and B is how many times A occurred together with B divided by sample size. Conditional probability of A given B is how many times A occurred together with B divided by how many times B occurred, i.e. only the A's "within" B's.

You can find nice visual illustration on this blog, that shows it using Lego blocks.


1

At the time of writing there is about 10 answers which seem to all miss the most important point: you are essentially right.

In that case, wouldn't the probability of P(A | B) simply be equal to the probability of A intersection B, since that's the only way the event could happen?

This is definitely true. This explains why the quantity we to define P(A|B) is actually P(AB) rescaled.

What am I missing?

You are missing that the probability of B being satisfied given that B is satisfied should be 1 since this is quite a certain event, and not P(BB)=P(B) which can well be less than 1. Dividing by P(B) makes the conditional probability of B given B equal to 1, as expected. Actually this is even better and makes the map AP(A|B) a probability – so a conditional probability is actually a probability.


0

I feel it is more intuitive when we have a concrete data to estimate the probabilities.

Let's use mtcars data as an example, the data looks like this (we only use number of cylinders and transmission type.)

> mtcars[,c("am","cyl")]
                    am cyl
Mazda RX4            1   6
Mazda RX4 Wag        1   6
Datsun 710           1   4
Hornet 4 Drive       0   6
...  
...
Ford Pantera L       1   8
Ferrari Dino         1   6
Maserati Bora        1   8
Volvo 142E           1   4

We can calculate the joint distribution on two variables by doing a cross table:

> prop.table(table(mtcars$cyl,mtcars$am))

          0       1
  4 0.09375 0.25000
  6 0.12500 0.09375
  8 0.37500 0.06250

The joint probability means we want to consider two variables at the same time. For example, we will ask how many cars are 4 cylinder and manual transmission.

Now, we come to conditional probability. I found the most intuitive way to explain conditional probability is using the term filtering on data.

Suppose we want to get P(am=1|cyl=4), we will do following estimations:

> cyl_4_cars=subset(mtcars, cyl==4)
> prop.table(table(cyl_4_cars$am))

        0         1 
0.2727273 0.7272727 

This means, we only care cars have 4 cylinder. So we filter data on that. After filtering, we check how many of them are manual transmission.

You can compare conditional this with joint I mentioned earlier to feel the differences.


0

If A were a superset of B the probability that A happens is always 1 given that B happened, i.e. P(A|B) = 1. However, B itself may have a probability much smaller than 1.

Consider the following example:

  • given x is a natural number in 1..100,
  • A is 'x is an even number'
  • B is 'x is divisible by 10'

we then have:

  • P(A) is 0.5
  • P(B) is 0.1

If we know that x is divisible by 10 (i.e.x is in B) we know that it is also an even number (i.e. x is in A) so P(A|B) = 1.

From Bayes' rule we have:

P(A|B)=P(AB)P(B)

note that in our (special) case P(AB), i.e. the probability that x is both an even number and a number divisible by 10 is equal to the probability that x is a number divisible by 10. Therefore we have P(AB)=P(B) and plugging this back into Bayes' rule we get P(A|B)=P(B)/P(B)=1.


For a non-degenerate example consider e.g. A is 'x is divisible by 7' and B is 'x is divisible by 3'. Then P(A|B) is equivalent to 'given that we know that x is divisible by 3 what is the probability that it is (also) divisible by 7 ?'. Or equivalently 'What fraction of the numbers 3, 6, ..., 99 are divisible by 7' ?


0

I think your initial statement may be a misunderstanding.

You wrote:

The formula for conditional probability of A happening, once B has happened is:

From your phrasing, it may sound as if there are 2 events "First B happened, and then we want to calculate the probability that A will happen".

This is not the case. (The following is valid whether there was a misunderstanding or not).

We have just 1 event, which is described by one of 4 possibilities:

  1. neither A nor B;

  2. just A, not B;

  3. just B, not A;

  4. both A and B.

Putting some example numbers on it, let's say

P(A)=0.5,P(B)=0.5,andA and B are independent.

It follows that

P(A and B)=0.25andP(neither A nor B)=0.25.

Initially (with no knowledge of the event), we knew P(AB)=0.25.

But once we know that B has happened, we are in a different space. P(AB) is half of P(B) so the probability of A given B, P(A|B), is 0.5. It is not 0.25, knowing that B has happened.


0

The conditioning probability is NOT equal to the probability of intersection. Here is an intuitive answer:

1) P(BA): "We know that A happened. What is the probability that B will happen?"

2: P(AB) : "We don't know if A or B did happen. What is the probability that both will happen?

The difference is that in the first one, we have extra information (we know that A occurs first). In the second one we do not know anything.

Starting out with the probability of the second one, we can deduce the probability of the first one.

The event that both A and B will occur can happen in two ways:

1) The probability of A AND the probability of B given that A happened.

2) The probability of B AND the probability of A given that B happened.

It turns out that both situations are equally like to happen. (I cannot myself find out the intuitive reason). Thus we have to weight both scenarios with 0.5

P(AB)=1/2P(A(BA))+1/2P(B(AB))

Now use that A and BA are independent and remember that both scenarios are equally likely to happen.

P(AB)=P(A)P(BA)

Tadaaa... now isolate the probability of the conditioning!

btw. I would love if someone could explain why scenario 1 and 2 are equal. The key lies in there imo.

Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.