Önce Bayesçi mi yoksa sıkça istatistik mi öğretmeli?


32

Şu an lisede olan öğrencilerime, istatistikleri anlamalarına yardım ediyorum ve bazı teorilere aldırış etmeden bazı basit örneklerle başlamayı düşünüyorum.

Amacım, istatistik ve nicel öğrenmeyi daha fazla takip etme konusundaki ilgilerini artırmak için, en baştan istatistiklerini öğrenmek için onlara en sezgisel fakat araçsal olarak yapıcı bir yaklaşım vermek olacaktır.

Başlamadan önce, yine de çok genel sonuçları olan özel bir sorum var:

Bir Bayesian veya sık görüşme çerçevesi kullanarak istatistik öğretmeye başlamalı mıyız?

Etrafı araştırmak, ortak bir yaklaşımın, sıkça istatistiklerle ilgili kısa bir girişle başladığını ve ardından Bayesian istatistiklerinin derinlemesine tartışılmasını (örn. Stangl ) başladığını gördüm .


5
Your question is difficult to answer without more context. What is it you'd like to achieve?
Glen_b -Reinstate Monica

4
It's bad parenting to teach kids Bayesian stats, akin to pouring them vodka or lighting up cigars. Note that both vodka and cigars are fine for adults as long they discover them on their own
Aksakal

2
@Aksakal I was actually planning to teach them Bayes theorem while sipping vodka and puffing havanas... ;-)
Joe_74

2
That's a good way to keep kid away from bayesian stats for a few years. You tell them that is just the Bayes theorem applications. The theorem works fine in frequentist stats
Aksakal

5
Nate Silver's book "The Signal and the Noise" makes a case for teaching young people Bayesian statistics.
Lloyd Christmas

Yanıtlar:


25

Both Bayesian statistics and frequentist statistics are based on probability theory, but I'd say that the former relies more heavily on the theory from the start. On the other hand, surely the concept of a credible interval is more intuitive than that of a confidence interval, once the student has a good understanding of the concept of probability. So, whatever you choose, I advocate first of all strengthening their grasp of probability concepts, with all those examples based on dice, cards, roulette, Monty Hall paradox, etc..

I would choose one approach or the other based on a purely utilitarian approach: are they more likely to study frequentist or Bayesian statistics at school? In my country, they would definitely learn the frequentist framework first (and last: never heard of high school students being taught Bayesian stats, the only chance is either at university or afterwards, by self-study). Maybe in yours it's different. Keep in mind that if they need to deal with NHST (Null Hypothesis Significance Testing), that more naturally arises in the context of frequentist statistics, IMO. Of course you can test hypotheses also in the Bayesian framework, but there are many leading Bayesian statisticians who advocate not using NHST at all, either under the frequentist or the Bayesian framework (for example, Andrew Gelman from Columbia University).

Finally, I don't know about the level of high school students in your country, but in mine it would be really difficult for a student to successfully assimilate (the basics of) probability theory and integral calculus at the same time. So, if you decide to go with Bayesian statistics, I'd really avoid the continuous random variable case, and stick to discrete random variables.


2
I was under impression (from reading his blog) that Andrew Gelman would advocate against frequentist NSHT as much as against Bayesian.
psarka

2
@psarka yes, sure - I never said the contrary.
DeltaIV

3
"In my country, they would definitely learn the frequentist framework first" — This (or omitting discussion of Bayesianism entirely) is the traditional approach worldwide.
Kodiologist

1
@Kodiologist I suspected as much. At least, there may be some educational systems where, after the frequentist framework, also the Bayesian one is introduced in high school. But that's not the case around here.
DeltaIV

22

Bayesian and frequentist ask different questions. Bayesian asks what parameter values are credible, given the observed data. Frequentist asks about the probability of imaginary simulated data if some hypothetical parameter values were true. Frequentist decisions are motivated by controlling errors, Bayesian decisions are motivated by uncertainty in model descriptions.

So which should you teach first? Well, if one or the other of those questions is what you want to ask first, that's your answer. But in terms of approachability and pedagogy, I think that Bayesian is much easier to understand and is far more intuitive. The basic idea of Bayesian analysis is re-allocation of credibility across possibilities, just like Sherlock Holmes famously said, and which millions of readers have intuitively understood. But the basic idea of frequentist analysis is very challenging: The space of all possible sets of data that might have happened if a particular hypothesis were true, and the proportion of those imaginary data sets that have a summary statistic as or more extreme than the summary statistic that was actually observed.

A free introductory chapter about Bayesian ideas is here. An article that sets frequentist and Bayesian concepts side by side is here. The article explains frequentist and Bayesian approaches to hypothesis testing and to estimation (and a lot of other stuff). The framework of the article might be especially useful to beginners trying to get a view of the landscape.


Including titles of the chapter and the article could be helpful in case the links go dead in the future.
Richard Hardy

8

This question risks being opinion-based, so I'll try to be really brief with my opinion, then give you a book suggestion. Sometimes it's worth taking a particular approach because it's the approach that a particularly good book takes.

I would agree that Bayesian statistics are more intuitive. The Confidence Interval versus Credible Interval distinction pretty much sums it up: people naturally think in terms of "what is the chance that..." rather than the Confidence Interval approach. The Confidence Interval approach sounds a lot like it's saying the same thing as the Credible Interval except on general principle you can't take the last step from "95% of the time" to "95% chance", which seems very frequentist but you can't do it. It's not inconsistent, just not intuitive.

Balancing that out is the fact that most college courses they will take will use the less-intuitive frequentist approach.

That said I really like the book Statistical Rethinking: A Bayesian Course with Examples in R and Stan by Richard McElreath. It's not cheap, so please read about it and poke around in it on Amazon before you buy. I find it a particularly intuitive approach that takes advantage of the Bayesian approach, and is very hands-on. (And since R and Stan are excellent tools for Bayesian statistics and they're free, it's practical learning.)

EDIT: A couple of comments have mentioned that the book is probably beyond a High Schooler, even with an experienced tutor. So I'll have to place an even bigger caveat: it has a simple approach at the beginning, but ramps up quickly. It's an amazing book, but you really, really would have to poke through it on Amazon to get a feel for its initial assumptions and how quickly it ramps up. Beautiful analogies, great hands-on work in R, incredible flow and organization, but maybe not useful to you.

It assumes a basic knowledge of programming and R (free statistical package), and some exposure to the basics of probability and statistics. It's not random-access and each chapter builds on prior chapters. It starts out very simple, though the difficulty does ramp up in the middle -- it ends on multi-level regression. So you might want to preview some of it at Amazon, and decide if you can easily cover the basics or if it jumps in a bit too far down the road.

EDIT 2: The bottom line of my contribution here and attempt to turn it from pure opinion is that a good textbook may decide which approach you take. I'd prefer a Bayesian approach, and this book does that well, but perhaps at too fast a pace.


2
McElreath's book is excellent, but I would be really surprised if high school students would be able to follow that level of treatment, even with a talented tutor.
DeltaIV

2
@DeltaIV: Good feedback, I'll edit my answer. I've been fooled several times by how readable and analogical it is at the beginning. It does enter a steep learning curve around halfway through, and probably a lot earlier than that.
Wayne

1
Another note: I think there's a bigger gap between fundamentals (i.e., probability theory) and application in Frequentist methods over Bayesian methods. That is, I have trouble imagining someone really understanding MLE theory, proof of the CLT etc., without a graduate level education, which is required for even the most basic of Frequentist procedures. Once you know conditional probability, you basically understand how Bayesian inference works. MCMC theory is a bit tricky, but honestly much simpler than truly understanding MLE theory...
Cliff AB

... and since it is easier to bridge the gap between probability and application of statistics in the Bayesian framework, at the very least, I think that makes things more mentally satisfying quicker. I hated TA-ing the courses where we had to say "and trust us, MLE theory works with large samples", as I felt that had to somewhat kill someone's scientific curiosity, or require a much larger mental commitment.
Cliff AB

... But the intuition behind mle is natural enough ... stats.stackexchange.com/questions/112451/…
kjetil b halvorsen

5

I have been taught the frequentist approach first, then the Bayesian one. I am not a professional statistician.

I have to admit I didn't find my prior knowledge of the frequentist approach to be decisively useful in understanding the Bayesian approach.

I would dare to say it depends on what concrete applications you will be showing your pupils next, and how much time and effort you will be spending on them.

Having said this, I would start with Bayes.


3

The Bayesian framework is tightly coupled to general critical thinking skills. It's what you need in the following situations:

  1. You think about applying for a competitive job. What are your chances of getting in? What payoff do you expect from applying?
  2. A headline tells you mobile phones cause cancer in humans in the long term. How much evidence do they have for this?
  3. Which charity should you donate money to if you want it to have the greatest effect?
  4. Someone offers to flip a coin with a bet of $0.90 from you and $1.10 from them. Would you give them the money? Why, why not?
  5. You've lost your keys (or an atom bomb). Where do you start looking?

Also, this is much more interesting than memorising the formula for a two sample t-test :p. Which increases the chance that students will stay interested long enough to bother with increasingly technical material.


3

No one has mentioned likelihood, which is foundational to Bayesian statistics. An argument in favor of teaching Bayes first is that the flow from probability, to likelihood, to Bayes, is pretty seamless. Bayes can be motivated from likelihood by noting that (i) the likelihood function looks (and acts) like a probability distribution function, but is not because the area under the curve is not 1.0, and (ii) the crude, commonly-used Wald intervals assume a likelihood function that is proportional to a normal distribution, but Bayesian methods easily overcome this limitation.

Another argument favoring Bayes first is that the P(A|B) versus P(B|A) concern about p-values can be more easily explained, as mentioned by others.

Yet another argument favoring "Bayes first" is that it forces students to think more carefully about conditional probability models, which is useful elsewhere, e.g., in regression analysis.

Sorry for the self-promotion, but since it is entirely on-topic, I do not mind stating that this is precisely the approach that Keven Henning and I took in our book "Understanding Advanced Statistical Methods," (https://peterwestfall.wixsite.com/book-1) whose intended audience is non-statisticians.


2

Are you teaching for fun and insight or for practical use? If it's about teaching and understanding, I'd go Bayes. If for practical purposes, I'd definitely go Frequentist.

In many fields -and I suppose most fields- of natural sciences, people are used to publish their papers with a p-value. Your "boys" will have to read other people's papers before they come to writing their own. To read other people's papers, at least in my field, they need to understand null hypotheses an p-values, no matter how stupid they may appear after Bayesian studies. And even when they are ready to publish their first paper, they will probably have some senior scientist leading the team and chances are, they prefer Frequentism.

That being said, I would like to concur with @Wayne , in that Statistical rethinking shows a very clear way towards Bayesian statistics as a first approach and not based on existing knowledge about Frequentism. It is great how this book does not try to convince you in a fight of the better or worse statistics. The stated argument of the author for Bayes is (IIRC) that he has been teaching both kinds and Bayes was easier to teach.


2

I would stay away from Bayesian, follow the giants.

Soviets had an excellent book series for secondary school students, roughly translated into English as "'Quant' little library." Kolmogorov contributed a book with co-authors titled "Introduction to a probability theory." I'm not sure it has ever been translated into English, but here's the link to its Russian original.

They approach explaining the probabilities through combinatorics, which I think is the great way to start. The book is very accessible for a high school student with decent maths. Note, that Soviets taught math rather extensively, so the average Western high school students may not be as well prepared, but with enough interest and will power can still handle the content, in my opinion.

The content is very interesting for students, it has random walks, limiting distributions, survival processes, law of large numbers etc. If you combine this approach with computer simulations, it becomes even more fun.

Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.