Yalnızca özet istatistikler mevcut olduğunda tahmin nasıl yapılır?

Bu kısmen aşağıdakiler tarafından motive edilmektedir soru ve onu takip eden tartışma .

Iid örneğinin gözlendiğini varsayalım, $X_i\sim F(x,\theta)$ . Amaç değerini tahmin etmektir $\theta$ . Ancak orijinal örnek mevcut değildir. Ne yerine sahip numunenin bazı istatistikler $T_1,...,T_k$ . Diyelim ki $k$ sabittir. nasıl tahmin edilir $\theta$ ? Bu durumda maksimum olabilirlik tahmincisi ne olur?

estimation maximum-likelihood

— mpiktas
kaynak

Eğer

T_{i} = f (X_{i})

$T_i=f(X_i)$ bilinen bir fonksiyonu

f

$f$ o zaman dağıtımını yazabiliriz

T_{i}

$T_i$ ve maksimum olabilirlik tahmin bilinen şekilde elde edilir. Ama ne precised değil

T_{i}

$T_i$ ?

— Stéphane Laurent

Zaman durumda ilgi am

T_{i} = f (X_{1}, . . ., X_{n})

$T_i=f(X_1,...,X_n)$ bilinen için

f

$f$ . Bu Bunu söylediğimde demek istediğim oldu

T_{i}

$T_i$ örnek istatistikler bulunmaktadır.

— mpiktas

Peki

T_{i}

$T_i$ ve

arasındaki fark

T_{j}

$T_j$ nedir?

— Stéphane Laurent

Üzgünüm, bu

olmalıydı , bir

değil . Biz çeşitli işlevleri vardır

bir argüman tüm örneklem olarak almak.

f_{i}

$f_i$

f

$f$

f_{i}

$f_i$

— mpiktas

Maksimum entropinin tasarlandığı bu değil mi?

— olasılık

Yanıtlar:

Bu durumda, aşağıdaki varsayım / kısıtlama altında olasılığın (ve sonuç olarak MLE'nin ) ABC yaklaşımını düşünebilirsiniz :

Varsayım. Orijinal örnek boyutu $n$ bilinmektedir.

Yakınsama açısından, sık tahmin edicilerin kalitesinin örnek büyüklüğüne bağlı olduğu düşünüldüğünde, orijinal örnek boyutunu bilmeden keyfi olarak iyi tahmin ediciler elde edilemez.

Fikri arka dağılımından bir örneğini oluşturmak için $\theta$ ve MLE bir yaklaşım oluşturmak amacıyla , aşağıdaki gibi yeni teknik örnekleme önem kullanabilir [1] ya da eşit bir önceden dikkate $\theta$ uygun üzerinde destekli [2] 'de olduğu gibi ayarlanır .

[2] 'deki yöntemi tarif edeceğim. Öncelikle ABC örnekleyicisini tanımlayayım.

ABC Örnekleyici

Let $f(\cdot\vert\theta)$ örnek oluşturur model $\theta \in \Theta$ (tahmin edilmesi), bir parametredir $T$ (numunenin bir fonksiyonu) bir istatistik ve $T_0$ ABC jargon içinde gözlenen istatistik olarak buna özet istatistik denir , $\rho$ bir metrik, $\pi(\theta)$ $\theta$ ve $\epsilon>0$ bir tolerans üzerinde bir önceki dağılım . Daha sonra, ABC ret örnekleyici aşağıdaki gibi uygulanabilir.

den $\theta^*$ örneği . $\pi(\cdot)$
modelinden boyutunda bir $\bf{x}$ örneği oluşturun . $n$ $f(\cdot\vert\theta^*)$
$T^*=T({\bf x})$ değerini hesaplayın .
Eğer $\rho(T^*,T_0)<\epsilon$ , kabul $\theta^*$ posteriorundan bir simülasyon olarak $\theta$ .

Bu algoritma, posterior dağılımından yaklaşık örnek oluşturur $\theta$ verilen $T({\bf x})=T_0$ . Bu nedenle, en iyi senaryo $T$ istatistiğinin yeterli olduğu, ancak diğer istatistiklerin kullanılabildiği durumdur. Bu daha ayrıntılı açıklaması için bkz bu kağıdı .

Şimdi, genel bir çerçevede, biri desteğinde MLE'yi içeren bir üniforma kullanırsa, Maksimum posteriori (MAP) Maksimum Olabilirlik Tahmincisi (MLE) ile çakışır. Bu nedenle, ABC Örnekleyicisi'nde daha önce uygun bir üniforma düşünürseniz, MAP'si MLE ile çakışan bir posterior dağılımın yaklaşık bir örneğini oluşturabilirsiniz. Kalan adım bu modu tahmin etmekten oluşur. Bu problem özgeçmişte, örneğin "Çok değişkenli modun hesaplamalı olarak etkili tahmini" bölümünde tartışılmıştır. .

Oyuncak örneği

Let $(x_1,...,x_n)$ bir alınan bir numune olabilir $N(\mu,1)$ ve bu örnek mevcut olan tek bilgi olduğunu varsayalım $\bar{x}=\dfrac{1}{n}\sum_{j=1}^n x_j$ . $\rho$ , ${\mathbb R}$ cinsinden Öklid metriğiolsunve $\epsilon=0.001$ . Aşağıdaki R kodu, yukarıda açıklanan yöntemleri kullanarak $n=100$ ve $\mu=0$ olan simüle edilmişbir örnek, $1000$ büyüklüğünde posterior dağılımın bir örneği, $\mu$ içinönceden eşitolan $(-0.3,0.3)$ nasıl yaklaşık MLE elde edileceğini gösterir.ve posterior numunenin modunun tahmini için bir çekirdek yoğunluk tahmincisi (MAP = MLE).

# rm(list=ls())

# Simulated data
set.seed(1)
x = rnorm(100)

# Observed statistic
T0 = mean(x)

# ABC Sampler using a uniform prior 

N=1000
eps = 0.001
ABCsamp = rep(0,N)
i=1

while(i < N+1){
  u = runif(1,-0.3,0.3)
  t.samp = rnorm(100,u,1)
  Ts = mean(t.samp)
  if(abs(Ts-T0)<eps){
    ABCsamp[i]=u
    i=i+1
    print(i)
  }
}

# Approximation of the MLE
kd = density(ABCsamp)
kd$x[which(kd$y==max(kd$y))]

Gördüğünüz gibi, küçük bir tolerans kullanarak MLE'ye çok iyi yaklaşıyoruz (bu önemsiz örnekte yeterli olduğu göz önüne alındığında istatistikten hesaplanabilir). Özet istatistiğin seçiminin çok önemli olduğunu fark etmek önemlidir. Nicelikler özet istatistik için genellikle iyi bir seçimdir, ancak tüm seçenekler iyi bir yaklaşım üretmez. Özet istatistiğin çok bilgilendirici olmaması ve yaklaşık olarak ABC kalitesinde iyi bilinen yaklaşımın kalitesi düşük olabilir.

Güncelleme: Benzer bir yaklaşım Fan ve ark. (2012) . Makaleyle ilgili tartışma için bu girişe bakın .

— whuber
kaynak

(+1) For stating the correct result about the relationship between MLE and MAP and for the warning in the last paragraph (among other reasons). To make that warning more explicit, this (or any!) approach will fail miserably if the statistics at hand are ancillary or nearly so. One can consider your toy example and

T = \sum_{i} (X_{i} - \bar{X})^{2}

$T = \sum_i (X_i - \bar X)^2$ , for example.

— cardinal

+1 @procrastinator I was going to simple say yes you can use the sufficient statistics if they are available for your model. But your extensive answers seems to have covered that.

— Michael R. Chernick

One simple question, you mention that uniform prior must contain MLE in its support. But MLE is a random variable which is only stochastically bounded, i.e. it can be outside of any bounded set with positive probability.

— mpiktas

@mpiktas For a specific sample, you have to choose the appropriate support of the uniform prior. This may change if you change the sample. It is important to note that this is not a Bayesian procedure, we are just using it as a numerical method, therefore there is no problem on playing with the choice of the prior. The smaller the support of the prior, the better. This would increase the speed of the ABC sampler but when your information is vague in the sense that you do not have a reliable clue on where the MLE is located, then you might need a larger support (and will pay the price).

@mpiktas In the toy example, you can use, for instance, a uniform prior with support on

(- 1000000, 1000000)

$(-1000000,1000000)$ or a uniform prior with support on

(0.1, 0.15)

$(0.1,0.15)$ obtaining the same results but with extremely different acceptance rates. The choice of this support is ad hoc and it is impossible to come up with a general-purpose prior given that the MLE is not stochastically bounded, as you mention. This choice can be considered as a lever of the method that has to be adjusted in each particular case.

It all depends on whether or not the joint distribution of those $T_i$ 's is known. If it is, e.g.,

(T_{1}, \dots, T_{k}) \sim g (t_{1}, \dots, t_{k} | θ, n)

$(T_1,\ldots,T_k)\sim g(t_1,\ldots,t_k|\theta,n)$ then you can conduct maximum likelihood estimation based on this joint distribution. Note that, unless

(T_{1}, \dots, T_{k})

$(T_1,\ldots,T_k)$ is sufficient, this will almost always be a different maximum likelihood than when using the raw data

(X_{1}, \dots, X_{n})

$(X_1,\ldots,X_n)$ . It will necessarily be less efficient, with a larger asymptotic variance.

If the above joint distribution with density $g$ is not available, the solution proposed by Procrastinator is quite appropriate.

— Xi'an
kaynak

The (frequentist) maximum likelihood estimator is as follows:

For $F$ in the exponential family, and if your statistics are sufficient your likelihood to be maximised can always be written in the form:

l (θ | T) = \exp (- ψ (θ) + ⟨ T, ϕ (θ) ⟩),

$l(\theta| T) = \exp\left( -\psi(\theta) + \langle T,\phi(\theta) \rangle \right),$ where

⟨ \cdot, \cdot ⟩

$\langle \cdot, \cdot\rangle$ is the scalar product,

T

$T$ is the vector of suff. stats. and

ψ (\cdot)

$\psi(\cdot)$ and

ϕ (\cdot)

$\phi(\cdot)$ are continuous twice-differentiable.

The way you actually maximize the likelihood depends mostly on the possiblity to write the likelihood analytically in a tractable way. If this is possible you will be able to consider general optimisation algorithms (newton-raphson, simplex...). If you do not have a tractable likelihood, you may find it easier to compute a conditional expection as in the EM algorithm, which will also yield maximum likelihood estimates under rather affordable hypotheses.

Best

— julien stirnemann
kaynak

For problems I am interested in, analytical tractability is not possible.

— mpiktas

The reason for non-tractability then conditions the optimization scheme. However, extensions of the EM usually allow to get arround most of these reasons. I don"t think I can be more specific in my suggestions without seeing the model itself

— julien stirnemann