belirli bir MLE ile rastgele örnekleri simüle etme

Bu Çapraz Onaylı soru , sabit bir meblağa sahip olmak için şartlı bir örnek taklit etme sorusunu bana George Casella'nın belirlediği bir sorunu hatırlattı .

Parametrik bir model göz önüne alındığında, $f(x|\theta)$ , ve bu model bir iid örnek $(X_1,\ldots,X_n)$ , maksimum olabilirlik tahmininin $\theta$ verilir Belirli bir değeri için
$\hat{θ} (x_{1}, \dots, x_{n}) = \arg min \sum_{i = 1}^{n} \log f (x_{i} | θ)$ $\hat{\theta}(x_1,\ldots,x_n)=\arg\min \sum_{i=1}^n \log f(x_i|\theta)$ $\theta$ , bir iid örneğini simüle etmenin genel bir yolu var mı $(X_1,\ldots,X_n)$ MLE değerine koşullu ? $\hat{\theta}(X_1,\ldots,X_n)$

Örneğin, yoğunluk ( olan konum parametresi ile bir $\mathfrak{T}_5$ dağılımı alın $\mu$ isebiz simüle nasılşartına

f (x | μ) = \frac{Γ (3)}{Γ (1 / 2) Γ (5 / 2)} {[1 + (x - μ)^{2} / 5]}^{- 3}

$f(x|\mu)=\dfrac{\Gamma(3)}{\Gamma(1/2)\Gamma(5/2)}\,\left[1+(x-\mu)^2/5\right]^{-3}$

(X_{1}, \dots, X_{n}) \overset{iid}{\sim} f (x | μ)

$(X_1,\ldots,X_n)\stackrel{\text{iid}}{\sim} f(x|\mu)$

(X_{1}, \dots, X_{n})

$(X_1,\ldots,X_n)$

? Bunda

\hat{μ} (X_{1}, \dots, X_{n}) = μ_{0}

$\hat{\mu}(X_1,\ldots,X_n)=\mu_0$

T_{5}

$\mathfrak{T}_5$ örnek dağılımı

kapalı bir ifade yoktur.

\hat{μ} (X_{1}, \dots, X_{n})

$\hat{\mu}(X_1,\ldots,X_n)$

— Xi'an
kaynak

Bir seçenek, Brubaker ve arkadaşları (1) tarafından Kapalı Olarak Tanımlanmış Manifoldlar Üzerine MCMC Yöntemleri Ailesinde tarif edildiği gibi kısıtlanmış bir HMC varyantı kullanmak olacaktır . Bu, location parametresinin maksimum olabilirlik kestiriminin bazı örtülü olarak tanımlanmış (ve farklılaştırılabilir) holonomik kısıtlama olduğu gibi bazı sabit eşit olduğu koşulunu ifade edebilmemizi gerektirir . Daha sonra bu kısıtlamaya bağlı kısıtlanmış bir Hamilton dinamikini simüle edebilir ve standart HMC'de olduğu gibi bir Metropolis-Hastings adımında kabul edebilir / reddedebiliriz. $\mu_0$ $c\left(\lbrace x_i \rbrace_{i=1}^N\right) = 0$

Negatif log olabilirliği konum parametresine göre birinci ve ikinci derece kısmi türevleri olan

L = - \sum_{i = 1}^{N} [\log f (x_{i} | μ)] = 3 \sum_{i = 1}^{N} [\log (1 + \frac{(x_{i} - μ)^{2}}{5})] + constant

$\mathcal{L} = -\sum_{i=1}^N \left[ \log f(x_i \,|\, \mu) \right] = 3 \sum_{i=1}^N \left[ \log\left(1 + \frac{(x_i - \mu)^2}{5}\right)\right] + \text{constant}$

μ

$\mu$

Maksimum olasılık olasılığı

, dolaylı olarak

için bir çözüm olarak tanımlanır

\frac{\partial L}{\partial μ} = 3 \sum_{i = 1}^{N} [\frac{2 (μ - x_{i})}{5 + (μ - x_{i})^{2}}] and \frac{\partial^{2} L}{\partial μ^{2}} = 6 \sum_{i = 1}^{N} [\frac{5 - (μ - x_{i})^{2}}{{(5 + (μ - x_{i})^{2})}^{2}}] .

$\frac{\partial \mathcal{L}}{\partial \mu} = 3 \sum_{i=1}^N \left[ \frac{2(\mu - x_i)}{5 + (\mu - x_i)^2}\right] \quad\text{and}\quad \frac{\partial^2 \mathcal{L}}{\partial \mu^2} = 6 \sum_{i=1}^N \left[\frac{5 - (\mu - x_i)^2}{\left(5 + (\mu - x_i)^2\right)^2}\right].$

μ_{0}

$\mu_0$

c = \sum_{i = 1}^{N} [\frac{2 (μ_{0} - x_{i})}{5 + (μ_{0} - x_{i})^{2}}] = 0 subject to \sum_{i = 1}^{N} [\frac{5 - (μ_{0} - x_{i})^{2}}{{(5 + (μ_{0} - x_{i})^{2})}^{2}}] > 0.

$c = \sum_{i=1}^N \left[ \frac{2(\mu_0 - x_i)}{5 + (\mu_0 - x_i)^2}\right] = 0 \quad\text{subject to}\quad \sum_{i=1}^N \left[\frac{5 - (\mu_0 - x_i)^2}{\left(5 + (\mu_0 - x_i)^2\right)^2}\right] > 0.$

için benzersiz bir MLE olacağını gösteren herhangi bir sonuç olup olmadığından emin değilim $\mu$ $\lbrace x_i \rbrace_{i=1}^N$ - the density is not log-concave in $\mu$ so it doesn't seem trivial to guarantee this. If there is a single unique solution the above implicitly defines a connected $N - 1$ dimensional manifold embedded in $\mathbb{R}^N$ corresponding to the set of $\lbrace x_i \rbrace_{i=1}^N$ with MLE for $\mu$ equal to $\mu_0$ . If there are multiple solutions then the manifold may consist of multiple non-connected components some of which may correspond to minima in the likelihood function. In this case we would need to have some additional mechanism for moving between the non-connected components (as the simulated dynamic will generally remain confined to a single component) and check the second-order condition and reject a move if it corresponds to moving to a minima in the likelihood.

If we use $\boldsymbol{x}$ to denote the vector $\left[ x_1 \dots x_N\right]^{\rm T}$ and introduce a conjugate momentum state $\boldsymbol{p}$ with mass matrix $\mathbf{M}$ and a Lagrange multiplier $\lambda$ for the scalar constraint $c(\boldsymbol{x})$ then the solution to system of ODEs

\frac{d x}{d t} = M^{- 1} p, \frac{d p}{d t} = - \frac{\partial L}{\partial x} - λ \frac{\partial c}{\partial x} subject to c (x) = 0 and \frac{\partial c}{\partial x} M^{- 1} p = 0

$\frac{{\rm d}\boldsymbol{x}}{{\rm d}t} = \mathbf{M}^{-1}\boldsymbol{p}, \quad \frac{{\rm d}\boldsymbol{p}}{{\rm d}t} = -\frac{\partial \mathcal{L}}{\partial \mathbf{x}} - \lambda \frac{\partial c}{\partial \boldsymbol{x}} \quad\text{subject to}\quad c(\boldsymbol{x}) = 0 \quad\text{and}\quad \frac{\partial c}{\partial \boldsymbol{x}}\mathbf{M}^{-1}\boldsymbol{p} = 0$ given initial condition

x (0) = x_{0}, p (0) = p_{0}

$\boldsymbol{x}(0) = \boldsymbol{x}_0,~\boldsymbol{p}(0) = \boldsymbol{p}_0$ with

c (x_{0}) = 0

$c(\boldsymbol{x}_0) = 0$ and

{\frac{\partial c}{\partial x} |}_{x_{0}} M^{- 1} p_{0} = 0

$\left.\frac{\partial c}{\partial \boldsymbol{x}}\right|_{\boldsymbol{x}_0}\,\mathbf{M}^{-1}\boldsymbol{p}_0 = 0$ , defines a constrained Hamiltonian dynamic which remains confined to the constraint manifold, is time reversible and exactly conserves the Hamiltonian and the manifold volume element. If we use a symplectic integrator for constrained Hamiltonian systems such as SHAKE (2) or RATTLE (3), which exactly maintain the constraint at each timestep by solving for the Lagrange multiplier, we can simulate the exact dynamic forward

L

$L$ discrete timesteps

δ t

$\delta t$ from some initial constraint satisfying

x, p

$\boldsymbol{x},\,\boldsymbol{p}$ and accept the proposed new state pair

x^{'}, p^{'}

$\boldsymbol{x}',\,\boldsymbol{p}'$ with probability

min {1, \exp [L (x) - L (x^{'}) + \frac{1}{2} p^{T} M^{- 1} p - \frac{1}{2} p^{' T} M^{- 1} p^{'}]} .

$\min\left\lbrace 1, \,\exp\left[ \mathcal{L}(\boldsymbol{x}) - \mathcal{L}(\boldsymbol{x}') + \frac{1}{2}\boldsymbol{p}^{\rm T}\mathbf{M}^{-1}\boldsymbol{p} - \frac{1}{2}\boldsymbol{p}'^{\rm T}\mathbf{M}^{-1}\boldsymbol{p}'\right] \right\rbrace.$ If we interleave these dynamics updates with partial / full resampling of the momenta from their Gaussian marginal (restricted to the linear subspace defined by

\frac{\partial c}{\partial x} M^{- 1} p = 0

$\frac{\partial c}{\partial \boldsymbol{x}}\mathbf{M}^{-1}\boldsymbol{p} = 0$ ) then modulo the possiblity of there being multiple non-connected constraint manifold components, the overall MCMC dynamic should be ergodic and the configuration state samples

x

$\boldsymbol{x}$ will coverge in distribution to the target density restricted to the constraint manifold.

To see how constrained HMC performed for the case here I ran the geodesic integrator based constrained HMC implementation described in (4) and available on Github here (full disclosure: I am an author of (4) and owner of the Github repository), which uses a variation of the 'geodesic-BAOAB' integrator scheme proposed in (5) without the stochastic Ornstein-Uhlenbeck step. In my experience this geodesic integration scheme is generally a bit easier to tune than the RATTLE scheme used in (1) due the extra flexibility of using multiple smaller inner steps for the geodesic motion on the constraint manifold. An IPython notebook generating the results is available here.

I used $N=3$ , $\mu=1$ and $\mu_0=2$ . An initial $\boldsymbol{x}$ corresponding to a MLE of $\mu_0$ was found by Newton's method (with the second order derivative checked to ensure a maxima of the likelihood was found). I ran a constrained dynamic with $\delta t = 0.5$ , $L=5$ interleaved with full momentum refreshals for 1000 updates. The plot below shows the resulting traces on the three $\boldsymbol{x}$ components

Trace plots for 3D example

and the corresponding values of the first and second order derivatives of the negative log-likelihood are shown below

Log-likelihood derivative trace plots

from which it can be seen that we are at a maximum of the log-likelihood for all sampled $\boldsymbol{x}$ . Although it is not readily apparent from the individual trace plots, the sampled $\boldsymbol{x}$ lie on a 2D non-linear manifold embedded in $\mathbb{R}^3$ - the animation below shows the samples in 3D

3D visualisation of samples confined to 2D manifold

Depending on the interpretation of the constraint it may also be necessary to adjust the target density by some Jacobian factor as described in (4). In particular if we want results consistent with the $\epsilon \to 0$ limit of using an ABC like approach to approximately maintain the constraint by proposing unconstrained moves in $\mathbb{R}^N$ and accepting if $|c(\boldsymbol{x})| < \epsilon$ , then we need to multiply the target density by $\sqrt{\frac{\partial c}{\partial \boldsymbol{x}}^{\rm \scriptscriptstyle T}\frac{\partial c}{\partial \boldsymbol{x}}}$ . In the above example I did not include this adjustment so the samples are from the original target density restricted to the constraint manifold.

References

M. A. Brubaker, M. Salzmann, and R. Urtasun. A family of MCMC methods on implicitly defined manifolds. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 2012.
http://www.cs.toronto.edu/~mbrubake/projects/AISTATS12.pdf
J.-P. Ryckaert, G. Ciccotti, and H. J. Berendsen. Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. Journal of Computational Physics, 1977.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.399.6868
H. C. Andersen. RATTLE: A "velocity" version of the SHAKE algorithm for molecular dynamics calculations. Journal of Computational Physics, 1983.
http://www.sciencedirect.com/science/article/pii/0021999183900141
M. M. Graham and A. J. Storkey. Asymptotically exact inference in likelihood-free models. arXiv pre-print arXiv:1605.07826v3, 2016.
https://arxiv.org/abs/1605.07826
B. Leimkuhler and C. Matthews. Efficient molecular dynamics using geodesic integration and solvent–solute splitting. Proc. R. Soc. A. Vol. 472. No. 2189. The Royal Society, 2016.
http://rspa.royalsocietypublishing.org/content/472/2189/20160138.abstract

— Matt Graham
kaynak

Brilliant and opening new and bright perspectives! Thank you.

— Xi'an