Ayrık bir homojen dağılımdan değiştirilmeden çizilen numuneler arasındaki maksimum boşluk

Bu sorun laboratuvarımın robotik kapsamdaki araştırmalarıyla ilgilidir:

Değiştirmeden kümesinden $n$ sayıları rastgele çizin ve sayıları artan sırada sıralayın. . $\{1,2,\ldots,m\}$ $1\le n\le m$

Bu sıralı sayı listesinden, $\{a_{(1)},a_{(2)},…,a_{(n)}\}$ ardışık sayılar ve sınırlar arasındaki farkı oluşturun: $g = \{a_{(1)},a_{(2)}−a_{(1)},\ldots,a_{(n)}−a_{(n-1)},m+1-a_{(n)}\}$ . Bu $n+1$ boşlukverir.

Maksimum boşluğun dağılımı nedir?

$P(\max(g) = k) = P(k;m,n) = ?$

Bu, sipariş istatistikleri kullanılarak çerçevelenebilir : $P(g_{(n+1)} = k) = P(k;m,n) = ?$

Boşlukların dağılımı için bağlantıya bakın , ancak bu soru maksimum boşluğun dağılımını soruyor .

Ortalama değerden memnun olurdum, $\mathbb{E}[g_{(n+1)}]$ .

Eğer $n=m$ Tüm boşluklar boyut 1. ise $n+1 = m$ büyüklükte bir boşluk vardır $2$ ve $n+1$ muhtemel konumların. Maksimum boşluk boyutu $m-n+1$ ve bu boşluk toplam olası konum için $n$ sayısından herhangi birine önce veya sonra yerleştirilebilir. En küçük maksimum boşluk boyutu $n+1$ $\lceil\frac{m-n}{n+1}\rceil$ . Verilen herhangi bir kombinasyonun olasılığını tanımlayın. $T= {m \choose n}^{-1}$

Olasılık kütle fonksiyonunu kısmen $P(g_{(n+1)} = k) = P(k;m,n) = \begin{cases} 0 & k < \lceil\frac{m-n}{n+1}\rceil\\ 1 & k = \frac{m-n}{n+1} \\ 1 & k = 1 \text{ (occurs when $m=n$)} \\ T(n+1)& k = 2 \text{ (occurs when $m=n+1$)} \\ T(n+1)& k = \frac{m-(n-1)}{n} \\ ? & \frac{m-(n-1)}{n} \le k \le m-n+1 \\ T(n+1)& k = m-n+1\\ 0 & k > m-n+1 \end{cases} \tag{1}$

Current work (1): The equation for the first gap, $a_{(1)}$ is straightforward:

P (a (1) = k) = P (k; m, n) = 1 ( m n ) \sum k = 1 m - n + 1 (m - k - 1 n - 1)

$P(a_{(1)} = k) = P(k;m,n) = \frac{1}{{m \choose n}} \sum_{k=1}^{m-n+1} {m-k-1 \choose n-1}$ The expected value has a simple value:

E[P(a(1))]=1(mn)∑m−n+1k=1(m−k−1n−1)k=m−n1+n $\mathbb{E}[P(a_{(1)})] = \frac{1}{ {m \choose n}} \sum_{k=1}^{m-n+1} {m-k-1 \choose n-1} k = \frac{m-n}{1+n}$ . By symmetry, I expect all

n $n$ gaps to have this distribution. Perhaps the solution could be found by drawing from this distribution

n $n$ times.

Current work (2): it is easy to run Monte Carlo simulations.

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]];
m = 1000; n = 1; trials = 100000;
SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis,
Frame -> {True, True, False, False},
FrameLabel -> {"k (Max gap)", "Probability"},
PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]][![enter image description here][1]][1]

— AaronBecker
kaynak

With these conditions you must have n<=m. I think you want g={a_(1), a_(2)-a_(1),..., a_(n)-a_(n-1)}. Does randomly select mean selecting each number with probability 1/m on the first draw? Since you do not replace the probability would be 1/(m-1) on the second and so on down to 1 on the mth draw if n=m. If n<m this would stop earlier with the last draw having probability 1/(m-(n-1)) on the nth draw.

— Michael R. Chernick

Your original description of

g $g$ made no sense, because (I believe) you transposed two of the subscripts. Please verify that my edit conforms with your intention: in particular, please confirm that you mean for there to be

n $n$ gaps, of which

a(1) $a_{(1)}$ is the first.

— whuber

@gung I think this is research, rather than self-study

— Glen_b -Reinstate Monica

I think your minimum and maximum gap sizes should be

1 $1$ and

m−n+1 $m-n+1$ . The minimum gap size is when consecutive integers are chosen, and the maximum gap size occurs when you select

m $m$ and

n−1 $n-1$ first integers

1,…,n−1 $1,\dots,n-1$ (or

1 $1$ and

m−n+2,…,m $m-n+2,\dots,m$ )

— probabilityislogic

Thank you Michael Chernick and probabilityislogic, your corrections have been made. Thank you @whuber for making the correction!

— AaronBecker

Let $f(g;n,m)$ be the chance that the minimum, $a_{(1)}$ , equals $g$ ; that is, the sample consists of $g$ and an $n-1$ -subset of $\{g+1,g+2,\ldots,m\}$ . There are $\binom{m-g}{n-1}$ such subsets out of the $\binom{m}{n}$ equally likely subsets, whence

Pr (a (1) = g = f (g; n, m) = ( m - g n - 1 ) ( m n ) .

$\Pr(a_{(1)}=g = f(g;n,m) = \frac{\binom{m-g}{n-1}}{\binom{m}{n}}.$

Adding $f(k;n,m)$ for all possible values of $k$ greater than $g$ yields the survival function

Pr (a (1) > g) = Q (g; n, m) = ( m - g ) ( m - g - 1 n - 1 ) n ( m n ) .

$\Pr(a_{(1)} \gt g) = Q(g;n,m)= \frac{(m-g)\binom{m-g-1}{n-1}}{n \binom{m}{n}}.$

Let $G_{n,m}$ be the random variable given by the largest gap:

G n, m = max (a (1), a (2) - a (1), \dots, a (n) - a (n - 1)) .

$G_{n,m} = \max\left(a_{(1)}, a_{(2)}-a_{(1)}, \ldots, a_{(n)}-a_{(n-1)}\right).$

(This responds to the question as originally framed, before it was modified to include a gap between $a_{(n)}$ and $m$ .) We will compute its survival function

P (g; n, m) = Pr (G n, m > g),

$P(g;n,m)=\Pr(G_{n,m}\gt g),$ from which the entire distribution of

Gn,m $G_{n,m}$ is readily derived. The method is a dynamic program beginning with

n=1 $n=1$ , for which it is obvious that

P (g; 1, m) = Pr (G 1, m > 1) = m - g m, g = 0, 1, \dots, m . (1)

$P(g;1,m) = \Pr(G_{1,m} \gt 1) = \frac{m-g}{m},\ g=0, 1, \ldots, m.\tag{1}$

For larger $n\gt 1$ , note that the event $G_{n,m}\gt g$ is the disjoint union of the event

a 1 > g,

$a_{1} \gt g,$

for which the very first gap exceeds $g$ , and the $g$ separate events

a 1 = k and G n - 1, m - k > g, k = 1, 2, \dots, g

$a_{1}=k\text{ and } G_{n-1,m-k} \gt g, \ k=1, 2, \ldots, g$

for which the first gap equals $k$ and a gap greater than $g$ occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

P (g; n, m) = Q (g; n, m) + \sum k = 1 g f (k; n, m) P (g; n - 1, m - k) . (2)

$P(g;n,m) = Q(g;n,m) + \sum_{k=1}^g f(k;n,m) P(g;n-1,m-k).\tag{2}$

Fixing $g$ and laying out a two-way array indexed by $i=1,2,\ldots,n$ and $j=1,2,\ldots,m$ , we may compute $P(g;n,m)$ by using $(1)$ to fill in its first row and $(2)$ to fill in each successive row using $O(gm)$ operations per row. Consequently the table can be completed in $O(gmn)$ operations and all tables for $g=1$ through $g=m-n+1$ can be constructed in $O(m^3n)$ operations.

These graphs show the survival function $g\to P(g;n,64)$ for $n=1,2,4,8,16,32,64$ . As $n$ increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for $P(g;n,m)$ can be obtained in many special cases, especially for large $n$ , but I have not been able to obtain a closed formula that applies to all $g,n,m$ . Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of $G_{n,m}$ is obtained by summing its survival function starting at $g=0$ :

E (G n, m) = \sum g = 0 m - n + 1 P (g; n, m) .

$\mathbb{E}(G_{n,m}) = \sum_{g=0}^{m-n+1} P(g;n,m).$

This contour plot of the expectation shows contours at $2, 4, 6, \ldots, 32$ , graduating from dark to light.

— whuber
kaynak

Suggestion: line "Let

Gn,m $G_{n,m}$ be the random variable given by the largest gap:", please add the last gap of

m+1−an $m+1-a_{n}$ . Your expectation plot matches my Monte Carlo simulation.

— AaronBecker