Ayrık bir homojen dağılımdan değiştirilmeden çizilen numuneler arasındaki maksimum boşluk


16

Bu sorun laboratuvarımın robotik kapsamdaki araştırmalarıyla ilgilidir:

Değiştirmeden { 1 , 2 , , m } kümesinden nn sayıları rastgele çizin ve sayıları artan sırada sıralayın. 1 n m .{1,2,,m}1nm

Bu sıralı { a ( 1 ) , a ( 2 ) , , a ( n ) } sayı listesinden, {a(1),a(2),,a(n)}ardışık sayılar ve sınırlar arasındaki farkı oluşturun: g = { a ( 1 ) , a ( 2 ) - a ( 1 ) , , a ( n ) - a ( n - 1 ) , m+ 1 - a ( n ) }g={a(1),a(2)a(1),,a(n)a(n1),m+1a(n)} . Bu n + 1n+1 boşlukverir.

Maksimum boşluğun dağılımı nedir?

P ( maks ( g ) = k ) = P ( k ; m , n ) = ?P(max(g)=k)=P(k;m,n)=?

Bu, sipariş istatistikleri kullanılarak çerçevelenebilir : P ( g ( n + 1 ) = k ) = P ( k ; m , n ) = ?P(g(n+1)=k)=P(k;m,n)=?

Boşlukların dağılımı için bağlantıya bakın , ancak bu soru maksimum boşluğun dağılımını soruyor .

Ortalama değerden memnun olurdum, E [ g ( n + 1 ) ]E[g(n+1)] .

Eğer , n = mn=m Tüm boşluklar boyut 1. ise n + 1 = mn+1=m büyüklükte bir boşluk vardır 22 ve n + 1n+1 muhtemel konumların. Maksimum boşluk boyutu m - n + 1'dirmn+1 ve bu boşluk toplam n + 1 olası konum için nn sayısından herhangi birine önce veya sonra yerleştirilebilir. En küçük maksimum boşluk boyutu m - nn+1n + 1mnn+1. Verilen herhangi bir kombinasyonun olasılığını tanımlayınT= ( mn ) -1. T=(mn)1

Olasılık kütle fonksiyonunu kısmen P ( g ( n + 1 ) = k ) = P ( k ; m , n ) = { 0 k < m - n olarak çözdümn + 11k=m-nn+11k=1 (occurs when m=n)T(n+1)k=2 (occurs when m=n+1)T(n+1)k=m(n1)n?m(n1)nkmn+1T(n+1)k=mn+10k>mn+1P(g(n+1)=k)=P(k;m,n)=011T(n+1)T(n+1)?T(n+1)0k<mnn+1k=mnn+1k=1 (occurs when m=n)k=2 (occurs when m=n+1)k=m(n1)nm(n1)nkmn+1k=mn+1k>mn+1(1)

Current work (1): The equation for the first gap, a(1)a(1) is straightforward: P(a(1)=k)=P(k;m,n)=1(mn)mn+1k=1(mk1n1)

P(a(1)=k)=P(k;m,n)=1(mn)k=1mn+1(mk1n1)
The expected value has a simple value: E[P(a(1))]=1(mn)mn+1k=1(mk1n1)k=mn1+nE[P(a(1))]=1(mn)mn+1k=1(mk1n1)k=mn1+n. By symmetry, I expect all nn gaps to have this distribution. Perhaps the solution could be found by drawing from this distribution nn times.

Current work (2): it is easy to run Monte Carlo simulations.

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]];
m = 1000; n = 1; trials = 100000;
SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis,
Frame -> {True, True, False, False},
FrameLabel -> {"k (Max gap)", "Probability"},
PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]][![enter image description here][1]][1]

1
With these conditions you must have n<=m. I think you want g={a_(1), a_(2)-a_(1),..., a_(n)-a_(n-1)}. Does randomly select mean selecting each number with probability 1/m on the first draw? Since you do not replace the probability would be 1/(m-1) on the second and so on down to 1 on the mth draw if n=m. If n<m this would stop earlier with the last draw having probability 1/(m-(n-1)) on the nth draw.
Michael R. Chernick

2
Your original description of gg made no sense, because (I believe) you transposed two of the subscripts. Please verify that my edit conforms with your intention: in particular, please confirm that you mean for there to be nn gaps, of which a(1)a(1) is the first.
whuber

1
@gung I think this is research, rather than self-study
Glen_b -Reinstate Monica

1
I think your minimum and maximum gap sizes should be 11 and mn+1mn+1. The minimum gap size is when consecutive integers are chosen, and the maximum gap size occurs when you select mm and n1n1 first integers 1,,n11,,n1 (or 11 and mn+2,,mmn+2,,m)
probabilityislogic

1
Thank you Michael Chernick and probabilityislogic, your corrections have been made. Thank you @whuber for making the correction!
AaronBecker

Yanıtlar:


9

Let f(g;n,m)f(g;n,m) be the chance that the minimum, a(1)a(1), equals gg; that is, the sample consists of gg and an n1n1-subset of {g+1,g+2,,m}{g+1,g+2,,m}. There are (mgn1)(mgn1) such subsets out of the (mn)(mn) equally likely subsets, whence

Pr(a(1)=g=f(g;n,m)=(mgn1)(mn).

Pr(a(1)=g=f(g;n,m)=(mgn1)(mn).

Adding f(k;n,m)f(k;n,m) for all possible values of kk greater than gg yields the survival function

Pr(a(1)>g)=Q(g;n,m)=(mg)(mg1n1)n(mn).

Pr(a(1)>g)=Q(g;n,m)=(mg)(mg1n1)n(mn).

Let Gn,mGn,m be the random variable given by the largest gap:

Gn,m=max(a(1),a(2)a(1),,a(n)a(n1)).

Gn,m=max(a(1),a(2)a(1),,a(n)a(n1)).

(This responds to the question as originally framed, before it was modified to include a gap between a(n)a(n) and mm.) We will compute its survival function P(g;n,m)=Pr(Gn,m>g),

P(g;n,m)=Pr(Gn,m>g),
from which the entire distribution of Gn,mGn,m is readily derived. The method is a dynamic program beginning with n=1n=1, for which it is obvious that

P(g;1,m)=Pr(G1,m>1)=mgm, g=0,1,,m.

P(g;1,m)=Pr(G1,m>1)=mgm, g=0,1,,m.(1)

For larger n>1n>1, note that the event Gn,m>gGn,m>g is the disjoint union of the event

a1>g,

a1>g,

for which the very first gap exceeds gg, and the gg separate events

a1=k and Gn1,mk>g, k=1,2,,g

a1=k and Gn1,mk>g, k=1,2,,g

for which the first gap equals kk and a gap greater than gg occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

P(g;n,m)=Q(g;n,m)+gk=1f(k;n,m)P(g;n1,mk).

P(g;n,m)=Q(g;n,m)+k=1gf(k;n,m)P(g;n1,mk).(2)

Fixing gg and laying out a two-way array indexed by i=1,2,,ni=1,2,,n and j=1,2,,mj=1,2,,m, we may compute P(g;n,m)P(g;n,m) by using (1)(1) to fill in its first row and (2)(2) to fill in each successive row using O(gm)O(gm) operations per row. Consequently the table can be completed in O(gmn)O(gmn) operations and all tables for g=1g=1 through g=mn+1g=mn+1 can be constructed in O(m3n)O(m3n) operations.

Figure

These graphs show the survival function gP(g;n,64)gP(g;n,64) for n=1,2,4,8,16,32,64n=1,2,4,8,16,32,64. As nn increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for P(g;n,m)P(g;n,m) can be obtained in many special cases, especially for large nn, but I have not been able to obtain a closed formula that applies to all g,n,mg,n,m. Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of Gn,mGn,m is obtained by summing its survival function starting at g=0g=0:

E(Gn,m)=mn+1g=0P(g;n,m).

E(Gn,m)=g=0mn+1P(g;n,m).

Figure 2: contour plot of expectation

This contour plot of the expectation shows contours at 2,4,6,,322,4,6,,32, graduating from dark to light.


Suggestion: line "Let Gn,mGn,m be the random variable given by the largest gap:", please add the last gap of m+1anm+1an. Your expectation plot matches my Monte Carlo simulation.
AaronBecker
Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.