Belirli bir normal dilde sonsuz önek içermeyen alt küme var mı?


11

Sonlu alfabesinin üzerinde kelimelerin dizi öneki içermeyen bir başka bir öneki hiçbir iki ayrı kelime varsa.

Soru:

NFA olarak verilen normal bir dilin sonsuz önek içermeyen bir alt küme içerip içermediğini kontrol etmenin karmaşıklığı nedir?

Cevap (burada Mikhail Rudoy'a bağlı olarak) : Polinom zamanında yapılabilir ve sanırım NL'de bile.

Mikhail'in cevabını normal formdaki giriş NFA olsun (epsilon geçişleri yok, trim yok) ve (sırasıyla ) durumunun ilk durum ve durumunun son durum olarak (sırasıyla durumunun inital ve set nihai olarak olması) ile elde edilen dil olmalıdır . Bir kelime için let yineleme ile elde edilen sonsuz kelime .(Σ,q0,F,δ)L[p,r]L[p,R]p{r}pRuuωu

Aşağıdakiler eşdeğerdir:

  1. dili sonsuz bir önek içermeyen alt küme içeriyor.L[q0,F]
  2. qQ , de böylece , öneki değildir .uL[q,q]{ε} vL[q,F]vuω
  3. qQ L[q,q]{ε} uL[q,q] vL[q,F] so that v is not a prefix of uω.

Proof:

32 trivial.

For 21, it suffices to see that for any wL[q0,q] we have that w(u|v|)v is an infinite prefix-free subset of L[q0,F].

Finally, 13 is the "correctness" proof in Mikhail's answer.

Yanıtlar:


7

Your problem can be solved in polynomial time.

To begin, convert the given NFA to an equivalent NFA with the following additional properties:

  • There are no epsilon transitions
  • All states are reachable from the start state

Helpful subroutine

Suppose we have an NFA N, a state q, and a nonempty string s. The following subroutine will let us evaluate the truth value of the following statement: "every path in N from state q to an accept state corresponds to a string that is a prefix of string sn for some n." Furthermore, this subroutine will run in polynomial time.

First, construct the NFA S with |s|+1 states which accepts all strings that are not prefixes of sn for any n (|s| non-accept states in a loop to keep track of where in the "pattern" of sssss we are so far, and one accept state for if we have already deviated from that pattern). Next, construct the NFA N which is exactly like N but has q as its start state. Finally, construct a final NFA N whose language L(N) is L(S)L(N) using the standard NFA intersection construction. Note that all of these constructions are polynomial in the size of the input.

Then simply test whether the language of N is empty (which can be done in polynomial time with a simple graph search). L(N)= if and only if L(S)L(N)=, or in other words every string in L(N) is not in L(S). In other words, the language of N is empty if and only if N accepts only strings that are prefixes of sn for some n. This can be rephrased as exactly the statement we were trying to evaluate: "every path in N from state q to an accept state corresponds to a string that is a prefix of string sn for some n."

Main algorithm

Consider the set of states in the NFA that are in some loop. For each such state, q, do the following:

Let P2 be any simple loop containing q. Let s be the string corresponding to loop P2. Since the NFA has no epsilon transitions, s is not empty. Then apply the subroutine to the NFA, state q, and string s. If the subroutine tells us that every path starting at q in the NFA and ending at an accept state corresponds to a prefix of sn for some n then continue to the next state q. Otherwise, output that the given NFA's language contains an infinite prefex-free subset.

If we try every state q that is in a loop and the algorithm never outputs, then output that the given NFA's language does not contain an infinite prefex-free subset.

Correctness (first half)

First, suppose that the above algorithm asserts that the given NFA's language contains an infinite prefex-free subset. Let's say that this output was selected while considering some loop P2 and some state q. As before, s is the string corresponding to P2. Then we know according to the subroutine that not every path starting at q in the NFA and ending at an accept state corresponds to a prefix of sn for some n (as this is the only output of the subroutine that would lead to the main algorithm outputting at that q).

Let P3 be a path whose existence is asserted by the subroutine: a path from q to an accept state such that the corresponding string t is not a prefix of sn for any n.

Let P2 consist of m copies of P2 where m is sufficiently large that m|s|>|t|. Since P2 is a loop through q, P2 can be treated as a path from q to q. The string corresponding to P2 is sm

Let P1 be a path from the start state to q (which exists since every state is reachable from the start) and let r be the string corresponding to this path.

Then the path consisting of P1, x copies of P2, and P3 is an accepting computation path. The string corresponding to this path is r(sm)xt. Thus, the NFA accepts every string of the form r(sm)xt. This is an infinite set of strings accepted by the NFA, and I claim that this set of strings is prefix-free. In particular, suppose r(sm)xt is a prefix of r(sm)yt with y>x. In other words, t is a prefix of (sm)yxt. Since (sm)yx has length m(yx)|s|m|s|>|t|, this implies that t is a prefix of (sm)yx=sm(yx). But we know by the output of the subroutine that t is not a prefix of sn for any n. Thus, r(sm)xt cannot be a prefix of r(sm)yt, and as desired the set of strings is prefix-free.

Thus, I have shown that if the main algorithm outputs that the given NFA's language contains an infinite prefex-free subset then this is in fact the case.

Correctness (second half)

Next, I will show the other half: if the given NFA's language contains an infinite prefex-free subset then the main algorithm will output this fact.

Suppose the given NFA's language contains an infinite prefix-free subset. Let A be the set of (accepting) computation paths corresponding to these strings. Notice that A is an infinite set of accepting computation paths whose corresponding strings are never prefixes of each other.

Say that a state is "looping" in the NFA if there exists a loop in the NFA through that state and "non-looping" otherwise. Consider all paths from the start state to any looping state which pass through only non-looping states (except for the one looping state where they end up). Let P be the set of these paths. Each path pP cannot have a loop as then the states in that loop would be looping states and so p would pass through a looping state. Thus, the lengths of paths in P are bounded above by the number of states in the NFA and so P is finite (for example, if the start state is a looping state then the only such path is the empty path).

We can partition A into |P|+1 subsets based on how that computation paths in A starts. In particular, for pP, let Ap be the set of all computation paths in A that start with path p and let B be the set of all other paths in A. Clearly, all Aps and B are disjoint and their union is the entire set A. Furthermore, B contains only paths that never pass through a looping state, and therefore never loop; thus B is finite. We can conclude then that some Ap must be infinite (otherwise A would be a union of finitely many finite sets).

Since Ap is infinite, there are infinitely many computation paths, none of whose strings are prefixes of each other, that are accepting paths starting with p. Let q be the state reached at the end of path p. We can conclude that there are infinitely many accepting paths, call this set A, starting at q all of which correspond to strings that are not prefixes of each other.

During the main algorithm, we run the subroutine on state q and some string s. This subroutine tells us whether every accepting path starting at q corresponds to a string that is a prefix of sn for some n. If this were the case, then all the infinitely many accepting paths in A would be prefixes of sn for various n, which would imply that they are all prefixes of each other. This is not the case, so we conclude that when the main algorithm runs the subroutine on state q, the result is the other possible outcome. This, however, leads the main algorithm to output that the NFA's language contains an infinite prefix-free subset.

This concludes the proof of correctness.


I don't understand how the loop handling works, since a given state q can be part of (exponentially) many loops. Of course, if any two of those loops can be used to generate a non-periodic sequence, then we are done.
japh

What do you mean by loop handling? In the main algorithm, for each state q you pick just one loop that goes through q (any loop out of the potentially exponentially many) and call that loop P2 (afterwords you run the subroutine on state q and string s where s is the string associated with P2). The subroutine essentially handles the check of whether it is possible to generate a non-periodic sequence using that loop. If yes, then we're done. If no (and furthermore no for every q), then your entire language is a union of periodic sequences so we're also done.
Mikhail Rudoy

To make my question clearer, here's a simple NFA with initial state q, final state T and three transitions: qaq, qbq, qaT. The loop for a will not generate the prefix-free strings, but the loop for b will.
japh

Actually, the loop for a does generate a prefix free set: the set of strings aba all use the a loop. In my algorithm, if the loop you choose for q is the a loop then the subroutine will determine that no, not every accepting path starting at q has a string of the form a, and so the main algorithm will say that an infinite prefix-free subset exists. If the loop the algorithm uses for q is instead the b loop then the subroutine determines that not every accepting path starting at q has a string of the form b, and in this case too the algorithm has the same output.
Mikhail Rudoy

Thank you Mikhail! I think your answer settles the question.
Googlo

2

Definitions

Definition 1: Let S be a set of words. We say that S is nicely infinite prefix-free (made up name for the purpose of this answer) if there are words u0,,un, and v1,,vn, such that:

  • For each n1, un and vn are non-empty and start with distinct letters;

  • S={u0v1,,u0unvn+1,}.

The intuition is that you can put all those words on an infinite rooted tree (the is the root, the are the leaves, and the are the remaining interior nodes) of the following shape such that the words in S are exactly the labels of paths from the root to a leaf:

   u₀    u₁    u₂
■-----•-----•-----•⋅⋅⋅
      |     |     |
      | v₁  | v₂  | v₃
      |     |     |
      ▲     ▲     ▲

Proposition 1.1: A nicely infinite prefix-free set is prefix-free.

Proof of proposition 1.1: Suppose that u0unvn+1 is a strict prefix of u0umvm+1. There are two cases:

  • If n<m then vn+1 is a prefix of un+1umvm+1. This is impossible because un+1 and vn+1 have distinct first letters.

  • If n>m then um+1unvn+1 is a prefix of vm+1. This is impossible because um+1 and vm+1 have distinct first letters.

Proposition 1.2: A nicely infinite prefix-free set is infinite.

Proof of proposition 1.2: In proof 1.1, we showed that if nm then u0unvn+1 and u0umvm+1 are not comparable for the prefix order. They are therefore not equal.


Main proof

Proposition 2: Any infinite prefix-free set contains a nice infinite prefix-free set.

Proposition 3: A language contains an infinite prefix-free set if and only if it contains a nicely infinite prefix-free set.

Proof below.

Proof of proposition 3: by proposition 2. by propositions 1.1 and 1.2.

Proposition 4: The set of nicely-prefix-free subsets of a regular language (encoded as an infinite word u0¯v1^u1¯v2^u2¯) is ω-regular (and the size of the Büchi automaton recognizing it is polynomial in the size of the NFA recognizing the regular language).

Proof below.

Theorem 5: Deciding if a regular language described by a NFA contains an infinite prefix-free subset can be done in time polynomial in the size of the NFA.

Proof of theorem 5: By proposition 3, it is sufficient to test if it contains a nicely-infinite prefix-free subset, which can be done in polynomial time by building the Büchi automaton given by proposition 4 and testing the non-emptyness of its language (which can be done in time linear in the size of the Büchi automaton).


Proof of proposition 2

Lemma 2.1: If S is a prefix-free set, then so is w1S (for any word w).

Proof 2.1: By definition.

Lemma 2.2: Let S be an infinite set of words. Let w:=lcp(Sn) be the longest prefix common to all words in S. S and w1S have the same cardinal.

Proof 2.2: Define f:w1SS by f(x)=wx. It is well defined by definition of w1S, injective by definition of f and surjective by definition of w.

Proof of proposition 2: We build un and vn by induction on n, with the induction hypothesis Hn composed of the following parts:

  • (P1) For all k{1,,n}, u0uk1vkS;

  • (P2) For all k{1,,n}, uk and vk are non-empty and start with distinct letters;

  • (P3) Sn:=(u0un)1S is infinite;

  • (P4) There is no non-empty prefix common to all words in Sn. In other words: There is no letter a such that SnaΣ.

Remark 2.3: If we have sequences that verify Hn without (P4), we can modify un to make them to also satisfy (P4). Indeed, it suffices to replace un by unlcp(Sn). (P1) is unaffected. (P2) is trivial. (P4) is by construction. (P3) is by lemma 3.

We now build the sequences by induction on n:

  • Initialization: H0 is true by taking u0:=lcp(S) (i.e. by taking u0:=ε and applying remark 3.1).

  • Induction step: Suppose that we have words u1,,un and v1,,vn such that Hn for some n. We will build un+1 and vn+1 such that Hn+1.

Since Sn is infinite and prefix-free (by lemma 1), it does not contain ε so that Sn=aΣ(SnaΣ). Since Sn is infinite, there is a letter a such that SnaΣ is infinite. By (P4), there is a letter b distinct from a such that SnbΣ is non-empty. Pick vn+1SnbΣ. Taking un+1 to be a would satisfy (P1), (P2) and (P3) so we apply remark 3.1 to get (P4): un+1:=alcp(a1Sn).

(P1) u1unvn+1u1un(SnbΣ)S.

(P2) By definition of un+1 and vn+1.

(P3) a1Sn is infinite by definition of a, and Sn+1 is therefore infinite by lemma 3.

(P4) By definition of un+1.


Proof of proposition 4

Proof of proposition 4: Let A=(Q,,Δ,q0,F) be a NFA.

The idea is the following: we read u0, remember where we are, read v1, backtrack to where we were after reading u0, read u1, remember where we are, ... We also remember the first letter that was read in each vn to ensure that un starts with another letter.

I've been told that this could be easier with multi-head automata but I'm not really familiar with the formalism so I'll just describe it using a Büchi automaton (with only one head).

We set Σ:=Σ¯Σ^, where the overlined symbols will be used to describes the uks and the symbols with hats for the vks.

We set Q:=Q×({}(Q×Σ)), where:

  • (q,) means that you are reading some un;

  • (q,(p,a)) means that you finished reading some un in the state p, that you are now reading vn+1 that starts with an a, and that once you are done, you will go back to p to read a un+1 that does not start with a.

We set q0:=(q0,) because we start by reading u0.

We define F as F×Q×Σ.

The set of transitions is defined as follows:

  • "un" For each transition qaq, add (q,)a¯(q,);

  • "un to vn+1" For each transition qaq, add (q,)a^(q,(q,a));

  • "vn" For each transition qaq, add (q,(p,a))a^(q,(p,a));

  • "vn to un" For each transition pap where p is final and letter b distinct from a, add (q,(p,b))a¯(p,);

Lemma 4.1: u0¯v1^u1¯v2^un¯vn+1^ is accepted by A iff for each n1, un and vn are non-empty and start with distinct letters, and for each n0, u0unvn+1L(A).

Proof of lemma 4.1: Left to the reader.

Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.