Belirli bir normal dilde sonsuz önek içermeyen alt küme var mı?

Sonlu alfabesinin üzerinde kelimelerin dizi öneki içermeyen bir başka bir öneki hiçbir iki ayrı kelime varsa.

Soru:

NFA olarak verilen normal bir dilin sonsuz önek içermeyen bir alt küme içerip içermediğini kontrol etmenin karmaşıklığı nedir?

Cevap (burada Mikhail Rudoy'a bağlı olarak) : Polinom zamanında yapılabilir ve sanırım NL'de bile.

Mikhail'in cevabını normal formdaki giriş NFA olsun (epsilon geçişleri yok, trim yok) ve (sırasıyla ) durumunun ilk durum ve durumunun son durum olarak (sırasıyla durumunun inital ve set nihai olarak olması) ile elde edilen dil olmalıdır . Bir kelime için let yineleme ile elde edilen sonsuz kelime . $(\Sigma,q_0,F,\delta)$ $L[p,r]$ $L[p,R]$ $p$ $\{r\}$ $p$ $R$ $u$ $u^\omega$ $u$

Aşağıdakiler eşdeğerdir:

dili sonsuz bir önek içermeyen alt küme içeriyor. $L[q_0,F]$
$\exists q \in Q$ , de böylece , öneki değildir . $\exists u \in L[q,q]\smallsetminus\{\varepsilon\}$ $\exists v \in L[q,F]$ $v$ $u^\omega$
$\exists q \in Q$ $L[q,q] \neq \{\varepsilon\}$ $\forall u \in L[q,q]$ $\exists v \in L[q,F]$ so that $v$ is not a prefix of $u^\omega$ .

Proof:

3 $\Rightarrow$ 2 trivial.

For 2 $\Rightarrow$ 1, it suffices to see that for any $w \in L[q_0,q]$ we have that $w (u^{|v|})^* v$ is an infinite prefix-free subset of $L[q_0,F]$ .

Finally, 1 $\Rightarrow$ 3 is the "correctness" proof in Mikhail's answer.

— Googlo
kaynak

Yanıtlar:

Your problem can be solved in polynomial time.

To begin, convert the given NFA to an equivalent NFA with the following additional properties:

There are no epsilon transitions
All states are reachable from the start state

Helpful subroutine

Suppose we have an NFA $N$ , a state $q$ , and a nonempty string $s$ . The following subroutine will let us evaluate the truth value of the following statement: "every path in $N$ from state $q$ to an accept state corresponds to a string that is a prefix of string $s^n$ for some $n$ ." Furthermore, this subroutine will run in polynomial time.

First, construct the NFA $S$ with $|s| + 1$ states which accepts all strings that are not prefixes of $s^n$ for any $n$ ( $|s|$ non-accept states in a loop to keep track of where in the "pattern" of $sssss\ldots$ we are so far, and one accept state for if we have already deviated from that pattern). Next, construct the NFA $N'$ which is exactly like $N$ but has $q$ as its start state. Finally, construct a final NFA $N''$ whose language $L(N'')$ is $L(S) \cap L(N')$ using the standard NFA intersection construction. Note that all of these constructions are polynomial in the size of the input.

Then simply test whether the language of $N''$ is empty (which can be done in polynomial time with a simple graph search). $L(N'') = \emptyset$ if and only if $L(S) \cap L(N') = \emptyset$ , or in other words every string in $L(N')$ is not in $L(S)$ . In other words, the language of $N''$ is empty if and only if $N'$ accepts only strings that are prefixes of $s^n$ for some $n$ . This can be rephrased as exactly the statement we were trying to evaluate: "every path in $N$ from state $q$ to an accept state corresponds to a string that is a prefix of string $s^n$ for some $n$ ."

Main algorithm

Consider the set of states in the NFA that are in some loop. For each such state, $q$ , do the following:

Let $P_2$ be any simple loop containing $q$ . Let $s$ be the string corresponding to loop $P_2$ . Since the NFA has no epsilon transitions, $s$ is not empty. Then apply the subroutine to the NFA, state $q$ , and string $s$ . If the subroutine tells us that every path starting at $q$ in the NFA and ending at an accept state corresponds to a prefix of $s^n$ for some $n$ then continue to the next state $q$ . Otherwise, output that the given NFA's language contains an infinite prefex-free subset.

If we try every state $q$ that is in a loop and the algorithm never outputs, then output that the given NFA's language does not contain an infinite prefex-free subset.

Correctness (first half)

First, suppose that the above algorithm asserts that the given NFA's language contains an infinite prefex-free subset. Let's say that this output was selected while considering some loop $P_2$ and some state $q$ . As before, $s$ is the string corresponding to $P_2$ . Then we know according to the subroutine that not every path starting at $q$ in the NFA and ending at an accept state corresponds to a prefix of $s^n$ for some $n$ (as this is the only output of the subroutine that would lead to the main algorithm outputting at that $q$ ).

Let $P_3$ be a path whose existence is asserted by the subroutine: a path from $q$ to an accept state such that the corresponding string $t$ is not a prefix of $s^n$ for any $n$ .

Let $P_2'$ consist of $m$ copies of $P_2$ where $m$ is sufficiently large that $m|s| > |t|$ . Since $P_2$ is a loop through $q$ , $P_2'$ can be treated as a path from $q$ to $q$ . The string corresponding to $P_2'$ is $s^m$

Let $P_1$ be a path from the start state to $q$ (which exists since every state is reachable from the start) and let $r$ be the string corresponding to this path.

Then the path consisting of $P_1$ , $x$ copies of $P_2'$ , and $P_3$ is an accepting computation path. The string corresponding to this path is $r(s^m)^xt$ . Thus, the NFA accepts every string of the form $r(s^m)^xt$ . This is an infinite set of strings accepted by the NFA, and I claim that this set of strings is prefix-free. In particular, suppose $r(s^m)^xt$ is a prefix of $r(s^m)^yt$ with $y > x$ . In other words, $t$ is a prefix of $(s^m)^{y-x}t$ . Since $(s^m)^{y-x}$ has length $m(y-x)|s| \ge m|s| > |t|$ , this implies that $t$ is a prefix of $(s^m)^{y-x} = s^{m(y-x)}$ . But we know by the output of the subroutine that $t$ is not a prefix of $s^n$ for any $n$ . Thus, $r(s^m)^xt$ cannot be a prefix of $r(s^m)^yt$ , and as desired the set of strings is prefix-free.

Thus, I have shown that if the main algorithm outputs that the given NFA's language contains an infinite prefex-free subset then this is in fact the case.

Correctness (second half)

Next, I will show the other half: if the given NFA's language contains an infinite prefex-free subset then the main algorithm will output this fact.

Suppose the given NFA's language contains an infinite prefix-free subset. Let $A$ be the set of (accepting) computation paths corresponding to these strings. Notice that $A$ is an infinite set of accepting computation paths whose corresponding strings are never prefixes of each other.

Say that a state is "looping" in the NFA if there exists a loop in the NFA through that state and "non-looping" otherwise. Consider all paths from the start state to any looping state which pass through only non-looping states (except for the one looping state where they end up). Let $P$ be the set of these paths. Each path $p \in P$ cannot have a loop as then the states in that loop would be looping states and so $p$ would pass through a looping state. Thus, the lengths of paths in $P$ are bounded above by the number of states in the NFA and so $P$ is finite (for example, if the start state is a looping state then the only such path is the empty path).

We can partition $A$ into $|P|+1$ subsets based on how that computation paths in $A$ starts. In particular, for $p \in P$ , let $A_p$ be the set of all computation paths in $A$ that start with path $p$ and let $B$ be the set of all other paths in $A$ . Clearly, all $A_p$ s and $B$ are disjoint and their union is the entire set $A$ . Furthermore, $B$ contains only paths that never pass through a looping state, and therefore never loop; thus $B$ is finite. We can conclude then that some $A_p$ must be infinite (otherwise $A$ would be a union of finitely many finite sets).

Since $A_p$ is infinite, there are infinitely many computation paths, none of whose strings are prefixes of each other, that are accepting paths starting with $p$ . Let $q$ be the state reached at the end of path $p$ . We can conclude that there are infinitely many accepting paths, call this set $A'$ , starting at $q$ all of which correspond to strings that are not prefixes of each other.

During the main algorithm, we run the subroutine on state $q$ and some string $s$ . This subroutine tells us whether every accepting path starting at $q$ corresponds to a string that is a prefix of $s^n$ for some $n$ . If this were the case, then all the infinitely many accepting paths in $A'$ would be prefixes of $s^n$ for various $n$ , which would imply that they are all prefixes of each other. This is not the case, so we conclude that when the main algorithm runs the subroutine on state $q$ , the result is the other possible outcome. This, however, leads the main algorithm to output that the NFA's language contains an infinite prefix-free subset.

This concludes the proof of correctness.

— Mikhail Rudoy
kaynak

I don't understand how the loop handling works, since a given state

q

$q$ can be part of (exponentially) many loops. Of course, if any two of those loops can be used to generate a non-periodic sequence, then we are done.

— japh

What do you mean by loop handling? In the main algorithm, for each state

q

$q$ you pick just one loop that goes through

q

$q$ (any loop out of the potentially exponentially many) and call that loop

P_{2}

$P_2$ (afterwords you run the subroutine on state

q

$q$ and string

s

$s$ where

s

$s$ is the string associated with

P_{2}

$P_2$ ). The subroutine essentially handles the check of whether it is possible to generate a non-periodic sequence using that loop. If yes, then we're done. If no (and furthermore no for every

q

$q$ ), then your entire language is a union of periodic sequences so we're also done.

— Mikhail Rudoy

To make my question clearer, here's a simple NFA with initial state

q

$q$ , final state

T

$T$ and three transitions:

q \overset{a}{\to} q

$q \overset{a}{\rightarrow} q$ ,

q \overset{b}{\to} q

$q \overset{b}{\rightarrow} q$ ,

q \overset{a}{\to} T

$q \overset{a}{\rightarrow} T$ . The loop for

a

$a$ will not generate the prefix-free strings, but the loop for

b

$b$ will.

— japh

Actually, the loop for

a

$a$ does generate a prefix free set: the set of strings

a^{*} b a

$a^*ba$ all use the

a

$a$ loop. In my algorithm, if the loop you choose for

q

$q$ is the

a

$a$ loop then the subroutine will determine that no, not every accepting path starting at

q

$q$ has a string of the form

a^{*}

$a^*$ , and so the main algorithm will say that an infinite prefix-free subset exists. If the loop the algorithm uses for

q

$q$ is instead the

b

$b$ loop then the subroutine determines that not every accepting path starting at

q

$q$ has a string of the form

b^{*}

$b^*$ , and in this case too the algorithm has the same output.

— Mikhail Rudoy

Thank you Mikhail! I think your answer settles the question.

— Googlo

Definitions

Definition 1: Let $S$ be a set of words. We say that $S$ is nicely infinite prefix-free (made up name for the purpose of this answer) if there are words $u_0,\dots,u_n,\dots$ and $v_1,\dots,v_n,\dots$ such that:

For each $n\ge 1$ , $u_n$ and $v_n$ are non-empty and start with distinct letters;
$S=\{u_0v_1,\dots,u_0\dots u_n v_{n+1},\dots\}$ .

The intuition is that you can put all those words on an infinite rooted tree (the ■ is the root, the ▲ are the leaves, and the • are the remaining interior nodes) of the following shape such that the words in $S$ are exactly the labels of paths from the root to a leaf:

   u₀    u₁    u₂
■-----•-----•-----•⋅⋅⋅
      |     |     |
      | v₁  | v₂  | v₃
      |     |     |
      ▲     ▲     ▲

Proposition 1.1: A nicely infinite prefix-free set is prefix-free.

Proof of proposition 1.1: Suppose that $u_0\dots u_n v_{n+1}$ is a strict prefix of $u_0 \dots u_m v_{m+1}$ . There are two cases:

If $n < m$ then $v_{n+1}$ is a prefix of $u_{n+1}\dots u_m v_{m+1}$ . This is impossible because $u_{n+1}$ and $v_{n+1}$ have distinct first letters.
If $n > m$ then $u_{m+1}\dots u_n v_{n+1}$ is a prefix of $v_{m+1}$ . This is impossible because $u_{m+1}$ and $v_{m+1}$ have distinct first letters.

Proposition 1.2: A nicely infinite prefix-free set is infinite.

Proof of proposition 1.2: In proof 1.1, we showed that if $n\not= m$ then $u_0\dots u_n v_{n+1}$ and $u_0 \dots u_m v_{m+1}$ are not comparable for the prefix order. They are therefore not equal.

Main proof

Proposition 2: Any infinite prefix-free set contains a nice infinite prefix-free set.

Proposition 3: A language contains an infinite prefix-free set if and only if it contains a nicely infinite prefix-free set.

Proof below.

Proof of proposition 3: $\boxed{\Rightarrow}$ by proposition 2. $\boxed{\Leftarrow}$ by propositions 1.1 and 1.2.

Proposition 4: The set of nicely-prefix-free subsets of a regular language (encoded as an infinite word $\overline{u_0}\widehat{v_1}\overline{u_1}\widehat{v_2}\overline{u_2}\dots$ ) is $\omega$ -regular (and the size of the Büchi automaton recognizing it is polynomial in the size of the NFA recognizing the regular language).

Proof below.

Theorem 5: Deciding if a regular language described by a NFA contains an infinite prefix-free subset can be done in time polynomial in the size of the NFA.

Proof of theorem 5: By proposition 3, it is sufficient to test if it contains a nicely-infinite prefix-free subset, which can be done in polynomial time by building the Büchi automaton given by proposition 4 and testing the non-emptyness of its language (which can be done in time linear in the size of the Büchi automaton).

Proof of proposition 2

Lemma 2.1: If $S$ is a prefix-free set, then so is $w^{-1}S$ (for any word $w$ ).

Proof 2.1: By definition.

Lemma 2.2: Let $S$ be an infinite set of words. Let $w:=\operatorname{lcp}(S_n)$ be the longest prefix common to all words in $S$ . $S$ and $w^{-1}S$ have the same cardinal.

Proof 2.2: Define $f:w^{-1}S\to S$ by $f(x)=wx$ . It is well defined by definition of $w^{-1}S$ , injective by definition of $f$ and surjective by definition of $w$ .

Proof of proposition 2: We build $u_n$ and $v_n$ by induction on $n$ , with the induction hypothesis $H_n$ composed of the following parts:

$(P_1)$ For all $k\in\{1,\dots,n\}$ , $u_0\dots u_{k-1} v_k \in S$ ;
$(P_2)$ For all $k\in\{1,\dots,n\}$ , $u_k$ and $v_k$ are non-empty and start with distinct letters;
$(P_3)$ $S_n:=(u_0\dots u_n)^{-1}S$ is infinite;
$(P_4)$ There is no non-empty prefix common to all words in $S_n$ . In other words: There is no letter $a$ such that $S_n\subseteq a\Sigma^*$ .

Remark 2.3: If we have sequences that verify $H_n$ without $(P_4)$ , we can modify $u_n$ to make them to also satisfy $(P_4)$ . Indeed, it suffices to replace $u_n$ by $u_n\operatorname{lcp}(S_n)$ . $(P_1)$ is unaffected. $(P_2)$ is trivial. $(P_4)$ is by construction. $(P_3)$ is by lemma 3.

We now build the sequences by induction on $n$ :

Initialization: $H_0$ is true by taking $u_0:=\operatorname{lcp}(S)$ (i.e. by taking $u_0:=\varepsilon$ and applying remark 3.1).
Induction step: Suppose that we have words $u_1,\dots,u_n$ and $v_1,\dots,v_n$ such that $H_n$ for some $n$ . We will build $u_{n+1}$ and $v_{n+1}$ such that $H_{n+1}$ .

Since $S_n$ is infinite and prefix-free (by lemma 1), it does not contain $\varepsilon$ so that $S_n=\underset{a\in \Sigma}{\bigsqcup}(S_n\cap a\Sigma^*)$ . Since $S_n$ is infinite, there is a letter $a$ such that $S_n\cap a\Sigma^*$ is infinite. By $(P_4)$ , there is a letter $b$ distinct from $a$ such that $S_n\cap b\Sigma^*$ is non-empty. Pick $v_{n+1}\in S_n\cap b\Sigma^*$ . Taking $u_{n+1}$ to be $a$ would satisfy $(P_1)$ , $(P_2)$ and $(P_3)$ so we apply remark 3.1 to get $(P_4)$ : $u_{n+1}:=a\operatorname{lcp}(a^{-1}S_n)$ .

$(P_1)$ $u_1\dots u_nv_{n+1}\in u_1\dots u_n(S_n\cap b\Sigma^*)\subseteq S$ .

$(P_2)$ By definition of $u_{n+1}$ and $v_{n+1}$ .

$(P_3)$ $a^{-1}S_n$ is infinite by definition of $a$ , and $S_{n+1}$ is therefore infinite by lemma 3.

$(P_4)$ By definition of $u_{n+1}$ .

Proof of proposition 4

Proof of proposition 4: Let $A=(Q,\to,\Delta,q_0,F)$ be a NFA.

The idea is the following: we read $u_0$ , remember where we are, read $v_1$ , backtrack to where we were after reading $u_0$ , read $u_1$ , remember where we are, ... We also remember the first letter that was read in each $v_n$ to ensure that $u_n$ starts with another letter.

I've been told that this could be easier with multi-head automata but I'm not really familiar with the formalism so I'll just describe it using a Büchi automaton (with only one head).

We set $\Sigma':=\overline{\Sigma}\sqcup\widehat{\Sigma}$ , where the overlined symbols will be used to describes the $u_k$ s and the symbols with hats for the $v_k$ s.

We set $Q':=Q\times (\{\bot\}\sqcup (Q \times \Sigma))$ , where:

$(q,\bot)$ means that you are reading some $u_n$ ;
$(q,(p,a))$ means that you finished reading some $u_n$ in the state $p$ , that you are now reading $v_{n+1}$ that starts with an $a$ , and that once you are done, you will go back to $p$ to read a $u_{n+1}$ that does not start with $a$ .

We set $q_0':=(q_0,\bot)$ because we start by reading $u_0$ .

We define $F'$ as $F\times Q \times \Sigma$ .

The set of transitions $\to'$ is defined as follows:

" $u_n$ " For each transition $q\overset{a}{\to}q'$ , add $(q,\bot)\overset{\overline{a}}{\to'}(q',\bot)$ ;
" $u_n$ to $v_{n+1}$ " For each transition $q\overset{a}{\to}q'$ , add $(q,\bot)\overset{\widehat{a}}{\to'}(q',(q,a))$ ;
" $v_n$ " For each transition $q\overset{a}{\to}q'$ , add $(q,(p,a))\overset{\widehat{a}}{\to'}(q',(p,a))$ ;
" $v_n$ to $u_n$ " For each transition $p\overset{a}{\to}p'$ where $p$ is final and letter $b$ distinct from $a$ , add $(q,(p,b))\overset{\overline{a}}{\to'}(p',\bot)$ ;

Lemma 4.1: $\overline{u_0}\widehat{v_1}\overline{u_1}\widehat{v_2}\dots \overline{u_n}\widehat{v_{n+1}}$ is accepted by $A'$ iff for each $n\ge 1$ , $u_n$ and $v_n$ are non-empty and start with distinct letters, and for each $n\ge 0$ , $u_0\dots u_n v_{n+1}\in L(A)$ .

Proof of lemma 4.1: Left to the reader.

— xavierm02
kaynak