Definitions
Definition 1: Let S be a set of words. We say that S is nicely infinite prefix-free (made up name for the purpose of this answer) if there are words u0,…,un,… and v1,…,vn,… such that:
For each n≥1, un and vn are non-empty and start with distinct letters;
S={u0v1,…,u0…unvn+1,…}.
The intuition is that you can put all those words on an infinite rooted tree (the ■
is the root, the ▲
are the leaves, and the •
are the remaining interior nodes) of the following shape such that the words in S are exactly the labels of paths from the root to a leaf:
u₀ u₁ u₂
■-----•-----•-----•⋅⋅⋅
| | |
| v₁ | v₂ | v₃
| | |
▲ ▲ ▲
Proposition 1.1: A nicely infinite prefix-free set is prefix-free.
Proof of proposition 1.1: Suppose that u0…unvn+1 is a strict prefix of u0…umvm+1. There are two cases:
If n<m then vn+1 is a prefix of un+1…umvm+1. This is impossible because un+1 and vn+1 have distinct first letters.
If n>m then um+1…unvn+1 is a prefix of vm+1. This is impossible because um+1 and vm+1 have distinct first letters.
Proposition 1.2: A nicely infinite prefix-free set is infinite.
Proof of proposition 1.2: In proof 1.1, we showed that if n≠m then u0…unvn+1 and u0…umvm+1 are not comparable for the prefix order. They are therefore not equal.
Main proof
Proposition 2: Any infinite prefix-free set contains a nice infinite prefix-free set.
Proposition 3: A language contains an infinite prefix-free set if and only if it contains a nicely infinite prefix-free set.
Proof below.
Proof of proposition 3: ⇒ by proposition 2. ⇐ by propositions 1.1 and 1.2.
Proposition 4: The set of nicely-prefix-free subsets of a regular language (encoded as an infinite word u0¯¯¯¯¯v1ˆu1¯¯¯¯¯v2ˆu2¯¯¯¯¯…) is ω-regular (and the size of the Büchi automaton recognizing it is polynomial in the size of the NFA recognizing the regular language).
Proof below.
Theorem 5: Deciding if a regular language described by a NFA contains an infinite prefix-free subset can be done in time polynomial in the size of the NFA.
Proof of theorem 5: By proposition 3, it is sufficient to test if it contains a nicely-infinite prefix-free subset, which can be done in polynomial time by building the Büchi automaton given by proposition 4 and testing the non-emptyness of its language (which can be done in time linear in the size of the Büchi automaton).
Proof of proposition 2
Lemma 2.1: If S is a prefix-free set, then so is w−1S (for any word w).
Proof 2.1: By definition.
Lemma 2.2: Let S be an infinite set of words. Let w:=lcp(Sn) be the longest prefix common to all words in S. S and w−1S have the same cardinal.
Proof 2.2: Define f:w−1S→S by f(x)=wx. It is well defined by definition of w−1S, injective by definition of f and surjective by definition of w.
Proof of proposition 2: We build un and vn by induction on n, with the induction hypothesis Hn composed of the following parts:
(P1) For all k∈{1,…,n}, u0…uk−1vk∈S;
(P2) For all k∈{1,…,n}, uk and vk are non-empty and start with distinct letters;
(P3) Sn:=(u0…un)−1S is infinite;
(P4) There is no non-empty prefix common to all words in Sn. In other words: There is no letter a such that Sn⊆aΣ∗.
Remark 2.3: If we have sequences that verify Hn without (P4), we can modify un to make them to also satisfy (P4). Indeed, it suffices to replace un by unlcp(Sn). (P1) is unaffected. (P2) is trivial. (P4) is by construction. (P3) is by lemma 3.
We now build the sequences by induction on n:
Initialization: H0 is true by taking u0:=lcp(S) (i.e. by taking u0:=ε and applying remark 3.1).
Induction step: Suppose that we have words u1,…,un and v1,…,vn such that Hn for some n. We will build un+1 and vn+1 such that Hn+1.
Since Sn is infinite and prefix-free (by lemma 1), it does not contain ε so that Sn=⨆a∈Σ(Sn∩aΣ∗). Since Sn is infinite, there is a letter a such that Sn∩aΣ∗ is infinite. By (P4), there is a letter b distinct from a such that Sn∩bΣ∗ is non-empty. Pick vn+1∈Sn∩bΣ∗. Taking un+1 to be a would satisfy (P1), (P2) and (P3) so we apply remark 3.1 to get (P4): un+1:=alcp(a−1Sn).
(P1) u1…unvn+1∈u1…un(Sn∩bΣ∗)⊆S.
(P2) By definition of un+1 and vn+1.
(P3) a−1Sn is infinite by definition of a, and Sn+1 is therefore infinite by lemma 3.
(P4) By definition of un+1.
Proof of proposition 4
Proof of proposition 4: Let A=(Q,→,Δ,q0,F) be a NFA.
The idea is the following: we read u0, remember where we are, read v1, backtrack to where we were after reading u0, read u1, remember where we are, ... We also remember the first letter that was read in each vn to ensure that un starts with another letter.
I've been told that this could be easier with multi-head automata but I'm not really familiar with the formalism so I'll just describe it using a Büchi automaton (with only one head).
We set Σ′:=Σ¯¯¯¯⊔Σˆ, where the overlined symbols will be used to describes the uks and the symbols with hats for the vks.
We set Q′:=Q×({⊥}⊔(Q×Σ)), where:
(q,⊥) means that you are reading some un;
(q,(p,a)) means that you finished reading some un in the state p, that you are now reading vn+1 that starts with an a, and that once you are done, you will go back to p to read a un+1 that does not start with a.
We set q′0:=(q0,⊥) because we start by reading u0.
We define F′ as F×Q×Σ.
The set of transitions →′ is defined as follows:
"un" For each transition q→aq′, add (q,⊥)→′a¯¯¯(q′,⊥);
"un to vn+1" For each transition q→aq′, add (q,⊥)→′aˆ(q′,(q,a));
"vn" For each transition q→aq′, add (q,(p,a))→′aˆ(q′,(p,a));
"vn to un" For each transition p→ap′ where p is final and letter b distinct from a, add (q,(p,b))→′a¯¯¯(p′,⊥);
Lemma 4.1: u0¯¯¯¯¯v1ˆu1¯¯¯¯¯v2ˆ…un¯¯¯¯¯vn+1ˆ is accepted by A′ iff for each n≥1, un and vn are non-empty and start with distinct letters, and for each n≥0, u0…unvn+1∈L(A).
Proof of lemma 4.1: Left to the reader.