Reading Notes in Formal Languages: 2008/03

More on Parsing Expression Languages

2008/03/13

Back to PELs (Ford, 2004) with a small comparison with Church Rosser languages (McNaughton et al., 1988) this time. Both enjoy a linear time membership problem, include some context-sensitive languages and the full class of DCFLs, and are closed under complement.

I had wondered for some time whether all PELs were Church Rosser, but I just found out that the following Gladkij language was not growing context-sensitive and thus not Church Rosser (Buntrock and Otto, 1998):

ℒ_Gladkij = { wcw^Rcw | w ∈ {a,b}^* }.

We can however recognize ℒ_Gladkij with a PEG since it is the intersection of two deterministic context-free languages

ℒ₁ = { wcw^Rcw' | w,w' ∈ {a,b}^* }
ℒ₂ = { w'cwcw^R | w,w' ∈ {a,b}^* }.

Since PELs are closed under intersection and include all the DCFLs, it follows that ℒ_Gladkij is a parsing expression language. The following PEG recognizes ℒ_Gladkij (I'm not too fond of predicates, so I prefer to stick with the ordered choice operator):

S ← ((W' c W / ε) Z / ε) W c W'
W' ← a W' / b W' / ε
W ← a W a / b W b / c
T ← a / b / c
Z ← T Z / T

The expression W' c W recognizes ℒ₂, and the expression W c W' recognizes ℒ₁. If an input string is in ℒ₂, then the expression (W' c W / ε) Z fails, thus the next alternative ε is used and the input string also has to be in ℒ₁ in order to be accepted by the grammar. If an input string is not in ℒ₂, then (W' c W / ε) Z succeeds and consumes the whole input through Z, therefore the grammar cannot accept the input.

I have currently no clue whether an inclusion of the Church Rosser languages in the class of parsing expression languages would hold.

A Note on Parsing Expression Languages

2008/03/11

Thanks to a recent excursion into the realm of RCGs, I have found the answer to another comparison with the boolean closure of some context-free languages: the comparison with PEL (Ford, 2004).

Parsing expression languages are closed under union, intersection, and complement, and properly include DCFLs (Birman and Ullman, 1973). Thus a comparison with the boolean closure of DCFLs is of some interest (on the other hand, it is widely conjectured that they do not capture all the CFLs). Using again the results of Kutrib et al. (2004, Theorem 17), we know that the language

ℒ_exp = { a^2ⁿ | n ≥ 1 },

which is not semilinear, is not in Bool(CFL) and thus not in Bool(DCFL).

But we can generate this language with the PEG

S ← a S a / a a

(see Exercise 6.1.1 by Aho and Ullman, 1972). Thus the inclusion of Bool(DCFL) in the set of parsing expression languages is proper.

Notes on 1-Range Concatenation Grammars

2008/03/07

Some quick notes on 1-RCGs after a first skim through Boullier (2000)'s paper.

Boolean Closure of Context-Free Languages

A striking aspect of 1-RCGs is how close to defining exactly the boolean closure of CFLs (Bool(CFL)) they seem to be. Thanks to their closure by intersection and complement and their inclusion of CFLs, 1-RCLs (actually proper 1-RCLs) clearly include Bool(CFL).

The inclusion is however proper. Using the results of Kutrib et al. (2004, Theorem 17), we can exploit the semilinearity of some languages in Bool(CFL) to show that the language

ℒ_exp = { aⁿb^2ⁿ | 0 ≤ n }.

is not in Bool(CFL).

Nevertheless, I think we can generate this language with the 1-RCG

S(aXYZ) → S(XY) S(XZ) A(X) B(YZ)
S(b) → ε
A(aX) → A(X)
A(ε) → ε
B(bXb) → B(X)
B(bb) → ε

This grammar is not a proper 1-RCG: the calls to S(XY) and S(XZ) use two variables in a right-hand side predicate, without any clear way to obtain the same effect with only one variable. The class of languages generated by proper 1-RCGs would probably deserve to be further compared to Bool(CFL).

Tree Adjoining Languages

Boullier (2000) indicates that it is unknown, albeit very unlikely, whether any tree adjoining language could be expressed using a proper 1-RCG. I am really not sure about this, but I wondered whether the language

ℒ_palcopy = { ww^Rww^R | w ∈ {a,b}^* }

could not be a good candidate for a counter-example.

Reading Notes in Formal Languages