More on Parsing Expression Languages

2008/03/13

Back to PELs (Ford, 2004) with a small comparison with Church Rosser languages (McNaughton et al., 1988) this time. Both enjoy a linear time membership problem, include some context-sensitive languages and the full class of DCFLs, and are closed under complement.

I had wondered for some time whether all PELs were Church Rosser, but I just found out that the following Gladkij language was not growing context-sensitive and thus not Church Rosser (Buntrock and Otto, 1998):

Gladkij = { wcwRcw | w ∈ {a,b}* }.

We can however recognize Gladkij with a PEG since it is the intersection of two deterministic context-free languages

1 = { wcwRcw' | w,w' ∈ {a,b}* }
2 = { w'cwcwR | w,w' ∈ {a,b}* }.

Since PELs are closed under intersection and include all the DCFLs, it follows that Gladkij is a parsing expression language. The following PEG recognizes Gladkij (I'm not too fond of predicates, so I prefer to stick with the ordered choice operator):

S ← ((W' c W / ε) Z / ε) W c W'
W' ← a W' / b W' / ε
W ← a W a / b W b / c
T ← a / b / c
Z ← T Z / T

The expression W' c W recognizes 2, and the expression W c W' recognizes 1. If an input string is in 2, then the expression (W' c W / ε) Z fails, thus the next alternative ε is used and the input string also has to be in 1 in order to be accepted by the grammar. If an input string is not in 2, then (W' c W / ε) Z succeeds and consumes the whole input through Z, therefore the grammar cannot accept the input.

I have currently no clue whether an inclusion of the Church Rosser languages in the class of parsing expression languages would hold.

0 comments: