-
Notifications
You must be signed in to change notification settings - Fork 351
Description
The lexer rejects _| when followed by any character other than _. It appears the lexer greedily enters the _|_ (bottom) path upon seeing _| without first checking that the third character is _. This means _ cannot be used as the left operand of | (disjunction) or || (logical OR) without inserting whitespace.
Affected: _|x for any x that is not _, including _||, _|1, _|b, _| 1.
Not affected: _ || x (space separates tokens), __|| (not bare _), _|_|| x (complete bottom before ||).
Reproducer (cmd/testscript):
testscript <<'EOD'
exec cue fmt ok_space.cue
exec cue fmt ok_bottom.cue
exec cue fmt ok_ident.cue
exec cue fmt fail_or.cue
exec cue fmt fail_disj.cue
-- ok_space.cue --
a: _ || true
-- ok_bottom.cue --
a: _|_ || true
-- ok_ident.cue --
a: b_|| true
-- fail_or.cue --
a: _|| true
-- fail_disj.cue --
a: _| 1
EOD
ok_space.cue, ok_bottom.cue, and ok_ident.cue pass. fail_or.cue and fail_disj.cue fail with:
illegal token '_|'; expected '_'
cue version v0.15.4.
Of course, all these are expected to fail evaluation, but here we observe a failure much earlier during tokenization.
This bug has been found by fuzzing with a BNFGen generative grammar. 87 out of 1,000,000 test programs (0.0087%) hit the bug. The mixlexing can cause to CUE programs to fail differently than in the example above, but they are caused by the same problem:
testscript <<'EOD'
exec cue fmt fail.cue
-- fail.cue --
""
: ( (
_|| 0XAF_eF)),
"""
""": 1
EOD
This fails with an additional spurious error:
illegal token '_|'; expected '_':
./fail.cue:4:3
expected ')', found 'EOF':
./fail.cue:7:7