P-expressions
Tony Garnock-Jones tonyg@leastfixedpoint.com
May 2024. Version 0.3.2.
Experimental. This document defines a grammar called Preserves Expressions (P-expressions, pexprs) that includes ordinary Preserves text syntax but offers extensions sufficient to support a Lisp- or Haskell-like programming notation.
Motivation. The text syntax for Preserves works well for writing
Value
s, i.e. data. However, in some contexts, Preserves applications
need a broader grammar that allows interleaving of expressions with
data. Two examples are the Preserves Schema
language and the Synit configuration scripting
language, both of
which (ab)use Preserves text syntax as a kind of programming notation.
Preliminaries
The P-expression grammar includes by reference the definition of Atom
from the
text syntax, as well as the definitions that Atom
depends on.
ws = *(%x20 / %x09 / CR / LF)
No changes to the Preserves semantic model are made. Every Preserves text-syntax term can be parsed as a valid P-expression, but in general P-expressions must be rewritten or otherwise interpreted before a meaningful Preserves value can be arrived at (see below).
Grammar
Standalone documents containing P-expressions are sequences of
individual Expr
s, followed by annotations, comments, and/or whitespace.
Document = *Expr Trailer ws
A single P-expression Expr
can be an Atom
from the text syntax,
a compound expression, special punctuation, an Embedded
expression, or
an Annotated
expression. The class SimpleExpr
includes all of Expr
except special punctuation.
Expr = ws (SimpleExpr / Punct)
SimpleExpr = Atom / Compound / Embedded / Annotated
Embedded and annotated values are as in the text syntax, differing only
in that uses of Value
are replaced with SimpleExpr
.
Embedded = "#:" SimpleExpr
Annotated = Annotation SimpleExpr
Annotation = "@" SimpleExpr / "#" [(%x20 / %x09/ %x21) linecomment] (CR / LF)
linecomment = *<any unicode scalar value except CR or LF>
P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.1
Punct = "," / ";" / 1*":"
Compound expressions are sequences of Expr
s with optional trailing
Annotation
s, surrounded by various kinds of parentheses.
Compound = Sequence / Record / Block / Group / Set
Sequence = "[" *Expr Trailer ws "]"
Record = "<" *Expr Trailer ws ">"
Block = "{" *Expr Trailer ws "}"
Group = "(" *Expr Trailer ws ")"
Set = "#{" *Expr Trailer ws "}"
In an Annotated
P-expression, annotations and comments attach to the
term following them, just as in the ordinary text syntax. However, it is
common in programming notations to allow comments at the end of a file
or other sequential construct. The ordinary text syntax forbids comments
in these positions, but P-expressions allow them.
Trailer = *(ws Annotation)
Encoding P-expressions as Preserves
We write ⌜p⌝ for the encoding into Preserves of a P-expression p.
⌜·⌝ : Expr | ⟶ | Value | |
⌜[ p …] ⌝ |
= | [ ⌜p⌝ …] |
|
⌜< p …> ⌝ |
= | <r ⌜p⌝ …> |
|
⌜{ p …} ⌝ |
= | <b ⌜p⌝ …> |
|
⌜( p …) ⌝ |
= | <g ⌜p⌝ …> |
|
⌜#{ p …} ⌝ |
= | <s ⌜p⌝ …> |
|
⌜#: p⌝ |
= | #: ⌜p⌝ |
|
⌜@ p q⌝ |
= | @ ⌜p⌝ ⌜q⌝ |
|
⌜p⌝ | = | p | when p ∈ Atom |
⌜, ⌝ |
= | <p ','> |
|
⌜; ⌝ |
= | <p ';'> |
|
⌜: …⌝ |
= | <p ': …'> |
|
⌜t⌝ | = | @ ⌜a⌝ … <a> |
where @ a … are the annotations in t and t ∈ Trailer |
The record <a>
acts as an anchor for the annotations in a Trailer
.
We overload the ⌜·⌝ notation for encoding whole Document
s into
sequences of Preserves values.
⌜·⌝ : P-expression Document | ⟶ | Preserves Sequence |
⌜p …⌝ | = | [ ⌜p⌝ …] |
⌜p … @ a …⌝ |
= | [ ⌜p⌝ … @ ⌜a⌝ … <a>] |
where @ a … are trailing annotations |
Interpreting P-expressions as Preserves
The previous section discussed ways of representing P-expressions using Preserves. Here, we discuss interpreting P-expressions as Preserves so that (1) a Preserves datum (2) written using Preserves text syntax and then (3) read as a P-expression can be (4) interpreted from that P-expression to yield the original datum.
- Every
(
..)
or;
that appears is an error. - Every
:
,::
,:::
, … is an error, except in context ofBlock
s as described below. - Every
,
that appears is discarded. - Every
Trailer
that appears is an error.2 - Every
Record
with no values in it is an error. - Every
Block
must contain zero or more repeating triplets ofSimpleExpr
,:
,SimpleExpr
. AnyBlock
not following this pattern is an error. EachBlock
following the pattern is translated to aDictionary
containing a key/value pair for each triplet. AnyBlock
with duplicate keys (under interpretation) is an error. - Every
Set
containing any duplicate expressions (under interpretation) is an error.
Appendix: Examples
Examples are given as pairs of P-expressions and their Preserves text-syntax encodings.
Individual P-expression Expr
s
⌜<date 1821 (lookup-month "February") 3>⌝
= <r date 1821 <g lookup-month "February"> 3>
⌜<>⌝
= <r>
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= <g begin <g println! <g + 1 2>> <g + 3 4>>
⌜()⌝
= <g>
⌜[() () ()]⌝
= [<g>, <g>, <g>]
⌜{
setUp();
# Now enter the loop
loop: {
greet("World");
}
tearDown();
}⌝
= <b
setUp <g> <p ';'>
# Now enter the loop
loop <p ':'> <b
greet <g "World"> <p ';'>
>
tearDown <g> <p ';'>
>
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #:remote, bar]⌝
= [1 + 2.0 <p ','> print "Hello" <p ','> predicate <p ':'> #t <p ','>
foo <p ','> #:remote <p ','> bar]
⌜#{1 2 3}⌝
= <s 1 2 3>
⌜#{(read) (read) (read)}⌝
= <s <g read> <g read> <g read>>
⌜{
optional name: string,
address: Address,
}⌝
= <b
optional name <p ':'> string <p ','>
address <p ':'> Address <p ','>
>
Whole Document
s
⌜#!/example/of/a/shebang
{
key: value
# example of a comment at the end of a dictionary
}
# example of a comment at the end of the input file⌝
= @<r interpreter "/example/of/a/shebang">
[ <b
key <p ':'> value
@"example of a comment at the end of a dictionary" <a>
>
@"example of a comment at the end of the input file"
<a>
]
Appendix: Reading vs. Parsing
Lisp systems first read streams of bytes into S-expressions and then parse those S-expressions into more abstract structures denoting various kinds of program syntax. Separation of reading from parsing is what gives Lisp its syntactic flexibility.
Similarly, the Apple programming language Dylan included a reader-parser split, with the Dylan reader producing D-expressions that are somewhat similar to P-expressions.
Finally, the Racket dialects Honu and Something use a reader-parser-macro setup, where the reader produces Racket data, the parser produces “syntax” and is user-extensible, and Racket’s own modular macro system rewrites this “syntax” down to core forms to be compiled to machine code.
Similarly, when using P-expressions as the foundation for a language, a generic P-expression reader can then feed into special-purpose parsers. The reader captures the coarse syntactic structure of a program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of
P-expression Expr
s.
-
A simple technique is repeated splitting of sequences; first by
Semicolon
, then byComma
, then by increasingly high binding-power operators. -
More refined is to use a Pratt parser or similar (1, 2, 3) to build a parse tree using an extensible specification of the pre-, in-, and postfix operators involved.
-
Finally, if you treat sequences of
Expr
s as pre-lexed token streams, almost any parsing formalism (such as PEG parsing, Ometa, etc.) can be used to extract further syntactic structure.
Appendix: Equations for interpreting P-expressions as Preserves
The function uncomma(p) removes all occurrences of ,
from a
P-expression p ∈ Expr
− {,
}.
uncomma : Expr − {, } |
⟶ | Expr | |
uncomma([ p …] ) |
= | [ uncomma(p) …] |
omitting any p = , |
uncomma(< p …> ) |
= | < uncomma(p) …> |
omitting any p = , |
uncomma({ p …} ) |
= | { uncomma(p) …} |
omitting any p = , |
uncomma(( p …) ) |
= | ( uncomma(p) …) |
omitting any p = , |
uncomma(#{ p …} ) |
= | #{ uncomma(p) …} |
omitting any p = , |
uncomma(#: p) |
= | #: uncomma(p) |
|
uncomma(@ p q) |
= | @ uncomma(p) uncomma(q) |
|
uncomma(p) | = | p | if p ∈ Atom ∪ Punct − {, } |
We write ⌞uncomma(p)⌟ for the partial function mapping a
P-expression p ∈ Expr
− {,
} to a corresponding Preserves Value
.
⌞·⌟ : Expr − {, } |
⇀ | Value | |
⌞[ p …] ⌟ |
= | [ ⌞p⌟ …] |
|
⌞< ℓ p …> ⌟ |
= | < ⌞ℓ⌟ ⌞p⌟ …> |
|
⌞{ k: v …} ⌟ |
= | { ⌞k⌟: ⌞v⌟ …} |
if all ⌞k⌟ … are distinct |
⌞#{ p …} ⌟ |
= | #{ ⌞p⌟ …} |
if all ⌞p⌟ … are distinct |
⌞#: p⌟ |
= | #: ⌞p⌟ |
|
⌞@ p q⌟ |
= | @ ⌞p⌟ ⌞q⌟ |
|
⌞p⌟ | = | p | when p ∈ Atom |
Notes
-
Colon matching is greedy: when reading, all adjacent colons are always taken into a single token, and when writing, adjacent colon-sequence punctuation marks must be written with whitespace separating them. ↩
-
Implementation note. When implementing parsing of P-expressions into Preserves, consider offering an optional mode where trailing annotations
Trailer
are discarded instead of causing an error to be signalled. ↩