Preserves Schema
Tony Garnock-Jones tonyg@leastfixedpoint.com
August 2024. Version 0.4.1.
This document proposes a Schema language for the Preserves data model.
Introduction
A Preserves schema connects Preserves Value
s to host-language data
structures. Each definition within a schema can be processed by a
compiler to produce
-
a simple host-language type definition;
-
a partial parsing function from
Value
s to instances of the produced type; and -
a total serialization function from instances of the type to
Value
s.
Every parsed Value
retains enough information to always be able to
be serialized again, and every instance of a host-language data
structure contains, by construction, enough information to be
successfully serialized.
Portability. Preserves Schema is broadly portable. Any host-language type system that can represent algebraic types in some way should be suitable as a compilation target.
This includes ML-family languages like Rust and Haskell, object-oriented languages like Java, Python and Smalltalk, and multiparadigm languages like JavaScript, TypeScript, Racket, Nim and Erlang.
Example. Sending the schema
version 1 .
Date = <date @year int @month int @day int>.
Person = <person @name string @birthday Date>.
to the TypeScript schema compiler produces types,
type Date = {"year": number, "month": number, "day": number};
type Person = {"name": string, "birthday": Date};
constructors,
function Date({year, month, day}: {year: number, month: number, day: number}): Date;
function Person({name, birthday}: {name: string, birthday: Date}): Person;
partial parsing functions which throw on parse failure,
function asDate(v: _val): Date;
function asPerson(v: _val): Person;
total parsing functions which yield undefined
on parse failure,
function toDate(v: _val): undefined | Date;
function toPerson(v: _val): undefined | Person;
and total serialization functions,
function fromDate(_v: Date): _val;
function fromPerson(_v: Person): _val;
Concepts
Bundle. A collection of schemas, each named by a module path.
Definition. A named pattern within a schema. When compiled, a definition will usually produce a type (plus associated constructors and predicates), a parser function, and a serializer function.
Metaschema. The Preserves metaschema is a schema describing the abstract syntax of all schema instances (including itself).
Module path. A sequence of symbols, denoting a leaf in a tree with symbol-labelled edges.
Pattern. A pattern describes a collection of Value
s as well as
providing names for the portions of matching Value
s that should be
captured in a host-language data type.
Schema abstract syntax tree (AST). Schema-manipulating tools will
usually work with schema AST; that is, with Value
s conforming to the
metaschema or instances of the corresponding host-language
datastructures.
Schema domain-specific language (DSL). While human beings can work directly with Preserves documents matching the metaschema, the schema DSL provides an easier-to-read and -write language for working with schemas that can be translated into instances
Schema. A collection of definitions, plus an optional schema-wide reference to a schema describing embedded values.
Identifiers and Capitalization Conventions
Throughout, id
is used in the grammar to denote an identifier,
which is a symbol that matches the regular expression
^[a-zA-Z][a-zA-Z_0-9]*$
. This is a lowest-common-denominator
constraint that allows for a reasonable mapping to the identifiers of
many programming languages.
Identifiers are case-sensitive. Schemas should be written with an awareness of the fact that some programming languages cannot preserve case differences. Avoid using two identifiers in the same context that differ only in case.
Schemas should be written using the following capitalization conventions:
-
UpperCamelCase
for definition names. -
Either
lowerCamelCase
orUpperCamelCase
for definition-unique names for alternatives within an alternation definition. -
lowerCamelCase
for module names (schema names, package names) and field or variable names.
The Preserves Schema Language
In this section, we use an ABNF-like notation to define a textual syntax that is easy for people to read and write. Most of the examples in this document are written using this syntax. An appendix defines the abstract syntax that this surface syntax translates into.
Schema files and bundles.
Each schema should be placed in a single file. Schema files usually
end with extension .prs
, and consist of a sequence of Preserves
Value
s1 separated into clauses by the Preserves
Symbol
“.
”.
A bundle of schema files is a directory tree containing .prs
files.
Clauses.
Clause = (Version / EmbeddedTypeName / Include / Definition) "."
Version = "version" "1"
EmbeddedTypeName = "embeddedType" ("#f" / Ref)
Include = "include" string
Version specification. Mandatory. Names the version of the schema
language used in the file. This version of the specification is
referred to in schema files as version 1
.
Embedded type name. Optional. If given as #f
(the default), it
declares that values parsed by the schema do not contain embedded
Value
s of any particular type. If given as a Ref
, a reference to a
definition in this or a neighbouring schema, it declares that embedded
Value
s must themselves conform to the named definition.
Include. Experimental. Includes the contents of a neighbouring file as if it were textually inserted in place of this clause. The file path may be relative to the current file, or absolute.
Definitions.
Definition = id "=" (OrPattern / AndPattern / Pattern)
Each definition clause connects a pattern over Value
s with a
host-language type name (derived from the supplied id
) and set of
associated functions.
A definition may be
- an alternation of patterns, allowing for biased choice among alternatives;
- an intersection of patterns, allowing for composition and reuse of patterns; or
- the base case, an ordinary pattern.
Host-language types. Each definition includes bindings that
capture information from a parsed Value
and expose it to programs in
the host language. When more than one binding is present in a
definition, a host-language record (product, structure, tuple) will be
the result of a parse; otherwise, a simple value will result. When a
definition involves alternation, a host-language representation of a
sum over the types of each branch of the alternation will result. For
example, a compiler targeting an object-oriented host language would
produce a base class for each definition, with a field for each binding
and a subclass for each variant alternative. A functional host language
with algebraic data types would produce a labelled-sum-of-products type.
Alternation definitions.
OrPattern = [orsep] AltPattern 1*(orsep AltPattern) [orsep]
orsep = 1*"/"
The right-hand-side of a definition may supply two or more alternatives.
Alternatives are separated by any number of slashes /
, and leading or
trailing slashes are ignored. When parsing, the alternatives are tried in
order; the result of the first successful alternative is the result of the
entire parse.2
Host-language types. The type corresponding to an OrPattern
is an
algebraic sum type, a union type, a variant type, or a concrete subclass
of an abstract superclass, depending on the host language.
Variant names. Each alternative with an OrPattern
must have a
definition-unique name. The name is used to uniquely label the
alternative’s host-language representation (for example, a subclass, or
a member of a tagged union type).
A variant name can either be given explicitly as @name
or
inferred.3 It can only be inferred
from the label of a record pattern, from the name of a reference to
another definition, or from the text of a “sufficiently
identifierlike” literal pattern - one that matches a
string, symbol or boolean:
AltPattern = "@" id Pattern
/ "<" id PatternSequence ">"
/ Ref
/ LiteralPattern -- with a side condition
A host language will likely use the same ordering of variants in a sum type as specified by the schema. It is therefore recommended to specify first the alternative best suited as a default initialization value (if there is any).
Intersection definitions.
AndPattern = [andsep] NamedPattern 1*(andsep NamedPattern) [andsep]
andsep = 1*"&"
The right-hand-side of a definition may supply two or more patterns, the
intersection of whose denotations is the denotation of the overall
definition. The patterns are separated by any number of ampersands &
,
and leading or trailing ampersands are ignored. When parsing, every
pattern is tried: if all succeed, the resulting information is combined
into a single type; otherwise, the overall parse fails.
When serializing, the terms resulting from serializing at each pattern are merged together.
Host-language types. Compiling an intersection definition produces a host-language type that is effectively the algebraic product of the types of the parts of the intersection. Practically, this usually means a record (product, structure, tuple) type.
Experimental.
Intersections are an experimental feature. They can be used to express optional dictionary entries:
MyDict = {a: int, b: string} & @c MaybeC . MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .
They can also be used to express something reminiscent of inheritance:
Type = @base BaseFields & @detail SubType . BaseFields = {a: int, b: string} . SubType = @base {} / @variantA { x: int } / @mid Mid . Mid = { y: symbol } & @detail SubSubType . SubSubType = @variantB { z: "type-b" } / @variantC { z: "type-c" }
It is not yet clear whether they pull their weight.
From the point of view of the user of the schema language, using intersections to express optional values is cumbersome. Not only is it verbose, requiring auxiliary definitions, but it leaves responsibility for checking for invalid inputs up to the user, rather than handling it completely at the Schema layer. A future Schema version will likely include first-class support for optionality.
Patterns.
Pattern = SimplePattern / CompoundPattern
Patterns come in two kinds:
-
The parsers for simple patterns yield a single host-language value—for example, a string, an array, a number, or a pointer—or even, in the case of
LiteralPattern
s, no host-language values at all.4 -
The parsers for compound patterns yield zero or more fields which combine into an overall record type associated with a definition.
Simple patterns
SimplePattern = AnyPattern
/ AtomKindPattern
/ EmbeddedPattern
/ LiteralPattern
/ SequenceOfPattern
/ SetOfPattern
/ DictOfPattern
/ Ref
The any
pattern matches any input Value
:
AnyPattern = "any"
Specifying the name of a kind of Atom
matches that kind of atom:
AtomKindPattern = "bool" / "double" / "int" / "string" / "bytes" / "symbol"
Embedded input Value
s are matched with embedded patterns. The
portion under the #:
prefix is the interface schema for the
embedded value.5 The result of a match is an
instance of the schema-wide embeddedType
, if one is supplied.
EmbeddedPattern = "#:" SimplePattern
A literal pattern may be expressed in any of three ways: non-symbol
atoms stand for themselves directly; symbols, prefixed with an equal
sign, are matched literally; and any Value
at all may be quoted by
placing it in a <<lit> ... >
record:
LiteralPattern = "="symbol / "<<lit>" value ">" / non-symbol-atom
Brackets containing an item pattern and a literal ellipsis match a sequence of items, each matching the nested item pattern. Sets and uniform dictionaries are similar.
SequenceOfPattern = "[" SimplePattern "..." "]"
SetOfPattern = "#{" SimplePattern "}"
DictOfPattern = "{" SimplePattern ":" SimplePattern "...:..." "}"
Finally, a reference to some other definition, in this schema or a neighbouring schema within this bundle, is made by mentioning the possibly-qualified name of the definition as a bare symbol:
Ref = symbol
Periods “.
” in such symbols are special:
Name
refers to the definition namedName
in the current schema.Mod.Submod.Name
refers to definitionName
inMod.Submod
, some other schema in the bundle.
Each period-separated portion of a reference name must be an id
, an
identifier.
Compound patterns
CompoundPattern = RecordPattern
/ TuplePattern
/ VariableTuplePattern
/ DictionaryPattern
A record pattern matches an input record. It may be specified as a
record with a literal in the label position, or as a quoted <<rec>
... >
record with a pattern for each of the label and field-sequence
positions:6
RecordPattern = "<<rec>" NamedPattern NamedPattern ">"
/ "<" value PatternSequence ">"
PatternSequence = *(NamedPattern) [NamedSimplePattern "..."]
A tuple pattern matches a fixed-length sequence with specific patterns in each position. A variable tuple pattern is the same, but with an additional pattern for matching additional elements following the fixed-position patterns.
TuplePattern = "[" *(NamedPattern) "]"
VariableTuplePattern = "[" *(NamedPattern) NamedSimplePattern "..." "]"
A dictionary pattern matches specific literal keys in an input
dictionary. If no explicit name is given for a particular
NamedSimplePattern
, but the key for the pattern is “sufficiently
identifierlike” (a string, symbol or boolean), then a
symbol formed from that key is used as the name for that dictionary
entry.
DictionaryPattern = "{" *(value ":" NamedSimplePattern) "}"
Extensibility. Each apparently fixed-size compound pattern over
records, tuples or dictionaries actually only places a lower bound on the
size of matching data. These patterns thus allow for extensibility by
ignoring additional fields, elements, or dictionary entries. For example,
the pattern <a @value int>
will match not only <a 123>
but also <a 123
"hello">
, while rejecting <a>
or <a [x y z]>
. Similarly, {a: int, b:
int}
matches {a: 123, b: 234, c: [x y z]}
as well as {a: 123, b: 234}
.
Each compound pattern places constraints only on the mentioned elements
of a datum, namely the leftmost fields of a record, the leftmost elements
of a tuple, and the mentioned keys of a dictionary. Unmentioned elements
are free to be present or absent.
Identifiers and Bindings: NamedPattern and NamedSimplePattern
Compound patterns specifications contain NamedPattern
s or
NamedSimplePattern
s rather than ordinary Pattern
s:
NamedPattern = "@" id SimplePattern / Pattern
NamedSimplePattern = "@" id SimplePattern / SimplePattern
Use of an @name
prefix generally results in creation of a field with
the given name in the overall record type for a definition. The type
of value contained in the field will correspond to the Pattern
or
SimplePattern
given.
“Sufficiently Identifierlike” Values
In some places in a schema, names can be inferred from some nearby
literal pattern element. In an OrPattern
, variant names can be
inferred; in a DictionaryPattern
, names for dictionary entries can be
inferred.
The rules are simple: if the literal pattern would match a specific
symbol or string, then that specific value is converted to a symbol and
used as the name. If the pattern would match #t
, the name will be
true
; if it would match #f
, the name will be false
.
For example, in the following grammar, the names for the variants of
Example1
are the symbols foo
and bar
and false
, and the names
for the two fields in Example2
are example
and 'testing strings'
.
Note that 'testing strings'
is a symbol whose name contains a space,
which will be rejected because it is not a valid identifier.
Example1 = =foo / "bar" / #f .
Example2 = { "testing strings": int, example: string } .
Semantics
Having covered concrete syntax, we now give semantics for the schema
language in terms of the abstract syntax and of the
language of Preserves Value
s.
Metaschema interpreter
(TODO: this subsection is to define an interpreter for metaschema values
applied to Preserves Value
s.)
Host-language types
The host-language types corresponding to a metaschema instance can themselves be described according to a grammar.
The definitions in this section should be understood as being part of a
module named host
, in a bundle alongside a module named schema
corresponding to the metaschema in the appendix below.
Abstract host language types
Definition = <union @variants [Variant ...]> / Simple .
Variant = [@label symbol @type Simple] .
The host-language type corresponding to a definition will either be a
tagged union (side condition: at least two Variant
s are present in a
union
) or a simple type.
Simple = Field / Record .
Record = <rec @fields [NamedField ...]> .
NamedField = [@name symbol @type Field] .
A simple type may be either a single, simple value of field type, or a record of multiple named fields, each having a specific field type.
Field = =unit
/ =any
/ =embedded
/ <array @element Field>
/ <set @element Field>
/ <map @key Field @value Field>
/ <ref @name schema.Ref>
/ schema.AtomKind .
A field type is either
- the language’s unit type (the empty tuple, the “void” value),
- the universal type of all Preserves
Value
s, - the type of some host-language embedded value in some context,
- the type of a uniform array having elements of a specific field type,
- the type of a set having elements of a specific field type,
- the type of a dictionary connecting keys of specific type to values of specific type,
- the type associated with some other named definition in scope in the current Schema bundle, or
- the type of a specific kind of Preserves
Atom
.
Computing abstract types from a metaschema instance
Given a metaschema definition d : schema.Definition
, the function
typeof yields a host.Definition
.
typeof :
schema.Definition
⟶host.Definition
typeof<or [[
n1
p1
]
…[
nn
pn
]]>
=<union [[
n1
(pat p1
)]
…[
nn
(pat pn
)]]>
typeof<and [
f1
… fn
]>
= product[
f1
… fn
]
typeof p = pat p, when p ∈schema.Pattern
pat :
schema.Pattern
⟶host.Simple
pat s = field s, when s ∈schema.SimplePattern
pat c = product[
c]
, when c ∈schema.CompoundPattern
field :
schema.SimplePattern
⟶host.Field
fieldany
=any
field<atom
k>
= k field<embedded
s>
=embedded
field<lit
v>
=unit
field<seqof
s>
=<array
(field s)>
field<setof
s>
=<set
(field s)>
field<dictof
sk
sv
>
=<map
(field sk
) (field sv
)>
field r = r, when r ∈schema.Ref
The helper function product is where unit
-valued
fields are omitted from the computed host-language type. If all fields
are so omitted, or if there were (recursively) no bindings in the input
patterns, product yields unit
type itself.
product :
[schema.NamedPattern
…]
⟶host.Simple
product[
f1
… fn
]
=unit
, if t =[]
;<rec
t>
, otherwise where t = gather f1
⧺ ⋯ ⧺ gather fn
gather :
schema.NamedPattern
⟶[host.NamedField ...]
gather<named
n p>
=[]
, if (field p) =unit
;[[
n (field p)]]
, otherwise gather<rec
flabel
ffields
>
= gather flabel
⧺ gather ffields
gather<tuple [
f1
.. fn
]>
= gather f1
⧺ ⋯ ⧺ gather fn
gather<tuplePrefix [
f1
… fn
]
frepeated
>
= gather f1
⧺ ⋯ ⧺ gather fn
⧺ gather frepeated
gather<dict {
v1
:
f1
… vn
:
fn
}>
= gather f1
′ ⧺ ⋯ ⧺ gather fn
′, where (f1
′ ⋯ fn
′) are the fi
sorted by Preserves term order of the corresponding vi
. gather s =[]
, when s ∈schema.SimplePattern
.
Appendix: Metaschema
The metaschema defines the structure of the abstract syntax (AST) of schemas, using the concrete DSL syntax described above.
The text below is taken from
schema/schema.prs
in the source code repository.
A Bundle
collects a number of Schema
s, each named by a
ModulePath
:7
Bundle = <bundle @modules Modules>.
Modules = { ModulePath: Schema ...:... }.
ModulePath = [symbol ...].
Schema = <schema {
version: Version
embeddedType: EmbeddedTypeName
definitions: Definitions
}>.
A Version
names the version of the schema language in use. At
present, it must be 1
.
# version 1 .
Version = 1 .
An EmbeddedTypeName
specifies the type of embedded values within
values parsed by a given schema:
EmbeddedTypeName = #f / Ref .
Ref = <ref @module ModulePath @name symbol>.
The Definitions
are a named collection of definitions within a
schema. Note the special mention of pattern0
and pattern1
: these
ensure that each or
or and
record has at least two members.
Definitions = { symbol: Definition ...:... }.
Definition =
# Pattern / Pattern / ...
/ <or [@pattern0 NamedAlternative
@pattern1 NamedAlternative
@patternN NamedAlternative ...]>
# Pattern & Pattern & ...
/ <and [@pattern0 NamedPattern
@pattern1 NamedPattern
@patternN NamedPattern ...]>
# Pattern
/ Pattern
.
NamedAlternative = [@variantLabel string @pattern Pattern].
Each Pattern
is either a simple or compound pattern:
Pattern = SimplePattern / CompoundPattern .
Simple patterns are as described above:
SimplePattern =
# any
/ =any
# special builtins: bool, double, int, string, bytes, symbol
/ <atom @atomKind AtomKind>
# matches an embedded value in the input: #:p
/ <embedded @interface SimplePattern>
# =symbol, <<lit> any>, or plain non-symbol atom
/ <lit @value any>
# [p ...] ----> <seqof <ref p>># see also tuplePrefix below.
/ <seqof @pattern SimplePattern>
# #{p} ----> <setof <ref p>>
/ <setof @pattern SimplePattern>
# {k: v, ...:...} ----> <dictof <ref k> <ref v>>
/ <dictof @key SimplePattern @value SimplePattern>
# symbol, symbol.symbol, symbol.symbol.symbol, ...
/ Ref
.
AtomKind = =Boolean
/ =Double
/ =SignedInteger
/ =String
/ =ByteString
/ =Symbol .
Compound patterns involve optionally-named subpatterns:
CompoundPattern =
# <label a b c> ----> <rec <lit label> <tuple [<ref a> <ref b> <ref c>]>>
# except for record labels
# <<rec> x y> ---> <rec <ref x> <ref y>>
/ <rec @label NamedPattern @fields NamedPattern>
# [a b c] ----> <tuple [<ref a> <ref b> <ref c>]>
/ <tuple @patterns [NamedPattern ...]>
# [a b c ...] ----> <tuplePrefix [<ref a> <ref b>] <seqof <ref c>>>
/ <tuplePrefix @fixed [NamedPattern ...] @variable NamedSimplePattern>
# {a: b, c: d} ----> <dict {a: <ref b>, c: <ref d>}>
/ <dict @entries DictionaryEntries>
.
DictionaryEntries = { any: NamedSimplePattern ...:... }.
Explicitly-named subpatterns are always SimplePattern
s; but,
depending on context, if a name is omitted, the pattern may be a
Pattern
or may be restricted to SimplePattern
as well:
NamedSimplePattern = @named Binding / @anonymous SimplePattern .
NamedPattern = @named Binding / @anonymous Pattern .
Binding = <named @name symbol @pattern SimplePattern>.
Appendix: Metaschema instance
The following is a (lightly-reformatted) Preserves document which is the output of DSL-to-AST compilation of the DSL source text of the metaschema.
<schema {
version: 1,
embeddedType: #f,
definitions: {
Pattern: <or [
["SimplePattern", <ref [] SimplePattern>],
["CompoundPattern", <ref [] CompoundPattern>]
]>,
CompoundPattern: <or [
["rec", <rec <lit rec> <tuple [
<named label <ref [] NamedPattern>>,
<named fields <ref [] NamedPattern>>
]>>],
["tuple", <rec <lit tuple> <tuple [<named patterns <seqof <ref [] NamedPattern>>>]>>],
["tuplePrefix", <rec <lit tuplePrefix> <tuple [
<named fixed <seqof <ref [] NamedPattern>>>,
<named variable <ref [] NamedSimplePattern>>
]>>],
["dict", <rec <lit dict> <tuple [<named entries <ref [] DictionaryEntries>>]>>]
]>,
Modules: <dictof <ref [] ModulePath> <ref [] Schema>>,
Ref: <rec <lit ref> <tuple [
<named module <ref [] ModulePath>>,
<named name <atom Symbol>>
]>>,
Bundle: <rec <lit bundle> <tuple [<named modules <ref [] Modules>>]>>,
Binding: <rec <lit named> <tuple [
<named name <atom Symbol>>,
<named pattern <ref [] SimplePattern>>
]>>,
Definition: <or [
["or", <rec <lit or> <tuple [<tuplePrefix [
<named pattern0 <ref [] NamedAlternative>>,
<named pattern1 <ref [] NamedAlternative>>
] <named patternN <seqof <ref [] NamedAlternative>>>>]>>],
["and", <rec <lit and> <tuple [<tuplePrefix [
<named pattern0 <ref [] NamedPattern>>,
<named pattern1 <ref [] NamedPattern>>
] <named patternN <seqof <ref [] NamedPattern>>>>]>>],
["Pattern", <ref [] Pattern>]
]>,
NamedSimplePattern: <or [
["named", <ref [] Binding>],
["anonymous", <ref [] SimplePattern>]
]>,
EmbeddedTypeName: <or [
["false", <lit #f>],
["Ref", <ref [] Ref>]
]>,
ModulePath: <seqof <atom Symbol>>,
AtomKind: <or [
["Boolean", <lit Boolean>],
["Double", <lit Double>],
["SignedInteger", <lit SignedInteger>],
["String", <lit String>],
["ByteString", <lit ByteString>],
["Symbol", <lit Symbol>]
]>,
DictionaryEntries: <dictof any <ref [] NamedSimplePattern>>,
Version: <lit 1>,
NamedPattern: <or [
["named", <ref [] Binding>],
["anonymous", <ref [] Pattern>]
]>,
SimplePattern: <or [
["any", <lit any>],
["atom", <rec <lit atom> <tuple [<named atomKind <ref [] AtomKind>>]>>],
["embedded", <rec <lit embedded> <tuple [<named interface <ref [] SimplePattern>>]>>],
["lit", <rec <lit lit> <tuple [<named value any>]>>],
["seqof", <rec <lit seqof> <tuple [<named pattern <ref [] SimplePattern>>]>>],
["setof", <rec <lit setof> <tuple [<named pattern <ref [] SimplePattern>>]>>],
["dictof", <rec <lit dictof> <tuple [
<named key <ref [] SimplePattern>>,
<named value <ref [] SimplePattern>>
]>>],
["Ref", <ref [] Ref>]
]>,
NamedAlternative: <tuple [
<named variantLabel <atom String>>,
<named pattern <ref [] Pattern>>
]>,
Definitions: <dictof <atom Symbol> <ref [] Definition>>,
Schema: <rec <lit schema> <tuple [<dict {
version: <named version <ref [] Version>>,
embeddedType: <named embeddedType <ref [] EmbeddedTypeName>>,
definitions: <named definitions <ref [] Definitions>>
}>]>>
}
}>
Appendix: Example generated types
The following are (abridged) outputs from the TypeScript and Racket compilers for Preserves Schema. Note that an implementation does not have to be a compiler: for example, the current Python implementation directly interprets Schema AST.
Date/Person example (person-example.prs)
The following are outputs for the Date and Person example, person-example.prs. The input schema is:
version 1 .
Date = <date @year int @month int @day int>.
Person = <person @name string @birthday Date>.
TypeScript.
The full output is available in person-example.ts.
import * as _ from "@preserves/core";
// ...
export type Date = ({"year": number, "month": number, "day": number} & ...);
export type Person = ({"name": string, "birthday": Date} & ...);
// ...
Racket.
The full output is available in person-example.rkt.
(module person-example racket/base
(provide ...)
(require preserves)
...
(struct Date (year month day) #:transparent #:methods gen:preservable ...)
(struct Person (name birthday) #:transparent #:methods gen:preservable ...)
...)
SSH Authentication Subprotocol example (auth-example.prs)
The following are outputs for the SSH authentication subprotocol example, auth-example.prs. The input schema is:
version 1 .
SshAuthenticatedUser = <authenticated @username string @service bytes>.
SshAuthMethod =
/ @none #"none"
/ @publickey #"publickey"
/ @password #"password"
.
SshAuthRequest =
/ <none @username string>
/ <publickey @username string @key PublicKey>
/ <password @username string @password string>
.
SshAuthenticationMethodAcceptable = <authentication-method-acceptable @method SshAuthMethod>.
SshAuthenticationAcceptable =
<authentication-acceptable? @method SshAuthMethod @request SshAuthRequest @ok bool>.
PublicKey = Ed25519PublicKey .
Ed25519PublicKey = <ed25519-public-key @q bytes>.
Ed25519PrivateKey = <ed25519-private-key @q bytes @d bytes>.
TypeScript.
The full output is available in auth-example.ts.
import * as _ from "@preserves/core";
...
export type SshAuthenticatedUser = ({"username": string, "service": _.Bytes} & ...);
export type SshAuthMethod = (
(
{"_variant": "none"} |
{"_variant": "publickey"} |
{"_variant": "password"}
) & ...);
export type SshAuthRequest = (
(
{"_variant": "none", "username": string} |
{"_variant": "publickey", "username": string, "key": PublicKey} |
{"_variant": "password", "username": string, "password": string}
) & ...);
export type SshAuthenticationMethodAcceptable = ({"method": SshAuthMethod} & ...);
export type SshAuthenticationAcceptable = (
{"method": SshAuthMethod, "request": SshAuthRequest, "ok": boolean} & ...);
export type PublicKey = Ed25519PublicKey;
export type Ed25519PublicKey = ({"q": _.Bytes} & ...);
export type Ed25519PrivateKey = ({"q": _.Bytes, "d": _.Bytes} & ...);
Racket.
The full output is available in auth-example.rkt.
(module auth-example racket/base
(provide ...)
(require preserves)
...
(struct Ed25519PrivateKey (q d) ...)
(struct Ed25519PublicKey (q) ...)
(struct PublicKey (value) ...)
(define (SshAuthMethod? p)
(or (SshAuthMethod-none? p)
(SshAuthMethod-publickey? p)
(SshAuthMethod-password? p)))
(struct SshAuthMethod-none () ...)
(struct SshAuthMethod-publickey () ...)
(struct SshAuthMethod-password () ...)
(define (SshAuthRequest? p)
(or (SshAuthRequest-none? p)
(SshAuthRequest-publickey? p)
(SshAuthRequest-password? p)))
(struct SshAuthRequest-none (username) ...)
(struct SshAuthRequest-publickey (username key) ...)
(struct SshAuthRequest-password (username password) ...)
(struct SshAuthenticatedUser (username service) ...)
(struct SshAuthenticationAcceptable (method request ok) ...)
(struct SshAuthenticationMethodAcceptable (method) ...)
...)
Metaschema (schema.prs)
The following are outputs for the metaschema, schema.prs.
TypeScript.
import * as _ from "@preserves/core";
// ...
export type _embedded = any;
export type _val = _.Value<_embedded>;
// ...
export type Bundle = {"modules": Modules};
export type Modules = _.KeyedDictionary<ModulePath, Schema, _embedded>;
export type Schema = {
"version": Version,
"embeddedType": EmbeddedTypeName,
"definitions": Definitions
};
export type Version = null;
export type EmbeddedTypeName = ({"_variant": "false"} | {"_variant": "Ref", "value": Ref});
export type Definitions = _.KeyedDictionary<symbol, Definition, _embedded>;
export type Definition = (
{
"_variant": "or",
"pattern0": NamedAlternative,
"pattern1": NamedAlternative,
"patternN": Array<NamedAlternative>
} |
{
"_variant": "and",
"pattern0": NamedPattern,
"pattern1": NamedPattern,
"patternN": Array<NamedPattern>
} |
{"_variant": "Pattern", "value": Pattern}
);
export type Pattern = (
{"_variant": "SimplePattern", "value": SimplePattern} |
{"_variant": "CompoundPattern", "value": CompoundPattern}
);
export type SimplePattern = (
{"_variant": "any"} |
{"_variant": "atom", "atomKind": AtomKind} |
{"_variant": "embedded", "interface": SimplePattern} |
{"_variant": "lit", "value": _val} |
{"_variant": "seqof", "pattern": SimplePattern} |
{"_variant": "setof", "pattern": SimplePattern} |
{"_variant": "dictof", "key": SimplePattern, "value": SimplePattern} |
{"_variant": "Ref", "value": Ref}
);
export type CompoundPattern = (
{"_variant": "rec", "label": NamedPattern, "fields": NamedPattern} |
{"_variant": "tuple", "patterns": Array<NamedPattern>} |
{
"_variant": "tuplePrefix",
"fixed": Array<NamedPattern>,
"variable": NamedSimplePattern
} |
{"_variant": "dict", "entries": DictionaryEntries}
);
export type DictionaryEntries = _.KeyedDictionary<_val, NamedSimplePattern, _embedded>;
export type AtomKind = (
{"_variant": "Boolean"} |
{"_variant": "Double"} |
{"_variant": "SignedInteger"} |
{"_variant": "String"} |
{"_variant": "ByteString"} |
{"_variant": "Symbol"}
);
export type NamedAlternative = {"variantLabel": string, "pattern": Pattern};
export type NamedSimplePattern = (
{"_variant": "named", "value": Binding} |
{"_variant": "anonymous", "value": SimplePattern}
);
export type NamedPattern = (
{"_variant": "named", "value": Binding} |
{"_variant": "anonymous", "value": Pattern}
);
export type Binding = {"name": symbol, "pattern": SimplePattern};
export type Ref = {"module": ModulePath, "name": symbol};
export type ModulePath = Array<symbol>;
Racket.
(struct AtomKind-Symbol () #:prefab)
(struct AtomKind-ByteString () #:prefab)
(struct AtomKind-String () #:prefab)
(struct AtomKind-SignedInteger () #:prefab)
(struct AtomKind-Double () #:prefab)
(struct AtomKind-Boolean () #:prefab)
(struct Bundle (modules) #:prefab)
(struct CompoundPattern-dict (entries) #:prefab)
(struct CompoundPattern-tuplePrefix (fixed variable) #:prefab)
(struct CompoundPattern-tuple (patterns) #:prefab)
(struct CompoundPattern-rec (label fields) #:prefab)
(struct Definition-Pattern (value) #:prefab)
(struct Definition-and (pattern0 pattern1 patternN) #:prefab)
(struct Definition-or (pattern0 pattern1 patternN) #:prefab)
(struct EmbeddedTypeName-false () #:prefab)
(struct EmbeddedTypeName-Ref (value) #:prefab)
(struct NamedAlternative (variantLabel pattern) #:prefab)
(struct NamedPattern-anonymous (value) #:prefab)
(struct NamedPattern-named (value) #:prefab)
(struct NamedSimplePattern-anonymous (value) #:prefab)
(struct NamedSimplePattern-named (value) #:prefab)
(struct Binding (name pattern) #:prefab)
(struct Pattern-CompoundPattern (value) #:prefab)
(struct Pattern-SimplePattern (value) #:prefab)
(struct Ref (module name) #:prefab)
(struct Schema (definitions embeddedType version) #:prefab)
(struct SimplePattern-Ref (value) #:prefab)
(struct SimplePattern-dictof (key value) #:prefab)
(struct SimplePattern-setof (pattern) #:prefab)
(struct SimplePattern-seqof (pattern) #:prefab)
(struct SimplePattern-lit (value) #:prefab)
(struct SimplePattern-embedded (interface) #:prefab)
(struct SimplePattern-atom (atomKind) #:prefab)
(struct SimplePattern-any () #:prefab)
Appendix: Future work
-
There are side conditions on AST instances. It would be nice to eventually be able to express these within the metaschema.
-
It’d be interesting to, Ometa-like, be able to specify the DSL-to-AST translation process as a schema. One challenge in doing so is the way schemas are required to be reversible at present.
-
Should
include
accept URLs, to be able to retrieve schema from the web? -
It’d be nice to firm up the interpretation of embedded interface schemas. I have in mind something like the higher-order contracts of Dimoulas. Essentially, a schema is a contract, and embedded pointers-to-behaviour are like closures/channels/objects/etc, which demand higher-order contracts. Future work could pin this down further; also, consideration of dependent schemas (analogous to dependent contracts) could be of interest.
Example. In the following fragment,
#:Session
is the handle a connected user uses to interact with a chatroom. In the implementation,Says
messages are dropped if theirwho
doesn’t match theuid
supplied in theJoin
assertion. It’d be nice to capture that using a dependent schema, passing in the specificuid
value to theSession
constructor, something like#:(Session uid)
.Join = <joinedUser @uid UserId @handle #:Session>. Session = @observeSpeech <Observe =says @observer #:Says> / Says . Says = <says @who UserId @what string>.
Notes
-
That is, schema files use Preserves as a kind of S-expression! ↩
-
This ordered choice becomes important when combined with the extensibility of e.g. record patterns: A pattern
<a @b int> / <a @b int @c int>
will always match the left branch because compound patterns match prefixes of records and tuples. A better way to write it would be<a @b int @c int> / <a @b int>
, which tries the at-least-two-fields branch before falling back to the at-least-one-field branch. ↩ -
Note that explicitly-given variant names are unlike binding names in that binding names give rise to a field in the record type for a definition, while variant names are used as labels for alternatives in a sum type for a definition. ↩
-
The case of a
LiteralPattern
yielding no host-language values is interesting. All the information required to reversibly store the result of a parse is already in the schema, so nothing need be stored at runtime in host-language data type instances. Concretely, a definition consisting only of aLiteralPattern
might correspond to a host-language unit type (the empty tuple, the “void” value). Definitions consisting ofCompoundPattern
s involvingLiteralPattern
s do not even need to store this much: fields of unit type in a host-language record type can simply be omitted without loss. ↩ -
Embedded patterns are experimental. One interpretation is that an embedded value denotes a reference to some stateful actor in a potentially-distributed system, and that the interface schema associated with an embedded value describes the messages that may be sent to that actor.
Examples.
#:any
may denote a reference to an Actor able to receive any value as a message;#:#t
, a reference to an Actor expecting only the “true” message;#:Session
, a reference to an Actor expecting any message matching a schema defined asSession
in this file. ↩ -
Note that
<label
ps>
can be thought of as roughly equivalent to<<rec> <<lit> label> [
ps]>
. The following two definitions are equivalent:D1 = <foo @a string @b string @extra any ... >. D2 = <<rec> <<lit> foo> [@a string @b string @extra any ...]>.
-
The semantics of module path references remain to be specified! ↩