Skip to content

Preserves Schema

A Preserves schema connects Preserves Values to host-language data structures. Each definition within a schema can be processed by a compiler to produce

  • a simple host-language type definition;

  • a partial parsing function from Values to instances of the produced type; and

  • a total serialization function from instances of the type to Values.

Every parsed Value retains enough information to always be able to be serialized again, and every instance of a host-language data structure contains, by construction, enough information to be successfully serialized.

Schema support in Python

The preserves.schema module implements Preserves Schema for Python.

A Schema source file (like this one) is first compiled using preserves-schemac to produce a binary-syntax schema bundle containing schema module definitons (like this one). Python code then loads the bundle, exposing its contents as Namespaces ultimately containing SchemaObjects.

Examples

Setup: Loading a schema bundle

For our running example, we will use schemas associated with the Syndicated Actor Model. (The schema bundle is a copy of this file from the syndicate-protocols repository.)

To load a schema bundle, use load_schema_file (or, alternatively, use Compiler directly):

>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> type(bundle)
<class 'preserves.schema.Namespace'>

The top-level entries in the loaded bundle are schema modules. Let's examine the stream schema module, whose source code indicates that it should contain definitions for Mode, Source, Sink, etc.:

>>> bundle.stream                                           # doctest: +ELLIPSIS
{'Mode': <class 'stream.Mode'>, 'Sink': <class 'stream.Sink'>, ...}

Example 1: stream.StreamListenerError, a product type

Drilling down further, let's consider the definition of StreamListenerError, which appears in the source as

StreamListenerError = <stream-listener-error @spec any @message string> .

This reads, in the Preserves Schema language, as the definition of a simple product type (record, class, object) with two named fields spec and message. Parsing a value into a StreamListenerError will only succeed if it's a record, if the label matches, the second field (message) is a string, and it has exactly two fields.

>>> bundle.stream.StreamListenerError
<class 'stream.StreamListenerError'>

The StreamListenerError class includes a decode method that analyzes an input value:

>>> bundle.stream.StreamListenerError.decode(
...     parse('<stream-listener-error <xyz> "an error">'))
StreamListenerError {'spec': #xyz(), 'message': 'an error'}

If invalid input is supplied, decode will raise SchemaDecodeFailed, which includes helpful information for diagnosing the problem (as we will see below, this is especially useful for parsers for sum types):

>>> bundle.stream.StreamListenerError.decode(
...     parse('<i-am-invalid>'))
Traceback (most recent call last):
  ...
preserves.schema.SchemaDecodeFailed: Could not decode i-am-invalid using <class 'stream.StreamListenerError'>
Most likely reason: in stream.StreamListenerError: <lit stream-listener-error> didn't match i-am-invalid
Full explanation: 
  in stream.StreamListenerError: <lit stream-listener-error> didn't match i-am-invalid

Alternatively, the try_decode method catches SchemaDecodeFailed, transforming it into None:

>>> bundle.stream.StreamListenerError.try_decode(
...     parse('<stream-listener-error <xyz> "an error">'))
StreamListenerError {'spec': #xyz(), 'message': 'an error'}
>>> bundle.stream.StreamListenerError.try_decode(
...     parse('<i-am-invalid>'))

The class can also be instantiated directly:

>>> err = bundle.stream.StreamListenerError(Record(Symbol('xyz'), []), 'an error')
>>> err
StreamListenerError {'spec': #xyz(), 'message': 'an error'}

The fields and contents of instances can be queried:

>>> err.spec
#xyz()
>>> err.message
'an error'

And finally, instances can of course be serialized and encoded:

>>> print(stringify(err))
<stream-listener-error <xyz> "an error">
>>> canonicalize(err)
b'\xb4\xb3\x15stream-listener-error\xb4\xb3\x03xyz\x84\xb1\x08an error\x84'

Example 2: stream.Mode, a sum type

Now let's consider the definition of Mode, which appears in the source as

Mode = =bytes / @lines LineMode / <packet @size int> / <object @description any> .

This reads, in the Preserves Schema language, as an alternation (disjoint union, variant, sum type) of four possible kinds of value: the symbol bytes; a LineMode value; a record with packet as its label and an integer as its only field; or a record with object as its label and any kind of value as its only field. In Python, this becomes:

>>> bundle.stream.Mode.bytes
<class 'stream.Mode.bytes'>
>>> bundle.stream.Mode.lines
<class 'stream.Mode.lines'>
>>> bundle.stream.Mode.packet
<class 'stream.Mode.packet'>
>>> bundle.stream.Mode.object
<class 'stream.Mode.object'>

As before, Mode includes a decode method that analyzes an input value:

>>> bundle.stream.Mode.decode(parse('bytes'))
Mode.bytes()
>>> bundle.stream.Mode.decode(parse('lf'))
Mode.lines(LineMode.lf())
>>> bundle.stream.Mode.decode(parse('<packet 123>'))
Mode.packet {'size': 123}
>>> bundle.stream.Mode.decode(parse('<object "?">'))
Mode.object {'description': '?'}

Invalid input causes SchemaDecodeFailed to be raised:

>>> bundle.stream.Mode.decode(parse('<i-am-not-a-valid-mode>'))
Traceback (most recent call last):
  ...
preserves.schema.SchemaDecodeFailed: Could not decode <i-am-not-a-valid-mode> using <class 'stream.Mode'>
Most likely reason: in stream.LineMode.crlf: <lit crlf> didn't match <i-am-not-a-valid-mode>
Full explanation: 
  in stream.Mode: matching <i-am-not-a-valid-mode>
    in stream.Mode.bytes: <lit bytes> didn't match <i-am-not-a-valid-mode>
    in stream.Mode.lines: <ref [] LineMode> didn't match <i-am-not-a-valid-mode>
      in stream.LineMode: matching <i-am-not-a-valid-mode>
        in stream.LineMode.lf: <lit lf> didn't match <i-am-not-a-valid-mode>
        in stream.LineMode.crlf: <lit crlf> didn't match <i-am-not-a-valid-mode>
    in stream.Mode.packet: <lit packet> didn't match i-am-not-a-valid-mode
    in stream.Mode.object: <lit object> didn't match i-am-not-a-valid-mode

The "full explanation" includes details on which parses were attempted, and why they failed.

Again, the try_decode method catches SchemaDecodeFailed, transforming it into None:

>>> bundle.stream.Mode.try_decode(parse('bytes'))
Mode.bytes()
>>> bundle.stream.Mode.try_decode(parse('<i-am-not-a-valid-mode>'))

Direct instantiation is done with the variant classes, not with Mode itself:

>>> bundle.stream.Mode.bytes()
Mode.bytes()
>>> bundle.stream.Mode.lines(bundle.stream.LineMode.lf())
Mode.lines(LineMode.lf())
>>> bundle.stream.Mode.packet(123)
Mode.packet {'size': 123}
>>> bundle.stream.Mode.object('?')
Mode.object {'description': '?'}

Fields and contents can be queried as usual:

>>> bundle.stream.Mode.lines(bundle.stream.LineMode.lf()).value
LineMode.lf()
>>> bundle.stream.Mode.packet(123).size
123
>>> bundle.stream.Mode.object('?').description
'?'

And serialization and encoding are also as expected:

>>> print(stringify(bundle.stream.Mode.bytes()))
bytes
>>> print(stringify(bundle.stream.Mode.lines(bundle.stream.LineMode.lf())))
lf
>>> print(stringify(bundle.stream.Mode.packet(123)))
<packet 123>
>>> print(stringify(bundle.stream.Mode.object('?')))
<object "?">
>>> canonicalize(bundle.stream.Mode.object('?'))
b'\xb4\xb3\x06object\xb1\x01?\x84'

Finally, the VARIANT attribute of instances allows code to dispatch on what kind of data it is handling at a given moment:

>>> bundle.stream.Mode.bytes().VARIANT
#bytes
>>> bundle.stream.Mode.lines(bundle.stream.LineMode.lf()).VARIANT
#lines
>>> bundle.stream.Mode.packet(123).VARIANT
#packet
>>> bundle.stream.Mode.object('?').VARIANT
#object

meta = load_schema_file(__metaschema_filename).schema module-attribute

Schema module Namespace corresponding to Preserves Schema's metaschema.

Compiler()

Instances of Compiler populate an initially-empty Namespace by loading and compiling schema bundle files.

>>> c = Compiler()
>>> c.load('docs/syndicate-protocols-schema-bundle.bin')
>>> type(c.root)
<class 'preserves.schema.Namespace'>

Attributes:

Name Type Description
root Namespace

the root namespace into which top-level schema modules are installed.

Source code in preserves/schema.py
910
911
def __init__(self):
    self.root = Namespace(())

load(filename)

Opens the file at filename, passing the resulting file object to load_filelike.

Source code in preserves/schema.py
934
935
936
937
938
939
def load(self, filename):
    """Opens the file at `filename`, passing the resulting file object to
    [load_filelike][preserves.schema.Compiler.load_filelike]."""
    filename = pathlib.Path(filename)
    with open(filename, 'rb') as f:
        self.load_filelike(f, filename.stem)

load_filelike(f, module_name=None)

Reads a meta.Bundle or meta.Schema from the filelike object f, compiling and installing it in self.root. If f contains a bundle, module_name is not used, since the schema modules in the bundle know their own names; if f contains a plain schema module, however, module_name is used directly if it is a string, and if it is None, a suitable module name is computed from the name attribute of f, if it is present. If name is absent in that case, ValueError is raised.

Source code in preserves/schema.py
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
def load_filelike(self, f, module_name=None):
    """Reads a `meta.Bundle` or `meta.Schema` from the filelike object `f`, compiling and
    installing it in `self.root`. If `f` contains a bundle, `module_name` is not used,
    since the schema modules in the bundle know their own names; if `f` contains a plain
    schema module, however, `module_name` is used directly if it is a string, and if it is
    `None`, a suitable module name is computed from the `name` attribute of `f`, if it is
    present. If `name` is absent in that case, `ValueError` is raised.

    """
    x = Decoder(f.read()).next()
    if x.key == SCHEMA:
        if module_name is None:
            if hasattr(f, 'name'):
                module_name = pathlib.Path(f.name).stem
            else:
                raise ValueError('Cannot load schema module from filelike object without a module_name')
        self.load_schema((Symbol(module_name),), x)
    elif x.key == BUNDLE:
        for (p, s) in x[0].items():
            self.load_schema(p, s)

Definition(*args, **kwargs)

Bases: SchemaObject

Subclasses of Definition are used to represent both standalone non-alternation definitions as well as alternatives within an Enumeration.

>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')

>>> bundle.stream.StreamListenerError.FIELD_NAMES
['spec', 'message']
>>> bundle.stream.StreamListenerError.SAFE_FIELD_NAMES
['spec', 'message']
>>> bundle.stream.StreamListenerError.ENUMERATION is None
True

>>> bundle.stream.Mode.object.FIELD_NAMES
['description']
>>> bundle.stream.Mode.object.SAFE_FIELD_NAMES
['description']
>>> bundle.stream.Mode.object.ENUMERATION is bundle.stream.Mode
True

>>> bundle.stream.CreditAmount.count.FIELD_NAMES
[]
>>> bundle.stream.CreditAmount.count.SAFE_FIELD_NAMES
[]
>>> bundle.stream.CreditAmount.count.ENUMERATION is bundle.stream.CreditAmount
True

>>> bundle.stream.CreditAmount.decode(parse('123'))
CreditAmount.count(123)
>>> bundle.stream.CreditAmount.count(123)
CreditAmount.count(123)
>>> bundle.stream.CreditAmount.count(123).value
123
Source code in preserves/schema.py
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
def __init__(self, *args, **kwargs):
    self._fields = args
    if self.SIMPLE:
        if self.EMPTY:
            if len(args) != 0:
                raise TypeError('%s takes no arguments' % (self._constructor_name(),))
        else:
            if len(args) != 1:
                raise TypeError('%s needs exactly one argument' % (self._constructor_name(),))
            self.value = args[0]
    else:
        i = 0
        for arg in args:
            if i >= len(self.FIELD_NAMES):
                raise TypeError('%s given too many positional arguments' % (self._constructor_name(),))
            setattr(self, self.SAFE_FIELD_NAMES[i], arg)
            i = i + 1
        for (argname, arg) in kwargs.items():
            if hasattr(self, argname):
                raise TypeError('%s given duplicate attribute: %r' % (self._constructor_name, argname))
            if argname not in self.SAFE_FIELD_NAMES:
                raise TypeError('%s given unknown attribute: %r' % (self._constructor_name, argname))
            setattr(self, argname, arg)
            i = i + 1
        if i != len(self.FIELD_NAMES):
            raise TypeError('%s needs argument(s) %r' % (self._constructor_name(), self.FIELD_NAMES))

ENUMERATION = None class-attribute instance-attribute

None for standalone top-level definitions with a module; otherwise, an Enumeration subclass representing a top-level alternation definition.

FIELD_NAMES = [] class-attribute instance-attribute

List of strings: names of the fields contained within this definition, if it has named fields at all; otherwise, an empty list, and the definition is a simple wrapper for another value, in which case that value is accessed via the value attribute.

SAFE_FIELD_NAMES = [] class-attribute instance-attribute

The list produced by mapping safeattrname over FIELD_NAMES.

Enumeration()

Bases: SchemaObject

Subclasses of Enumeration represent a group of variant options within a sum type.

>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')

>>> import pprint
>>> pprint.pprint(bundle.stream.Mode.VARIANTS)
[(#bytes, <class 'stream.Mode.bytes'>),
 (#lines, <class 'stream.Mode.lines'>),
 (#packet, <class 'stream.Mode.packet'>),
 (#object, <class 'stream.Mode.object'>)]

>>> bundle.stream.Mode.VARIANTS[0][1] is bundle.stream.Mode.bytes
True
Source code in preserves/schema.py
579
580
def __init__(self):
    raise TypeError('Cannot create instance of Enumeration')

VARIANTS = None class-attribute instance-attribute

List of (Symbol, SchemaObject class) tuples representing the possible options within this sum type.

Namespace(prefix)

A Namespace is a dictionary-like object representing a schema module that knows its location in a schema module hierarchy and whose attributes correspond to definitions and submodules within the schema module.

Attributes:

Name Type Description
_prefix tuple[Symbol]

path to this module/Namespace from the root Namespace

Source code in preserves/schema.py
874
875
def __init__(self, prefix):
    self._prefix = prefix

SchemaDecodeFailed(cls, p, v, failures=None)

Bases: ValueError

Raised when decode cannot find a way to parse a given input.

Attributes:

Name Type Description
cls class

the SchemaObject subclass attempting the parse

pattern Value

the failing pattern, a Value conforming to schema meta.Pattern

value Value

the unparseable value

failures list[SchemaDecodeFailed]

descriptions of failed paths attempted during the match this failure describes

Source code in preserves/schema.py
310
311
312
313
314
315
def __init__(self, cls, p, v, failures=None):
    super().__init__()
    self.cls = cls
    self.pattern = p
    self.value = v
    self.failures = [] if failures is None else failures

SchemaObject

Base class for classes representing grammatical productions in a schema: instances of SchemaObject represent schema definitions. This is an abstract class, as are its subclasses Enumeration and Definition. It is subclasses of those subclasses, automatically produced during schema loading, that are actually instantiated.

>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')

>>> bundle.stream.Mode.mro()[1:-1]
[<class 'preserves.schema.Enumeration'>, <class 'preserves.schema.SchemaObject'>]

>>> bundle.stream.Mode.packet.mro()[1:-1]
[<class 'stream.Mode._ALL'>, <class 'preserves.schema.Definition'>, <class 'preserves.schema.SchemaObject'>]

>>> bundle.stream.StreamListenerError.mro()[1:-1]
[<class 'preserves.schema.Definition'>, <class 'preserves.schema.SchemaObject'>]

Illustrating the class attributes on SchemaObject subclasses:

>>> bundle.stream.Mode.ROOTNS is bundle
True

>>> print(stringify(bundle.stream.Mode.SCHEMA, indent=2))
<or [
  [
    "bytes"
    <lit bytes>
  ]
  [
    "lines"
    <ref [] LineMode>
  ]
  [
    "packet"
    <rec <lit packet> <tuple [<named size <atom SignedInteger>>]>>
  ]
  [
    "object"
    <rec <lit object> <tuple [<named description any>]>>
  ]
]>

>>> bundle.stream.Mode.MODULE_PATH
(#stream,)

>>> bundle.stream.Mode.NAME
#Mode

>>> bundle.stream.Mode.VARIANT is None
True
>>> bundle.stream.Mode.packet.VARIANT
#packet

MODULE_PATH = None class-attribute instance-attribute

A sequence (tuple) of Symbols naming the path from the root to the schema module containing this definition.

NAME = None class-attribute instance-attribute

A Symbol naming this definition within its module.

ROOTNS = None class-attribute instance-attribute

A Namespace that is the top-level environment for all bundles included in the Compiler run that produced this SchemaObject.

SCHEMA = None class-attribute instance-attribute

A Value conforming to schema meta.Definition (and thus often to meta.Pattern etc.), interpreted by the SchemaObject machinery to drive parsing, unparsing and so forth.

VARIANT = None class-attribute instance-attribute

None for Definitions (such as bundle.stream.StreamListenerError above) and for overall Enumerations (such as bundle.stream.Mode), or a Symbol for variant definitions contained within an enumeration (such as bundle.stream.Mode.packet).

__preserve__()

Called by preserves.values.preserve: unparses the information represented by this instance, using its schema definition, to produce a Preserves Value.

Source code in preserves/schema.py
536
537
538
539
def __preserve__(self):
    """Called by [preserves.values.preserve][]: *unparses* the information represented by
    this instance, using its schema definition, to produce a Preserves `Value`."""
    raise NotImplementedError('Subclass responsibility')

decode(v) classmethod

Parses v using the SCHEMA, returning a (sub)instance of SchemaObject or raising SchemaDecodeFailed.

Source code in preserves/schema.py
443
444
445
446
447
448
@classmethod
def decode(cls, v):
    """Parses `v` using the [SCHEMA][preserves.schema.SchemaObject.SCHEMA], returning a
    (sub)instance of [SchemaObject][preserves.schema.SchemaObject] or raising
    [SchemaDecodeFailed][preserves.schema.SchemaDecodeFailed]."""
    raise NotImplementedError('Subclass responsibility')

try_decode(v) classmethod

Parses v using the SCHEMA, returning a (sub)instance of SchemaObject or None if parsing failed.

Source code in preserves/schema.py
450
451
452
453
454
455
456
457
458
@classmethod
def try_decode(cls, v):
    """Parses `v` using the [SCHEMA][preserves.schema.SchemaObject.SCHEMA], returning a
    (sub)instance of [SchemaObject][preserves.schema.SchemaObject] or `None` if parsing
    failed."""
    try:
        return cls.decode(v)
    except SchemaDecodeFailed:
        return None

extend(cls)

A decorator for function definitions. Useful for adding behaviour to the classes resulting from loading a schema module:

>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')

>>> @extend(bundle.stream.LineMode.lf)
... def what_am_i(self):
...     return 'I am a LINEFEED linemode'

>>> @extend(bundle.stream.LineMode.crlf)
... def what_am_i(self):
...     return 'I am a CARRIAGE-RETURN-PLUS-LINEFEED linemode'

>>> bundle.stream.LineMode.lf()
LineMode.lf()
>>> bundle.stream.LineMode.lf().what_am_i()
'I am a LINEFEED linemode'

>>> bundle.stream.LineMode.crlf()
LineMode.crlf()
>>> bundle.stream.LineMode.crlf().what_am_i()
'I am a CARRIAGE-RETURN-PLUS-LINEFEED linemode'
Source code in preserves/schema.py
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
def extend(cls):
    """A decorator for function definitions. Useful for adding *behaviour* to the classes
    resulting from loading a schema module:

    ```python
    >>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')

    >>> @extend(bundle.stream.LineMode.lf)
    ... def what_am_i(self):
    ...     return 'I am a LINEFEED linemode'

    >>> @extend(bundle.stream.LineMode.crlf)
    ... def what_am_i(self):
    ...     return 'I am a CARRIAGE-RETURN-PLUS-LINEFEED linemode'

    >>> bundle.stream.LineMode.lf()
    LineMode.lf()
    >>> bundle.stream.LineMode.lf().what_am_i()
    'I am a LINEFEED linemode'

    >>> bundle.stream.LineMode.crlf()
    LineMode.crlf()
    >>> bundle.stream.LineMode.crlf().what_am_i()
    'I am a CARRIAGE-RETURN-PLUS-LINEFEED linemode'

    ```

    """
    @wraps(cls)
    def extender(f):
        setattr(cls, f.__name__, f)
        return f
    return extender

load_schema_file(filename)

Simple entry point to the compiler: creates a Compiler, calls load on it, and returns its root Namespace.

>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> type(bundle)
<class 'preserves.schema.Namespace'>
Source code in preserves/schema.py
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
def load_schema_file(filename):
    """Simple entry point to the compiler: creates a [Compiler][preserves.schema.Compiler],
    calls [load][preserves.schema.Compiler.load] on it, and returns its `root`
    [Namespace][preserves.schema.Namespace].

    ```python
    >>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
    >>> type(bundle)
    <class 'preserves.schema.Namespace'>

    ```
    """
    c = Compiler()
    c.load(filename)
    return c.root

safeattrname(k)

Escapes Python keywords by prepending _; passes all other strings through.

Source code in preserves/schema.py
611
612
613
def safeattrname(k):
    """Escapes Python keywords by prepending `_`; passes all other strings through."""
    return k + '_' if keyword.iskeyword(k) else k