Preserves Schema
A Preserves schema connects Preserves Value
s to host-language data
structures. Each definition within a schema can be processed by a
compiler to produce
-
a simple host-language type definition;
-
a partial parsing function from
Value
s to instances of the produced type; and -
a total serialization function from instances of the type to
Value
s.
Every parsed Value
retains enough information to always be able to
be serialized again, and every instance of a host-language data
structure contains, by construction, enough information to be
successfully serialized.
Schema support in Python
The preserves.schema module implements Preserves Schema for Python.
A Schema source file (like this one) is first
compiled using preserves-schemac
to
produce a binary-syntax schema bundle containing schema module definitons (like this
one). Python code
then loads the bundle, exposing its contents as Namespaces
ultimately containing SchemaObjects.
Examples
Setup: Loading a schema bundle
For our running example, we will use schemas associated with the Syndicated Actor
Model. (The schema bundle
is a copy of this
file
from the syndicate-protocols
repository.)
To load a schema bundle, use load_schema_file (or, alternatively, use Compiler directly):
>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> type(bundle)
<class 'preserves.schema.Namespace'>
The top-level entries in the loaded bundle are schema modules. Let's examine the stream
schema module, whose source
code
indicates that it should contain definitions for Mode
, Source
, Sink
, etc.:
>>> bundle.stream # doctest: +ELLIPSIS
{'Mode': <class 'stream.Mode'>, 'Sink': <class 'stream.Sink'>, ...}
Example 1: stream.StreamListenerError, a product type
Drilling down further, let's consider the definition of StreamListenerError, which appears in the source as
StreamListenerError = <stream-listener-error @spec any @message string> .
This reads, in the Preserves Schema
language, as the
definition of a simple product type (record, class, object) with two named fields spec
and
message
. Parsing a value into a StreamListenerError
will only succeed if it's a record, if
the label matches, the second field (message
) is a string, and it has exactly two fields.
>>> bundle.stream.StreamListenerError
<class 'stream.StreamListenerError'>
The StreamListenerError
class includes a decode
method that analyzes an input value:
>>> bundle.stream.StreamListenerError.decode(
... parse('<stream-listener-error <xyz> "an error">'))
StreamListenerError {'spec': #xyz(), 'message': 'an error'}
If invalid input is supplied, decode will raise SchemaDecodeFailed, which includes helpful information for diagnosing the problem (as we will see below, this is especially useful for parsers for sum types):
>>> bundle.stream.StreamListenerError.decode(
... parse('<i-am-invalid>'))
Traceback (most recent call last):
...
preserves.schema.SchemaDecodeFailed: Could not decode i-am-invalid using <class 'stream.StreamListenerError'>
Most likely reason: in stream.StreamListenerError: <lit stream-listener-error> didn't match i-am-invalid
Full explanation:
in stream.StreamListenerError: <lit stream-listener-error> didn't match i-am-invalid
Alternatively, the try_decode method catches
SchemaDecodeFailed, transforming it into None
:
>>> bundle.stream.StreamListenerError.try_decode(
... parse('<stream-listener-error <xyz> "an error">'))
StreamListenerError {'spec': #xyz(), 'message': 'an error'}
>>> bundle.stream.StreamListenerError.try_decode(
... parse('<i-am-invalid>'))
The class can also be instantiated directly:
>>> err = bundle.stream.StreamListenerError(Record(Symbol('xyz'), []), 'an error')
>>> err
StreamListenerError {'spec': #xyz(), 'message': 'an error'}
The fields and contents of instances can be queried:
>>> err.spec
#xyz()
>>> err.message
'an error'
And finally, instances can of course be serialized and encoded:
>>> print(stringify(err))
<stream-listener-error <xyz> "an error">
>>> canonicalize(err)
b'\xb4\xb3\x15stream-listener-error\xb4\xb3\x03xyz\x84\xb1\x08an error\x84'
Example 2: stream.Mode, a sum type
Now let's consider the definition of Mode, which appears in the source as
Mode = =bytes / @lines LineMode / <packet @size int> / <object @description any> .
This reads, in the Preserves Schema
language, as an
alternation (disjoint union, variant, sum type) of four possible kinds of value: the symbol
bytes
; a LineMode
value; a record with packet
as its label and an integer as its only
field; or a record with object
as its label and any kind of value as its only field. In
Python, this becomes:
>>> bundle.stream.Mode.bytes
<class 'stream.Mode.bytes'>
>>> bundle.stream.Mode.lines
<class 'stream.Mode.lines'>
>>> bundle.stream.Mode.packet
<class 'stream.Mode.packet'>
>>> bundle.stream.Mode.object
<class 'stream.Mode.object'>
As before, Mode
includes a decode method that analyzes
an input value:
>>> bundle.stream.Mode.decode(parse('bytes'))
Mode.bytes()
>>> bundle.stream.Mode.decode(parse('lf'))
Mode.lines(LineMode.lf())
>>> bundle.stream.Mode.decode(parse('<packet 123>'))
Mode.packet {'size': 123}
>>> bundle.stream.Mode.decode(parse('<object "?">'))
Mode.object {'description': '?'}
Invalid input causes SchemaDecodeFailed to be raised:
>>> bundle.stream.Mode.decode(parse('<i-am-not-a-valid-mode>'))
Traceback (most recent call last):
...
preserves.schema.SchemaDecodeFailed: Could not decode <i-am-not-a-valid-mode> using <class 'stream.Mode'>
Most likely reason: in stream.LineMode.crlf: <lit crlf> didn't match <i-am-not-a-valid-mode>
Full explanation:
in stream.Mode: matching <i-am-not-a-valid-mode>
in stream.Mode.bytes: <lit bytes> didn't match <i-am-not-a-valid-mode>
in stream.Mode.lines: <ref [] LineMode> didn't match <i-am-not-a-valid-mode>
in stream.LineMode: matching <i-am-not-a-valid-mode>
in stream.LineMode.lf: <lit lf> didn't match <i-am-not-a-valid-mode>
in stream.LineMode.crlf: <lit crlf> didn't match <i-am-not-a-valid-mode>
in stream.Mode.packet: <lit packet> didn't match i-am-not-a-valid-mode
in stream.Mode.object: <lit object> didn't match i-am-not-a-valid-mode
The "full explanation" includes details on which parses were attempted, and why they failed.
Again, the try_decode method catches
SchemaDecodeFailed, transforming it into None
:
>>> bundle.stream.Mode.try_decode(parse('bytes'))
Mode.bytes()
>>> bundle.stream.Mode.try_decode(parse('<i-am-not-a-valid-mode>'))
Direct instantiation is done with the variant classes, not with Mode
itself:
>>> bundle.stream.Mode.bytes()
Mode.bytes()
>>> bundle.stream.Mode.lines(bundle.stream.LineMode.lf())
Mode.lines(LineMode.lf())
>>> bundle.stream.Mode.packet(123)
Mode.packet {'size': 123}
>>> bundle.stream.Mode.object('?')
Mode.object {'description': '?'}
Fields and contents can be queried as usual:
>>> bundle.stream.Mode.lines(bundle.stream.LineMode.lf()).value
LineMode.lf()
>>> bundle.stream.Mode.packet(123).size
123
>>> bundle.stream.Mode.object('?').description
'?'
And serialization and encoding are also as expected:
>>> print(stringify(bundle.stream.Mode.bytes()))
bytes
>>> print(stringify(bundle.stream.Mode.lines(bundle.stream.LineMode.lf())))
lf
>>> print(stringify(bundle.stream.Mode.packet(123)))
<packet 123>
>>> print(stringify(bundle.stream.Mode.object('?')))
<object "?">
>>> canonicalize(bundle.stream.Mode.object('?'))
b'\xb4\xb3\x06object\xb1\x01?\x84'
Finally, the VARIANT attribute of instances allows code to dispatch on what kind of data it is handling at a given moment:
>>> bundle.stream.Mode.bytes().VARIANT
#bytes
>>> bundle.stream.Mode.lines(bundle.stream.LineMode.lf()).VARIANT
#lines
>>> bundle.stream.Mode.packet(123).VARIANT
#packet
>>> bundle.stream.Mode.object('?').VARIANT
#object
meta = load_schema_file(__metaschema_filename).schema
module-attribute
Schema module Namespace corresponding to Preserves Schema's metaschema.
Compiler()
Instances of Compiler populate an initially-empty Namespace by loading and compiling schema bundle files.
>>> c = Compiler()
>>> c.load('docs/syndicate-protocols-schema-bundle.bin')
>>> type(c.root)
<class 'preserves.schema.Namespace'>
Attributes:
Name | Type | Description |
---|---|---|
root |
Namespace
|
the root namespace into which top-level schema modules are installed. |
Source code in preserves/schema.py
912 913 |
|
load(filename)
Opens the file at filename
, passing the resulting file object to
load_filelike.
Source code in preserves/schema.py
936 937 938 939 940 941 |
|
load_filelike(f, module_name=None)
Reads a meta.Bundle
or meta.Schema
from the filelike object f
, compiling and
installing it in self.root
. If f
contains a bundle, module_name
is not used,
since the schema modules in the bundle know their own names; if f
contains a plain
schema module, however, module_name
is used directly if it is a string, and if it is
None
, a suitable module name is computed from the name
attribute of f
, if it is
present. If name
is absent in that case, ValueError
is raised.
Source code in preserves/schema.py
915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 |
|
Definition(*args, **kwargs)
Bases: SchemaObject
Subclasses of Definition are used to represent both standalone non-alternation definitions as well as alternatives within an Enumeration.
>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> bundle.stream.StreamListenerError.FIELD_NAMES
['spec', 'message']
>>> bundle.stream.StreamListenerError.SAFE_FIELD_NAMES
['spec', 'message']
>>> bundle.stream.StreamListenerError.ENUMERATION is None
True
>>> bundle.stream.Mode.object.FIELD_NAMES
['description']
>>> bundle.stream.Mode.object.SAFE_FIELD_NAMES
['description']
>>> bundle.stream.Mode.object.ENUMERATION is bundle.stream.Mode
True
>>> bundle.stream.CreditAmount.count.FIELD_NAMES
[]
>>> bundle.stream.CreditAmount.count.SAFE_FIELD_NAMES
[]
>>> bundle.stream.CreditAmount.count.ENUMERATION is bundle.stream.CreditAmount
True
>>> bundle.stream.CreditAmount.decode(parse('123'))
CreditAmount.count(123)
>>> bundle.stream.CreditAmount.count(123)
CreditAmount.count(123)
>>> bundle.stream.CreditAmount.count(123).value
123
Source code in preserves/schema.py
689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 |
|
ENUMERATION = None
class-attribute
instance-attribute
None
for standalone top-level definitions with a module; otherwise, an
Enumeration subclass representing a top-level alternation
definition.
FIELD_NAMES = []
class-attribute
instance-attribute
List of strings: names of the fields contained within this definition, if it has named
fields at all; otherwise, an empty list, and the definition is a simple wrapper for another
value, in which case that value is accessed via the value
attribute.
SAFE_FIELD_NAMES = []
class-attribute
instance-attribute
The list produced by mapping safeattrname over FIELD_NAMES.
Enumeration()
Bases: SchemaObject
Subclasses of Enumeration represent a group of variant options within a sum type.
>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> import pprint
>>> pprint.pprint(bundle.stream.Mode.VARIANTS)
[(#bytes, <class 'stream.Mode.bytes'>),
(#lines, <class 'stream.Mode.lines'>),
(#packet, <class 'stream.Mode.packet'>),
(#object, <class 'stream.Mode.object'>)]
>>> bundle.stream.Mode.VARIANTS[0][1] is bundle.stream.Mode.bytes
True
Source code in preserves/schema.py
581 582 |
|
VARIANTS = None
class-attribute
instance-attribute
List of (Symbol, SchemaObject class)
tuples representing the possible options within
this sum type.
Namespace(prefix)
A Namespace is a dictionary-like object representing a schema module that knows its location in a schema module hierarchy and whose attributes correspond to definitions and submodules within the schema module.
Attributes:
Name | Type | Description |
---|---|---|
_prefix |
tuple[Symbol]
|
path to this module/Namespace from the root Namespace |
Source code in preserves/schema.py
876 877 |
|
SchemaDecodeFailed(cls, p, v, failures=None)
Bases: ValueError
Raised when decode cannot find a way to parse a given input.
Attributes:
Name | Type | Description |
---|---|---|
cls |
class
|
the SchemaObject subclass attempting the parse |
pattern |
Value
|
the failing pattern, a |
value |
Value
|
the unparseable value |
failures |
list[SchemaDecodeFailed]
|
descriptions of failed paths attempted during the match this failure describes |
Source code in preserves/schema.py
311 312 313 314 315 316 |
|
SchemaObject
Base class for classes representing grammatical productions in a schema: instances of SchemaObject represent schema definitions. This is an abstract class, as are its subclasses Enumeration and Definition. It is subclasses of those subclasses, automatically produced during schema loading, that are actually instantiated.
>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> bundle.stream.Mode.mro()[1:-1]
[<class 'preserves.schema.Enumeration'>, <class 'preserves.schema.SchemaObject'>]
>>> bundle.stream.Mode.packet.mro()[1:-1]
[<class 'stream.Mode._ALL'>, <class 'preserves.schema.Definition'>, <class 'preserves.schema.SchemaObject'>]
>>> bundle.stream.StreamListenerError.mro()[1:-1]
[<class 'preserves.schema.Definition'>, <class 'preserves.schema.SchemaObject'>]
Illustrating the class attributes on SchemaObject subclasses:
>>> bundle.stream.Mode.ROOTNS is bundle
True
>>> print(stringify(bundle.stream.Mode.SCHEMA, indent=2))
<or [
[
"bytes"
<lit bytes>
]
[
"lines"
<ref [] LineMode>
]
[
"packet"
<rec <lit packet> <tuple [<named size <atom SignedInteger>>]>>
]
[
"object"
<rec <lit object> <tuple [<named description any>]>>
]
]>
>>> bundle.stream.Mode.MODULE_PATH
(#stream,)
>>> bundle.stream.Mode.NAME
#Mode
>>> bundle.stream.Mode.VARIANT is None
True
>>> bundle.stream.Mode.packet.VARIANT
#packet
MODULE_PATH = None
class-attribute
instance-attribute
A sequence (tuple) of Symbols naming the path from the root to the schema module containing this definition.
NAME = None
class-attribute
instance-attribute
A Symbol naming this definition within its module.
ROOTNS = None
class-attribute
instance-attribute
A Namespace that is the top-level environment for all bundles included in the Compiler run that produced this SchemaObject.
SCHEMA = None
class-attribute
instance-attribute
A Value
conforming to schema meta.Definition
(and thus often to meta.Pattern
etc.), interpreted by the SchemaObject machinery to drive
parsing, unparsing and so forth.
VARIANT = None
class-attribute
instance-attribute
None
for Definitions (such as
bundle.stream.StreamListenerError
above) and for overall
Enumerations (such as bundle.stream.Mode
), or a
Symbol for variant definitions contained within an enumeration
(such as bundle.stream.Mode.packet
).
__preserve__()
Called by preserves.values.preserve: unparses the information represented by
this instance, using its schema definition, to produce a Preserves Value
.
Source code in preserves/schema.py
538 539 540 541 |
|
decode(v)
classmethod
Parses v
using the SCHEMA, returning a
(sub)instance of SchemaObject or raising
SchemaDecodeFailed.
Source code in preserves/schema.py
444 445 446 447 448 449 |
|
try_decode(v)
classmethod
Parses v
using the SCHEMA, returning a
(sub)instance of SchemaObject or None
if parsing
failed.
Source code in preserves/schema.py
451 452 453 454 455 456 457 458 459 |
|
extend(cls)
A decorator for function definitions. Useful for adding behaviour to the classes resulting from loading a schema module:
>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> @extend(bundle.stream.LineMode.lf)
... def what_am_i(self):
... return 'I am a LINEFEED linemode'
>>> @extend(bundle.stream.LineMode.crlf)
... def what_am_i(self):
... return 'I am a CARRIAGE-RETURN-PLUS-LINEFEED linemode'
>>> bundle.stream.LineMode.lf()
LineMode.lf()
>>> bundle.stream.LineMode.lf().what_am_i()
'I am a LINEFEED linemode'
>>> bundle.stream.LineMode.crlf()
LineMode.crlf()
>>> bundle.stream.LineMode.crlf().what_am_i()
'I am a CARRIAGE-RETURN-PLUS-LINEFEED linemode'
Source code in preserves/schema.py
977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 |
|
load_schema_file(filename)
Simple entry point to the compiler: creates a Compiler,
calls load on it, and returns its root
Namespace.
>>> bundle = load_schema_file('docs/syndicate-protocols-schema-bundle.bin')
>>> type(bundle)
<class 'preserves.schema.Namespace'>
Source code in preserves/schema.py
960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 |
|
safeattrname(k)
Escapes Python keywords by prepending _
; passes all other strings through.
Source code in preserves/schema.py
613 614 615 |
|
Created: March 16, 2023