Preserves Path

The preserves.path module implements Preserves Path.

Preserves Path is roughly analogous to XPath, but for Preserves values: just as XPath selects portions of an XML document, a Preserves Path uses path expressions to select portions of a Value.

Use parse to compile a path expression, and then use the exec method on the result to apply it to a given input:

parse(PATH_EXPRESSION_STRING).exec(PRESERVES_VALUE)
    -> SEQUENCE_OF_PRESERVES_VALUES

Command-line usage

When preserves.path is run as a __main__ module, sys.argv[1] is parsed, interpreted as a path expression, and run against human-readable values read from standard input. Each matching result is passed to stringify and printed to standard output.

Examples

Setup: Loading test data

The following examples use testdata:

>>> with open('tests/samples.bin', 'rb') as f:
...     testdata = decode_with_annotations(f.read())

Recall that samples.bin contains a binary-syntax form of the human-readable [samples.pr](https://preserves.dev/tests/samples.pr) test data file, intended to exercise most of the features of Preserves. In particular, the rootValue` in the file has a number of annotations (for documentation and other purposes).

Example 1: Selecting string-valued documentation annotations

The path expression .annotations ^ Documentation . 0 / string proceeds in five steps:

.annotations selects each annotation on the root document
^ Documentation retains only those values (each an annotation of the root) that are Records with label equal to the symbol Documentation
. 0 moves into the first child (the first field) of each such Record, which in our case is a list of other Values
/ selects all immediate children of these lists
string retains only those values that are strings

The result of evaluating it on testdata is as follows:

>>> selector = parse('.annotations ^ Documentation . 0 / string')
>>> for result in selector.exec(testdata):
...     print(stringify(result))
"Individual test cases may be any of the following record types:"
"In each test, let stripped = strip(annotatedValue),"
"                  encodeBinary(·) produce canonical ordering and no annotations,"
"                  looseEncodeBinary(·) produce any ordering, but with annotations,"
"                  annotatedBinary(·) produce “canonical ordering”, but with annotations,"
"                  decodeBinary(·) include annotations,"
"                  encodeText(·) include annotations,"
"                  decodeText(·) include annotations,"
"and check the following numbered expectations according to the table above:"
"Implementations may vary in their treatment of the difference between expectations"
"21/22 and 31/32, depending on how they wish to treat end-of-stream conditions."
"The idea of canonical-ordering-with-annotations is to encode, say, sets with their elements"
"in sorted order of their canonical annotationless binary encoding, but then actually"
"*serialized* with the annotations present."

Example 2: Selecting tests with Records as their annotatedValues

The path expression // [.^ [= Test + = NondeterministicTest]] [. 1 rec] proceeds in three steps:

// recursively decomposes the input, yielding all direct and indirect descendants of each input value
[.^ [= Test + = NondeterministicTest]] retains only those inputs (each a descendant of the root) that yield more than zero results when executed against the expression within the brackets:
1. .^ selects only labels of values that are Records, filtering by type and transforming in a single step
2. [= Test + = NondeterministicTest] again filters by a path expression:
  1. the infix + operator takes the union of matches of its arguments
  2. the left-hand argument, = Test selects values (remember, record labels) equal to the symbol Test
  3. the right-hand argument = NondeterministicTest selects values equal to NondeterministicTest
The result is thus all Records anywhere inside testdata that have either Test or NondeterministicTest as their labels.
[. 1 rec] filters these Records by another path expression:
1. . 1 selects their second field (fields are numbered from 0)
2. rec retains only values that are Records

Evaluating the expression against testdata yields the following:

>>> selector = parse('// [.^ [= Test + = NondeterministicTest]] [. 1 rec]')
>>> for result in selector.exec(testdata):
...     print(stringify(result))
<Test #[tLMHY2FwdHVyZbSzB2Rpc2NhcmSEhA==] <capture <discard>>>
<Test #[tLMHb2JzZXJ2ZbSzBXNwZWFrtLMHZGlzY2FyZIS0swdjYXB0dXJltLMHZGlzY2FyZISEhIQ=] <observe <speak <discard> <capture <discard>>>>>
<Test #[tLWzBnRpdGxlZLMGcGVyc29usAECswV0aGluZ7ABAYSwAWWxCUJsYWNrd2VsbLSzBGRhdGWwAgcdsAECsAEDhLECRHKE] <[titled person 2 thing 1] 101 "Blackwell" <date 1821 2 3> "Dr">>
<Test #[tLMHZGlzY2FyZIQ=] <discard>>
<Test #[tLABB7WEhA==] <7 []>>
<Test #[tLMHZGlzY2FyZLMIc3VycHJpc2WE] <discard surprise>>
<Test #[tLEHYVN0cmluZ7ABA7ABBIQ=] <"aString" 3 4>>
<Test #[tLSzB2Rpc2NhcmSEsAEDsAEEhA==] <<discard> 3 4>>
<Test #[hbMCYXK0swFShbMCYWazAWaE] @ar <R @af f>>
<Test #[tIWzAmFyswFShbMCYWazAWaE] <@ar R @af f>>

`Predicate = syntax.Predicate` `module-attribute`

Schema definition for representing a Preserves Path Predicate.

`Selector = syntax.Selector` `module-attribute`

Schema definition for representing a sequence of Preserves Path Steps.

`dumps = stringify` `module-attribute`

This alias for stringify provides a familiar pythonesque name for converting a Preserves Value to a string.

`loads = parse` `module-attribute`

This alias for parse provides a familiar pythonesque name for converting a string to a Preserves Value.

`syntax = load_schema_file(pathlib.Path(file).parent / 'path.prb').path` `module-attribute`

This value is a Python representation of a Preserves Schema definition for the Preserves Path expression language. The language is defined in the file path.prs.

`Annotated(item)`

Bases: object

A Preserves Value along with a sequence of Values annotating it. Compares equal to the underlying Value, ignoring the annotations. See the specification document for more about annotations.

>>> import preserves
>>> a = preserves.parse('''
... # A comment
... [1 2 3]
... ''', include_annotations=True)
>>> a
@'A comment' (1, 2, 3)
>>> a.item
(1, 2, 3)
>>> a.annotations
['A comment']
>>> a == (1, 2, 3)
True
>>> a == preserves.parse('@xyz [1 2 3]', include_annotations=True)
True
>>> a[0]
Traceback (most recent call last):
  ...
TypeError: 'Annotated' object is not subscriptable
>>> a.item[0]
1
>>> type(a.item[0])
<class 'preserves.values.Annotated'>
>>> a.item[0].annotations
[]
>>> print(preserves.stringify(a))
@"A comment" [1 2 3]
>>> print(preserves.stringify(a, include_annotations=False))
[1 2 3]

Attributes:

Name	Type	Description
`item`	`Value`	the underlying annotated `Value`
`annotations`	`list[Value]`	the annotations attached to `self.item`

Source code in preserves/values.py

def __init__(self, item):
    self.annotations = []
    self.item = item

`peel()`

Calls strip_annotations on self with depth=1.

Source code in preserves/values.py

def peel(self):
    """Calls [strip_annotations][preserves.values.strip_annotations] on `self` with `depth=1`."""
    return strip_annotations(self, 1)

`strip(depth=inf)`

Calls strip_annotations on self and depth.

Source code in preserves/values.py

def strip(self, depth=inf):
    """Calls [strip_annotations][preserves.values.strip_annotations] on `self` and `depth`."""
    return strip_annotations(self, depth)

`DecodeError`

Bases: ValueError

Raised whenever preserves.binary.Decoder or preserves.text.Parser detect invalid input.

`Decoder(packet=b'', include_annotations=False, decode_embedded=lambda x: x)`

Bases: BinaryCodec

Implementation of a decoder for the machine-oriented binary Preserves syntax.

Parameters:

Name	Type	Description	Default
`packet`	`bytes`	initial contents of the input buffer; may subsequently be extended by calling extend.	`b''`
`include_annotations`	`bool`	if `True`, wrap each value and subvalue in an Annotated object.	`False`
`decode_embedded`		function accepting a `Value` and returning a possibly-decoded form of that value suitable for placing into an Embedded object.	`lambda x: x`

Normal usage is to supply a buffer, and keep calling next until a ShortPacket exception is raised:

>>> d = Decoder(b'\xb0\x01{\xb1\x05hello\x85\xb3\x01x\xb5\x84')
>>> d.next()
123
>>> d.next()
'hello'
>>> d.next()
()
>>> d.next()
Traceback (most recent call last):
  ...
preserves.error.ShortPacket: Short packet

Alternatively, keep calling try_next until it yields None, which is not in the domain of Preserves Values:

>>> d = Decoder(b'\xb0\x01{\xb1\x05hello\x85\xb3\x01x\xb5\x84')
>>> d.try_next()
123
>>> d.try_next()
'hello'
>>> d.try_next()
()
>>> d.try_next()

For convenience, Decoder implements the iterator interface, backing it with try_next, so you can simply iterate over all complete values in an input:

>>> d = Decoder(b'\xb0\x01{\xb1\x05hello\x85\xb3\x01x\xb5\x84')
>>> list(d)
[123, 'hello', ()]

>>> for v in Decoder(b'\xb0\x01{\xb1\x05hello\x85\xb3\x01x\xb5\x84'):
...     print(repr(v))
123
'hello'
()

Supply include_annotations=True to read annotations alongside the annotated values:

>>> d = Decoder(b'\xb0\x01{\xb1\x05hello\x85\xb3\x01x\xb5\x84', include_annotations=True)
>>> list(d)
[123, 'hello', @#x ()]

If you are incrementally reading from, say, a socket, you can use extend to add new input as if comes available:

>>> d = Decoder(b'\xb0\x01{\xb1\x05he')
>>> d.try_next()
123
>>> d.try_next() # returns None because the input is incomplete
>>> d.extend(b'llo')
>>> d.try_next()
'hello'
>>> d.try_next()

Attributes:

Name	Type	Description
`packet`	`bytes`	buffered input waiting to be processed
`index`	`int`	read position within `packet`

Source code in preserves/binary.py

def __init__(self, packet=b'', include_annotations=False, decode_embedded=lambda x: x):
    super(Decoder, self).__init__()
    self.packet = packet
    self.index = 0
    self.include_annotations = include_annotations
    self.decode_embedded = decode_embedded

`extend(data)`

Appends data to the remaining bytes in self.packet, trimming already-processed bytes from the front of self.packet and resetting self.index to zero.

Source code in preserves/binary.py

def extend(self, data):
    """Appends `data` to the remaining bytes in `self.packet`, trimming already-processed
    bytes from the front of `self.packet` and resetting `self.index` to zero."""
    self.packet = self.packet[self.index:] + data
    self.index = 0

`next()`

Reads the next complete Value from the internal buffer, raising ShortPacket if too few bytes are available, or DecodeError if the input is invalid somehow.

Source code in preserves/binary.py

def next(self):
    """Reads the next complete `Value` from the internal buffer, raising
    [ShortPacket][preserves.error.ShortPacket] if too few bytes are available, or
    [DecodeError][preserves.error.DecodeError] if the input is invalid somehow.

    """
    tag = self.nextbyte()
    if tag == 0x80: return self.wrap(False)
    if tag == 0x81: return self.wrap(True)
    if tag == 0x84: raise DecodeError('Unexpected end-of-stream marker')
    if tag == 0x85:
        a = self.next()
        v = self.next()
        return self.unshift_annotation(a, v)
    if tag == 0x86:
        if self.decode_embedded is None:
            raise DecodeError('No decode_embedded function supplied')
        return self.wrap(Embedded(self.decode_embedded(self.next())))
    if tag == 0x87:
        count = self.nextbyte()
        if count == 8: return self.wrap(struct.unpack('>d', self.nextbytes(8))[0])
        raise DecodeError('Invalid IEEE754 size')
    if tag == 0xb0: return self.wrap(self.nextint(self.varint()))
    if tag == 0xb1: return self.wrap(self.nextbytes(self.varint()).decode('utf-8'))
    if tag == 0xb2: return self.wrap(self.nextbytes(self.varint()))
    if tag == 0xb3: return self.wrap(Symbol(self.nextbytes(self.varint()).decode('utf-8')))
    if tag == 0xb4:
        vs = self.nextvalues()
        if not vs: raise DecodeError('Too few elements in encoded record')
        return self.wrap(Record(vs[0], vs[1:]))
    if tag == 0xb5: return self.wrap(tuple(self.nextvalues()))
    if tag == 0xb6:
        vs = self.nextvalues()
        s = frozenset(vs)
        if len(s) != len(vs): raise DecodeError('Duplicate value')
        return self.wrap(s)
    if tag == 0xb7: return self.wrap(ImmutableDict.from_kvs(self.nextvalues()))
    raise DecodeError('Invalid tag: ' + hex(tag))

`try_next()`

Like next, but returns None instead of raising ShortPacket.

Source code in preserves/binary.py

def try_next(self):
    """Like [next][preserves.binary.Decoder.next], but returns `None` instead of raising
    [ShortPacket][preserves.error.ShortPacket]."""
    start = self.index
    try:
        return self.next()
    except ShortPacket:
        self.index = start
        return None

`Embedded(embeddedValue)`

Representation of a Preserves Embedded value. For more on the meaning and use of embedded values, see the specification.

>>> import io
>>> e = Embedded(io.StringIO('some text'))
>>> e                                        # doctest: +ELLIPSIS
#:<_io.StringIO object at ...>
>>> e.embeddedValue                          # doctest: +ELLIPSIS
<_io.StringIO object at ...>

>>> import preserves
>>> print(preserves.stringify(Embedded(None)))
Traceback (most recent call last):
  ...
TypeError: Cannot preserves-format: None
>>> print(preserves.stringify(Embedded(None), format_embedded=lambda x: 'abcdef'))
#:"abcdef"

Attributes:

Name	Type	Description
`embeddedValue`		any Python value; could be a platform object, could be a representation of a Preserves `Value`, could be `None`, could be anything!

Source code in preserves/values.py

def __init__(self, embeddedValue):
    self.embeddedValue = embeddedValue

`EncodeError`

Bases: ValueError

Raised whenever preserves.binary.Encoder or preserves.text.Formatter are unable to proceed.

`Encoder(encode_embedded=lambda x: x, canonicalize=False, include_annotations=None)`

Bases: BinaryCodec

Implementation of an encoder for the machine-oriented binary Preserves syntax.

>>> e = Encoder()
>>> e.append(123)
>>> e.append('hello')
>>> e.append(annotate([], Symbol('x')))
>>> e.contents()
b'\xb0\x01{\xb1\x05hello\x85\xb3\x01x\xb5\x84'

Parameters:

Name	Type	Description	Default
`encode_embedded`		function accepting an Embedded.embeddedValue and returning a `Value` for serialization.	`lambda x: x`
`canonicalize`	`bool`	if `True`, ensures the serialized data are in canonical form. This is slightly more work than producing potentially-non-canonical output.	`False`
`include_annotations`	`bool \| None`	if `None`, includes annotations in the output only when `canonicalize` is `False`, because canonical serialization of values demands omission of annotations. If explicitly `True` or `False`, however, annotations will be included resp. excluded no matter the `canonicalize` setting. This can be used to get canonical ordering (`canonicalize=True`) and annotations (`include_annotations=True`).	`None`

Attributes:

Name	Type	Description
`buffer`	`bytearray`	accumulator for the output of the encoder

Source code in preserves/binary.py

def __init__(self,
             encode_embedded=lambda x: x,
             canonicalize=False,
             include_annotations=None):
    super(Encoder, self).__init__()
    self.buffer = bytearray()
    self._encode_embedded = encode_embedded
    self._canonicalize = canonicalize
    if include_annotations is None:
        self.include_annotations = not self._canonicalize
    else:
        self.include_annotations = include_annotations

`append(v)`

Extend self.buffer with an encoding of v.

Source code in preserves/binary.py

def append(self, v):
    """Extend `self.buffer` with an encoding of `v`."""
    v = preserve(v)
    if hasattr(v, '__preserve_write_binary__'):
        v.__preserve_write_binary__(self)
    elif v is False:
        self.buffer.append(0x80)
    elif v is True:
        self.buffer.append(0x81)
    elif isinstance(v, float):
        self.buffer.append(0x87)
        self.buffer.append(8)
        self.buffer.extend(struct.pack('>d', v))
    elif isinstance(v, numbers.Number):
        self.encodeint(v)
    elif isinstance(v, bytes):
        self.encodebytes(0xb2, v)
    elif isinstance(v, basestring_):
        self.encodebytes(0xb1, v.encode('utf-8'))
    elif isinstance(v, list):
        self.encodevalues(0xb5, v)
    elif isinstance(v, tuple):
        self.encodevalues(0xb5, v)
    elif isinstance(v, set):
        self.encodeset(v)
    elif isinstance(v, frozenset):
        self.encodeset(v)
    elif isinstance(v, dict):
        self.encodedict(v)
    else:
        try:
            i = iter(v)
        except TypeError:
            i = None
        if i is None:
            self.cannot_encode(v)
        else:
            self.encodevalues(0xb5, i)

`contents()`

Returns a bytes constructed from the contents of self.buffer.

Source code in preserves/binary.py

def contents(self):
    """Returns a `bytes` constructed from the contents of `self.buffer`."""
    return bytes(self.buffer)

`reset()`

Clears self.buffer to a fresh empty bytearray.

Source code in preserves/binary.py

def reset(self):
    """Clears `self.buffer` to a fresh empty `bytearray`."""
    self.buffer = bytearray()

`Formatter(format_embedded=lambda x: x, indent=None, with_commas=False, trailing_comma=False, include_annotations=True)`

Bases: TextCodec

Printer (and indenting pretty-printer) for producing human-readable syntax from Preserves Values.

>>> f = Formatter()
>>> f.append({'a': 1, 'b': 2})
>>> f.append(Record(Symbol('label'), ['field1', ['field2item1', 'field2item2']]))
>>> print(f.contents())
{"a": 1 "b": 2} <label "field1" ["field2item1" "field2item2"]>

>>> f = Formatter(indent=4)
>>> f.append({'a': 1, 'b': 2})
>>> f.append(Record(Symbol('label'), ['field1', ['field2item1', 'field2item2']]))
>>> print(f.contents())
{
    "a": 1
    "b": 2
}
<label "field1" [
    "field2item1"
    "field2item2"
]>

Parameters:

Name	Type	Description	Default
`format_embedded`		function accepting an Embedded.embeddedValue and returning a `Value` for serialization.	`lambda x: x`
`indent`	`int \| None`	`None` disables indented pretty-printing; otherwise, an `int` specifies indentation per nesting-level.	`None`
`with_commas`	`bool`	`True` causes commas to separate sequence and set items and dictionary entries; `False` omits commas.	`False`
`trailing_comma`	`bool`	`True` causes a comma to be printed after the final item or entry in a sequence, set or dictionary; `False` omits this trailing comma	`False`
`include_annotations`	`bool`	`True` causes annotations to be included in the output; `False` causes them to be omitted.	`True`

Attributes:

Name	Type	Description
`indent_delta`	`int`	indentation per nesting-level
`chunks`	`list[str]`	fragments of output

Source code in preserves/text.py

def __init__(self,
             format_embedded=lambda x: x,
             indent=None,
             with_commas=False,
             trailing_comma=False,
             include_annotations=True):
    super(Formatter, self).__init__()
    self.indent_delta = 0 if indent is None else indent
    self.indent_distance = 0
    self.nesting = 0
    self.with_commas = with_commas
    self.trailing_comma = trailing_comma
    self.chunks = []
    self._format_embedded = format_embedded
    self.include_annotations = include_annotations

`append(v)`

Extend self.chunks with at least one chunk, together making up the text representation of v.

Source code in preserves/text.py

def append(self, v):
    """Extend `self.chunks` with at least one chunk, together making up the text
    representation of `v`."""
    if self.chunks and self.nesting == 0:
        self.write_indent_space()
    try:
        self.nesting += 1
        self._append(v)
    finally:
        self.nesting -= 1

`contents()`

Returns a str constructed from the join of the chunks in self.chunks.

Source code in preserves/text.py

def contents(self):
    """Returns a `str` constructed from the join of the chunks in `self.chunks`."""
    return u''.join(self.chunks)

`is_indenting()`

Returns True iff this Formatter is in pretty-printing indenting mode.

Source code in preserves/text.py

def is_indenting(self):
    """Returns `True` iff this [Formatter][preserves.text.Formatter] is in pretty-printing
    indenting mode."""
    return self.indent_delta > 0

`ImmutableDict(*args, **kwargs)`

Bases: dict

A subclass of Python's built-in dict that overrides methods that could mutate the dictionary, causing them to raise TypeError('Immutable') if called.

Implements the __hash__ method, allowing ImmutableDict instances to be used whereever immutable data are permitted; in particular, as keys in other dictionaries.

>>> d = ImmutableDict([('a', 1), ('b', 2)])
>>> d
{'a': 1, 'b': 2}
>>> d['c'] = 3
Traceback (most recent call last):
  ...
TypeError: Immutable
>>> del d['b']
Traceback (most recent call last):
  ...
TypeError: Immutable

Source code in preserves/values.py

def __init__(self, *args, **kwargs):
    if hasattr(self, '__hash'): raise TypeError('Immutable')
    super(ImmutableDict, self).__init__(*args, **kwargs)
    self.__hash = None

`from_kvs(kvs)` `staticmethod`

Constructs an ImmutableDict from a sequence of alternating keys and values; compare to the ImmutableDict constructor, which takes a sequence of key-value pairs.

>>> ImmutableDict.from_kvs(['a', 1, 'b', 2])
{'a': 1, 'b': 2}
>>> ImmutableDict.from_kvs(['a', 1, 'b', 2])['c'] = 3
Traceback (most recent call last):
  ...
TypeError: Immutable

Source code in preserves/values.py

@staticmethod
def from_kvs(kvs):
    """Constructs an [ImmutableDict][preserves.values.ImmutableDict] from a sequence of
    alternating keys and values; compare to the
    [ImmutableDict][preserves.values.ImmutableDict] constructor, which takes a sequence of
    key-value pairs.

    ```python
    >>> ImmutableDict.from_kvs(['a', 1, 'b', 2])
    {'a': 1, 'b': 2}
    >>> ImmutableDict.from_kvs(['a', 1, 'b', 2])['c'] = 3
    Traceback (most recent call last):
      ...
    TypeError: Immutable

    ```

    """

    i = iter(kvs)
    result = ImmutableDict()
    result_proxy = super(ImmutableDict, result)
    try:
        while True:
            k = next(i)
            try:
                v = next(i)
            except StopIteration:
                raise DecodeError("Missing dictionary value")
            if k in result:
                raise DecodeError("Duplicate key: " + repr(k))
            result_proxy.__setitem__(k, v)
    except StopIteration:
        pass
    return result

`Parser(input_buffer='', include_annotations=False, parse_embedded=lambda x: x)`

Bases: TextCodec

Parser for the human-readable Preserves text syntax.

Parameters:

Name	Type	Description	Default
`input_buffer`	`str`	initial contents of the input buffer; may subsequently be extended by calling extend.	`''`
`include_annotations`	`bool`	if `True`, wrap each value and subvalue in an Annotated object.	`False`
`parse_embedded`		function accepting a `Value` and returning a possibly-decoded form of that value suitable for placing into an Embedded object.	`lambda x: x`

Normal usage is to supply input text, and keep calling next until a ShortPacket exception is raised:

>>> d = Parser('123 "hello" @x []')
>>> d.next()
123
>>> d.next()
'hello'
>>> d.next()
()
>>> d.next()
Traceback (most recent call last):
  ...
preserves.error.ShortPacket: Short input buffer

Alternatively, keep calling try_next until it yields None, which is not in the domain of Preserves Values:

>>> d = Parser('123 "hello" @x []')
>>> d.try_next()
123
>>> d.try_next()
'hello'
>>> d.try_next()
()
>>> d.try_next()

For convenience, Parser implements the iterator interface, backing it with try_next, so you can simply iterate over all complete values in an input:

>>> d = Parser('123 "hello" @x []')
>>> list(d)
[123, 'hello', ()]

>>> for v in Parser('123 "hello" @x []'):
...     print(repr(v))
123
'hello'
()

Supply include_annotations=True to read annotations alongside the annotated values:

>>> d = Parser('123 "hello" @x []', include_annotations=True)
>>> list(d)
[123, 'hello', @#x ()]

If you are incrementally reading from, say, a socket, you can use extend to add new input as if comes available:

>>> d = Parser('123 "he')
>>> d.try_next()
123
>>> d.try_next() # returns None because the input is incomplete
>>> d.extend('llo"')
>>> d.try_next()
'hello'
>>> d.try_next()

Attributes:

Name	Type	Description
`input_buffer`	`str`	buffered input waiting to be processed
`index`	`int`	read position within `input_buffer`

Source code in preserves/text.py

def __init__(self, input_buffer=u'', include_annotations=False, parse_embedded=lambda x: x):
    super(Parser, self).__init__()
    self.input_buffer = input_buffer
    self.index = 0
    self.include_annotations = include_annotations
    self.parse_embedded = parse_embedded

`extend(text)`

Appends text to the remaining contents of self.input_buffer, trimming already-processed text from the front of self.input_buffer and resetting self.index to zero.

Source code in preserves/text.py

def extend(self, text):
    """Appends `text` to the remaining contents of `self.input_buffer`, trimming already-processed
    text from the front of `self.input_buffer` and resetting `self.index` to zero."""
    self.input_buffer = self.input_buffer[self.index:] + text
    self.index = 0

`next()`

Reads the next complete Value from the internal buffer, raising ShortPacket if too few bytes are available, or DecodeError if the input is invalid somehow.

Source code in preserves/text.py

def next(self):
    """Reads the next complete `Value` from the internal buffer, raising
    [ShortPacket][preserves.error.ShortPacket] if too few bytes are available, or
    [DecodeError][preserves.error.DecodeError] if the input is invalid somehow.

    """
    self.skip_whitespace()
    c = self.peek()
    if c == '"':
        self.skip()
        return self.wrap(self.read_string('"'))
    if c == "'":
        self.skip()
        return self.wrap(Symbol(self.read_string("'")))
    if c == '@':
        self.skip()
        return self.unshift_annotation(self.next(), self.next())
    if c == ';':
        raise DecodeError('Semicolon is reserved syntax')
    if c == ':':
        raise DecodeError('Unexpected key/value separator between items')
    if c == '#':
        self.skip()
        c = self.nextchar()
        if c in ' \t': return self.unshift_annotation(self.comment_line(), self.next())
        if c in '\n\r': return self.unshift_annotation('', self.next())
        if c == '!':
            return self.unshift_annotation(
                Record(Symbol('interpreter'), [self.comment_line()]),
                self.next())
        if c == 'f': self.require_delimiter('#f'); return self.wrap(False)
        if c == 't': self.require_delimiter('#t'); return self.wrap(True)
        if c == '{': return self.wrap(self.read_set())
        if c == '"': return self.wrap(self.read_literal_binary())
        if c == 'x':
            c = self.nextchar()
            if c == '"': return self.wrap(self.read_hex_binary())
            if c == 'd': return self.wrap(self.read_hex_float())
            raise DecodeError('Invalid #x syntax')
        if c == '[': return self.wrap(self.read_base64_binary())
        if c == ':':
            if self.parse_embedded is None:
                raise DecodeError('No parse_embedded function supplied')
            return self.wrap(Embedded(self.parse_embedded(self.next())))
        raise DecodeError('Invalid # syntax')
    if c == '<':
        self.skip()
        vs = self.upto('>', False)
        if len(vs) == 0:
            raise DecodeError('Missing record label')
        return self.wrap(Record(vs[0], vs[1:]))
    if c == '[':
        self.skip()
        return self.wrap(self.upto(']', True))
    if c == '{':
        self.skip()
        return self.wrap(self.read_dictionary())
    if c in '>]},':
        raise DecodeError('Unexpected ' + c)
    self.skip()
    return self.wrap(self.read_raw_symbol_or_number([c]))

`try_next()`

Like next, but returns None instead of raising ShortPacket.

Source code in preserves/text.py

def try_next(self):
    """Like [next][preserves.text.Parser.next], but returns `None` instead of raising
    [ShortPacket][preserves.error.ShortPacket]."""
    start = self.index
    try:
        return self.next()
    except ShortPacket:
        self.index = start
        return None

`Record(key, fields)`

Bases: object

Representation of Preserves Records, which are a pair of a label Value and a sequence of field Values.

>>> r = Record(Symbol('label'), ['field1', ['field2item1', 'field2item2']])
>>> r
#label('field1', ['field2item1', 'field2item2'])
>>> r.key
#label
>>> r.fields
('field1', ['field2item1', 'field2item2'])
>>> import preserves
>>> preserves.stringify(r)
'<label "field1" ["field2item1" "field2item2"]>'
>>> r == preserves.parse('<label "field1" ["field2item1" "field2item2"]>')
True

Parameters:

Name	Type	Description	Default
`key`	`Value`	the `Record`'s label	required
`fields`	`iterable[Value]`	the fields of the `Record`	required

Attributes:

Name	Type	Description
`key`	`Value`	the `Record`'s label
`fields`	`tuple[Value]`	the fields of the `Record`

Source code in preserves/values.py

def __init__(self, key, fields):
    self.key = key
    self.fields = tuple(fields)
    self.__hash = None

`makeBasicConstructor(label, fieldNames)` `staticmethod`

Constructs and returns a "constructor" for Records having a certain label and number of fields.

Deprecated

Use preserves.schema definitions instead.

The "constructor" is a callable function that accepts len(fields) arguments and returns a Record with label as its label and the arguments to the constructor as field values.

In addition, the "constructor" has a constructorInfo attribute holding a RecordConstructorInfo object, an isClassOf attribute holding a unary function that returns True iff its argument is a Record with label label and arity len(fieldNames), and an ensureClassOf attribute that raises an Exception if isClassOf returns false on its argument and returns the argument otherwise.

Finally, for each field name f in fieldNames, the "constructor" object has an attribute _f that is a unary function that retrieves the f field from the passed in argument.

>>> c = Record.makeBasicConstructor(Symbol('date'), 'year month day')
>>> c(1969, 7, 16)
#date(1969, 7, 16)
>>> c.constructorInfo
#date/3
>>> c.isClassOf(c(1969, 7, 16))
True
>>> c.isClassOf(Record(Symbol('date'), [1969, 7, 16]))
True
>>> c.isClassOf(Record(Symbol('date'), [1969]))
False
>>> c.ensureClassOf(c(1969, 7, 16))
#date(1969, 7, 16)
>>> c.ensureClassOf(Record(Symbol('date'), [1969]))
Traceback (most recent call last):
  ...
TypeError: Record: expected #date/3, got #date(1969)
>>> c._year(c(1969, 7, 16))
1969
>>> c._month(c(1969, 7, 16))
7
>>> c._day(c(1969, 7, 16))
16

Parameters:

Name	Type	Description	Default
`label`	`Value`	Label to use for constructed/matched `Record`s	required
`fieldNames`	`tuple[str] \| list[str] \| str`	Names of the `Record`'s fields	required

Source code in preserves/values.py

@staticmethod
def makeBasicConstructor(label, fieldNames):
    """Constructs and returns a "constructor" for `Record`s having a certain `label` and
    number of fields.

    Deprecated:
       Use [preserves.schema][] definitions instead.

    The "constructor" is a callable function that accepts `len(fields)` arguments and
    returns a [Record][preserves.values.Record] with `label` as its label and the arguments
    to the constructor as field values.

    In addition, the "constructor" has a `constructorInfo` attribute holding a
    [RecordConstructorInfo][preserves.values.RecordConstructorInfo] object, an `isClassOf`
    attribute holding a unary function that returns `True` iff its argument is a
    [Record][preserves.values.Record] with label `label` and arity `len(fieldNames)`, and
    an `ensureClassOf` attribute that raises an `Exception` if `isClassOf` returns false on
    its argument and returns the argument otherwise.

    Finally, for each field name `f` in `fieldNames`, the "constructor" object has an
    attribute `_f` that is a unary function that retrieves the `f` field from the passed in
    argument.

    ```python
    >>> c = Record.makeBasicConstructor(Symbol('date'), 'year month day')
    >>> c(1969, 7, 16)
    #date(1969, 7, 16)
    >>> c.constructorInfo
    #date/3
    >>> c.isClassOf(c(1969, 7, 16))
    True
    >>> c.isClassOf(Record(Symbol('date'), [1969, 7, 16]))
    True
    >>> c.isClassOf(Record(Symbol('date'), [1969]))
    False
    >>> c.ensureClassOf(c(1969, 7, 16))
    #date(1969, 7, 16)
    >>> c.ensureClassOf(Record(Symbol('date'), [1969]))
    Traceback (most recent call last):
      ...
    TypeError: Record: expected #date/3, got #date(1969)
    >>> c._year(c(1969, 7, 16))
    1969
    >>> c._month(c(1969, 7, 16))
    7
    >>> c._day(c(1969, 7, 16))
    16

    ```

    Args:
        label (Value): Label to use for constructed/matched `Record`s
        fieldNames (tuple[str] | list[str] | str): Names of the `Record`'s fields

    """
    if type(fieldNames) == str:
        fieldNames = fieldNames.split()
    arity = len(fieldNames)
    def ctor(*fields):
        if len(fields) != arity:
            raise Exception("Record: cannot instantiate %r expecting %d fields with %d fields"%(
                label,
                arity,
                len(fields)))
        return Record(label, fields)
    ctor.constructorInfo = RecordConstructorInfo(label, arity)
    ctor.isClassOf = lambda v: \
                     isinstance(v, Record) and v.key == label and len(v.fields) == arity
    def ensureClassOf(v):
        if not ctor.isClassOf(v):
            raise TypeError("Record: expected %r/%d, got %r" % (label, arity, v))
        return v
    ctor.ensureClassOf = ensureClassOf
    for fieldIndex in range(len(fieldNames)):
        fieldName = fieldNames[fieldIndex]
        # Stupid python scoping bites again
        def getter(fieldIndex):
            return lambda v: ensureClassOf(v)[fieldIndex]
        setattr(ctor, '_' + fieldName, getter(fieldIndex))
    return ctor

`makeConstructor(labelSymbolText, fieldNames)` `staticmethod`

Equivalent to Record.makeBasicConstructor(Symbol(labelSymbolText), fieldNames).

Deprecated

Use preserves.schema definitions instead.

Source code in preserves/values.py

@staticmethod
def makeConstructor(labelSymbolText, fieldNames):
    """
    Equivalent to `Record.makeBasicConstructor(Symbol(labelSymbolText), fieldNames)`.

    Deprecated:
       Use [preserves.schema][] definitions instead.
    """
    return Record.makeBasicConstructor(Symbol(labelSymbolText), fieldNames)

`ShortPacket`

Bases: DecodeError

Raised whenever preserves.binary.Decoder or preserves.text.Parser discover that they want to read beyond the end of the currently-available input buffer in order to completely read an encoded value.

`Symbol(name)`

Bases: object

Representation of Preserves Symbols.

>>> Symbol('xyz')
#xyz
>>> Symbol('xyz').name
'xyz'
>>> repr(Symbol('xyz'))
'#xyz'
>>> str(Symbol('xyz'))
'xyz'
>>> import preserves
>>> preserves.stringify(Symbol('xyz'))
'xyz'
>>> preserves.stringify(Symbol('hello world'))
"'hello world'"
>>> preserves.parse('xyz')
#xyz
>>> preserves.parse("'hello world'")
#hello world

Attributes:

Name	Type	Description
`name`	`str \| Symbol`	The symbol's text label. If an existing Symbol is passed in, the existing Symbol's `name` is used as the `name` for the new Symbol.

Source code in preserves/values.py

def __init__(self, name):
    self.name = name.name if isinstance(name, Symbol) else name

`annotate(v, *anns)`

Wraps v in an Annotated object, if it isn't already wrapped, and appends each of the anns to the Annotated's annotations sequence. NOTE: Does not recursively ensure that any parts of the argument v are themselves wrapped in Annotated objects!

>>> import preserves
>>> print(preserves.stringify(annotate(123, "A comment", "Another comment")))
@"A comment" @"Another comment" 123

Source code in preserves/values.py

def annotate(v, *anns):
    """Wraps `v` in an [Annotated][preserves.values.Annotated] object, if it isn't already
    wrapped, and appends each of the `anns` to the [Annotated][preserves.values.Annotated]'s
    `annotations` sequence. NOTE: Does not recursively ensure that any parts of the argument
    `v` are themselves wrapped in [Annotated][preserves.values.Annotated] objects!

    ```python
    >>> import preserves
    >>> print(preserves.stringify(annotate(123, "A comment", "Another comment")))
    @"A comment" @"Another comment" 123

    ```
    """
    if not is_annotated(v):
        v = Annotated(v)
    for a in anns:
        v.annotations.append(a)
    return v

`canonicalize(v, **kwargs)`

As encode, but sets canonicalize=True in the Encoder constructor.

Source code in preserves/binary.py

def canonicalize(v, **kwargs):
    """As [encode][preserves.binary.encode], but sets `canonicalize=True` in the
    [Encoder][preserves.binary.Encoder] constructor.

    """
    return encode(v, canonicalize=True, **kwargs)

`cmp(a, b)`

Returns -1 if a < b, or 0 if a = b, or 1 if a > b according to the Preserves total order.

Source code in preserves/compare.py

def cmp(a, b):
    """Returns `-1` if `a` < `b`, or `0` if `a` = `b`, or `1` if `a` > `b` according to the
    [Preserves total order](https://preserves.dev/preserves.html#total-order)."""
    return _cmp(preserve(a), preserve(b))

`decode(bs, **kwargs)`

Yields the first complete encoded value from bs, passing kwargs through to the Decoder constructor. Raises exceptions as per next.

Parameters:

Name	Type	Description	Default
`bs`	`bytes`	encoded data to decode	required

Source code in preserves/binary.py

def decode(bs, **kwargs):
    """Yields the first complete encoded value from `bs`, passing `kwargs` through to the
    [Decoder][preserves.binary.Decoder] constructor. Raises exceptions as per
    [next][preserves.binary.Decoder.next].

    Args:
        bs (bytes): encoded data to decode

    """
    return Decoder(packet=bs, **kwargs).next()

`decode_with_annotations(bs, **kwargs)`

Like decode, but supplying include_annotations=True to the Decoder constructor.

Source code in preserves/binary.py

def decode_with_annotations(bs, **kwargs):
    """Like [decode][preserves.binary.decode], but supplying `include_annotations=True` to the
    [Decoder][preserves.binary.Decoder] constructor."""
    return Decoder(packet=bs, include_annotations=True, **kwargs).next()

`encode(v, **kwargs)`

Encode a single Value v to a byte string. Any supplied kwargs are passed on to the underlying Encoder constructor.

Source code in preserves/binary.py

def encode(v, **kwargs):
    """Encode a single `Value` `v` to a byte string. Any supplied `kwargs` are passed on to the
    underlying [Encoder][preserves.binary.Encoder] constructor."""
    e = Encoder(**kwargs)
    e.append(v)
    return e.contents()

`exec(self, v)`

WARNING: This is not a function: it is a method on Selector, Predicate, and so on.

>>> sel = parse('/ [.length gt 1]')
>>> sel.exec(['', 'a', 'ab', 'abc', 'abcd', 'bcd', 'cd', 'd', ''])
('ab', 'abc', 'abcd', 'bcd', 'cd')

Source code in preserves/path.py

@extend(syntax.Function)
def exec(self, v):
    """WARNING: This is not a *function*: it is a *method* on
    [Selector][preserves.path.Selector], [Predicate][preserves.path.Predicate], and so on.

    ```python
    >>> sel = parse('/ [.length gt 1]')
    >>> sel.exec(['', 'a', 'ab', 'abc', 'abcd', 'bcd', 'cd', 'd', ''])
    ('ab', 'abc', 'abcd', 'bcd', 'cd')

    ```

    """
    return (len(self.selector.exec(v)),)

`is_annotated(v)`

True iff v is an instance of Annotated.

Source code in preserves/values.py

def is_annotated(v):
    """`True` iff `v` is an instance of [Annotated][preserves.values.Annotated]."""
    return isinstance(v, Annotated)

`parse(s)`

Parse s as a Preserves Path path expression, yielding a Selector object. Selectors (and Predicates etc.) have an exec method defined on them.

Raises ValueError if s is not a valid path expression.

Source code in preserves/path.py

def parse(s):
    """Parse `s` as a Preserves Path path expression, yielding a
    [Selector][preserves.path.Selector] object. Selectors (and Predicates etc.) have an
    [exec][preserves.path.exec] method defined on them.

    Raises `ValueError` if `s` is not a valid path expression.

    """
    return parse_selector(Parser(s))

`parse_with_annotations(bs, **kwargs)`

Like parse, but supplying include_annotations=True to the Parser constructor.

Source code in preserves/text.py

def parse_with_annotations(bs, **kwargs):
    """Like [parse][preserves.text.parse], but supplying `include_annotations=True` to the
    [Parser][preserves.text.Parser] constructor."""
    return Parser(input_buffer=bs, include_annotations=True, **kwargs).next()

`preserve(v)`

Converts v to a representation of a Preserves Value by (repeatedly) setting

v = v.__preserve__()

while v has a __preserve__ method. Parsed Schema values are able to render themselves to their serialized representations this way.

Source code in preserves/values.py

def preserve(v):
    """Converts `v` to a representation of a Preserves `Value` by (repeatedly) setting

    ```python
    v = v.__preserve__()
    ```

    while `v` has a `__preserve__` method. Parsed [Schema][preserves.schema]
    values are able to render themselves to their serialized representations this way.

    """
    while hasattr(v, '__preserve__'):
        v = v.__preserve__()
    return v

`stringify(v, **kwargs)`

Convert a single Value v to a string. Any supplied kwargs are passed on to the underlying Formatter constructor.

Source code in preserves/text.py

def stringify(v, **kwargs):
    """Convert a single `Value` `v` to a string. Any supplied `kwargs` are passed on to the
    underlying [Formatter][preserves.text.Formatter] constructor."""
    e = Formatter(**kwargs)
    e.append(v)
    return e.contents()

`strip_annotations(v, depth=inf)`

Exposes depth layers of raw structure of potentially-Annotated Values. If depth==0 or v is not Annotated, just returns v. Otherwise, descends recursively into the structure of v.item.

>>> import preserves
>>> a = preserves.parse('@"A comment" [@a 1 @b 2 @c 3]', include_annotations=True)
>>> is_annotated(a)
True
>>> print(preserves.stringify(a))
@"A comment" [@a 1 @b 2 @c 3]
>>> print(preserves.stringify(strip_annotations(a)))
[1 2 3]
>>> print(preserves.stringify(strip_annotations(a, depth=1)))
[@a 1 @b 2 @c 3]

Source code in preserves/values.py

def strip_annotations(v, depth=inf):
    """Exposes `depth` layers of raw structure of
    potentially-[Annotated][preserves.values.Annotated] `Value`s. If `depth==0` or `v` is not
    [Annotated][preserves.values.Annotated], just returns `v`. Otherwise, descends recursively
    into the structure of `v.item`.

    ```python
    >>> import preserves
    >>> a = preserves.parse('@"A comment" [@a 1 @b 2 @c 3]', include_annotations=True)
    >>> is_annotated(a)
    True
    >>> print(preserves.stringify(a))
    @"A comment" [@a 1 @b 2 @c 3]
    >>> print(preserves.stringify(strip_annotations(a)))
    [1 2 3]
    >>> print(preserves.stringify(strip_annotations(a, depth=1)))
    [@a 1 @b 2 @c 3]

    ```
    """

    if depth == 0: return v
    if not is_annotated(v): return v

    next_depth = depth - 1
    def walk(v):
        return strip_annotations(v, next_depth)

    v = v.item
    if isinstance(v, Record):
        return Record(strip_annotations(v.key, depth), tuple(walk(f) for f in v.fields))
    elif isinstance(v, list):
        return tuple(walk(f) for f in v)
    elif isinstance(v, tuple):
        return tuple(walk(f) for f in v)
    elif isinstance(v, set):
        return frozenset(walk(f) for f in v)
    elif isinstance(v, frozenset):
        return frozenset(walk(f) for f in v)
    elif isinstance(v, dict):
        return ImmutableDict.from_kvs(walk(f) for f in dict_kvs(v))
    elif is_annotated(v):
        raise ValueError('Improper annotation structure')
    else:
        return v

Preserves Path

Command-line usage

Examples

Setup: Loading test data

Example 1: Selecting string-valued documentation annotations

Example 2: Selecting tests with Records as their annotatedValues

Predicate = syntax.Predicate module-attribute

Selector = syntax.Selector module-attribute

dumps = stringify module-attribute

loads = parse module-attribute

syntax = load_schema_file(pathlib.Path(__file__).parent / 'path.prb').path module-attribute

Annotated(item)

peel()

strip(depth=inf)

DecodeError

Decoder(packet=b'', include_annotations=False, decode_embedded=lambda x: x)

extend(data)

next()

try_next()

Embedded(embeddedValue)

EncodeError

Encoder(encode_embedded=lambda x: x, canonicalize=False, include_annotations=None)

append(v)

contents()

reset()

Formatter(format_embedded=lambda x: x, indent=None, with_commas=False, trailing_comma=False, include_annotations=True)

append(v)

contents()

is_indenting()

ImmutableDict(*args, **kwargs)

from_kvs(kvs) staticmethod

Parser(input_buffer='', include_annotations=False, parse_embedded=lambda x: x)

extend(text)

next()

try_next()

Record(key, fields)

makeBasicConstructor(label, fieldNames) staticmethod

makeConstructor(labelSymbolText, fieldNames) staticmethod

ShortPacket

Symbol(name)

annotate(v, *anns)

canonicalize(v, **kwargs)

cmp(a, b)

decode(bs, **kwargs)

decode_with_annotations(bs, **kwargs)

encode(v, **kwargs)

exec(self, v)

is_annotated(v)

parse(s)

parse_with_annotations(bs, **kwargs)

preserve(v)

stringify(v, **kwargs)

strip_annotations(v, depth=inf)

`Predicate = syntax.Predicate` `module-attribute`

`Selector = syntax.Selector` `module-attribute`

`dumps = stringify` `module-attribute`

`loads = parse` `module-attribute`

`syntax = load_schema_file(pathlib.Path(file).parent / 'path.prb').path` `module-attribute`

`Annotated(item)`

`peel()`

`strip(depth=inf)`

`DecodeError`

`Decoder(packet=b'', include_annotations=False, decode_embedded=lambda x: x)`

`extend(data)`

`next()`

`try_next()`

`Embedded(embeddedValue)`

`EncodeError`

`Encoder(encode_embedded=lambda x: x, canonicalize=False, include_annotations=None)`

`append(v)`

`contents()`

`reset()`

`Formatter(format_embedded=lambda x: x, indent=None, with_commas=False, trailing_comma=False, include_annotations=True)`

`append(v)`

`contents()`

`is_indenting()`

`ImmutableDict(*args, **kwargs)`

`from_kvs(kvs)` `staticmethod`

`Parser(input_buffer='', include_annotations=False, parse_embedded=lambda x: x)`

`extend(text)`

`next()`

`try_next()`

`Record(key, fields)`

`makeBasicConstructor(label, fieldNames)` `staticmethod`

`makeConstructor(labelSymbolText, fieldNames)` `staticmethod`

`ShortPacket`

`Symbol(name)`

`annotate(v, *anns)`

`canonicalize(v, **kwargs)`

`cmp(a, b)`

`decode(bs, **kwargs)`

`decode_with_annotations(bs, **kwargs)`

`encode(v, **kwargs)`

`exec(self, v)`

`is_annotated(v)`

`parse(s)`

`parse_with_annotations(bs, **kwargs)`

`preserve(v)`

`stringify(v, **kwargs)`

`strip_annotations(v, depth=inf)`