Preserves Path
Tony Garnock-Jones tonyg@leastfixedpoint.com
August 2021. Version 0.1.0.
Preserves Path is roughly analogous to
XPath, but for Preserves values: just as
XPath selects portions of an XML document, a Preserves Path uses path expressions to select
portions of a Value.
XPaths on XML documents can move into attributes, into text, or into children. Preserves documents don’t have attributes, but they do have children generally and keyed children in particular. You might want to move into the child with a particular key (number, for sequences, or general-value for dictionaries); into all keys; into all mapped-to-values, i.e. children (n.b. not just for sequences and dicts, but also for sets).
Selector
A sequence of steps, applied one after the other, flatmap-style.
    step ...          # Applies steps one after the other, flatmap-style
Each step transforms an input document into zero or more related documents. A step is an axis or a filter.
Predicates
Predicates: interpret selectors as truth-functions over inputs (nonempty output meaning truth), and compose them using and, not, or, etc.
Precedence groupings from highest to lowest. Within a grouping, no mixed precedence is permitted.
    selector          # Applies steps one after the other, flatmap-style
    ! pred            # "not" of a predicate
    pred + pred + ... # "or" of predicates
    pred & pred & ... # "and" of predicates
Axes
Axes: move around, applying filters after moving
    /            # Moves into immediate children (values / fields)
    //           # Flattens children recursively
    . key        # Moves into named child
    .^           # Moves into record label
    .keys        # Moves into *keys* rather than values
    .length      # Moves into the number of keys
    .annotations # Moves into any annotations that might be present
    .embedded    # Moves into the representation of an embedded value
    % name       # Moves into successful Preserves Schema parse of definition `name`
    %- name      # Moves into successful Preserves Schema unparse of definition `name`
Sets have children, but no keys/length; Strings, ByteStrings and Symbols have no children, but have keys/length.
Filters
Filters: narrow down a selection without moving
    *                # Accepts all
    [!]              # Rejects all (just a use of `[pred]`)
    eq literal       # Matches values (equal to/less than/greater than/etc.) the literal
    = literal
    ne literal
    != literal
    lt literal
    gt literal
    le literal
    ge literal
    re regex          # Matches strings and symbols by POSIX extended regular expression
    =r regex
    [pred]            # Applies predicate to each input; keeps inputs yielding truth
    ^ literal         # Matches a record having a the literal as its label -- equivalent to [.^ = literal]
    ~real             # Promotes int to double, passes on double unchanged, rejects others
                      # Out-of-range ints (too big or too small) become various double infinities
                      # Converting high-magnitude ints causes loss of precision
    ~int              # Converts double to closest integer, where possible
                      # NaN and infinities are rejected
    bool              # Type filters
    double
    int
    string
    bytes
    symbol
    rec
    seq
    set
    dict
    embedded
Design choice: Which regular expression dialect to choose? CDDL (RFC 8610) goes for XML Schema regular expressions, which seems like a very sensible choice. The discussion in section 3.8.3 of RFC 8610 makes some good points. A couple of things that occurred to me: (1) the dialect should be backreference-free, allowing matching by “text-directed engines”; (2) it should be very widely implemented; (3) it should cover regular languages and no more; (4) it should be easy to implement.
Design choice: How should comparison work? Should lt 1.0 accept not only 0.9 but also
#t and #f (since Boolean comes before Double in the Preserves total ordering)? Should
lt 1.0 accept 0 as well as 0.0?
Functions
    <count selector>        # Counts number of results of selector
Transformers
e.g. stringify results; sequenceify results (see “+” operator); setify results (see “/” and “&” operators); join stringified results with a separator
Tool design
When processing multiple input documents sequentially, will sometimes want a list of results for each document, a set of results for each document, or a list flattened into a sequence of outputs for all input documents in the sequence. (A flattened set doesn’t make sense for streaming since the input documents come in a sequence; if the inputs were treated as a set represented as a sequence, and outputs were buffered in a single large set, that could work out…)
Examples
Consider the following Preserves Path selectors, intended to run against the Preserves codec test suite document:
- 
    .annotations ^ Documentation . 0 /This selects each of the elements (mostly text strings) in the list of the Documentationrecord annotating the test suite document itself.First, .annotationsfocuses on the annotations of the document. Then,^ Documentationselects only annotations that are records with labelDocumentation. Then,. 0selects the first field in each record. Finally,/replaces each selected value with a sequence of its children.
- 
    // [.^ [= Test + = NondeterministicTest]] [. 1 rec]This selects every deterministic or nondeterministic test case where the expected value is a record. First, //recursively selects every descendant subvalue of the root (inclusive). Then, two filters are applied, one after the other. The first,[.^ [= Test + = NondeterministicTest]], selects record labels, and then filters out all butTestandNondeterministicTest. Then, the second,[. 1 rec], filters out all but those where the second field is a record.