Constraints on Replace

Nov 14, 2017 — Tags: The editor

The operation Replace replaces a child node with a new one. This new node is recursively described using a history of changes. In this article we compare two scenarios: in one we impose the constraint that the new history must be an continuation of the history so far. In the other we don’t impose this constraint, which is to say that the new history may be anything.

NOTE: (June 2018): the article below describes the situation as of late 2017. Experiments with this an approach as in this article may be found in “nerf0”. The current situation differs in at least 2 ways from the one desribed in this article. [1]: the present version of the project has continuation-only “replace” (now called “extend”); [2]: notes are now denoted using an s-expression rather than the Pythonic notation in the below.

Let’s start with some quick pointers to important background information.

The context of this article is an editor of s-expressions in which the primitives are the mechanisms of construction rather than the resulting structures.

Important examples of such primitives are the operations (Insert, Delete and Replace). These primitives may be composed; a list of them is called a history.

The operation Replace, in particular, replaces a child-node with a new one, where the new node is described recursively using a full history h; the child to be replaced is denoted by an index i.

In this article we will consider a possible constraint on this history h and its ramifications.

A continuation of history

The potential constraint that we will consider is the following: that h must be continuation of the history of the node previously existing at index i. We shall call this constraint “continuation-only” and cases in which it does not apply “any-history”.

Let’s make this notion of history being a continuation of a previously existing history a bit more precise.

First, note that we can always meaningfully talk about such a thing as “the history of the node at index i”: This is because the only way to add or modify children in the first place is by using the operations Insert and Replace, and both of these operations take a history as an argument. All we need to do is, during construction of each child, keep a reference to the history out of which it was constructed. In the below, we will refer to such a pre-existing history as h-pre.

So what is the relationship between h and h-pre if a history is a continuation of another? As noted in the above, a history is simply a list of operations of change. Given that definition, we may formulate continuation as follows: A history h is a continuation of another history h-pre if the latter is a prefix of the former. In other words: the list h fully contains the list h-pre at its beginning.

An alternative, equivalent, formulation, is based on the following implementation detail for histories: In practice, we implement histories using a technique called hash consing. In short: the list is implemented as a linked list, where each element points to its predecessor using a hash. In this view, we can say that a history h is a continuation of another history h-pre if you, by following the hashes from h, at some point will encounter the hash h-pre.

To clarify what history continuation means in practice, let’s consider two examples:

Continuation-only

In the first example the history with which we replace is a continuation of the previously existing history:

BecomeList
Insert 0 ...
- BecomeList
- Insert 0 (BecomeAtom *))
- Insert 1 (BecomeAtom 6))
Replace 0 ...
- BecomeList
- Insert 0 (BecomeAtom *))
- Insert 1 (BecomeAtom 6))
- Delete 1

(The notation of nested lists in the above should be interpreted as follows: each of the ellipses (...) are a single parameter h, which is further detailed in the indented bullet points below it.)

This is an example of a continuation of history, because the first 3 operations, up until Insert 1 (BecomeAtom 6)) are shared between the pre-existing history h-pre and the new history h.

Any-history

In the second example the history with which we replace is not a continuation of the pre-existing one.

BecomeList
Insert 0 ...
- BecomeList
- Insert 0 (BecomeAtom *))
- Insert 1 (BecomeAtom 6))
Replace 0 ...
- BecomeList
- Insert 0 (BecomeAtom +))

This is an example where the constraint does not apply: the operations Insert 0 (BecomeAtom *)) and Insert 1 (BecomeAtom 6)) from h-pre are not reflected in the history h.

In the below, we’ll consider the consequences enforcing continuation-only from two perspectives: the perspective of the mental model of our programmers, and the perspective of potential simplifications and optimizations we can make to our tooling. We’ll start with the human perspective.

Continuation-only: Mental model

Enforcing the constraint continuation-only makes for a much more simple mental model of the nodes and histories under consideration than the alternative.

To see why this is the case, consider the following: One of the reasons Replace is a separate operation in the first place is that it allows for a semantics of identity in the face of change. In particular: we can talk about the node which is “replaced” as a single thing which exists over time, interpreting the Replace as a change to its current value and history, but not to its fundamental identity.

The constraint continuation-only fits nicely into that model. If the constraint applies, the node to which we attach this identity has a single linear history. At times, new things may happen in that history, but those things that have once happened will never “unhappen”.

This mental model also maps quite nicely to a typical use case: that of editing a document. Each time a part of the document is edited, that part’s history is simply extended with the edit-operation.

Contrast this model with the alternative, being any_history. In that scenario, each time a node is replaced, its history may be arbitrarily rewritten.

In such a scenario we cannot speak of the history of any of our children, since the view of what that history is may itself change over time. Because of that, we must, for each of a node’s children, distinguish between one of two different histories:

The history h as set in the latest Insert or Replace for the child.
The history of such histories.

Such a distinction makes reasoning about a node and the histories of its subnodes quite a bit more complex!

Continuation-only: T-paths

In an editor which takes operations of change as its primitives we get the ability to reference the children of any given node by their order of appearance “for free”. This is quite useful, because such a reference will keep pointing at the same child over time. This was discussed in depth in a separate article, in which we called such references t-indices.

In that article we also mentioned t-paths. The interpretation of such a t-path is to consider it to be a list of t-indices; to follow the path means to consider the elements of the path one by one and descend into the relevant child accordingly. As such a t-path can point to an arbitrary descendant, not just a child.

The possible utility of both t-indices and t-paths lies in the fact that, for some structure, they give us a reference to a descendant that keeps pointing at “the same” descendant in the face of changes to the structure over time. We could say that it is this utility that makes it so that the t-index or t-path does what you expect.

For t-indices, whether they do what you’d expect holds for both the cases continuation-only and any-history. This is because, in a single node with a single linear history, the assignment of t-indices to its children is only influenced by the occurrences of Insert in that history. The interpretation of the attribute h of any given Replace operation does not come into play at all. By extension, neither does any constraint on Replace’s arguments.

T-paths, however, only do what you expect for the case continuation-only. This is perhaps best shown with a counter-example. Consider again the history as provided at the beginning of this article of an any-history, which is shown here annotated with the resulting t-paths:

BecomeList
Insert 0 ... – inserts a new node at t-index = 0
- BecomeList
- Insert 0 (BecomeAtom *)) – resulting t-path: 0 0
- Insert 1 (BecomeAtom 6)) – resulting t-path: 0 1
Replace 0 ...
- BecomeList
- Insert 0 (BecomeAtom +)) – resulting t-path: 0 0

As should be obvious from the above, the t-path 0 0 does not constitute a meaningful reference to a node with a single identity: it initially points at the atom *, and later at the atom +, despite these two nodes being unrelated entirely: they exist in separate histories.

When the constraint continuation-only applies, it is not possible to construct such a counter-example: in that scenario it is not possible to replace a history with an entirely unrelated one by definition.

Algorithms

Further consequences continuation-only may be found in the algorithms that make use of histories of operations.

What could such algorithms be? First, there is the algorithm that constructs an actual s-expression out of a history of changes – something we will inevitably be interested in at some point. When compared to an editor that stores s-expressions directly, such an algorithm forms a computational overhead by default. If only for that reason, it is worthwhile to consider its performance characteristics.

First, let’s consider a naive implementation: For each Replace operation, construct the resulting structure from scratch by considering each element of the history in turn. Such an approach is not scalable: Each Replace operation takes a full history as one of its parameters and such histories grow linearly in time. Moreover, the histories are recursively composed of further Replace operations. The result is an exponential growth of the required runtime.

A less naive approach is the following: make use of the fact that each Replace operations replaces an already existing node, which already has some structure. If we can use this existing structure in such a way that we don’t have to consider the full provided history each time we replace, we may reduce the required runtime considerably.

If continuation-only applies, this is quite straightforward: we can simply take the existing structure as our starting point, and apply to it only those operations in h that are new. Such an approach is only linear in time with the amount of new operations.

any-history

Finally, let’s considert the respective drawbacks and advantages of the alternative to continuation-only, namely those of any-history.

The drawbacks should be obvious: they are precisely the inverse of the aformentioned advantages. In particular: the mental model of what Replace means becomes more complex; arbitrary descendants may no longer be referenced meaningfully using a t-path and performant algorithms for construction of nodes are less obvious.

Having said that, there is still something to be said for this approach: Relaxing the constraint gives us certain features almost by definition. Namely: the ability to rewrite the histories of children-nodes, while retaining, at the level at which the rewriting is done, a reference to the history as it existed before it was rewritten. This means we get both the advantages of rewriting of history (cleaner histories) and those of never rewriting history (no loss of information).

To see how this is so, consider the example with which we started this article:

Insert 0 ...
- BecomeList
- Insert 0 (BecomeAtom *))
- Insert 1 (BecomeAtom 6))
Replace 0 ...
- BecomeList
- Insert 0 (BecomeAtom +))

In the above, from the perspective of the node whose history is rewritten, all that ever happened is the insertion of the atom +. However, from the root node we can still see that at some point there was a different history for that node.

Note that this ability to rewrite history is integrated into our main means of expression: we do not require a separate layer of abstraction for it. One caveat does apply though: it is only available on items that are at least one level lower in the tree. This also implies that rewriting history is not available for the root of the tree.

It remains to be seen whether this integrated approach to rewriting of history should be seen as a useful means of expression, or an undesirable conflation of ideas.

Meaningful rewrites

That rewriting history is useful in the first place should perhaps be made clear with a separate example.

The running example of any-rewrite has been to present a rewriting of history that is truly arbitrary: in the example above there is truly no relationship between the history pre and post rewrite.

In practice, rewriting of history will often not be so arbitrary. For an example of this, consider the concept of undo, which can be modeled by replacing the history of a node by a history in which the last operation is no longer present:

Insert 0 ...
- BecomeList
- Insert 0 (BecomeAtom *))
- Insert 1 (BecomeAtom 6))
- Insert 2 (BecomeAtom 9))
Replace 0 ...
- BecomeList
- Insert 0 (BecomeAtom *))
- Insert 1 (BecomeAtom 6))

Conclusions / final thoughts

In the above we’ve seen the respective advantages and disadvantages of either enforcing the constraint that histories must be append-only or not. A few final thoughts are below.

First, as noted in the above, a naive approach to t-paths does not yield meaningful results. It is however, even in environments in which nodes can be replaced by arbitrary histories, possible to come up with a scheme of referencing children that does in fact yield meaningful references across time. This scheme will be explained in a future article.

Second, we noted that in continuation-only a performant algorithm for construction is readily available, whereas for for any-history it is not as obvious what such an algorithm should look like. We simply postulate here that such algorithms can still be devised. In particular it is possible to come up with algorithms that are linear in runtime with respect to the amount of history-rewriting.

Finally: given the above advantages and disadvantages, we are left with a question: should we apply the constraint or not? Or should we perhaps make a distinction between environments in which the constraint applies, and environments in which it doesn’t? Or should we perhaps have 2 distinct operations for Replace, one for each scenario?

These are, as it stands, open questions. The present version of the edtior is any-history. The rationale is: this forces me to think about the annoying corner-cases which exist only if the constraint does not apply. However, this is certainly subject to change.