T-Indices

Nov 09, 2017 — Tags: The editor

The editor presented on this website is an editor of s-expressions that treats modifications to those expressions as first level citizens. This opens up the possibility to reference the children of a given list-expression by their order of appearance. In the below we’ll see why this could be useful.

Let’s start with some terminology.

An s-expression is a notation for tree-structured data. Each s-expression is either an atom or a list of s-expressions. Using this recursive definition arbitrary tree-structures can be formed.

Because it is the tree-like properties that we’re interested in in this article, we will use the terminology of trees (nodes, children, parents) in the below. We will mostly ignore atoms and talk almost exclusively of list-expressions, as these are the only non-trivial case, being the only type of node that can have children.

If we want to talk about a particular child of a node, a straightforward way of doing so is to reference it by its zero-based index in the list of children. I.e. in s-expression (a b) we can say that the atom a is the child at index 0, and the atom b is the child at index 1.

This observation can be generalized to refer to descendants at arbitrary depth by using a list of such indices and interpreting it as a path through the tree. I.e. in the s-expression (a (b ((c d) e))) the atom c can be reached by following the path 1, 1, 0, 0.

In this approach we use the spatial layout of the tree-structure to refer to its elements. It is for this reason that we will refer to such indices as s-indices in the below.

T-indices

In the editor presented in this project we treat the modifications to s-expressions as first level citizens. The central motivation for doing this is the idea that changes to computer programs are such a central aspect of the programming experience that they must be put more centrally in the tooling.

We do this by offering a small set of operations for constructing s-expressions, and storing those means of construction rather than the constructed structures. Key elements of this set of operations are Insert, Delete and Replace.

Given these operations, a second way of referring to the children of a node becomes available: by referring to them in their order of appearance. In other words: when were they inserted?

Consider the following full history of operations on a list node:

Insert b at s-index 0.
Insert a at s-index 0.

Constructing a node using these operations will yield the expression (a b), given that insertion at a given s-index will shift any items at or above that s-index one place to the right

In a such constructed node, we may refer to the node b as created at t=0 and the node a as created at t=1.

Because we we look at the node through the lens of time as opposed to its current spatial layout in this approach, we’ll denote this indexing scheme as the t-index.

Note that the same generalization to paths to arbitrary descendants is available for t-indices as it was in the case of regular indices, although some constraints must apply on replace’s paramters for the path to be stable over time.

Delete

Referring to children in order of insertion raises the question: how does this interact with some of the other main operations, namely Delete and Replace?

The answer for Delete is straightforward: deleting a child makes it go away in the current version, but does not alter the order of appearance. Consider the following full history of a node:

Insert b at s-index 0.
Insert a at s-index 0.
Delete the item at s-index 1.

Constructing a node using these operations yields an expression (a).

At t-index 1 we still have the node a, as in the previous example. At t-index 0, however, we now have nothing. This is because the child b, which lived at (spatial) index 1 at the time of deletion, was the item at t-index 0. Such an answer “nothing” may be implemented using a special tombstone or sentinel value.

Replace

How Replace operations affect t-indices is less straightforward – in the design of the editor, one actually has to make a choice between one of 2 options.

As a first attempt, we could simply view Replace as a shorthand for a Delete followed by and Insert. In that model, after a replace, the t-index previously pointing to the now-replaced child will yield the tombstone, and the new value for the child will be associated with a new consecutive t-index. Considering the following example:

Insert b at s-index 0.
Insert a at s-index 0.
Replace the item at s-index 1 with c.

The t-index 0 would, just like in the example for Delete, yield no result. The t-index 2 would point to the child c.

A second possible definition for the interplay between Replace and t-indices is quite different: In it, the t-index of the replaced child is reused to point to the new child after the replacement.

In this definition, after the same set of operation as in the last example, the t-index 0 points to the item c and the t-index 2 is as of yet unused.

As such, we can use the t-index to answer the question “what is the child that was nth element in the order of insertion, and may since have been changed any number of times?”. This means that we can use t-indices as a reference to an identity that persists under mutation. In this model the identity is formed at insertion, and replace is interpreted as a mutation that does not alter identity.

Note that in this model, in cases where modeling identity persisting across a replace is undesirable, we always have the option of explicitly deleting and inserting instead.

It is this second definition that we have taken in the editor.

Uses for t-indices in the UI

Associating an identity with nodes on the moment of their creation, and having a simple way (namely: consecutive numbers) to refer to them is useful in a number of ways.

A first such use case is in presenting historical views to the user. Consider again the following 3 operations:

Insert b at s-index 0.
Insert a at s-index 0.
Delete the item at s-index 1.

It is not immediately obvious that the deletion operates on the item b, which has since its creation been shifted to s-index 1 because another item was inserted in its place. This is the case even for this trivial example; in any non-trivial example this lack of clarity renders historical views which only display an s-index quite useless from a UI perspective.

To remedy this, we can use t-indices in our UI, and decorate the above to the show the following:

Insert b at s-index 0, yielding t-address 0.
Insert a at s-index 0, yielding t-address 1.
Delete the item at s-index 1 (which is t-address 0)

Of course, in an actual UI we are not restricted to such a rather clumsy textual notation. Instead, we could e.g. use colors to highlight the relationships between various operations at the same t-address.¹

The idea of a node having an persistent identity in the context of changes to its value is useful more generally. One example is in an interface which has multiple views on a single document, e.g. in the form of multiple tabs or windows. In such a setup one can imagine certain windows displaying part of a larger tree; and edits to this part of the tree being consistently propagated from and to other windows that show the same part of the tree. The idea of sameness can then be modeled using t-paths.

In particular, this means one can edit part of a tree in one separate window, without any possible confusion as to what it means if that part is moved around in the document. (Such moving around may occur as a result of insertions and deletions in its parent node). In cases where the editor has only a single user and a single thread of control, such confusion is ruled out entirely. In distributed and multi-user contexts we are able to significantly reduce the amount of such confusion (“merge conflicts”); more about which in a separate article.

Conclusions

T-indices, and by extension t-paths, are a useful tool to unambiguously talk about a particular node in a mutable tree.

They require a historical view of the tree, but given such a view are quite trivially implemented.

One might be tempted to take t-addresses as the basis of modeling of the 3 operations if it is indeed a useful representation. Note however that for insertions we always need the s-address, and hence such a model would lead to an inconsistant mix of s-addresses and t-addresses in the operations. ↩