We Need to Talk About Constructs (LLO Archive)

Created 2025-12-09, last modified 2025-12-09. Visibility: public

Part of my archive of Layover Linux Official posts on Tumblr.


2025-10-31

So I've talked before about how "constructs" in Prone are just lists, but with a different pointer tag, so that in memory, they can be two ways of looking at the same object. That's going to be upheld as behavior going forward, but not in term of performance, and I'm going to talk about why.

Conceptually, constructs are just lists with a deputy badge. They're a sequence of values that may happen to follow some recognizable convention. So, you can append to them, pop from them, etc. The original way of representing constructs in memory was literally just a Value Vector (VVec), same as lists, and in fact you could have the same VVec referred to as a list in one place and a construct as another.

As the language starts needing to do more stuff with constructs, though, it's started to be really useful to have a structured data approach, for a bunch of reasons.

So now I have two uneasily-coexisting representations of constructs. The original unstructured VVecs, and the new tagged union. Right now, the latter is only used very internally, and as temporary variables, because the DV tag for "construct" means "VVec". So a lot of places that constructs are passed around between code, it has to be unstructured. It's only at the last minute where code will be like "I'll convert to a capital-C Construct* to do type dispatch and field extraction." It's very inefficient, it's not as simple as it should be, etc.

So where I'm going with this is, I'm going to change the DV tag to mean the new structured type. This is going to be a sweeping and painful change, but the long-term vision is going to be a lot cleaner and simpler. Construct-related functions will always use the same type. Conversion to and from VVec format will happen less frequently, and at times you'd more expect. The fact that it's more optimized is an unsurprising bonus: the real benefit is making the codebase easier to understand and harder to shoot your own foot. My thinking about software is, it should have good bones, and not be afraid to show them.

This has also affected my thinking about the "table" type. Right now, that's just a list that happens to follow a convention, although it doesn't have its own pointer tag yet. A table is a list of key-value pairs like [K1, V1, K2, V2, ... KN, VN]. This is pretty wasteful for a couple reasons. Keys are Atoms, which are 32 bits, but they have to be stored in a list as DVs (64 bits). Also, many operations get faster if you have the keys next to each other in one area of memory, and the values next to each other in another area of memory, rather than alternating Ks and Vs. A dedicated storage format would let us validate tables when you first convert the list, and then maintain invariants and speed after that. So I'm probably going to learn a lesson from Constructs and make a dedicated storage structure for Tables, allowing the conversions between types to be more slow, infrequent, and explicit. This keeps validation logic from proliferating everywhere and happening all the time, which is the cost of loosey-goosey data. This is probably also going to be a helpful methodology for Plexes, which are like Tables but have a declared structure of metadata (including field names and types) which they must adhere to. That's going to be easier to enforce as an internal structure which we can just convert to and from a listy representation as needed, transparent to the user of the language.

Anyways. Wish me luck on the conversion. Changing the meaning of the construct DV tag (with so many things assuming the old meaning) is going to be a nightmare. But it's only going to be harder later, so now is the easiest time to make the change, and I'm acting accordingly!


2025-10-31

This might be easier to understand if you have examples of real construct conventions, so here's a few! This is how we represent Prone source code as (potentially manipulable) Prone objects.

$ abc()
=> construct([#FN_CALL, construct([#IDENTIFIER, #abc]), []])

$ x = y
=> construct([#ASSIGN, construct([#IDENTIFIER, #x]), construct([#IDENTIFIER, #y])])

$ 5.9
=> construct([#LITERAL_NUM, "5.9"])

$ 5.9u8
=> construct([#SUFFIX, construct([#LITERAL_NUM, "5.9"]), #u8])

These are conventions because, as long as a piece of data is shaped the right way, certain parts of the language will give it special treatment. There's no penalty for a construct not matching any convention, it just won't get any special treatment, because some pieces of code won't know what to do with it.