Part of my archive of Layover Linux Official posts on Tumblr.

-----------

    2025-10-31

So I've talked before about how "constructs" in Prone are just lists, but with a different pointer tag, so that in memory, they can be two ways of looking at the same object. That's going to be upheld as behavior going forward, but not in term of performance, and I'm going to talk about why.

Conceptually, constructs are just lists with a deputy badge. They're a sequence of values that may _happen to_ follow some recognizable convention. So, you can append to them, pop from them, etc. The original way of representing constructs in memory was literally just a Value Vector (VVec), same as lists, and in fact you could have the same VVec referred to as a list in one place and a construct as another.

As the language starts needing to do more stuff with constructs, though, it's started to be really useful to have a structured data approach, for a bunch of reasons.

- Most constructs are created once, but never modified, and used many times. For example, a function construct. It needs to support list operations, but those don't have to be fast. Calling the function, though, that _should_ be fast.
- Structured data compiles to smaller machine code because you have more details locked down. The construct "type" isn't an Atom (interned string) assigned an ID at runtime, the IDs are locked in at compile time. VVecs also don't contain their own buffer data, they point to it, so a structured data type could save on that pointer indirection.
- Having a central implementation means that convention-related logic (is the first item `#LITERAL_NUM`? is the second item a String? is the length 2? Okay, let's extract the second item and convert it from DV...) gets consolidated to one place. This makes it easier to change the conventions during this experimental period of development.
- It also means that we can do better sanity-checking with the C type system. If I change how a literal number is stored, this will ripple out to factory function signatures, which will ripple out to call sites. It will also ripple out to anyone extracting fields out of the object.
- By adding a so-called "arbitrary" convention (just a VVec of stuff) we can support constructs that don't fit other conventions, and support arbitrary list operations. This isn't fast, because you're having to convert to VVec, do the list operation, then convert (basically parse) back. But it doesn't have to be fast, for the reasons established earlier.

So now I have two uneasily-coexisting representations of constructs. The original unstructured VVecs, and the new tagged union. Right now, the latter is only used very internally, and as temporary variables, because the DV tag for "construct" means "VVec". So a lot of places that constructs are passed around between code, it has to be unstructured. It's only at the last minute where code will be like "I'll convert to a capital-C `Construct*` to do type dispatch and field extraction." It's very inefficient, it's not as simple as it should be, etc.

So where I'm going with this is, I'm going to change the DV tag to mean the new structured type. This is going to be a sweeping and painful change, but the long-term vision is going to be a lot cleaner and simpler. Construct-related functions will always use the same type. Conversion to and from VVec format will happen less frequently, and at times you'd more expect. The fact that it's more optimized is an unsurprising bonus: the real benefit is making the codebase easier to understand and harder to shoot your own foot. My thinking about software is, it should have good bones, and not be afraid to show them.

This has also affected my thinking about the "table" type. Right now, that's just a list that happens to follow a convention, although it doesn't have its own pointer tag yet. A table is a list of key-value pairs like `[K1, V1, K2, V2, ... KN, VN]`. This is pretty wasteful for a couple reasons. Keys are Atoms, which are 32 bits, but they have to be stored in a list as DVs (64 bits). Also, many operations get faster if you have the keys next to each other in one area of memory, and the values next to each other in another area of memory, rather than alternating Ks and Vs. A dedicated storage format would let us validate tables when you first convert the list, and then maintain invariants and speed after that. So I'm probably going to learn a lesson from Constructs and make a dedicated storage structure for Tables, allowing the conversions between types to be more slow, infrequent, and explicit. This keeps validation logic from proliferating everywhere and happening all the time, which is the cost of loosey-goosey data. This is probably also going to be a helpful methodology for Plexes, which are like Tables but have a declared structure of metadata (including field names and types) which they must adhere to. That's going to be easier to enforce as an internal structure which we can just _convert to and from a listy representation_ as needed, transparent to the user of the language.

Anyways. Wish me luck on the conversion. Changing the meaning of the construct DV tag (with so many things assuming the old meaning) is going to be a nightmare. But it's only going to be harder later, so now is the easiest time to make the change, and I'm acting accordingly!

-----------

    2025-10-31

This might be easier to understand if you have examples of real construct conventions, so here's a few! This is how we represent Prone source code as (potentially manipulable) Prone objects.

```prone
$ abc()
=> construct([#FN_CALL, construct([#IDENTIFIER, #abc]), []])

$ x = y
=> construct([#ASSIGN, construct([#IDENTIFIER, #x]), construct([#IDENTIFIER, #y])])

$ 5.9
=> construct([#LITERAL_NUM, "5.9"])

$ 5.9u8
=> construct([#SUFFIX, construct([#LITERAL_NUM, "5.9"]), #u8])
```

These are _conventions_ because, as long as a piece of data is shaped the right way, certain parts of the language will give it special treatment. There's no penalty for a construct not matching any convention, it just won't get any special treatment, because some pieces of code won't know what to do with it.