Tagged Pointers (LLO Archive)

Created 2025-11-30, last modified 2025-11-30

Part of my archive of Layover Linux Official posts on Tumblr.


2025-08-02

I wanna talk about something cool, but it needs a little context.

Okay, let's say you're implementing a dynamically typed language, and you have some sort of fixed-size type for any value. There's lots of ways to do it, but in Prone, we have 64-bit numbers which act as tagged pointers. Because we allocate all objects on 16-bit alignment, the lowest 4 bits of its address are always zero, so we have room to use those bits for type information. And if all the tag bits are zero, the value isn't a pointer at all! It's a value that fits inline, like a u8 or i32, where the final 16 bits are used to be specific about the inline type.

So here's the interesting question. Let's say you need to make a full copy of an object, but all you have is a tagged pointer. The right way to copy depends on the object type! So how do you do the dispatch?

If you're C++, you probably solve your problems with vtables. An instance of an interface is actually a pair of pointers: one for the underlying data structure, and one for the vtable full of function pointers. Vtables aren't always a bad solution, but they're twice as wide as Prone dynamic values are supposed to be. We can also do a bit better on access patterns, as will be clear in a minute.

If you're Python, there is no such thing as inline values. It's a pointer to some object that starts with specific fields in a specific order - so while it could have extra stuff, it can at least be correctly understood when cast to a PyObject. The type is another object pointed to by this PyObject data. It's a lot of indirection, but also means that even the smallest objects must be many bytes wide. This is one of the big challenges to optimizing the language.

If you're me a few months ago, you might take advantage of the language having an intentionally small set of fundamental types (which we get a lot of mileage out of). Just 4 bits. Just 16 options, one of them meaning inline data, so 15 really. So you can handwrite a release_value(v) function that uses a chain of if-statements to do the right thing. This works, and it helps with locality (you tend to free a bunch of stuff at once, so it's about the verb, not all the vtables for each noun), but this style is hard to write and test. It also introduces a dependency loop in your code. Everything wants to import your basic dynamic value machinery, but that machinery wants to import all the types!

So today, something new, but still verb-centered and compact: a function table of how to do a verb, for each noun. This is perfect for our needs: there's a fixed handful of types, so you can actually just use the tag bits to index into a table of 16 function pointers. So in the new dynamic value code, functions like dv_release is able to do some things without even caring about specific types, then fall back to calling into __DV_Freers[DV_TAG(dv)]. All this requires is populating the tables with forward declared functions, which are implemented elsewhere later in the type-specific files like str.c.

Anyways. It's a neat trick, and a perfect fit for the engineering niche I'm operating in.