Blocks and Thunks (LLO Archive)
Created 2025-12-11, last modified 2025-12-11. Visibility: public
Part of my archive of Layover Linux Official posts on Tumblr.
2025-11-02
Big updates today, summarizing a couple days of work. You may remember that last time I gave an update, I was about to pull off a big migration of how the DV_TC_CONSTRUCT typecode is interpreted. Before, it meant the data being pointed at was a Value Vector. Now, it points at a Construct object, which is a structured (fast, easy to interact with in C) version of the same data. It took a little while, but that's done now.
What I started next was Blocks: pieces of code consisting of statements, which don't execute immediately.
prn> my_block = ~{ foo = u8(5); foo:list() }
construct([#BLOCK, ...])
prn> foo
#ATTR_NOT_FOUND
prn> my_block()
[u8(0), u8(1), u8(2), u8(3), u8(4)]
prn> foo
u8(5)
These are going to be used for the blocks in things like if statements, for loops, defer statements, etc. In order to make sense there, they need to inherit and interact with whatever scope they're called in, and they don't take arguments. These will eventually be the building blocks for in-language functions, which have a firm scope boundary and multiple dispatch.
Here's something I've noticed though. Most constructs are evaluated so eagerly, that you don't really have a chance to mess with them for metaprogramming reasons. Blocks help a little, but they don't complete the picture. How do you make an Identifier construct without it immediately turning into an attempt to access that identifier, and so on? Well, this is where I've introduced a third way of looking at sequenced data: Thunks.
A thunk, like a list, is a VVec in memory. Because thunks are for looking under the hood, debugging, making your own, it doesn't make sense to use structured data here. What's important is they mark what should eventually be a construct, so you can build your nice nested thunk object, recursively dethunk it (turn all the thunks into constructs), and not have to worry about lists getting converted too. Mostly, they act like lists: you can push and pop them, and they have a literal syntax.
prn> [% "hello", #world %]
[% "hello", #world %]
prn> [% foo:repr() %]
[% [% #CHAIN_IMM, [% #IDENTIFIER, #repr %], [[% #IDENTIFIER, #foo %]] %] %]
Notice how anything in a thunk literal is recursively thunkified? Convenient! And so is the thunk operator, which is a prefix operator that converts whatever expression follows it into a thunk:
prn> %%foo
[% #IDENTIFIER, #foo %]
This is built right into the parser, so you truly get the raw version of whatever source code you put in there.
This allows you to build your own code dynamically, which isn't too useful or convenient yet, but it'll get there. These are important bones for future meat.
prn> stmts = []
[]
prn> stmts = push(stmts, [% #ASSIGN, foo, u8(5) %])
...
prn> stmts = push(stmts, %%foo:list())
...
prn> block = push([% #BLOCK %], stmts)
...
prn> dethunk(block)()
[u8(0), u8(1), u8(2), u8(3), u8(4)]
prn> foo
u8(5)
Obviously in this case it's far less convenient to make a block from total scratch, a thunky little piece at a time, than to just use a block literal. The benefit is very future-looking. For example, the Prone version of printf will probably work by looking at the format string and arguments, generating a block of code that is specialized to those types, and then executing that block of code. Why would it do that? Because the optimizer, even in Prone 1 probably, should be smart enough to do that code generation ONCE at compile time, and then further optimize the generated code with specific values where the whole value (not just the type) is known, etc. In other words, code generation can be a bit of a roundabout way of doing things in interpreted mode, but it can be great for feeding the optimizer in compiled mode. This has implications for serialization logic, web frameworks, regular expressions and more.
Making it as easy as possible to modify code inside the interpreter, means making it as easy as possible for the optimizer to understand what you're doing, means making very optimized C code that compiles down to very optimized machine code, someday.
Taking a break for the rest of the day now. Functions can wait a little while longer.

