Part of my archive of Layover Linux Official posts on Tumblr.

-----------

    2025-11-02

I said I probably wouldn't do any further work on Prone today, but I might anyways, just later in the evening. You see, I like keeping the whole test suite finishing under 1ms on my hardware, and right now, it's on the edge. Some runs go over 1ms. And I thought I knew what the low-hanging fruit would be (some inefficient allocations in the parser) - while that _is_ a problem, there's something even more easy and sweeping to fix.

While I'm a big believer in solving allocation problems by allocating less objects, there's two big elephants crowded into the same room.

1. Some allocations are just unavoidable in a given program. The test suite is a great example of this.
2. For the time being, Prone structures (at least DV ones) have to be heap-allocated. I eventually plan to change this, but it's needed for simplicity and flexibility at this phase of development.

This leaves one thing to do when the heaviest thing in your profiling is memory management: make the memory management faster.

> (Quick tangent: how do you profile something that completes really quickly? Well, I adjusted my test suite to repeat the entire list of tests a configurable N number of times, normally set to one. Then I cranked that up to 1000, and ran it under `perf`. Which got me very useful feedback!)

![[perf_lineup.png]]
![[perf_disassembly.png]]

So let's talk about the Prone interpreter's allocation infrastructure, and why it could use an upgrade.

Currently, the allocator operates in one of a handful of preset modes, with functions to set the mode, get metadata, etc. Most of the test suite operates in `TRACKED` mode in order to detect memory leaks. Tracked mode actually maintains a global Vec of allocation data, which allows us to see whether we freed all the memory we allocated, and print diagnostics. There is a `FAST` mode that just immediately defers to system `malloc`/`free`, and you can see in the assembly how `prn_alloc_free` tries to check for fast mode and dispatch accordingly as early as possible.

So here's the thing. Most tests are _only_ using tracked mode to be able to get a count of unfreed allocations. That's extremely common, in fact it happens in the test suite harness code now rather than in individual test cases. This could be done with a counter, but instead we're using a Vec of allocation metadata, and freeing is the most expensive operation because we have to linearly search the Vec to find the allocation that just died, and remove that one specifically. Most of the CPU time is being spent in this find loop!

So here's what I want to do. Instead of having a single multi-modal allocator, I'm going to make more of a pluggable allocator situation. There will be a global allocator object that just contains a small vtable of functions. This is a pointer indirection, but pretty easy for a branch predictor, since the pointers in this table only change when you change alloc mode. Then I just need multiple implementations of these functions - each mode is a compatible collection of functions that can be copied into the global. Finally, I'm going to add a new mode that's default for the test suite that only has a single counter as its metadata, bumps up on `malloc`, bumps down on `free`. More advanced tests can use the full tracking allocator as they used to.

At least at first, I can totally keep this hidden behind the existing interface, and then eventually allow for the more open-ended `set_alloc_functions(malloc_ptr, free_ptr)` type API to be used directly by consumers. This also makes it easier to replace system `malloc` on weird architectures, try different experimental allocators, etc.

I'm probably also going to remove paranoid mode entirely in favor of just normalizing the use of Valgrind. It's good software, and I'm getting used to using it. Debug mode still has its useful moments, so I'm keeping that.