Part of my archive of Layover Linux Official posts on Tumblr.

-----------

    2025-09-20

Alright, converting things to lists is implemented. That went pretty smoothly and was instantly rewarding.

```prone
prn> list()
[]
prn> list(89.9)
[#LITERAL_NUM, "89.9"]
prn> list(u8(5))
[u8(0), u8(1), u8(2), u8(3), u8(4)]
```

The really interesting case is the raw number, since it's kind of a whirlwind tour of Prone internals/theory. Number literals don't really have a concrete type, until you cast them to one. So what are they in the meantime?

Well, Prone is full of things called Constructs, which are used to hold non-C functions, AST type stuff, etc. One of the big purposes of the language is meta-programming, so the only difference between code and data is whether that data has its work uniform on at the moment. We represent that with a bit of the tagged pointer, so the same underlying Value Vector in memory might be viewed as a list in one place, a construct in another.

In this spirit, a literal number is just a list that follows a particular convention for its innards (being a pair of the `#LITERAL_NUM` atom and a string), and is marked as a construct. Sure we have special handling for the repr of some recognized constructs...

```prone
prn> 89.9
literal::num("89.9")
```

But if we didn't have that special handling, you'd see it fall back to the generic repr for constructs where the language isn't really sure what the fuck you're on about. In that hypothetical world, you'd see something a little more honest about what's under the hood!

```prone
prn> 89.9
construct([#LITERAL_NUM, "89.9"])
```

So naturally, lists and constructs convert back and forth to each other _fast_ and share memory, because the only difference is the tag of the pointer. Only one bit of the tag, for that matter! So if you convert a literal number to a list, it makes total sense why you would expect that to give you what it does. That's the VVec that the parser cooked up special, just for you, when it consumed a sequence of digits of the source code.

```prone
prn> list(89.9)
[#LITERAL_NUM, "89.9"]
```


So that was excellent, and I think that fleshing out library functions is the next round of work that'll be worth my time. There's a treasure trove of tools that are currently C functions, but not exposed in the in-language standard library. Until I do that, you're pretty limited in what the REPL can do, especially with limitations like "oof, I don't actually have parsing of lists implemented quite yet." The standard library is the easiest way to work around parser limitations, and then it'll be easier to poke around and exercise the parser as it catches up.

## Single-file header experiment

Now for the sad news. Ever since migrating to parallel compilation for the test suite, the tradeoff has been "well each of those parallel builds needs to parse the headers all over again." Given this, header parsing is abruptly on the radar as a juicy thing to optimize to keep build times fast and feedback loops hot.

With that in mind, I started wondering how much faster things could go if all the headers were in a single file. This was actually a little obnoxious to do, as it's easy to get a naive sequencing of the files, but it did need plenty of manual tweaks, and `gcc -E` also expands stdlib headers so I couldn't just use that for a quick and dirty experiment (I tried). At the end of this, I was able to determine that I was maybe able to get a 10ms speedup, but it was impossible to confirm.

The lesson here is that there's no juicy target here after all. The header reading and parsing is probably no bottleneck at all. Really, it's just that there's a base cost to spinning up a modern compiler process, and reading multiple header files from OS disk cache is just a drop in that bucket. If there's anything to be optimized here, it's in the _contents_ of the headers, not whether they're spread into a very trivial fetch quest.

That said, there are advantages to having a single header file, in terms of enabling the optimizations that might actually matter. It's easy to see and modify the order that the compiler slurps up source code. It's more apparent where I'm defining functions that probably should just be declarations (there's an engineering tradeoff here about inlining for runtime perf, but I probably overcorrected towards inlining). So if I want to make these headers faster, a single header structure may not do much on its own, but it might make the real fixes more human-friendly to pull off.

So I don't think it was a waste of time to do an experiment that technically disproved my hypothesis. I learned some valuable stuff. It just probably won't be valuable immediately, as it means the low-hanging fruits are other places for now.

-----------

    2025-09-20

Actually, to put a fine point on the editing benefits of a single-file header, I realized: "for my really trivial benchmark where I include the headers but don't use what's in them, I could just go through and try ripping out all the function implementations."

And yeah. Holy shit, I did get serious speedups. Here's how it breaks down, using 5 compilations in immediate succession to stabilize the timing. These all use the same file with a trivial main function, importing and not using one of the variants of the Prone headers.

- Existing header fractal of files: 97ms.
- Single file, some functions defined: 94ms.
- Single file, declarations only: 46ms.

That's insane. That's _insane._ On a hot disk cache (normal for my workflow), consolidating the source files got me 3ms, but ripping out function defs chopped the compilation time in less than half. This is why we experiment, people! We learn things!

So yes, I'll be pursuing that in a more meaningful, non-quick-and-dirty manner. The new header technically probably wouldn't work as-is, and it's probable that I could get some further speedups by splitting this header in two (one for public interface, and one for some internal helpers that are only needed when compiling `libprone.so` itself).

As for retaining runtime performance, that actually doesn't matter so much yet, but I'd probably be better served by either using LTO, or some optional definitions behind an ifdef (maybe in a third header file that's simply not included if you don't ask for it). For the test suite, runtime performance is extremely not the bottleneck, and we're better served by faster compiles than anything else.

Final fun note: I was kinda playing with compilation runs as I chopped this down, out of curiosity, to see if I was getting anywhere. One of the big drops happened when I got some static tables out of the header, that probably shouldn't have been there in the first place, but were important supporting infrastructure for the type dispatch in functions like `dv_release`. So the tables were in the headers, because these functions were defined in the headers, and needed the tables. Until I measured, I had no idea how much this "cool technique" was pessimizing my build times. Live and learn, I guess.