mal.haza.website

Application development

The last few weeks have felt like Holster is capable enough to now focus on some application development. Previously trying to work on an app that used Holster meant constantly jumping back into the Holster code trying to fix edge cases or find performance improvements.

But I've recently been able to focus on improving RSStream and haven't had to context switch too much. I've just made a small Holster 2.0.1 bug fix release, but besides that it's feeling solid.

The real improvement I've found is not in any Holster change, but in the data model I'm using. A feed reader is possibly the worst starting application to test Holster with, it's too much data arriving all the time, and it's always new! What I realised is that I was treating everything as graph data, starting from feeds all the way down to properties on an item. But unless you're using the ability to query data on the graph, it's a massive performance hit to create references for every piece of graph data. So at some point it's good to stop and decide what can just be stored as strings.

When I was looking into this Claude reported stringifying items would reduce reference look ups from O(N) to O(1)... I didn't believe this, so cleared the session and started again, but same result!  So I looked further and realised that as well as saving all the property references on an item, the whole item itself didn't need a reference created for it as it became a simple property on it's parent node. This is a big deal for a feed reader since the N mentioned here is the number of items per day per feed, so with this simple change the app feels usable now.

Holster 2.0

A 2.0 release sounds like it should be a big update, and that actually has turned out to be true here, but more by coincidence. Holster uses semver for version numbers, so this 2.0 update is really just for a small breaking API change: a server now needs to be started with either an explicit port number or a websocket server in their config, there's no longer a default port as a fallback.

This was required because some browser testing frameworks were breaking on the const isNode = typeof document === "undefined" check as they also don't define document. The updated check const isNode = typeof process !== "undefined" && process.versions?.node != null, is a better test for server environments but again is also true for testing libraries like jsdom. So the fix isn't to rely on isNode in the tests, but to explicitly require a port when running a server. The real benefit of this isNode change is that Web Workers should now be supported, which is enough of a reason to make the change. But the reason this has become a big update is because I made some radisk changes, and then didn't want to publish those until I was confident it was stable. This meant I just kept adding other patches while watching how the radisk change was performing.

One other API change is that there is no need for a wait parameter on requests, so there's no reason for an options object at all now. This was previously helpful if you wanted to wait longer for certain queries, but this update has dealt with timeout issues internally so there's no longer a reason to specify it. So requests are either fast if you've requested a key previously, or given much more time to check the network if it hasn't been seen before.

My last update mentioned using private browsing mode for testing, but even better than that is going offline and seeing how Holster handles fetching data. The honest answer was not very well, requests would fall through to the network and get stuck in a queue that couldn't be processed. But this also made it quite easy to find what needed improving. When a query is made via the API, the first place to check is in-memory, ie "the graph". There were a few places where it wasn't capturing everything available, so requests would fall through to disk.

Next there were the radisk improvements I mentioned. The change was quite simple, radisk shouldn't ever really time out unless there's an actual problem reading from disk. So timeouts have been mostly removed, file lookup has got faster and file splitting has improved too. It would previously split into the configured size and a smaller file for the overflow, now it aims for half each which is what GunDB tries to do too.

So now if the browser has previously synchronised all available data, no network requests are required. Lastly if something does fall through to a network request when offline then there's much better recovery there too.