I haven't seen any problems with writing data to disk, but I noticed that Holster is creating only single character file names. Looking into this I found that it's due to the default file size being 1 megabyte, so by the time the initial file reaches that size it has generally already created a radix tree with a single character at the top. That doesn't happen in the unit tests because there's a lot less data so the radix trees don't split out as much.
The reason why the radix tree is important is that Radisk only splits files at the start of a node. It does this so it doesn't have to read multiple files to return the node, which would be quite complicated. And because all the radix trees start with a single character, Radisk will only find single character file names to write to.
This works fine but I realised this limited the number of files it will create. The names depend on the available characters for node ids, which are alphanumeric characters plus ~ for user data and ! for the initial file. That means only 64 files in total! So after 64 megabytes of data it will just evenly distribute data amongst the existing files.
I did find a small issue in the fact that there are so few file names to choose between, it's possible that the new file selected to write to could match the current file name, which would overwrite data. I haven't seen this happen but pushed a fix to avoid it in 1.0.7.
Going forward there's a few ways to deal with this. Firstly "64 megabytes is enough for everyone" sounds ok. And if it's not enough the file size will grow but still work. Second is that writing data requires listing files, so there is a cost to having more files anyway. That means keeping the current implementation and adding a process to remove "least recently used" data which will keep things fast. This is my preference and would be interesting to see what moving data out to "offline" or "cold" storage looks like. Coincidentally this came up in the gun chat today with some interest in making it work.
The third way of dealing with this is to look at the Radisk code again. Holster added the top level node check whereas GunDB reads from multiple files. There is probably a middle way that will work that just needs some brave soul to investigate.
Still thinking about Radisk
The reason why the radix tree is important is that Radisk only splits files at the start of a node. It does this so it doesn't have to read multiple files to return the node, which would be quite complicated. And because all the radix trees start with a single character, Radisk will only find single character file names to write to.
This works fine but I realised this limited the number of files it will create. The names depend on the available characters for node ids, which are alphanumeric characters plus
~
for user data and!
for the initial file. That means only 64 files in total! So after 64 megabytes of data it will just evenly distribute data amongst the existing files.I did find a small issue in the fact that there are so few file names to choose between, it's possible that the new file selected to write to could match the current file name, which would overwrite data. I haven't seen this happen but pushed a fix to avoid it in 1.0.7.
Going forward there's a few ways to deal with this. Firstly "64 megabytes is enough for everyone" sounds ok. And if it's not enough the file size will grow but still work. Second is that writing data requires listing files, so there is a cost to having more files anyway. That means keeping the current implementation and adding a process to remove "least recently used" data which will keep things fast. This is my preference and would be interesting to see what moving data out to "offline" or "cold" storage looks like. Coincidentally this came up in the gun chat today with some interest in making it work.
The third way of dealing with this is to look at the Radisk code again. Holster added the top level node check whereas GunDB reads from multiple files. There is probably a middle way that will work that just needs some brave soul to investigate.