Introducing Sia Slice, My Absurdly Cheap Block Storage Solution

tech · Oct 28, 2019 · 5 min read

Sia Slice in action. (On a remote system, with tmux.)

I dabble in cryptocurrencies, occasionally. I hesitate to get too partisan on a subject the Internet takes very seriously, but it seems to me that the fairest judge of a coin’s value is the utility it provides to its holders. So Bitcoin is useful because everyone recognizes and accepts Bitcoin, Monero is useful because it facilitates anonymous transactions, Ethereum has that smart contracts thing going for it, and so on and so forth.

I’m pleased to endorse Sia as another rising star in the cryptocurrency world. It’s a blockchain-backed decentralized storage network that connects renters with hosts, who sell spare hard drive capacity on a globe-spanning swarm of machines that range from Raspberry Pis to university datacenters. Redundancy and encryption for your files, of course, come standard. Remember “Pied Piper,” the fictional peer-to-peer storage network from Mike Judge’s Silicon Valley? Well, Sia is exactly that, except it’s a real system you can store real data on.

Still, it is clearly a work in progress. The official Sia client nicely handles uploads and downloads, but it lacks support for automatic synchronization, making for an experience that feels especially manual compared to, say, OneDrive or Google Drive. There are a number of promising projects (currently in beta) poised to change this: Repertory mounts Sia storage as a local filesystem, while Siasync keeps files synchronized with Sia. But for maximum flexibility, system administrators and enterprise customers would desire block-level—not file-level—access. Very large files, like database dumps, are poorly suited for Sia, which currently does not support partial file updates. And any file synchronization solution would fail to preserve inodes, permissions, and other extended metadata, stripping specialized filesystems like ZFS and Btrfs (both copy-on-write, with support for instant snapshots and data de-duplication) of the very features that make them useful.

The solution is to treat your data not as a collection of files, but as one big, mutable array of bytes. Enter Sia Slice: a small Python program that splits any large file into 100-megabyte chunks for uploading to Sia. Because Sia Slice operates at the block level, it can make 1:1 copies of block devices, disk images, database backups, and other large blobs of data. And on subsequent syncs, it can accomplish partial writes by ignoring the chunks that haven’t changed.

You can obtain a copy, peruse the source code, and find usage instructions on the project homepage.

Notes on Sia

Sia is an emerging platform, so it’s not just the user interface that is rough around the edges; there is also room for improvement, in my humble opinion, in the developer API. I know the Sia developers are reading this, so I promise to be gentle. 🙂

The blockchain, which as of writing is about 17GB large, takes several days to sync on a mechanical hard drive, even if bootstrapped. Yikes! A solid-state will bring that time down to hours. There ought to be a polite, but very visible warning on the Sia download page.
The API documentation is occasionally outdated or incorrect. For example, the call to validate a SiaPath is listed as /renter/validate/*, while the correct call /renter/validatesiapath/* is shown right there in the example demo!
JSON timestamps for access times, modification times, etc. are not represented accurately. In the examples, the timestamps are front-loaded with an unexplained series of numbers, and they also lack quote delimiters, implying they are something other than regular JSON strings.
Bindings for languages that aren’t Go are scarce. For my choice of Python, both pysia and siapy are over 2 years out-of-date, and missing calls. For Sia Slice, I had to write my own bindings—something I would have had to do anyway to take advantage of Python’s asyncio features.
Sia is said to be most efficient with many simultaneous uploads, but my own experience with the daemon—perhaps it’s my pokey 10Mbps residential connection—is that it generally limits itself to one upload at a time. When uploading data with /renter/uploadstream, it is best to use one POST request at a time; otherwise, the uploads may starve each other for computation time.
Presumably due to host or network availability hiccups, Sia occasionally fails to complete an upload, leaving it stuck in a stalled state with less-than-one redundancy. If your uploads are not disk-backed—because you’ve used /renter/uploadstream, perhaps—Sia considers that file lost and will not repair it on its own; you need to invoke /renter/delete and start over. This was a major pitfall that left me scratching my head for several days while the stalled uploads kept piling up. Sia Slice’s solution is to restart uploads that have not completed within 3 hours.
Nitpicking here: /renter/uploadstream lacks a specific method for handling failed or aborted POST requests due to a disconnect, program crash, etc. So when Sia Slice uploads data to Sia, it appends a .part extension—just like your web browser or download manager—to disambiguate between successful and failed partial uploads.
Some of the file attributes are difficult to understand. What is the difference between available and recoverable? Why do most of my uploads get stuck right out of the gate?
Deleting directories with /render/dir is not reliable. After several unit tests failed to clean up after themselves, I simply elected to delete individual files instead.

Lastly, I would suggest a new feature that may prove indispensable to end users: bandwidth throttling. Most home networks are afflicted by an insidious phenomenon called buffer bloat—when the relatively slow upload pipe is completely saturated by traffic, packets from other connections stop getting through. Ping times spike. Web browsing becomes sluggish. Skype and Discord calls become pixelated, then cut out entirely. In general, buffer bloat makes the whole Internet feel unreliable and unusable. (That goes for the download direction too, because TCP ACK replies are also impacted.)

It’s not a concern on my own network, thanks to my high-tech home router with traffic-shaping capabilities; but my neighbor, who is very generously allowing me to “borrow” his faster connection for my oversize initial upload, is not so fortunate. For lack of a bandwidth control in Sia, I had to use an OS-level tool like wondershaper to avoid crippling his digital lifestyle.

However you slice the problem…

Some challenges and gotchas aside, building on top of Sia is totally viable; I found the API cleanly designed and intuitive to use. Sia Slice is now “in production” as part of my backup workflow—I use btrbk to save semi-automated snapshots of all my desktops and servers to an external hard drive, and then I mirror that hard drive to the Sia cloud with Sia Slice, thus constituting a 3-2-1 backup strategy. For me, this was a fun project that involved a wide array of systems, from low-level disk access to asynchronous I/O to curses to HTTP streaming.

While the general “split, hash, compress, and upload” principle behind Sia Slice could be extended to other kinds of object storage, the simple fact of the matter is that no service on the horizon is nearly as affordable (or decentralized!) as the Sia network. Dropbox and Wasabi both quote USD $12/month for the 2TB of storage I use. On Sia, I pay approximately USD $1/month.

Once again, what is the true purpose of cryptocurrency? To provide useful services that no other medium can—something we all tend to forget as we keep our eyeballs glued to the red and green numerals on cryptocurrency exchanges. Block storage that is a whopping 92% below market cost is certainly very useful to me. Perhaps Sia Slice will make Sia useful to you.