==================================================================<
** Original area : "/grc/techtalk"
** Original message from :
0x6c976bb3@lookup.openpgp.key.invalid (Andrew Skretvedt)
** Original message to :
** Original date/time : 25 May 26, 02:43 >==================================================================<
Last month, there was a nice thread started by Peabody asking a question
about disk metadata.
Message-ID: <10ru7ci$k8m$1@GRC>
https://grc.com/groups/techtalk:300610
Tonight, I just ran across a pretty awesome looking space-usage analysis
tool for folks running one of the more advanced filesystems with
features like Copy-on-Write reflinks, snapshots, embedded data
compression, and the like.
These advanced features tend to make the actual storage required much
less than the apparent size of a file set.
This tends to make the process of accounting for space usage on such a filesystem a little harder and more abstract.
The tool is 'btdu', for Linux. It sports a TUI for terminal awesomeness.
It also has a straight cli-mode.
https://github.com/CyberShadow/btdu
It's most unique property is that it is a /sampling/ disk usage
analyzer. Rather than starting at the root directory or some other
directory and recursively enumerating all the objects therein to
generate space usage stats, it treats the filesystem rather like a dart
board and starts throwing darts into the on-disk address space. As they explain in their readme, for each dart location, they ask the filesystem
what file(s) point to the data there; then they collect metadata about
those files.
Like sampling a population, sampling a filesystem in this way builds an estimate of the usage state of the disk, which improves in its "margin
of error" rapidly as the number of samples builds. They say that as
little as 100 samples can estimate the space usage of a filesystem to
~1%. So you can get an idea of what's hogging all your space /very/ quickly.
(if you think about it, if you have space hogging files, one of those
initial 100 darts are likely to land on a block connected with one of
them, since their "target" size is bigger; their data is more likely to
be the earliest accounted for in a sampling run)
Typically, the sampling proceeds until you stop it. So the estimate
starts inaccurate, improves to reasonable in a short time, and after
this gradually converges toward an /exact/ report once /every/ block of
the filesystem has been sampled.
If you run Linux and use BTRFS filesystems especially, then this is
worth a look.
(I haven't studied-up yet enough to know if it can also work with ZFS,
XFS, and other filesystems with similar advanced features. Nor if you
could use it on BSD or macOS.)
My Mint 21.3 and 22.2 systems don't have this tool in their default
package manager; the tool seems relatively new (still showing version
0.x.y). So, I intend to setup the required build dependencies (written
in D) and build a copy from source.
I'll report back with some experiences once I have a binary to try out!
--
OpenPGP 0xC6901B2A6C976BB3 (
https://keys.openpgp.org)
--- OpenXP 5.0.64
* Origin: (618:400/23.10)