Demystifying Disk Space Usage: A Deep Dive Into Blocks, Inodes And File System Overhead

Describing key concepts – blocks, inodes, file system overhead

Files systems are responsible for organizing and managing file storage on disks. To understand disk space usage, we need to get familiar with some key concepts like blocks, inodes and file system overhead.

Blocks are minimum addressable block storage units used by file systems. A typical block size is 4KB. File contents are stored in data blocks. Inodes (index nodes) store file system metadata, like file permissions, sizes and pointers to data blocks.

Besides user data, additional disk space is consumed by the file system itself as overhead. This includes space reserved for inodes, journaling, block allocation bitmaps and more. Overhead ranges from less than 1% on large disks to 12% or more on smaller disks.

Visualizing how files are stored on disk

When a file is created and written to, the file system allocates an inode and data blocks to store it. The inode stores metadata like ownership and permissions. It points to the file’s data blocks where the actual contents are stored.

As the file grows in size, additional data blocks may be allocated. The inode is updated to include pointers to these new blocks. When deleted, only the inode pointing to data blocks is removed. The data blocks may remain until overwritten by new files. Thus disk usage does not decrease after deletes until new data reuses those blocks.

Introducing common Linux file systems (EXT4, XFS, Btrfs)

There are many file system options on Linux. Some popular ones include:

  • EXT4: Default system on many Linux distros. Reliable legacy option.
  • XFS: High performance for large files and filesystems.
  • Btrfs: Modern copy-on-write filesystem with enhanced features.

The choice depends on priorities like speed versus stability. Each file system has different storage overhead and space usage behaviors.

Tracking Down Where Your Disk Space Goes

Listing directory sizes with du

The du (disk usage) command reports disk space occupied by files and directories. To find large space consumers, run:

# du -sh /*

This sums descendant usage recursively in human readable format. It scans the whole filesystem tree though, so can be slow on big disks.

Finding large files with find and ls

To locate individual large files try:

  
# find / -type f -size +100M

This finds regular files sized over 100 MB. The ls command can also sort files by size:

# ls -lSh | head

This lists the 10 largest files in human readable format.

Using ncdu for interactive analysis

The ncdu tool provides an interactive ncurses-based disk usage browser. It enables drilling down through subdirectories and sorting by size to identify space hogs. Better than du since scanning is dynamic, not needing to crawl the full tree upfront.

Behind the Scenes: Inodes and Links

Explaining how inodes work

Inodes store distinct file metadata. Each inode has a unique id and tracks attributes like permissions, ownership, timestamps and data block pointers. Two files with distinct inodes may share the same data blocks. Inodes limit the number of unique files, not the amount of data stored per filesystem.

Hard links and soft links

Multiple hard links can reference a single inode. This allows one file’s contents to be accessible via multiple paths. Deleting one path does not remove the file if other links remain. Soft/symbolic links create a special type of file containing a path redirecting to the target file.

Impact on disk usage

Hard linked files share one set of data blocks, so do not consume additional space. Soft links only record the path to files. Deleting the original files leaves soft links dangling and space used by the original files is recovered.

Digging Into Disk Layouts

SUPERBLOCK and metadata

The SUPERBLOCK contains critical metadata like block size, inodes count, and data block pointers to key filesystem structures. These structures organize and track available space, inode tables, journal logs and more.

Reserving space for root

Most filesystems reserve space for the root user to enable essential system maintenance. Typically 5% is reserved even when the disk appears completely full for other users.

Understanding partition alignment

Partition offsets should align with erase block sizes of the underlying storage media. Misalignment causes additional write amplification on SSDs. 4KB alignment is recommended for best performance and wear levelling on SSDs.

Getting More Granular With Iostat and XFS

Using iostat to monitor block device IO

The iostat tool reports advanced storage metrics and performance data. This helps identify abnormally high IO loads that may be reducing available space through journaling or logging. Trend analysis can reveal IO congestion issues.

XFS specific tools (xfs_info, xfs_db)

The XFS filesystem includes advanced administration utilities like xfs_info and xfs_db for detailed filesystem inspection. They enable examining precise allocation and utilization data to diagnose space issues.

Best Practices for Managing Disk Space

Setting logfile rotation policies

Log files can accumulate substantial space over time. Configuring logrotate ensures older logs are compressed and archived after reaching a threshold. This avoids filling capacity due to outdated logs.

Enabling TRIM for SSDs

TRIM frees unused blocks on SSDs. Filesystems periodically inform drives which blocks are unused and can be erased. This maintains performance and minimizes write amplification as free space is reused.

Keeping old kernels in check

Unused legacy kernels often pile up after updates, consuming substantial space. Safely removing older kernels avoids filling /boot. Typically only the current and latest working kernels are necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *