Disk Usage 101: An Intro To Du, Df, And Tools For Analyzing Storage Utilization

Defining Key Terms: du, df, inode, storage utilization

When managing a computer system, monitoring and understanding disk space usage is critical. There are some key terms and concepts to understand when analyzing storage utilization:

  • du – Short for “disk usage”, this command provides statistics on disk space used by files and directories on your filesystem.
  • df – Short for “disk free”, this command reports on the overall disk space usage and availability on mounted filesystems.
  • inode – Every file and directory stored on a filesystem is defined by an inode, which stores metadata about the file size, permissions, and location of the data blocks that contain file contents.
  • storage utilization – The amount of disk space that is currently occupied with files and directories on mounted filesystems.

Why Monitoring Disk Usage is Important

Carefully monitoring disk usage is crucial for several reasons:

  • Prevent running out of free space – If a disk becomes 100% full, new data cannot be written. This can cause errors and prevent applications from operating properly.
  • Maintain performance – As disks fill up, performance often slows from increased fragmentation and extra time needed to find free blocks for writing data. Monitoring usage helps catch problems early.
  • Identify wasted space – Old log files, temporary data, and “forgotten” files can slowly eat up available space over time. Monitoring helps find these space hogs.
  • Plan storage upgrades – Tracking usage patterns helps administrators predict when additional storage capacity will be needed in the future.

Monitoring and understanding exactly how disk space is utilized enables both preventing immediate problems and planning for future growth. Now let’s dive into the tools and techniques for analyzing usage data…

Listing Folder Sizes with du

Syntax and Options for du

The du command is used to scan directories and measure how much space is used by the files and subdirectories they contain. The basic syntax is simple:

  du [options] [directory|file ...]

By default du will recursively scan all subdirectories beneath the specified path and display the grand total disk usage at the end. Commonly used options include:

  • -h – Print sizes in human readable format (KB, MB)
  • -s – Only display the summary grand total size
  • -c – Produce grand total and display a grand total at the end

Examples for Listing Folder Sizes

Some examples help illustrate how du can be used to measure disk usage:

  # Measure home folder disk usage
  du ~

  # Human readable output
  du -h /var/log 

  # Only the summary
  du -s /home/*

Additional examples:

  • du -h /home –max-depth=1 – List all top level folders under /home with human readable sizes
  • du -c / | sort -n – Sort a full directory listing by size to find large space users

Understanding du Output

The du output displays the disk usage total for the contents of each subdirectory it scans, culminating in a grand total for the specified path. The numbers reflect the actual space used by the files within – which can differ substantially from the apparent file sizes.

Hard links and internal fragmentation due to filesystem block sizes can result in smaller apparent file sizes. Sparse files with empty gaps result in actual usage less than the file length. And duplicate file instances from hard links get counted multiple times in the usage totals.

Understanding du output allows properly interpreting the real storage utilization…

Summarizing Disk Usage with df

Syntax and Options for df

While du focuses on measuring files and directories, the df command reports on filesystem disk space usage. It displays metrics on the total, used and available space on mounted filesystems. The basic syntax is:

  df [options] [file|directory]

Commonly used options include:

  • -h – Print sizes in human readable format (KB, MB)
  • -T – Include filesystem type
  • -i – Report number of used and free inodes

Reading df Output

A df command by itself will report on all mounted filesystems. A sample output may look like:

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda1       48720528 4420964  41834036  10% /
/dev/sdb1      297965564 26660308 256691420  10% /data

The output displays space usage metrics on every mounted filesystem, including:

  • Total 1K-blocks – Total disk space on that filesystem
  • Used – Total space used currently
  • Available – Remaining free space available
  • Use% – Percentage of space used on this filesystem
  • Mounted on – The mount point where this filesystem is attached

Examples for Checking Disk Space

Some useful examples include:

  # Check all filesystems    
  df -h

  # Check just the root filesystem
  df -h /  

  # Include inode counts
  df -i /home   

The df output provides an overview of disk usage on mounted filesystems…

Inodes Explained

Definition of Inodes

Inodes are a key concept to understand when analyzing disk usage. An inode is a data structure on Unix filesystems that defines a file or directory. Each inode stores the attributes and disk block locations for that file’s data.

Information stored in an inode includes:

  • File type (regular file, directory, symlink, etc)
  • Ownership
  • Permissions
  • Size
  • Timestamps (access, modify, change)
  • Pointers to data blocks

Inodes organize and manage the underlying storage space used by files and directories. An individual inode is required for every unique file.

Viewing Inodes Usage

The df command can display inode usage statistics using the -i option:

# df -i
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
/dev/disk1s1  4294967295 4881337 37520402   12% /
/dev/disk2s1  85898690 5560072 80298618    7% /data

This shows the number of used and available inodes on each filesystem. Monitoring the inode usage percent allows predicting when a filesystem may need to be expanded due to running out of inodes.

Running Out of Inodes

It is possible for a disk usage percent to be low while the inode usage is 100%. This can happen for instance when there are a large number of small files consuming one inode each. Actions to take if running out of inodes include:

  • Delete unnecessary small files such as old logs
  • Consolidate files into archives
  • Resize or add disk volumes and filesystems

Monitoring inode usage is crucial to measure alongside disk usage…

Identifying Large Files

Finding File Sizes with du and ls

Finding and cleaning up unneeded large files can free up significant disk space quickly. Here are approaches to finding sizes of files:

# List files by size with du
du -hs * | sort -h

# Use ls and sort by size
ls -lSh | head

This will surface the top space consumers for manual inspection and cleanup.

Locating Log Files and tmp Files

Log files in /var/log and temporary files often accumulate heavily over time. Some specific places to check include:

  • /var/log – Application and system logs
  • /var/tmp – Temporary application files
  • /tmp – Temporary files for users/services
  • ~/tmp – User temporary space

Old log files in particular can get very large but may be unnecessary to retain…

Cleaning up Unneeded Files

Tips for reclaiming wasted disk space:

  • Delete or archive old log files
  • Remove stale application tmp files
  • Find and remove duplicate files
  • Cleanup unused user accounts

Monitoring usage and targeting large space hogs is key…

Monitoring Usage Over Time

Automating Disk Usage Checks

Checking disk usage periodically helps spot gradual space usage growth trends. Automated checks can be set up using cron jobs or specialized tools to schedule disk usage reporting such as:

  • cron job running df and du
  • dureporter – disk usage reporter
  • diskusg – configurable disk usage monitor

Scheduled disk usage check jobs monitor gradual changes over time…

Graphing Disk Usage Trends

Graphing usage percent over time provides helpful visualizations of usage growth trends. Scripts can capture df percentage periodically, with the time series data graphed using tools such as:

  • Matplotlib – Python graphing library
  • Grafana – Graph dashboard visualization

Graphing usage helps correlate disk usage growth to application changes…

Setting up Alerts and Notifications

Alerts can proactively notify administrators as disk capacity limits are approached. These alerts can be configured at warning thresholds such as:

  • 80% usage – Warning notification
  • 90% usage – Critical alert
  • 100% usage – Event trigger

Integrating alerts allows catching problems before capacity is fully consumed…

Optimizing Disks for Capacity Needs

Adding New Disks

As storage needs grow over time, additional disk volumes can help provide expanded capacity:

  • Add secondary data disks to VM instances
  • Configure RAID arrays with additional storage
  • Add network attached storage (NAS) capacity

Expanding the underlying storage allows increasing available filesystem space…

Partitioning and Mounting Volumes

New raw block devices must be partitioned and formatted before use as filesystems:

# Partition with fdisk 
fdisk /dev/sdb

# Format with filesystem 
mkfs.ext4 /dev/sdb1

The filesystem can then be permanently mounted for usage…

Extending Logical Volumes

For existing logical volume groups, the volume group can be extended with the new disks simply using lvextend:

lvextend -L+5G /dev/vg01/lv01

Expanding logical volumes is an efficient approach for adding more space…

Leave a Reply

Your email address will not be published. Required fields are marked *