Disk Usage 101: An Intro To Du, Df, And Tools For Analyzing Storage Utilization
Defining Key Terms: du, df, inode, storage utilization
When managing a computer system, monitoring and understanding disk space usage is critical. There are some key terms and concepts to understand when analyzing storage utilization:
- du – Short for “disk usage”, this command provides statistics on disk space used by files and directories on your filesystem.
- df – Short for “disk free”, this command reports on the overall disk space usage and availability on mounted filesystems.
- inode – Every file and directory stored on a filesystem is defined by an inode, which stores metadata about the file size, permissions, and location of the data blocks that contain file contents.
- storage utilization – The amount of disk space that is currently occupied with files and directories on mounted filesystems.
Why Monitoring Disk Usage is Important
Carefully monitoring disk usage is crucial for several reasons:
- Prevent running out of free space – If a disk becomes 100% full, new data cannot be written. This can cause errors and prevent applications from operating properly.
- Maintain performance – As disks fill up, performance often slows from increased fragmentation and extra time needed to find free blocks for writing data. Monitoring usage helps catch problems early.
- Identify wasted space – Old log files, temporary data, and “forgotten” files can slowly eat up available space over time. Monitoring helps find these space hogs.
- Plan storage upgrades – Tracking usage patterns helps administrators predict when additional storage capacity will be needed in the future.
Monitoring and understanding exactly how disk space is utilized enables both preventing immediate problems and planning for future growth. Now let’s dive into the tools and techniques for analyzing usage data…
Listing Folder Sizes with du
Syntax and Options for du
The du command is used to scan directories and measure how much space is used by the files and subdirectories they contain. The basic syntax is simple:
du [options] [directory|file ...]
By default du will recursively scan all subdirectories beneath the specified path and display the grand total disk usage at the end. Commonly used options include:
- -h – Print sizes in human readable format (KB, MB)
- -s – Only display the summary grand total size
- -c – Produce grand total and display a grand total at the end
Examples for Listing Folder Sizes
Some examples help illustrate how du can be used to measure disk usage:
# Measure home folder disk usage du ~ # Human readable output du -h /var/log # Only the summary du -s /home/*
Additional examples:
- du -h /home –max-depth=1 – List all top level folders under /home with human readable sizes
- du -c / | sort -n – Sort a full directory listing by size to find large space users
Understanding du Output
The du output displays the disk usage total for the contents of each subdirectory it scans, culminating in a grand total for the specified path. The numbers reflect the actual space used by the files within – which can differ substantially from the apparent file sizes.
Hard links and internal fragmentation due to filesystem block sizes can result in smaller apparent file sizes. Sparse files with empty gaps result in actual usage less than the file length. And duplicate file instances from hard links get counted multiple times in the usage totals.
Understanding du output allows properly interpreting the real storage utilization…
Summarizing Disk Usage with df
Syntax and Options for df
While du focuses on measuring files and directories, the df command reports on filesystem disk space usage. It displays metrics on the total, used and available space on mounted filesystems. The basic syntax is:
df [options] [file|directory]
Commonly used options include:
- -h – Print sizes in human readable format (KB, MB)
- -T – Include filesystem type
- -i – Report number of used and free inodes
Reading df Output
A df command by itself will report on all mounted filesystems. A sample output may look like:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 48720528 4420964 41834036 10% / /dev/sdb1 297965564 26660308 256691420 10% /data
The output displays space usage metrics on every mounted filesystem, including:
- Total 1K-blocks – Total disk space on that filesystem
- Used – Total space used currently
- Available – Remaining free space available
- Use% – Percentage of space used on this filesystem
- Mounted on – The mount point where this filesystem is attached
Examples for Checking Disk Space
Some useful examples include:
# Check all filesystems df -h # Check just the root filesystem df -h / # Include inode counts df -i /home
The df output provides an overview of disk usage on mounted filesystems…
Inodes Explained
Definition of Inodes
Inodes are a key concept to understand when analyzing disk usage. An inode is a data structure on Unix filesystems that defines a file or directory. Each inode stores the attributes and disk block locations for that file’s data.
Information stored in an inode includes:
- File type (regular file, directory, symlink, etc)
- Ownership
- Permissions
- Size
- Timestamps (access, modify, change)
- Pointers to data blocks
Inodes organize and manage the underlying storage space used by files and directories. An individual inode is required for every unique file.
Viewing Inodes Usage
The df command can display inode usage statistics using the -i option:
# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/disk1s1 4294967295 4881337 37520402 12% / /dev/disk2s1 85898690 5560072 80298618 7% /data
This shows the number of used and available inodes on each filesystem. Monitoring the inode usage percent allows predicting when a filesystem may need to be expanded due to running out of inodes.
Running Out of Inodes
It is possible for a disk usage percent to be low while the inode usage is 100%. This can happen for instance when there are a large number of small files consuming one inode each. Actions to take if running out of inodes include:
- Delete unnecessary small files such as old logs
- Consolidate files into archives
- Resize or add disk volumes and filesystems
Monitoring inode usage is crucial to measure alongside disk usage…
Identifying Large Files
Finding File Sizes with du and ls
Finding and cleaning up unneeded large files can free up significant disk space quickly. Here are approaches to finding sizes of files:
# List files by size with du du -hs * | sort -h # Use ls and sort by size ls -lSh | head
This will surface the top space consumers for manual inspection and cleanup.
Locating Log Files and tmp Files
Log files in /var/log and temporary files often accumulate heavily over time. Some specific places to check include:
- /var/log – Application and system logs
- /var/tmp – Temporary application files
- /tmp – Temporary files for users/services
- ~/tmp – User temporary space
Old log files in particular can get very large but may be unnecessary to retain…
Cleaning up Unneeded Files
Tips for reclaiming wasted disk space:
- Delete or archive old log files
- Remove stale application tmp files
- Find and remove duplicate files
- Cleanup unused user accounts
Monitoring usage and targeting large space hogs is key…
Monitoring Usage Over Time
Automating Disk Usage Checks
Checking disk usage periodically helps spot gradual space usage growth trends. Automated checks can be set up using cron jobs or specialized tools to schedule disk usage reporting such as:
- cron job running df and du
- dureporter – disk usage reporter
- diskusg – configurable disk usage monitor
Scheduled disk usage check jobs monitor gradual changes over time…
Graphing Disk Usage Trends
Graphing usage percent over time provides helpful visualizations of usage growth trends. Scripts can capture df percentage periodically, with the time series data graphed using tools such as:
- Matplotlib – Python graphing library
- Grafana – Graph dashboard visualization
Graphing usage helps correlate disk usage growth to application changes…
Setting up Alerts and Notifications
Alerts can proactively notify administrators as disk capacity limits are approached. These alerts can be configured at warning thresholds such as:
- 80% usage – Warning notification
- 90% usage – Critical alert
- 100% usage – Event trigger
Integrating alerts allows catching problems before capacity is fully consumed…
Optimizing Disks for Capacity Needs
Adding New Disks
As storage needs grow over time, additional disk volumes can help provide expanded capacity:
- Add secondary data disks to VM instances
- Configure RAID arrays with additional storage
- Add network attached storage (NAS) capacity
Expanding the underlying storage allows increasing available filesystem space…
Partitioning and Mounting Volumes
New raw block devices must be partitioned and formatted before use as filesystems:
# Partition with fdisk fdisk /dev/sdb # Format with filesystem mkfs.ext4 /dev/sdb1
The filesystem can then be permanently mounted for usage…
Extending Logical Volumes
For existing logical volume groups, the volume group can be extended with the new disks simply using lvextend:
lvextend -L+5G /dev/vg01/lv01
Expanding logical volumes is an efficient approach for adding more space…