Handling Partial Reads And Writes When Using Dd

The Problem with Partial Transfers in dd

The dd utility is commonly used for copying data from one location to another in Unix-like environments. It works by reading a block of data from the input source, then writing that block to the output destination, before reading the next block. This block-by-block approach allows dd to make exact copies of devices and files.

However, a major downside of this methodology is that dd is prone to partial reads and writes during the transfer process. If the copy gets interrupted prematurely, such as from a power outage or system crash, dd may end up writing only a portion of the latest buffer to the destination before halting. This can result in incomplete copies and data corruption.

There are a few common scenarios that can trigger partial transfers in dd:

  • The source or destination devices get disconnected unexpectedly
  • Errors occur when reading from or writing to storage media
  • The dd process gets killed manually or by the OS
  • System instability causes dd to crash or hang

When any of these situations occur, the dd operation stops but the final output buffer may only get partially written to the destination. This leaves the copy corrupted and unusable.

In addition, partial reads can occur if the block size for dd is set higher than the source device allows. For example, if you try to read 1 MB blocks from a disk that has an internal block size of 4 KB, the final read might return less than the full 1 MB requested. This also leads to partial transfers.

Fortunately, there are some dd options and techniques which help detect and handle partial reads/writes during copy operations.

Using the status=progress Flag

The easiest way to monitor for and identify partial dd transfers is to use the status=progress flag. This displays a progress bar with the current copy rate and total data transferred so far while the operation runs:

dd if=/dev/sda of=/dev/sdb status=progress

That way if the transfer rate suddenly drops to 0 or the progress bar fails to reach 100%, you know the copy halted prematurely and results in a partial write.

Here is some sample output from dd using status=progress showing an interrupted transfer:

2042880+0 records in
1904+0 records out
199491584 bytes (199 MB, 190 MiB) copied, 0.402029 s, 496 MB/s
^C0+0 records in
0+0 records out
0 bytes copied, 0.00102059 s, 0 MB/s

As you can see, the transfer was moving along at ~500 MB/s when it was terminated, as noted by the ^C exit code. This leads to a partial write of only 199 MB out of a potentially much larger total file size. The progress bar never finished either.

By monitoring the dd status output, you can quickly identify incomplete copies and rerun the operation if needed.

Retrying Partial Transfers

When a partial read or write gets detected from the dd status, there are a couple different options for recovering and retrying the transfer:

Automatically Resume File Copy Position

If you specify conv=sparse, dd keeps track of the current file position it has copied up to and will resume from that point in subsequent runs. For example:

 
dd if=/dev/sda of=/backup/disk-image conv=sparse

Now if this gets interrupted, rerun the same command and dd automatically starts from where it left off without needing to manually calculate byte offsets.

Manually Specify Skip and Seek Values

Alternatively, you can manually retry partial transfers by specifying the exact number of bytes to skip both in the input file and output file:

dd if=/dev/sda of=/backup/disk-image skip=199491584 seek=199491584

This tells dd to skip reading the first 199 MB at the start of the input file since that part already got copied successfully before. It also seeks to the 199 MB position in the output file so the additional data gets appended from there.

The key is to make sure the skip and seek values match the last known good byte position transferred. You can retrieve this byte offset either from the final status=progress output or by checking the size of the partial output file already written.

Then dd will simply resume the copy from there and complete the remaining data transfer.

Avoiding Partial Writes to Destinations

While being able to retry failed transfers is useful, it’s better to avoid partial writes altogether. There are a couple useful dd settings and system tweaks that significantly lower the chances of interrupted copies resulting in incomplete output files.

Sync Output File After Each Block

Adding the osync flag forces dd to flush and sync the output file after every block written. For example:

dd if=/dev/sda of=/backup/disk-image conv=sparse,osync

This minimizes potential data loss if the transfer halts unexpectedly. At most a single block would be lost rather than a large buffered chunk.

Stabilize Destination Drive Caching

Many modern disk drives rely on volatile write caches to boost performance. However if power is interrupted it could cause recent writes to get lost. You can disable caching for a drive using hdparm:

hdparm -W 0 /dev/sdb

Or if caching can’t be disabled completely, you can manually force all current cache data to be flushed out to persistent storage by running:

  
sync && sleep 1 && echo 3 > /proc/sys/vm/drop_caches

Do this periodically to stabilize the write cache. This ensures the maximum amount of new copy data gets persisted rather than left in volatile cache should an unexpected halt occur.

Handling Unaligned Block Sizes

Another common cause of interrupted transfers relates to mismatched block sizes. dd reads input files in chunks based on the specified ibs block size. Then it writes those same sized chunks to the destination defined by obs.

Problems arise if you try to read large blocks that aren’t aligned with the source device, or attempt to write blocks larger than what the destination can accommodate per operation. This leads to partial read/write attempts.

Detecting Native Block Sizes

To avoid mismatches, you first need to check the actual hardware-level block size used by the drive you’re reading/writing. In Linux you can find the sector size with:

cat /sys/block/sda/queue/hw_sector_size

Then set dd to use a matching ibs obs sizes aligned to this base block device value. That ensures you stay within the device read/write constraints.

Tuning dd with bs Values

If needed you can also manually configure custom input (ibs) and output (obs) block sizes for dd that better match your specific hardware:

  
dd if=/dev/sda ibs=1024k obs=512k of=/backup/disk-image 

Just be careful not to request block sizes larger than either device supports for atomic operations. It’s also wise to keep ibs and obs equal unless you have an explicit need for asymmetry.

Tuning the block sizes properly prevents dd from ever needing to do partial reads or writes, eliminating this cause of interrupted transfers.

Recovering Data After a Failed Transfer

If a dd copy operation does result in a partial write despite efforts to avoid this, there are still methods for recovering and completing the transfer.

Check Destination File Size

After a failed dd run, the first thing is to check the size of the output file already written:

ls -lh /backup/disk-image

This tells you the exact position within the source data that the copy process stopped at. Armed with this knowledge of the last good offset, you can retry the transfer without re-copying existing destination data.

Restore Truncated Final Block Data

During a partial write event, the last block transfer can get interrupted midway through writing. This results in a truncated final block in the output file compared to the source.

To restore this final partial block, use dd to copy only this small range of source data into position on the output file:

dd if=/dev/sda of=/backup/disk-image skip=199491584 seek=199491584 bs=512 count=1

Adjust the skip, seek, and count values as needed based on your file sizes and block length. This neatly repairs the truncated section without overwriting any subsequent good data after it.

Resume Transfer from Last Known Good Offset

With the existing copy data validated and repaired, you can simply rerun the initial dd command while specifying skip and seek locations to pick up exactly where the process previously failed:

  
dd if=/dev/sda of=/backup/disk-image skip=199491584 seek=199491584

This seamlessly resumes the transfer without rewriting any previously copied content. The skip and seek values direct dd to only access not-yet-transferred data.

Following these steps, you can successfully recover even from a disastrous partial dd copy and complete the operation while preserving existing destination data.

Leave a Reply

Your email address will not be published. Required fields are marked *