Find And Rsync: Alternative Approaches To Selective File Copying

Finding Files Selectively with find

The find command in Linux provides powerful capabilities for searching and selecting files based on a variety of attributes and metadata. With flexible options to match on names, sizes, modification times, permissions, and more, find enables precise targeting of subdirectories and files.

The basic syntax of find is:

find [starting directory] [options] [qualifiers] [actions]

For example, to find all PDF files over 1MB in size in the /home directory:

find /home -type f -name "*.pdf" -size +1M

Here the options are -type f to select only files and not directories, -name to match by file name pattern, and -size to filter by file size. Many additional options are available.

Selecting Files by Name and Path

The -name and -path options provide powerful ways to match files by name or full path and name. Shell glob patterns like *.txt and *.config can be used to match categories of files.

Filtering by Size and Date

You can match files above or below a certain size threshold using -size. Comparators like +100M will find files larger than 100 MB in size. The -mtime and -ctime options select files modified or changed within a specified number of days.

Querying Permissions and User/Group Ownership

The permissions, owning user ID, and owning group ID can also be used to selectively find files with find. For example, finding world-writable files with -perm -002, or files not owned by a user with -nouser.

Combining Tests for More Precise Targeting

Multiple test criteria can be combined using Boolean logic like -a for AND and -o for OR to zero in on specific subsets of files. The command precedence determines order of evaluation.

Synchronizing Files with rsync

The rsync utility can synchronize files between directories on the same or different systems quickly and efficiently. It uses an algorithm to minimize data transfer by only transferring the portions of files that have changed.

Key rsync Options

Here are some key options for controlling rsync’s synchronization behavior:

  • -r: Synchronize directories recursively
  • -l: Copy symlinks as symlinks
  • -p: Preserve permissions, owners, groups, and times
  • -z: Compress for efficient transfer
  • –partial: Keep partially transferred files

Specifying Source and Destination

The basic syntax for rsync is:

rsync [options] source destination

Where source and destination are local paths or remote host locations. Trailing slashes affect whether a directory’s contents are synced or the directory itself.

Exclude Options

Sometimes you want to sync everything except certain subdirectories or patterns. The –exclude option is very useful for this:

rsync -r /sourc/dir /dest --exclude='temp*' 

This will sync everything under /source/dir except files and directories matching temp*.

Combined find and rsync for Flexible File Copying

By chaining find and rsync together, very precise and flexible synchronization scenarios can be implemented to only copy subsets of files matching desired criteria.

Passing find Results to rsync

Output from find can be directly passed to rsync through a pipe:

find /src -name '*.mp3' -mtime -30 | rsync -r --files-from=- /dest

This will copy mp3 files modified in the last 30 days from /src to /dest.

Copy Files Based on Size, Date, Name Patterns

More complex file matching expressions can be built with find to copy particular file types, date ranges, sizes, etc. For example:

find . -size +500k -name '*doc*' -o -name '*pdf' | rsync -r --files-from=- some-server:/archive

Optimizing Performance

Rsync may have to scan the entire source directory tree before transferring files. This can be avoided by using the –prune-empty-dirs option to eliminate unnecessary scans:

find /src -type f -name '*.mp3' | rsync -r --files-from=- --prune-empty-dirs /dest

Use Cases and Practical Examples

Here are some examples of useful selective synchronization scenarios with find and rsync:

Mirror Document Directory, Exclude Temp Files

rsync -r /document/dir /mirror --exclude temporary*  

Sync Last 30 Days of Photos

find /photos -type f -mtime -30 | rsync --files-from=- /external-drive

Backup Configuration Files from Past Week

 
find /etc -name '*.conf' -mtime -7 | rsync --files-from=- /backup  

Sync Select Parts of Servers

rsync -r /var/www/html/css production-web:/var/www/newsite/css

As demonstrated, by combining find, rsync, and creative command chaining, very powerful and customized file synchronization solutions can be implemented for backups, mirroring, replication, and more.

Leave a Reply

Your email address will not be published. Required fields are marked *