Effectively Using Rsync Filters To Selectively Copy Files

What rsync Filters Allow You To Do

The rsync utility allows you to efficiently transfer and synchronize files between locations. One of its most powerful features is the ability to selectively choose which files to copy or ignore during the transfer process. This is accomplished through versatile include and exclude filtering options.

Rsync filters provide granular control over file selection. You can choose to copy only files matching defined include filters while omitting any files matching exclude filters. This allows you to craft precise rules to only transfer the files you need without wasting time and bandwidth on unnecessary files.

With properly constructed filters, rsync can copy only specific file types, date ranges, directories, or filenames based on flexible pattern matching rules. The filters support both simple wildcard strings as well as more advanced regular expression syntax for exact matching requirements.

File selection filters help limit rsync to copying your critical data while avoiding extraneous files. This improves transfer efficiency. Understanding rsync’s pattern matching syntax and how to combine inclusive and exclusive filters enables creating custom copy rules automating complex file selection tasks.

Understanding Include/Exclude Pattern Matching

Rsync’s include and exclude filters use file-matching patterns to determine which files to transfer or ignore. These patterns allow different levels of flexibility:

  • Literal filenames: Matched exactly with no wildcard elements.
  • Shell-style wildcards: Basic ? and * wildcards to match groups of characters.
  • Regex: Full regular expression pattern recognition for advanced matches.

The more precise the pattern, the more exacting rsync’s file matching logic. Literal names only apply to exact filenames provided. Shell wildcards allow loose pattern matching using ? to match any single character and * matching zero or more characters.

For the most flexible pattern logic, rsync filters also support full Perl compatible regular expressions (Regex). This exposes advanced pattern recognition with quantifiers, character classes, anchors, grouping, and other expressions to zero in on files you want to include or exclude from transfer.

Literal Filename Matching

The simplest rsync filter is a literal filename provided. Some examples:

report-2023.pdf
archive.zip
index.html

Rsync tries matching these literal names directly. Only an exact path and filename match would get included. If no match, rsync would skip that file.

Shell-Style Wildcard Matching

Shell-style wildcards extend pattern matching capabilities in rsync filters. The ? and * characters allow matching unnamed characters.

Using ? matches exactly one character position. * matches zero or more characters. So you can craft flexible file filters like:

report-202?.pdf   (Matches report-2023.pdf, report-2024.pdf etc.)
arch*.zip         (Matches archive.zip, archives.zip, archdata.zip etc.)  
index.html

This makes it easier to match multiple similar files without defining every literal possibility.

Regular Expression Matching

For the most advanced matching, rsync supports Perl-style regular expression (Regex) pattern recognition.

Regex enables very powerful filter strings, recognizing patterns based on text positioning anchors, character ranges/classes, quantifiers, grouping, and alternation constructs not possible in standard shell wildcards.

Some examples of more surgical file matching in rsync with Regex:

^report-202[3-6]{1}.pdf$   (Matches report years 2023 to 2026)   
^arch(ive|ives)?.zip$     (Matches archive.zip or archives.zip)
.*index.* (.html|.php)     (Matches index.html or index.php etc.)  

This allows tightly controlling the filter to parse filenames with extreme precision. Understanding Regex constructs enables crafting filter patterns to match your target files regardless of naming variability.

Crafting Precise Include Filters with –filter

Rsync’s –filter argument allows defining an include filter file-matching pattern. This controls which files rsync examines further to determine if they should get transferred based on any defined exclude filters.

The include filter acts as the gatekeeper, only allowing through files with names matching your criteria. All non-matching files get excluded without considering them further.

Some examples of using –filter with an include pattern:

rsync -a --filter='report-202[3-6].pdf' /sourcedir/ /destdir
rsync -a --filter='*.zip' /sourcedir/ /destdir 
rsync -a --filter='.*index.*(.html|.php)' /sourcedir/ /destdir

As these demonstrate, include filters support regex, wildcards, and literals. Define your filter to match the core files required, omitting unrelated data. This improves transfer speed by ignoring non-essential content.

Filtering Specific File Types

A common scenario is using rsync filters to only transfer files of certain types while excluding all other file formats. This relies on pattern matching elements of the file extension. Some examples:

# Only PDF reports 
rsync -a --filter='*.pdf' /sourcedir/ /destdir  

# Only image media files
rsync -a --filter='*.(jpg|png|gif)' /sourcedir/ /destdir

# Only HTML documents   
rsync -a --filter='*.html' /sourcedir/ /destdir

Adjust the regex in filters to recognize your target file types while skipping unrelated formats. This prevents wasted copy time on binary documents, logs, metadata files that are not needed in the transfer.

Filtering by File Dates

When mirroring live file directories, it is often useful to only sync files updated after a certain date threshold, while retaining older legacy files in the destination. The file timestamp can be matched in a filter.

# Only files updated after January 1 2023
rsync -a --filter='[0-9]2023(01(0[1-9]|[1-2][0-9]|3[0-1])|02(0[1-9]|[1-2][0-9])|03(0[1-9]|[1-2][0-9]|30))' /sourcedir/ /destdir

This filter contains a regex explicitly matching file timestamps in YYYYMMDD format after January 1, 2023. Older files get excluded. Rsync only transfers recent updates.

Filtering by Directory Path

If replicating a large multi-level directory, filtering on the root path allows synchronizing only files stored in targeted subfolders rather than all children content.

# Only logs directory (including subdirs)
rsync -a --filter='/logs/.*' /sourcedir/ /destdir

# Only media assets directory 
rsync -a --filter='/assets/media/.*' /sourcedir/ /destdir

This way entire directory subtrees can be included/excluded simply based on parent directory path filters.

Excluding Unwanted Files with –exclude

The –exclude filter gives an escape hatch to omit files that get passed from the –filter include criteria. Think of –exclude as exception cases to handle data you specifically want to leave out of the transfer.

Like –filter, exclude supports wildcards, regex, and literal patterns:

  
rsync -a --filter='.*' --exclude='thumbs.db' /sourcedir/ /destdir
rsync -a --filter='.*' --exclude='/temp/.*'  /sourcedir/ /destdir 
rsync -a --filter='.*' --exclude='*.lock' /sourcedir /destdir

Any files matching the exclude rules get dropped from the transfer even if they fall under the include filter umbrella. This provides precision control to cherry pick the source directory contents.

Excluding System and Hidden Files

A common exclusion filter scenario is omitting hidden system files and dotfiles. These clutter transfers with temporary data, lock files, editor artifacts, thumbnails.

  
# Exclude all dotfiles
rsync -a --filter='.*' --exclude='/\..*' /source/ /dest

# Exclude some common hidden file types
rsync -a --filter='.*' --exclude='/\.(lock|tmp|cache)' /source/ /dest 

Now personal user files, application config dotfiles can stay behind. Rsync focuses on relevant media files, documents, data files.

Excluding Prior Destination Data

When mirroring a live data source to an existing destination directory, previous copy artifacts can be avoided by excluding the destination root folder from the updated transfer.

rsync -a --filter='.*' --exclude='/destdir/.*' /source/ /destdir 

This forces rsync to only overwrite and update files inside of /destdir without removing existing data not present under /source. Useful for incremental live directory mirroring.

Excluding Alternate File Extensions

In cases where multiple file types contain the same root filenames, exclusive filters help choosing one format over another.

# Sync HTML files except .PHP copies
rsync -a --filter='*.html' --exclude='*.php' /source/ /dest

# Sync JPG files except .JPEG copies
rsync -a --filter='*.jpg' --exclude='*.jpeg' /source/ /dest

This avoids duplicate copies with different extensions cluttering the destination.

Combining Include and Exclude for Precision

The power of rsync filters comes from stacking both inclusive and exclusive rules. This allows carefully controlling the source file selection criteria down to precise filenames and attributes.

Some examples combining filters:

# Sync recent JPGs excluding thumbs
rsync -a --filter='*.jpg' --exclude='*thumbs.jpg' --filter='-20230101' /source /dest

# HTML files excluding templates  
rsync -a --filter='*.html' --exclude='/template/' /source /dest

# Documents excluding temp content
rsync -a --filter='*doc' --exclude='*tmp' /source /dest 

Construct the filters to maximize desired file matches while dropping irrelevant content. This orchestration of positive and negative filters creates accurate file selection rulesets.

Additional filter options like –exclude-from bring more ways to chain inclusion and exclusion of files. See rsync man pages for deeper capabilities.

Real-World Examples of Effective rsync Filters

In practice, rsync filters enable automating precision file mirrored backups, web site deployments, cloud data warehousing, and other synchronization workflows.

Web Site Asset Syncing

For web developers publishing live code repositories up to production servers, rsync filters prove useful for keeping only critical files in deployment.

rsync -avz --exclude='.git/' --filter='!^\.#' --filter='/assets/.*' --filter='/templates/.*' --delete public/ staging.example.com:/var/www/example.com/html

This pipeline keeps hidden .git repository folders out of production while including only template assets and scripts directories needed to render the site.

User Home Directory Backups

When backing up remote user home directories, filters help rsync focus on documents and settings while avoiding temporary or cached application files.

  
rsync -az --filter='-20230101' --exclude '.cache' --exclude='*.swp' username@host:/home/username/ /backups/users/username/

Now only important personal files updated this year get mirrored excluding lockfiles, cached application data residing in the user’s $HOME.

Cloud Storage Selective Sync

Mirroring subsets of data to cloud storage helps minimize costs and synchronization times. Filters make this simpler than full data duplication.

rsync -ahuPz --filter='-20230'[0-1][0-9]' --exclude='thumbs.db' onprem-nas:/data bucketname:/cloudstore/nas-subset

This sample grabs only files modified in first two months of 2023, great for cloud archiving. Rsync filters enable precise control over storage sync workflows.

Leave a Reply

Your email address will not be published. Required fields are marked *