Common Shell Glob/Wildcard Syntax On Linux – When To Use Vs Regular Expressions
When to Use Globs vs Regular Expressions
Glob patterns, also known as wildcards, are a simple pattern matching syntax used to match file and directory names in Linux and other Unix-like operating systems. Regular expressions are a more advanced way of defining search patterns that support a wider range of matching capabilities.
Globs excel at simple matches using wildcards to select groups of files or directories based on common naming patterns. They allow the use of common wildcards like * and ? to match multiple or single characters. Globs work very well for tasks like listing the contents of directories and selecting groups of files to copy, move, or delete.
Regular expressions have a more complex syntax with capabilities like matching specific character sets, negative matches, counts of character repeats, and other advanced options. Regular expressions are better suited to matching precise patterns rather than just groups of files. They can define validation rules, search for matches within file contents, and find other advanced pattern matches.
In summary:
- Use globs when selecting groups of files/dirs based on naming patterns
- Use regular expressions for advanced/precise pattern matching
- Globs are simpler with limited matching logic
- Regular expressions have very advanced matching capabilities
Glob Basics and Common Patterns
Globs rely on just a few key wildcards to define patterns:
- * – matches any string of characters
- ? – matches any single character
- [abc] – matches a, b, or c character
These wildcards can be combined together to match common groups of files. Some examples:
- * – matches all files in a directory
- *.txt – matches all text files
- file? – matches file1, file2, etc
- [rf]ile – matches rile or file
Regular Expression Basics
Regular expressions have a complex set of special characters and syntax for matching text:
- .
- []
- ^
- $
- *
- +
- ?
- {}
- ()
- \
Some examples of regex patterns:
- ^file – matches file at start of string
- txt$ – matches txt at end of string
- [0-9]+ – matches one or more digits
- (txt|png) – matches txt or png
Strengths and Limitations of Each
Glob strengths:
- Simple syntax using common wildcards
- Easy to learn and implement
- Great for selecting groups of files
- Lightweight and fast
Glob limitations:
- Limited matching logic
- Can only match filenames/paths
- No advanced validation rules
Regex strengths:
- Very advanced matching capabilities
- Support complex matching logic
- Can validate/process file contents
- Well-suited for programming usage
Regex limitations:
- Complex syntax with steep learning curve
- Difficult troubleshooting failed matches
- Not as user-friendly as globs
- Performance overhead for complex patterns
Recommended Use Cases
Use globs when:
- Selecting groups of files/dirs for common operations
- Matching typical filename patterns
- Writing simple scripts focused on filenames
- You need a fast and simple matching solution
Use regular expressions when:
- Defining validation checks within app/program logic
- Searching the contents of files for text patterns
- Matching precise or complex text patterns
- You need maximum control and flexibility in patterns
Glob Syntax and Examples
Let’s explore the syntax and usage for common glob patterns…
Star Wildcard for Matching Multiple Characters
The * wildcard will match any string of characters in a filename/path. For example:
- * – matches all filenames in a directory
- *.txt – matches all .txt files
- data* – matches data, datafile, datasheet, etc
The * can be very powerful for matching groups of files with similar naming patterns.
Question Mark for Matching Single Characters
The ? wildcard will match any single character in a filename/path string. For example:
- file? – matches file1, file2, fileX
- data?.csv – matches data1.csv, data2.csv
This provides a very simple way to match variable digits/chars.
Square Brackets for Sets and Ranges
Square bracket globs match a defined set or range of characters:
- [abc] – matches just a, b or c characters
- [0-9] – matches digits 0 through 9
- [^a] – matches any 1 character except a
Sets provide flexibility to match custom defined sets of characters.
Curly Braces for More Complex Sets and Ranges
Curly brace globs extend the glob matching capabilities to support counters and additional values. Examples:
- {one,two} – matches one or two words
- {1..5} – matches 1, 2, 3, 4 or 5 numeric value
- {a,b/{x,y}} – nesting sets and choices
This glob syntax provides very advanced ways to specify custom matching rules.
Escaping Special Characters
\ is used to escape special glob characters and match them literally. For example:
- \* – matches a literal * character
- \\ – matches a \ character
This is useful if you need to match a special glob wildcard character.
Globs in Practice
Now that we’ve covered glob syntax and matching rules, let’s see how they can be applied in real world Linux usage…
Listing Files in Directories
Globs are very useful for matching sets of files to list, view, open or process in some way. For example to list specific groups of files in ls and find commands:
- ls *.txt – lists all text files
- find . -name “[a-z]*” -print – find files starting with lowercase letters
Moving and Deleting Groups of Files
Common Linux file commands like mv and rm accept globs to select sets of files:
- mv *.tmp /archive – move all .tmp files to archive dir
- rm log.{1..5} – remove log.1 through log.5 files
Globs in Scripts and Commands
Globs can be used in shells scripts and custom commands to identify sets of files to process:
- for f in *.txt; do echo $f; done – iterate over .txt files
- cat data* > combined.csv – combine data files into one
Additional Glob Features
There are some more advanced glob capabilities that enable additional flexibility…
Extended Globs
Extended glob syntax adds features like recursive matches across subdirectories and additional set logic:
- **(/) – matches recursively down directory tree
- @(png|jpg|gif) – set match logic for multiple values
Globstar (**)
The globstar ** wildcard matches recursively down directory subtrees. For example:
- **/logs/**/*.log – match all log files recursively
Dotglob
The dotglob shell option enables globs to also match dotfiles/directories that start with a period (.). Example:
- shopt -s dotglob
- ls *
This allows globs to also see dotfiles/dirs in matches.
Conclusion and Recommendations
In summary, shell glob patterns provide a simple yet powerful way to match filenames and paths on Linux/Unix systems. They excel at selecting groups of files or directories based on naming patterns commonly seen on these OSes. Globs are easier to implement than regular expressions in many file processing situations.
Some key recommendations on effective usage of globs:
- Learn basic * and ? wildcards first
- Use globs for simple file groups in scripts/commands
- Combine multiple globs for advanced matches
- Utilize braces and brackets for custom sets
- Enable extended globs and globstar when needed
For more advanced use cases than globs can provide, regular expressions may be better suited to match files and content based on complex or precise patterns. But globs provide excellent simple pattern matching rooted in the Linux/Unix philosophy.