Word Splitting Internals: How Bash And Posix Shells Handle Ifs

Understanding Word Splitting

Word splitting refers to the way bash and other POSIX-compatible shells parse input strings into words. It is the process by which shells take input and break it up into discrete words that can then be processed by commands. Understanding how word splitting works is crucial for writing robust shell scripts that handle input safely.

Word splitting occurs when shells read input, whether that input comes from the user’s typing at an interactive prompt, from a script that is being executed, or from a variable’s value. The shell takes this input and splits it or tokenizes it into words and operators by looking for separators and whitespace. The key separators shells use are spaces, tabs, and newlines, but the value of a special shell variable called IFS (Internal Field Separator) also plays a role.

When word splitting takes place on a string of input, it can lead to unexpected consequences. If the input contains spaces or other separators, it will be split into multiple words rather than being preserved as a single word. This can cause commands and scripts to misinterpret the intent of the input. Quoting input is necessary to inhibit word splitting in cases where a string should be treated as a single word.

Table of Contents

The Internal Implementation

Understanding the internal handling of word splitting gives shell users much more control and ability to write scripts that account for this central shell behavior.

The core mechanics of word splitting center around the shell’s parser which takes each line of input and divides into an array of words. This parser uses whitespace and delimiters as triggers to divide input. The key configuration that controls what counts as a delimiter is the IFS variable.

IFS stands for Internal Field Separator. This variable defines what counts as whitespace and delimiters for the purposes of splitting input into words. The default value consists of a space, tab, and newline character. But it can be configured to use other separators.

A critical distinction comes between unquoted and quoted strings. When a string is unquoted, the shellparser readily splits it on spaces or other IFS delimiters. But using quotes around a string inhibits this splitting. This allows strings with spacesand IFS separators to be preserved as a single word.

Examples and Demonstrations

Seeing some examples of word splitting at work helps illustrate this key shell behavior.

Here is a simple example showing word splitting in action:

input="one two three"
for x in $input; do
  echo $x
done

This loops over the split words in $input, causing it to print “one”, “two”, and “three” on separate lines. Without the quotes around $input, it underwent word splitting on spaces.

We can see the difference with quotes:

input="one two three"
for x in "$input"; do
  echo $x
done

Now it preserves the entire string as a single word, printing “one two three”.

In addition to spaces, the contents of IFS split words:

IFS=.
input=127.0.0.1
echo $input

This prints out “127” “0” “0” “1” on separate lines, split on the period from IFS.

These examples demonstrate how unquoted expansions and IFS cause word splitting behavior leading spaces, tabs, and other delimiters to break up strings.

Controlling Word Splitting Behavior

While word splitting is a default shell behavior, there are ways to inhibit or control it when needed.

The most common method is using quotes around variables and strings to prevent splits. Both double quotes and single quotes can be used for this. Double quotes allow some interpretation of contents like variable expansion, while single quotes inhibit almost all processing.

Another method is changing the IFS value itself, either globally or temporarily within the context of a script or subshell. Setting IFS to a single space rather than space-tab-newline causes word splitting to only occur on spaces.

Some additional options like -f can be used with set to disable special treatment of newline so newline no longer splits words. And read has a -d option to define an alternate delimiter for input splitting.

Understanding these controls allows scripts to handle word splitting carefully for assigned needs.

Why Word Splitting Matters

While word splitting is an internal shell implementation detail, its effects have far reaching consequences for scripting and commanding shells.

Double quoting variables is widely recognized as a best practice in shell scripting because it avoids errors and unintended splitting on spaces and other delimiters.

As a default behavior, word splitting can subtly alter input without making it obvious problems are occurring. Tracking down mysterious splits and missing words can take considerable effort.

Scripts attempting to parse structured data need to account for potential word splitting through validation, quoting, and careful handling during processing.

Understanding exactly when and where word splitting occurs based on unquoted expansions and IFS values leads to robust scripts. Expecting and properly handling this default shell behavior is key to success.