Passing Variables Between Subshells In Bash Pipelines

Bash pipelines allow commands to be chained together by connecting the stdout of one process to the stdin of another. This allows efficient data processing flows to be constructed. However, variables set in one part of a pipeline are not always available further down the pipeline.

This loss of variable scope is caused by subshells – the separate Bash instances that are spawned to run each command in the pipeline. As each command runs in its own subshell, variables set in one part of the pipeline do not persist to other sections.

There are methods of preserving variable values across subshells in Bash pipelines however. By understanding what causes variables to be lost, techniques like export can be used to pass data between commands. This enables construction of reusable pipelines where metadata is preserved alongside the main data flow.

Why Variables Get Lost in Pipelines

To understand why variables get lost when piping between Bash commands, we must consider how pipelines and subshells work at a lower level.

In a pipeline like:

cmd1 | cmd2 | cmd3

The stdout of cmd1 becomes the stdin of cmd2, and the stdout of cmd2 feeds into cmd3. This allows data to flow efficiently from one process to the next.

What is happening underneath is:

  1. A subshell is created for cmd1, which runs and outputs data to stdout
  2. A pipe is opened and stdout is redirected from cmd1 into it
  3. A subshell for cmd2 is created, with its stdin connected to the write end of the pipe
  4. cmd2 runs and outputs data to stdout, which is redirected into another pipe
  5. A subshell for cmd3 is created, with its stdin connected to the second pipe
  6. cmd3 runs and consumes the data from cmd2

A key thing to notice is that cmd1, cmd2 and cmd3 run in separate subshells. These subshells are child processes forked from the parent Bash process to run each command.

Being separate processes, each subshell has its own environment and set of variables. So while code within a single Bash script shares state, the subshells spawned for each part of a pipeline are isolated from each other.

This means variables set by cmd1 are not available within the subshell running cmd2, as they exist in separate environments. Data is passed via the pipelines, but metadata stored in variables does not get preserved.

How Subshells Cause Variable Scope Issues

To illustrate how subshells can cause problems with variable scope in pipelines, consider this example:

NAME="John"
echo "Hello $NAME" | read NAME; echo $NAME

We might expect this to output:

Hello John
John

But instead it prints:

Hello John

What happens is:

  1. The first command runs in a subshell, inheriting $NAME from the parent environment
  2. It prints “Hello John” and pipes this into the next command
  3. The second command runs in another subshell
  4. This subshell does not inherit the parent $NAME variable
  5. $NAME is unset within this subshell, so the second echo prints a blank line

This demonstrates how while pipelines pass data streams between commands, metadata in variables gets lost due to the isolated environments of subshells.

Passing Variables Between Subshells

While subshells normally have isolated environments, there are methods to pass variables from one pipeline section to the next.

The key mechanism used for sharing variables between subshells is export. Exporting a variable sets it in the environment passed to child processes spawned by the current shell.

For example, we can fix the previous pipeline by exporting NAME:

export NAME="John"
echo "Hello $NAME" | read NAME; echo $NAME 

Now NAME is available to both subshells, allowing the expected output:

Hello John  
John

By exporting variables from higher subshells in a pipeline, this provides a way to pass metadata down a processing chain.

Exporting Variables to Child Processes

The export command marks a variable to be exported to child processes from the current shell environment. This allows descendant Bash instances to inherit variables originally set at higher levels.

Exporting variables can be done in two main ways:

  1. Individually when setting a variable using export NAME=”John”
  2. In bulk by exporting an existing variable: export NAME

The same variable persistence rules apply recursively down chains of subshells. An exported variable in a parent script or pipeline stage is inherited by all children.

This means we can set metadata variables in an ancestral shell instance or at the start of a pipeline, export them, and descendant subshells will retain this information.

For example, a script could process data files and at each stage export information like:

  • $FILENAME – The name of the current file
  • $FILECOUNT – Total files processed
  • $TIMESTAMP – The time this pipeline started

And this metadata is propagated down to all subshells, allowing tracking of state as files are handled in the pipeline.

Using Command Substitution to Read Exported Variables

In some cases subshells need to read the values of exported variables in addition to inheriting them. This can be done using command substitution.

Command substitution allows the output of a command to be substituted inline using $(cmd). For example:

 
NAME="John"
echo "Hello $(echo $NAME)"

Here, echo $NAME is run in a subshell and returns “John”. This is substituted into the outer echo command, printing “Hello John”.

This approach can be used to read exported variables from a parent shell. Though subshells inherit exported variables automatically, command substitution gives a way to reference them explicitly.

For example, in a pipeline section that needs to know the original $FILENAME:

ORIGINAL_NAME=$(echo $FILENAME)

Because $FILENAME was exported from the originating context, the subshell in the command substitution can still access it.

Workarounds for Old Versions of Bash

The export technique for passing variables between subshells depends on a relatively recent Bash feature – propagation of exported variables to non-child subshell processes.

In Bash versions 3.1 and below, exported variables were only passed to direct child processes, not to non-descendant subshells such as those running stages of a pipeline.

In such older Bash versions, workarounds can be used to simulate exporting into pipelines:

  • Sourcing a script file that exports variables from higher levels
  • Using named pipes to pass environment variables
  • Reimplementing pipelines in a single script instead of using shell syntax

Thankfully as most modern systems run Bash 4+ nowadays, inheriting exported variables works reliably for pipeline subshells. But for older platforms, test pipelines carefully to catch cases where metadata variables get lost between stages.

Examples of Passing Variables Through Pipelines

A common situation where exporting variables is useful in Bash pipelines is processing multiple data files in a loop. This example demonstrates passing metadata on the current file through a pipeline:

# Loop over csv files 
for FILE in *.csv; do

  # Export metadata 
  export FILENAME=$FILE 
  export ROW_COUNT=$(wc -l < $FILE)

  # Process file
  cat $FILE | grep -v header_row | sort | process_data

  # Report on this file
  echo "Processed $ROW_COUNT rows from $FILENAME"

done

By exporting variables like the current filename and row count, we can preserve this state across the pipeline executed for each file iteration. This lets the final reporting print correct file-specific metadata even though running in a separate subshell.

In a similar way useful temporary data like row counts, timestamps and computed values can be propagated through multi-stage pipelines by judicious use of export.

Passing Multiple Variables Through Pipelines

So far we have looked at exporting individual named variables through pipeline subshells. It is also possible to export the entire environment containing multiple variables.

This can be done using env in a command substitution syntax:

export VARS="$(env)"

This runs env in a subshell, prints all current variable assignments, and exports them for inheritance into child processes.

A more targeted approach is to export only selected variables by naming them:

  
export VARS="$(env | grep -E 'NAME|COUNT')"

This will export just $NAME and $COUNT from the current environment to children.

A code pattern like this at the start of a pipeline allows batches of metadata variables to be preserved through later stages, without needing to export each one individually.

Other Methods for Preserving Variable Values

In addition to export, there are a few other methods that can be used to pass variable data between Bash subshells:

  • Read-eval - Variables can be echoed then reconstituted using read and eval, but this is complex and risky
  • Temporary files - Variables can be written to files then read by later subshells
  • Named pipes - mkfifo can create pipes with a filesystem presence that persist variables
  • Storing data in environmental variables - Subshells inherit the overall environment which can be used to communicate values

While these work, they bring complexity and side effects. Exporting variables is generally the cleanest approach, causing child processes to inherit variables without side effects.

Exported variables act as a transient communication channel between pipeline sections, allowing sharing of metadata without contamination of unrelated environmental variables during passage through subshells.

When to Avoid Passing Variables Between Subshells

While exporting variables from parent environments into pipelines is useful for preserving state, there are also cases where this should be avoided.

Overuse of exported variables can lead to unexpected coupling between pipeline stages, where changing an upstream variable breaks downstream tasks.

Inter-stage dependencies should be clearly defined and minimized. Code is easiest to maintain when each pipeline command focuses on a single task and accepts simple data formats.

Exported variables are also problematic when handling untrusted data, as this could allow unfiltered content to propagate into subsystems. Always validate and sanitize external inputs before exporting them downstream.

In Bash scripting generally, try to use the simplest scope possible for variables. Prefer declaring them within functions instead of the global environment unless explicitly needed for sharing state.

Used judiciously, exported variables enable conveying state through pipelines between subshell stages. But aim to minimize side effects, validate data flows, and keep stages as decoupled as possible.

Conclusion

Bash pipelines often involve data passing through a sequence of subshells. While metadata stored in variables does not automatically persist across these environment boundaries, explicit exporting of variables provides a way to propagate this state.

Understanding the forked process model behind pipelines explains why variables get lost between stages. But using export judiciously, key data like filenames, counts and options can be threaded through a chain of commands.

Preserving variables across subshells helps construct reusable pipelines where context and configuration is retained alongside main data flows. This technique facilitates scripts that leverage Bash's strengths of rapid glueing together of UNIX processes, while avoiding the typical downside of losing metadata during this procedure.

Leave a Reply

Your email address will not be published. Required fields are marked *