Adopting Structured Data Passing In Shells And Utilities

The Need for Structured Data Passing

Shell scripts have traditionally passed data between commands and utilities as simple strings and text. However, as integration and automation requirements grow more complex, the need arises for more structured data interchange between the components of shell-based pipelines and workflows.

Passing loosely formatted text data leads to fragility in scripts, as even small changes in the output of one utility may break the input expectations of the next process in the pipeline. Using structured data formats like JSON overcomes this tight coupling between pipeline stages.

With structured document formats like JSON, data can be exchanged in a self-describing way that maintains integrity across implementations. Adopting JSON in shell pipelines aids:

Table of Contents

Loose coupling between processing stages
Simplified parsing and data extraction
Unified handling of configuration data, messages, and logs
Interoperation with common APIs and event streams

Using JSON for Data Interchange

JSON (JavaScript Object Notation) has emerged as a ubiquitous structured document format for web APIs and data streaming. JSON’s simple syntax makes it an ideal choice for representing nested data and metadata in human-readable text.

A JSON document consists of key-value pairs and ordered lists of values. Keys and string values use double quotes, while numeric and boolean values do not. JSON uses braces to delimit objects and square brackets for arrays:

{
  "server": "localhost", 
  "ports": [22, 80, 443],
  "enabled": true 
}

This simplicity enables easy generation and parsing across every programming environment. The self-describing structure of JSON carries semantic meaning without needing to couple data formats between pipeline stages.

Reading and Parsing JSON in Shell Scripts

The jq utility has become a ubiquitous tool for parsing and manipulating JSON data in shell scripts. jq can extract values using a simple domain-specific language based on JSON structure:

cat config.json | jq .server

This command prints just the “server” value from the JSON. More complex expressions filter JSON arrays:

 
cat events.json | jq '.events[] | {type, timestamp}'

Here jq returns just the “type” and “timestamp” from a list of “events”. JWTs and AJV provide additional JSON handling capabilities.

Generating JSON from Shell Scripts

While jq focuses on consuming JSON, several tools help generate structured JSON from shell environments:

jo: jo creates JSON objects from command output using simple templates:

ps aux | jo user,pid,cpu

oj: oj converts YAML/CSV to JSON and beautifies JSON docs:

  
cat data.yml | oj -y

yq: yq is a handy YAML processor that works well with JSON too:

yq n musicAlbums[*].tracks data.yml

These tools form a comprehensive toolkit for consuming, emitting, and transforming JSON within shell script workflows.

Passing JSON to Applications and Utilities

JSON’s universality makes it consumable by virtually every application and programming environment. Many CLI utils and Unix platforms now accept JSON for complex configuration and messaging:

Docker: docker run apps accept JSON env vars and config
Kubernetes: kubectl consumes JSON resource definitions
AWS CLI: aws cli utilities use JSON for IAM policies and CloudFormation
JQ and JMESPath: Query JSON documents from CLI

JSON input brings structure without requiring apps to depend on custom config schema. Adopting JSON unlocks seamless interoperation between shell pipelines and common automation infrastructure.

JSON Use Cases and Examples

Some common use cases that benefit from added JSON adoption:

Configuration Data

Hardcoded config leads to fragile scripts with custom parsing logic tied to specific output. Passing config as JSON documents simplifies parsing:

cat config.json | jq -r .database.connectionString

Self-describing JSON delivers flexibility. Format changes avoid breaking downstream processes.

Application Messages

Logging and monitoring with JSON’s structured metadata enables centralized analysis and dashboards:

logger -p user.error -t app "Failed auth"

JSON error events here simplify debugging compared to opaque text messages.

Structured Logging

Text-based log data lacks rich metadata on runtime events. JSON-encoded log lines preserve detail and structure:

jq '[{level: "W", time: 123, msg: "Disk full"}]'

Uniform JSON logs integrate with Elastic, Splunk, and other platforms.

Considerations and Alternatives

JSON adoption has tradeoffs. Text and CSV avoid added processing complexity. YAML matches JSON capability with improved readability. MessagePack and Protocol Buffers enable greater serialization efficiency compared to raw JSON.

However, for universal application with diverse tooling, JSON delivers an optimized balance of simplicity and interoperability. JSON schema and JSON:API can also formalize structural contracts when needed.

Adopting Structured Data for Robust Pipelines

JSON’s flexibility provides structured data exchange to improve the robustness and stability of shell script automation. Passing self-describing JSON documents avoids tight format couplings between pipeline stages.

JQ, jo, yq, and related tools form a rich toolkit for adopting JSON-based data flows in shells and utilities. The universality of JSON unlocks seamless interoperability with diverse applications and environments.