Lost In Translation: Keyboard Layouts And Character Encoding In Terminals

Understanding Keyboard Input in Terminals

When using a terminal, the keyboard acts as the primary input device for the user to enter text and issue commands. However, the process of translating key presses into text that appears on the terminal is complex, involving several steps:

Keyboard layouts and scancodes

Physical keyboards are organized in different layouts depending on language and region. When a key is pressed, the keyboard generates a numeric scancode that identifies which key was pressed but does not represent an actual character. For example, pressing the Q key on an English US keyboard generates the same scancode regardless of the keyboard layout in use.

Character encodings

The scancode is mapped to a character based on the active character encoding. Character encodings provide the numeric representations for different characters and assign them to the available code points. Common encodings used in terminals include ASCII, ISO-8859-1, and Unicode encodings like UTF-8.

The translation process

The terminal takes the incoming scancodes, looks up the corresponding character in the active encoding using the current keyboard layout mapping, and then displays the character in the terminal window. This allows typing in different languages as long as there is support in the character encoding. Problems can occur if there is a mismatch between keyboard layout, character encoding, and font support in the terminal.

Common Issues with Foreign Keyboards

When using a keyboard with a layout that does not match the default expected by the system, some common issues include:

Incorrect or missing characters

Typing certain keys results in the wrong character appearing or nothing showing up at all. This is typically caused by incorrect mapping of scancodes due to the differences in physical key placement between layouts. For example, hitting the Y key on a German keyboard expects the character Z to appear by default.

Problems with special keys

Keys like Alt Gr on European keyboards or Kana keys on Japanese keyboards may not work properly or at all in the terminal. Their scancodes may not map correctly to valid functions or characters in the terminal’s expected keyboard layout.

Setting Your Locale and Encoding

To help address issues with foreign keyboards, the user locale and character encoding used in the terminal can be configured. This changes the expected keymap and ensures the terminal can process and display characters correctly.

locale command

The locale controls settings like language, formatting conventions, and character classification. It can be set system-wide or per-session with the locale command, like locale -a to list available locales and locale fr_FR.UTF-8 to set the French locale using UTF-8 encoding.

Encoding environment variables

Environment variables can specify character encoding options for the current session. Common variables include:

LANG – Default locale settings
LC_CTYPE – Character classification and encoding
LC_ALL – Override all locale settings

For example, using export LC_CTYPE=en_US.UTF-8 along with an English US keymap would configure the right mappings and encoding.

Overriding Defaults

In some cases, manually overriding the keyboard defaults is needed to properly support an alternate layout.

loadkeys and dumpkeys

The loadkeys and dumpkeys commands can help remap keyboard layouts. dumpkeys prints the keymap table for the current terminal. loadkeys loads a keymap file, modifying the mappings between scancodes and characters.

Defining custom keymaps

Custom keymap files can be created to precisely match the physical keyboard layout in use. Tools like ckbcomp help build keymaps by outputting entries when keys are pressed on an attached keyboard.

Multi-Language Support

Switching between multiple language inputs is also possible in the terminal.

Switching input methods

Input method frameworks like uim and ibus allow changing the keyboard layout and input mode, like switching between English and Japanese input.

Input method environments

uim and ibus work by intercepting key input, processing it using the active input method engine, then passing the text results to the application such as the terminal. This helps handle complex keyboard layouts and character conversions.

Fixing Copy and Paste

Some issues with foreign keyboards relate to copying and pasting text in the terminal due to character encoding mismatches.

Using a compatible terminal

A modern terminal emulator that fully supports Unicode like Konsole and Gnome Terminal helps minimize encoding issues during copy/paste.

Configuring your terminal emulator

Terminal settings may need to be tweaked as well – options like forcing character encoding during paste and when exchanging data with the clipboard.

Understanding What Went Wrong

If text input still does not work properly, some debugging steps can help identify where encoding breaks down between the keyboard and terminal.

Checking locale, encoding, and keymap settings

Use locale to validate language/encoding settings are as expected. dumpkeys verifies the current keymap matches keyboard layout. Ensure values are consistent.

Logging and debugging character input

Tools like xev or evtest log details on key presses and input events, helping isolate issues in input handling. Tracing a character’s path from scancode to display can uncover where mappings break down.