Home » How do keyboard input and text output work?

How do keyboard input and text output work?

Solutons:


There are several different scenarios; I’ll describe the most common ones. The successive macroscopic events are:

  1. Input: the key press event is transmitted from the keyboard hardware to the application.
  2. Processing: the application decides that because the key A was pressed, it must display the character a.
  3. Output: the application gives the order to display a on the screen.

GUI applications

The de facto standard graphical user interface of unix systems is the X Window System, often called X11 because it stabilized in the 11th version of its core protocol between applications and the display server. A program called the X server sits between the operating system kernel and the applications; it provides services including displaying windows on the screen and transmitting key presses to the window that has the focus.

Input

+----------+              +-------------+         +-----+
| keyboard |------------->| motherboard |-------->| CPU |
+----------+              +-------------+         +-----+
             USB, PS/2, …                 PCI, …
             key down/up

First, information about the key press and key release is transmitted from the keyboard to the computer and inside the computer. The details depend on the type of hardware. I won’t dwell more on this part because the information remains the same throughout this part of the chain: a certain key was pressed or released.

         +--------+        +----------+          +-------------+
-------->| kernel |------->| X server |--------->| application |
         +--------+        +----------+          +-------------+
interrupt          scancode             keysym
                   =keycode            +modifiers

When a hardware event happens, the CPU triggers an interrupt, which causes some code in the kernel to execute. This code detects that the hardware event is a key press or key release coming from a keyboard and records the scan code which identifies the key.

The X server reads input events through a device file, for example /dev/input/eventNNN on Linux (where NNN is a number). Whenever there is an event, the kernel signals that there is data to read from that device. The device file transmits key up/down events with a scan code, which may or may not be identical to the value transmitted by the hardware (the kernel may translate the scan code from a keyboard-dependent value to a common value, and Linux doesn’t retransmit the scan codes that it doesn’t know).

X calls the scan code that it reads a keycode. The X server maintains a table that translates key codes into keysyms (short for “key symbol”). Keycodes are numeric, whereas keysyms are names such as A, aacute, F1, KP_Add, Control_L, … The keysym may differ depending on which modifier keys are pressed (Shift, Ctrl, …).

There are two mechanisms to configure the mapping from keycodes to keysyms:

  • xmodmap is the traditional mechanism. It is a simple table mapping keycodes to a list of keysyms (unmodified, shifted, …).
  • XKB is a more powerful, but more complex mechanism with better support for more modifiers, in particular for dual-language configuration, among others.

Applications connect to the X server and receive a notification when a key is pressed while a window of that application has the focus. The notification indicates that a certain keysym was pressed or released as well as what modifiers are currently pressed. You can see keysyms by running the program xev from a terminal. What the application does with the information is up to it; some applications have configurable key bindings.

In a typical configuration, when you press the key labeled A with no modifiers, this sends the keysym a to the application; if the application is in a mode where you’re typing text, this inserts the character a.

Relationship of keyboard layout and xmodmap goes into more detail on keyboard input. How do mouse events work in linux? gives an overview of mouse input at the lower levels.

Output

+-------------+        +----------+          +-----+         +---------+
| application |------->| X server |---····-->| GPU |-------->| monitor |
+-------------+        +----------+          +-----+         +---------+
               text or              varies          VGA, DVI,
               image                                HDMI, …

There are two ways to display a character.

  • Server-side rendering: the application tells the X server “draw this string in this font at this position”. The font resides on the X server.
  • Client-side rendering: the application builds an image that represents the character in a font that it chooses, then tells the X server to display that image.

See What are the purposes of the different types of XWindows fonts? for a discussion of client-side and server-side text rendering under X11.

What happens between the X server and the Graphics Processing Unit (the processor on the video card) is very hardware-dependent. Simple systems have the X server draw in a memory region called a framebuffer, which the GPU picks up for display. Advanced systems such as found on any 21st century PC or smartphone allow the GPU to perform some operations directly for better performance. Ultimately, the GPU transmits the screen content pixel by pixel every fraction of a second to the monitor.

Text mode application, running in a terminal

If your text editor is a text mode application running in a terminal, then it is the terminal which is the application for the purpose of the section above. In this section, I explain the interface between the text mode application and the terminal. First I describe the case of a terminal emulator running under X11. What is the exact difference between a ‘terminal’, a ‘shell’, a ‘tty’ and a ‘console’? may be useful background here. After reading this, you may want to read the far more detailed What are the responsibilities of each Pseudo-Terminal (PTY) component (software, master side, slave side)?

Input

      +-------------------+               +-------------+
----->| terminal emulator |-------------->| application |
      +-------------------+               +-------------+
keysym                     character or
                           escape sequence

The terminal emulator receives events like “Left was pressed while Shift was down”. The interface between the terminal emulator and the text mode application is a pseudo-terminal (pty), a character device which transmits bytes. When the terminal emulator receives a key press event, it transforms this into one or more bytes which the application gets to read from the pty device.

Printable characters outside the ASCII range are transmitted as one or more byte depending on the character and encoding. For example, in the UTF-8 encoding of the Unicode character set, characters in the ASCII range are encoded as a single bytes, while characters outside that range are encoded as multiple bytes.

Key presses that correspond to a function key or a printable character with modifiers such as Ctrl or Alt are sent as an escape sequence. Escape sequences typically consist of the character escape (byte value 27 = 0x1B = 33, sometimes represented as ^[ or e) followed by one or more printable characters. A few keys or key combination have a control character corresponding to them in ASCII-based encodings (which is pretty much all of them in use today, including Unicode): Ctrl+letter yields a character value in the range 1–26, Esc is the escape character seen above and is also the same as Ctrl+[, Tab is the same as Ctrl+I, Return is the same as Ctrl+M, etc.

Different terminals send different escape sequences for a given key or key combination. Fortunately, the converse is not true: given a sequence, there is in practice at most one key combination that it encodes. The one exception is the character 127 = 0x7f = 177 which is often Backspace but sometimes Delete.

In a terminal, if you type Ctrl+V followed by a key combination, this inserts the first byte of the escape sequence from the key combination literally. Since escape sequences normally consist only of printable characters after the first one, this inserts the whole escape sequence literally. See key bindings table? for a discussion of zsh in this context.

The terminal may transmit the same escape sequence for some modifier combinations (e.g. many terminals transmit a space character for both Space and Shift+Space; xterm has a mode to distinguish modifier combinations but terminals based on the popular vte library don’t). A few keys are not transmitted at all, for example modifier keys or keys that trigger a binding of the terminal emulator (e.g. a copy or paste command).

It is up to the application to translate escape sequences into symbolic key names if it so desires.

Output

+-------------+               +-------------------+
| application |-------------->| terminal emulator |--->
+-------------+               +-------------------+
               character or
               escape sequence

Output is rather simpler than input. If the application outputs a character to the pty device file, the terminal emulator displays it at the current cursor position. (The terminal emulator maintains a cursor position, and scrolls if the cursor would fall under the bottom of the screen.) The application can also output escape sequences (mostly beginning with ^[ or ^]) to tell the terminal to perform actions such as moving the cursor, changing the text attributes (color, bold, …), or erasing part of the screen.

Escape sequences supported by the terminal emulator are described in the termcap or terminfo database. Most terminal emulator nowadays are fairly closely aligned with xterm. See Documentation on LESS_TERMCAP_* variables? for a longer discussion of terminal capability information databases, and How to stop cursor from blinking and Can I set my local machine’s terminal colors to use those of the machine I ssh into? for some usage examples.

Application running in a text console

If the application is running directly in a text console, i.e. a terminal provided by the kernel rather than by a terminal emulator application, the same principles apply. The interface between the terminal and the application is still a byte stream which transmits characters, with special keys and commands encoded as escape sequences.

Remote application, accessed over the network

Remote text application

If you run a program on a remote machine, e.g. over SSH, then the network communication protocol relays data at the pty level.

+-------------+           +------+           +-----+           +----------+
| application |<--------->| sshd |<--------->| ssh |<--------->| terminal |
+-------------+           +------+           +-----+           +----------+
               byte stream        byte stream       byte stream
               (char/seq)         over TCP/…        (char/seq)

This is mostly transparent, except that sometimes the remote terminal database may not know all the capabilities of the local terminal.

Remote X11 application

The communication protocol between applications an the server is itself a byte stream that can be sent over a network protocol such as SSH.

+-------------+            +------+        +-----+            +----------+
| application |<---------->| sshd |<------>| ssh |<---------->| X server |
+-------------+            +------+        +-----+            +----------+
               X11 protocol        X11 over       X11 protocol
                                   TCP/…

This is mostly transparent, except that some acceleration features such as movie decoding and 3D rendering that require direct communication between the application and the display are not available.

If you want to see this in a Unix system that is small enough to be understandable, dig into Xv6. It is more or less the mythical Unix 6th Edition that became the base of John Lion’s famous commentary, long circulated as samizdat. Its code was reworked to compile under ANSI C and taking modern developments, like multiprocessors, into account.

Related Solutions

Joining bash arguments into single string with spaces

[*] I believe that this does what you want. It will put all the arguments in one string, separated by spaces, with single quotes around all: str="'$*'" $* produces all the scripts arguments separated by the first character of $IFS which, by default, is a space....

AddTransient, AddScoped and AddSingleton Services Differences

TL;DR Transient objects are always different; a new instance is provided to every controller and every service. Scoped objects are the same within a request, but different across different requests. Singleton objects are the same for every object and every...

How to download package not install it with apt-get command?

Use --download-only: sudo apt-get install --download-only pppoe This will download pppoe and any dependencies you need, and place them in /var/cache/apt/archives. That way a subsequent apt-get install pppoe will be able to complete without any extra downloads....

What defines the maximum size for a command single argument?

Answers Definitely not a bug. The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h: /* * These are the maximum length and maximum number of strings...

Bulk rename, change prefix

I'd say the simplest it to just use the rename command which is common on many Linux distributions. There are two common versions of this command so check its man page to find which one you have: ## rename from Perl (common in Debian systems -- Ubuntu, Mint,...

Output from ls has newlines but displays on a single line. Why?

When you pipe the output, ls acts differently. This fact is hidden away in the info documentation: If standard output is a terminal, the output is in columns (sorted vertically) and control characters are output as question marks; otherwise, the output is...

mv: Move file only if destination does not exist

mv -vn file1 file2. This command will do what you want. You can skip -v if you want. -v makes it verbose - mv will tell you that it moved file if it moves it(useful, since there is possibility that file will not be moved) -n moves only if file2 does not exist....

Is it possible to store and query JSON in SQLite?

SQLite 3.9 introduced a new extension (JSON1) that allows you to easily work with JSON data . Also, it introduced support for indexes on expressions, which (in my understanding) should allow you to define indexes on your JSON data as well. PostgreSQL has some...

Combining tail && journalctl

You could use: journalctl -u service-name -f -f, --follow Show only the most recent journal entries, and continuously print new entries as they are appended to the journal. Here I've added "service-name" to distinguish this answer from others; you substitute...

how can shellshock be exploited over SSH?

One example where this can be exploited is on servers with an authorized_keys forced command. When adding an entry to ~/.ssh/authorized_keys, you can prefix the line with command="foo" to force foo to be run any time that ssh public key is used. With this...

Why doesn’t the tilde (~) expand inside double quotes?

The reason, because inside double quotes, tilde ~ has no special meaning, it's treated as literal. POSIX defines Double-Quotes as: Enclosing characters in double-quotes ( "" ) shall preserve the literal value of all characters within the double-quotes, with the...

What is GNU Info for?

GNU Info was designed to offer documentation that was comprehensive, hyperlinked, and possible to output to multiple formats. Man pages were available, and they were great at providing printed output. However, they were designed such that each man page had a...

Set systemd service to execute after fstab mount

a CIFS network location is mounted via /etc/fstab to /mnt/ on boot-up. No, it is not. Get this right, and the rest falls into place naturally. The mount is handled by a (generated) systemd mount unit that will be named something like mnt-wibble.mount. You can...

Merge two video clips into one, placing them next to each other

To be honest, using the accepted answer resulted in a lot of dropped frames for me. However, using the hstack filter_complex produced perfectly fluid output: ffmpeg -i left.mp4 -i right.mp4 -filter_complex hstack output.mp4 ffmpeg -i input1.mp4 -i input2.mp4...

How portable are /dev/stdin, /dev/stdout and /dev/stderr?

It's been available on Linux back into its prehistory. It is not POSIX, although many actual shells (including AT&T ksh and bash) will simulate it if it's not present in the OS; note that this simulation only works at the shell level (i.e. redirection or...

How can I increase the number of inodes in an ext4 filesystem?

It seems that you have a lot more files than normal expectation. I don't know whether there is a solution to change the inode table size dynamically. I'm afraid that you need to back-up your data, and create new filesystem, and restore your data. To create new...

Why doesn’t cp have a progress bar like wget?

The tradition in unix tools is to display messages only if something goes wrong. I think this is both for design and practical reasons. The design is intended to make it obvious when something goes wrong: you get an error message, and it's not drowned in...