Home » Understanding “IFS= read -r line”

Understanding “IFS= read -r line”

Solutons:


In POSIX shells, read, without any option doesn’t read a line, it reads words from a (possibly backslash-continued) line, where words are $IFS delimited and backslash can be used to escape the delimiters (or continue lines).

The generic syntax is:

read word1 word2... remaining_words

read reads stdin one byte at a time¹ until it finds an unescaped newline character (or end-of-input), splits that according to complex rules and stores the result of that splitting into $word1, $word2$remaining_words.

For instance on an input like:

  <tab> foo bar baz   blah   blah
whatever whatever

and with the default value of $IFS, read a b c would assign:

  • $afoo
  • $bbar baz
  • $cblah blahwhatever whatever

Now if passed only one argument, that doesn’t become read line. It’s still read remaining_words. Backslash processing is still done, IFS whitespace characters² are still removed from the beginning and end.

The -r option removes the backslash processing. So that same command above with -r would instead assign

  • $afoo
  • $bbar
  • $cbaz blah blah

Now, for the splitting part, it’s important to realise that there are two classes of characters for $IFS: the IFS whitespace characters² (including space and tab (and newline, though here that doesn’t matter unless you use -d), which also happen to be in the default value of $IFS) and the others. The treatment for those two classes of characters is different.

With IFS=: (: being not an IFS whitespace character), an input like :foo::bar:: would be split into "", "foo", "", bar and "" (and an extra "" with some implementations though that doesn’t matter except for read -a). While if we replace that : with space, the splitting is done into only foo and bar. That is leading and trailing ones are ignored, and sequences of them are treated like one. There are additional rules when whitespace and non-whitespace characters are combined in $IFS. Some implementations can add/remove the special treatment by doubling the characters in IFS (IFS=:: or IFS=' ').

So here, if we don’t want the leading and trailing unescaped whitespace characters to be stripped, we need to remove those IFS white space characters from IFS.

Even with IFS-non-whitespace characters, if the input line contains one (and only one) of those characters and it’s the last character in the line (like IFS=: read -r word on a input like foo:) with POSIX shells (not zsh nor some pdksh versions), that input is considered as one foo word because in those shells, the characters $IFS are considered as terminators, so word will contain foo, not foo:.

So, the canonical way to read one line of input with the read builtin is:

IFS= read -r line

(note that for most read implementations, that only works for text lines as the NUL character is not supported except in zsh).

Using var=value cmd syntax makes sure IFS is only set differently for the duration of that cmd command.

History note

The read builtin was introduced by the Bourne shell and was already to read words, not lines. There are a few important differences with modern POSIX shells.

The Bourne shell’s read didn’t support a -r option (which was introduced by the Korn shell), so there’s no way to disable backslash processing other than pre-processing the input with something like sed 's/\/&&/g' there.

The Bourne shell didn’t have that notion of two classes of characters (which again was introduced by ksh). In the Bourne shell all characters undergo the same treatment as IFS whitespace characters do in ksh, that is IFS=: read a b c on an input like foo::bar would assign bar to $b, not the empty string.

In the Bourne shell, with:

var=value cmd

If cmd is a built-in (like read is), var remains set to value after cmd has finished. That’s particularly critical with $IFS because in the Bourne shell, $IFS is used to split everything, not only the expansions. Also, if you remove the space character from $IFS in the Bourne shell, "$@" no longer works.

In the Bourne shell, redirecting a compound command causes it to run in a subshell (in the earliest versions, even things like read var < file or exec 3< file; read var <&3 didn’t work), so it was rare in the Bourne shell to use read for anything but user input on the terminal (where that line continuation handling made sense)

Some Unices (like HP/UX, there’s also one in util-linux) still have a line command to read one line of input (that used to be a standard UNIX command up until the Single UNIX Specification version 2).

That’s basically the same as head -n 1 except that it reads one byte at a time to make sure it doesn’t read more than one line. On those systems, you can do:

line=`line`

Of course, that means spawning a new process, execute a command and read its output through a pipe, so a lot less efficient than ksh’s IFS= read -r line, but still a lot more intuitive.


¹ though on seekable input, some implementations can revert to reading by blocks and seek-back afterwards as an optimisation. ksh93 goes even further and remembers what was read and uses it for the next read invocation, though that’s currently broken

² IFS whitespace characters, per POSIX being the characters classified as [:space:] in the locale and that happen to be in $IFS though in ksh88 (on which the POSIX specification is based) and in most shells, that’s still limited to SPC, TAB and NL. The only POSIX compliant shell in that regard I found was yash. ksh93 and bash (since 5.0) also include other whitespace (such as CR, FF, VT…), but limited to the single-byte ones (beware on some systems like Solaris, that includes the non-breaking-space which is single byte in some locales)

The Theory

There are two concepts that are in play here :

  • IFS is the Input Field Separator, which means the string read will be split based on the characters in IFS. On a command line, IFS is normally any whitespace characters, that’s why the command line splits at spaces.
  • Doing something like VAR=value command means “modify the environment of command so that VAR will have the value value“. Basically, the command command will see VAR as having the value value, but any command executed after that will still see VAR as having its previous value. In other words, that variable will be modified only for that statement.

In this case

So when doing IFS= read -r line, what you are doing is setting IFS to an empty string (no character will be used to split, therefore no splitting will occur) so that read will read the entire line and see it as one word that will be assigned to the line variable. The changes to IFS only affect that statement, so that any following commands won’t be affected by the change.

As a side note

While the command is correct and will work as intended, setting IFS in this case is not might1 not be necessary. As written in the bash man page in the read builtin section :

One line is read from the standard input […] and the first word is assigned to the first name, the second word to the second name, and so on, with leftover words and their intervening separators assigned to the last name. If there are fewer words read from the input stream than names, the remaining names are assigned empty values. The characters in IFS are used to split the line into words. […]

Since you only have the line variable, every words will be assigned to it anyway, so if you don’t need any of the preceding and trailing whitespace characters1 you could just write read -r line and be done with it.

[1] Just as an example of how an unset or default $IFS value will cause read to regard leading/trailing IFS whitespace, you might try:

echo ' where are my spaces? ' | { 
    unset IFS
    read -r line
    printf %s\n "$line"
} | sed -n l

Run it and you will see that the preceding and trailing characters won’t survive if IFS is not unset. Furthermore, some strange things could happen if $IFS was to be modified somewhere earlier in the script.

You should read that statement in two parts, the first one clears the value of the IFS variable, i.e. is equivalent to the more readable IFS="", the second one is reading the line variable from stdin, read -r line.

What is specific in this syntax is the IFS affectation is transcient and only valid for the read command.

Unless I’m missing something, in that particular case clearing IFS has no effect though as whatever IFS is set to, the whole line will be read in the line variable. There would have been a change in behavior only in the case more than one variable had been passed as parameter to the read instruction.

Edit:

The -r is there to allow input ending with not to be processed specially, i.e. for the backslash to be included in the line variable and not as a continuation character to allow multi-line input.

$ read line; echo "[$line]"   
abc
> def
[abcdef]
$ read -r line; echo "[$line]"  
abc
[abc]

Clearing IFS has the side effect of preventing read to trim potential leading and trailing space or tab characters, eg :

$ echo "   a b c   " | { IFS= read -r line; echo "[$line]" ; }   
[   a b c   ]
$ echo "   a b c   " | { read -r line; echo "[$line]" ; }     
[a b c]

Thanks to rici for pointing that difference.

Related Solutions

Joining bash arguments into single string with spaces

[*] I believe that this does what you want. It will put all the arguments in one string, separated by spaces, with single quotes around all: str="'$*'" $* produces all the scripts arguments separated by the first character of $IFS which, by default, is a space....

AddTransient, AddScoped and AddSingleton Services Differences

TL;DR Transient objects are always different; a new instance is provided to every controller and every service. Scoped objects are the same within a request, but different across different requests. Singleton objects are the same for every object and every...

How to download package not install it with apt-get command?

Use --download-only: sudo apt-get install --download-only pppoe This will download pppoe and any dependencies you need, and place them in /var/cache/apt/archives. That way a subsequent apt-get install pppoe will be able to complete without any extra downloads....

What defines the maximum size for a command single argument?

Answers Definitely not a bug. The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h: /* * These are the maximum length and maximum number of strings...

Bulk rename, change prefix

I'd say the simplest it to just use the rename command which is common on many Linux distributions. There are two common versions of this command so check its man page to find which one you have: ## rename from Perl (common in Debian systems -- Ubuntu, Mint,...

Output from ls has newlines but displays on a single line. Why?

When you pipe the output, ls acts differently. This fact is hidden away in the info documentation: If standard output is a terminal, the output is in columns (sorted vertically) and control characters are output as question marks; otherwise, the output is...

mv: Move file only if destination does not exist

mv -vn file1 file2. This command will do what you want. You can skip -v if you want. -v makes it verbose - mv will tell you that it moved file if it moves it(useful, since there is possibility that file will not be moved) -n moves only if file2 does not exist....

Is it possible to store and query JSON in SQLite?

SQLite 3.9 introduced a new extension (JSON1) that allows you to easily work with JSON data . Also, it introduced support for indexes on expressions, which (in my understanding) should allow you to define indexes on your JSON data as well. PostgreSQL has some...

Combining tail && journalctl

You could use: journalctl -u service-name -f -f, --follow Show only the most recent journal entries, and continuously print new entries as they are appended to the journal. Here I've added "service-name" to distinguish this answer from others; you substitute...

how can shellshock be exploited over SSH?

One example where this can be exploited is on servers with an authorized_keys forced command. When adding an entry to ~/.ssh/authorized_keys, you can prefix the line with command="foo" to force foo to be run any time that ssh public key is used. With this...

Why doesn’t the tilde (~) expand inside double quotes?

The reason, because inside double quotes, tilde ~ has no special meaning, it's treated as literal. POSIX defines Double-Quotes as: Enclosing characters in double-quotes ( "" ) shall preserve the literal value of all characters within the double-quotes, with the...

What is GNU Info for?

GNU Info was designed to offer documentation that was comprehensive, hyperlinked, and possible to output to multiple formats. Man pages were available, and they were great at providing printed output. However, they were designed such that each man page had a...

Set systemd service to execute after fstab mount

a CIFS network location is mounted via /etc/fstab to /mnt/ on boot-up. No, it is not. Get this right, and the rest falls into place naturally. The mount is handled by a (generated) systemd mount unit that will be named something like mnt-wibble.mount. You can...

Merge two video clips into one, placing them next to each other

To be honest, using the accepted answer resulted in a lot of dropped frames for me. However, using the hstack filter_complex produced perfectly fluid output: ffmpeg -i left.mp4 -i right.mp4 -filter_complex hstack output.mp4 ffmpeg -i input1.mp4 -i input2.mp4...

How portable are /dev/stdin, /dev/stdout and /dev/stderr?

It's been available on Linux back into its prehistory. It is not POSIX, although many actual shells (including AT&T ksh and bash) will simulate it if it's not present in the OS; note that this simulation only works at the shell level (i.e. redirection or...

How can I increase the number of inodes in an ext4 filesystem?

It seems that you have a lot more files than normal expectation. I don't know whether there is a solution to change the inode table size dynamically. I'm afraid that you need to back-up your data, and create new filesystem, and restore your data. To create new...

Why doesn’t cp have a progress bar like wget?

The tradition in unix tools is to display messages only if something goes wrong. I think this is both for design and practical reasons. The design is intended to make it obvious when something goes wrong: you get an error message, and it's not drowned in...