Home » What does “LC_ALL=C” do?

What does “LC_ALL=C” do?

Solutons:


LC_ALL is the environment variable that overrides all the other localisation settings (except $LANGUAGE under some circumstances).

Different aspects of localisations (like the thousand separator or decimal point character, character set, sorting order, month, day names, language or application messages like error messages, currency symbol) can be set using a few environment variables.

You’ll typically set $LANG to your preference with a value that identifies your region (like fr_CH.UTF-8 if you’re in French speaking Switzerland, using UTF-8). The individual LC_xxx variables override a certain aspect. LC_ALL overrides them all. The locale command, when called without argument gives a summary of the current settings.

For instance, on a GNU system, I get:

$ locale
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

I can override an individual setting with for instance:

$ LC_TIME=fr_FR.UTF-8 date
jeudi 22 août 2013, 10:41:30 (UTC+0100)

Or:

$ LC_MONETARY=fr_FR.UTF-8 locale currency_symbol
€

Or override everything with LC_ALL.

$ LC_ALL=C LANG=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 cat /
cat: /: Is a directory

In a script, if you want to force a specific setting, as you don’t know what settings the user has forced (possibly LC_ALL as well), your best, safest and generally only option is to force LC_ALL.

The C locale is a special locale that is meant to be the simplest locale. You could also say that while the other locales are for humans, the C locale is for computers. In the C locale, characters are single bytes, the charset is ASCII (well, is not required to, but in practice will be in the systems most of us will ever get to use), the sorting order is based on the byte values¹, the language is usually US English (though for application messages (as opposed to things like month or day names or messages by system libraries), it’s at the discretion of the application author) and things like currency symbols are not defined.

On some systems, there’s a difference with the POSIX locale where for instance the sort order for non-ASCII characters is not defined.

You generally run a command with LC_ALL=C to avoid the user’s settings to interfere with your script. For instance, if you want [a-z] to match the 26 ASCII characters from a to z, you have to set LC_ALL=C.

On GNU systems, LC_ALL=C and LC_ALL=POSIX (or LC_MESSAGES=C|POSIX) override $LANGUAGE, while LC_ALL=anything-else wouldn’t.

A few cases where you typically need to set LC_ALL=C:

  • sort -u or sort ... | uniq.... In many locales other than C, on some systems (notably GNU ones), some characters have the same sorting order. sort -u doesn’t report unique lines, but one of each group of lines that have equal sorting order. So if you do want unique lines, you need a locale where characters are byte and all characters have different sorting order (which the C locale guarantees).

  • the same applies to the = operator of POSIX compliant expr or == operator of POSIX compliant awks (mawk and gawk are not POSIX in that regard), that don’t check whether two strings are identical but whether they sort the same.

  • Character ranges like in grep. If you mean to match a letter in the user’s language, use grep '[[:alpha:]]' and don’t modify LC_ALL. But if you want to match the a-zA-Z ASCII characters, you need either LC_ALL=C grep '[[:alpha:]]' or LC_ALL=C grep '[a-zA-Z]'². [a-z] matches the characters that sort after a and before z (though with many APIs it’s more complicated than that). In other locales, you generally don’t know what those are. For instance some locales ignore case for sorting so [a-z] in some APIs like bash patterns, could include [B-Z] or [A-Y]. In many UTF-8 locales (including en_US.UTF-8 on most systems), [a-z] will include the latin letters from a to y with diacritics but not those of z (since z sorts before them) which I can’t imagine would be what you want (why would you want to include é and not ź?).

  • floating point arithmetic in ksh93. ksh93 honours the decimal_point setting in LC_NUMERIC. If you write a script that contains a=$((1.2/7)), it will stop working when run by a user whose locale has comma as the decimal separator:

     $ ksh93 -c 'echo $((1.1/2))'
     0.55
     $ LANG=fr_FR.UTF-8  ksh93 -c 'echo $((1.1/2))'
     ksh93: 1.1/2: arithmetic syntax error
    

Then you need things like:

    #! /bin/ksh93 -
    float input="$1" # get it as input from the user in his locale
    float output
    arith() { typeset LC_ALL=C; (($@)); }
    arith output=input/1.2 # use the dot here as it will be interpreted
                           # under LC_ALL=C
    echo "$output" # output in the user's locale

As a side note: the , decimal separator conflicts with the , arithmetic operator which can cause even more confusion.

  • When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes³. When dealing with data that is meant to be bytes, with text utilities, you’ll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.

  • a corollary of the previous point: when processing text where you don’t know what character set the input is written in, but can assume it’s compatible with ASCII (as virtually all charsets are). For instance grep '<.*>' to look for lines containing a <, > pair will no work if you’re in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That’s because . only matches characters and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand, LC_ALL=C grep '<.*>' will work because any byte value forms a valid character in the C locale.

  • Any time where you process input data or output data that is not intended from/for a human. If you’re talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you’ll want to set LC_ALL=C:

     $ printf '%gn' 1e-2
     0,01
     $ LC_ALL=C printf '%gn' 1e-2
     0.01
     $ date +%b
     août
     $ LC_ALL=C date +%b
     Aug
    

That also applies to things like case insensitive comparison (like in grep -i) and case conversion (awk‘s toupper(), dd conv=ucase…). For instance:

    grep -i i

is not guaranteed to match on I in the user’s locale. In some Turkish locales for instance, it doesn’t as upper-case i is İ (note the dot) there and lower-case I is ı (note the missing dot).


Notes

¹ again, only on ASCII based systems (the immense majority of systems). POSIX requires the collation order for the C locale to be that of the order of characters in the ASCII charset, even on EBCDIC systems which are not allowed to do the strcoll() === strcmp() optimisation in the C locale.


² Depending on the encoding of the text, that’s not necessarily the right thing to do though. That’s valid for UTF-8 or single-byte character sets (like iso-8859-1), but not necessarily non-UTF-8 multibyte character sets.

For instance, if you’re in a zh_HK.big5hkscs locale (Hong Kong, using the Hong Kong variant of the BIG5 Chinese character encoding), and you want to look for English letters in a file encoded in that charsets, doing either:

LC_ALL=C grep '[[:alpha:]]'

or

LC_ALL=C grep '[a-zA-Z]'

would be wrong, because in that charset (and many others, but hardly used since UTF-8 came out), a lot of characters contain bytes that correspond to the ASCII encoding of A-Za-z characters. For instance, all of A䨝䰲丕乙乜你再劀劈呸哻唥唧噀噦嚳坽 (and many more) contain the encoding of A. is 0x96 0x41, and A is 0x41 like in ASCII. So our LC_ALL=C grep '[a-zA-Z]' would match on those lines that contain those characters as it would misinterpret those sequences of bytes.

LC_COLLATE=C grep '[A-Za-z]'

would work, but only if LC_ALL is not otherwise set (which would override LC_COLLATE). So you may end up having to do:

grep '[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]'

if you wanted to look for English letters in a file encoded in the locale’s encoding.


³ some would argue it’s rather 1 to 4 bytes these days now that Unicode code points (and the libraries that encode/decode UTF-8 data) have been arbitrarily restricted to code points U+0000 to U+10FFFF (0xD800 to 0xDFFF excluded) down from U+7FFFFFFF to accommodate the UTF-16 encoding, but some applications will still happily encode/decode 6-byte UTF-8 sequences (including the ones that fall in the 0xD800 .. 0xDFFF range).

It forces applications to use the default language for output:

$ LC_ALL=es_ES man
¿Qué página de manual desea?

$ LC_ALL=C man
What manual page do you want?

and forces sorting to be byte-wise:

$ LC_ALL=en_US sort <<< $'anbnAnB'
a
A
b
B

$ LC_ALL=C sort <<< $'anbnAnB'
A
B
a
b

C is the default locale,”POSIX” is the alias of “C”. I guess “C” is derived from ANSI-C. Maybe ANSI-C define the “POSIX” locale.

Related Solutions

Extract file from docker image?

You can extract files from an image with the following commands: docker create $image # returns container ID docker cp $container_id:$source_path $destination_path docker rm $container_id According to the docker create documentation, this doesn't run the...

Transfer files using scp: permission denied

Your commands are trying to put the new Document to the root (/) of your machine. What you want to do is to transfer them to your home directory (since you have no permissions to write to /). If path to your home is something like /home/erez try the following:...

What’s the purpose of DH Parameters?

What exactly is the purpose of these DH Parameters? These parameters define how OpenSSL performs the Diffie-Hellman (DH) key-exchange. As you stated correctly they include a field prime p and a generator g. The purpose of the availability to customize these...

How to rsync multiple source folders

You can pass multiple source arguments. rsync -a /etc/fstab /home/user/download bkp This creates bkp/fstab and bkp/download, like the separate commands you gave. It may be desirable to preserve the source structure instead. To do this, use / as the source and...

Benefits of Structured Logging vs basic logging

There are two fundamental advances with the structured approach that can't be emulated using text logs without (sometimes extreme levels of) additional effort. Event Types When you write two events with log4net like: log.Debug("Disk quota {0} exceeded by user...

Interfaces vs Types in TypeScript

2019 Update The current answers and the official documentation are outdated. And for those new to TypeScript, the terminology used isn't clear without examples. Below is a list of up-to-date differences. 1. Objects / Functions Both can be used to describe the...

Get total as you type with added column (append) using jQuery

One issue if that the newly-added column id's are missing the id number. If you look at the id, it only shows "price-", when it should probably be "price-2-1", since the original ones are "price-1", and the original ones should probably be something like...

Determining if a file is a hard link or symbolic link?

Jim's answer explains how to test for a symlink: by using test's -L test. But testing for a "hard link" is, well, strictly speaking not what you want. Hard links work because of how Unix handles files: each file is represented by a single inode. Then a single...

How to restrict a Google search to results of a specific language?

You can do that using the advanced search options: http://www.googleguide.com/sharpening_queries.html I also found this, which might work for you: http://www.searchenginejournal.com/how-to-see-google-search-results-for-other-locations/25203/ Just wanted to add...

Random map generation

Among the many other related questions on the site, there's an often linked article for map generation: Polygonal Map Generation for Games you can glean some good strategies from that article, but it can't really be used as is. While not a tutorial, there's an...

How to prettyprint a JSON file?

The json module already implements some basic pretty printing in the dump and dumps functions, with the indent parameter that specifies how many spaces to indent by: >>> import json >>> >>> your_json = '["foo", {"bar":["baz", null,...

How can I avoid the battery charging when connected via USB?

I have an Android 4.0.3 phone without root access so can't test any of this but let me point you to /sys/class/power_supply/battery/ which gives some info/control over charging issues. In particular there is charging_enabled which gives the current state (0 not...

How to transform given dataset in python? [closed]

From your expected result, it appears that each "group" is based on contiguous id values. For this, you can use the compare-cumsum-groupby pattern, and then use agg to get the min and max values. # Sample data. df = pd.DataFrame( {'id': [1, 2, 2, 2, 2, 2, 1, 1,...

Output of the following C++ Program [closed]

It works exactly like this non-recursive translation: int func_0() { return 2; } int func_1() { return 3; } int func_2() { return func_1() + func_0(); } // Returns 3 + 2 = 5 int func_3() { return func_2() + func_1(); } // Returns 5 + 3 = 8 int func_4() { return...

Making a circle out of . (periods) [closed]

Here's the maths and even an example program in C: http://pixwiki.bafsoft.com/mags/5/articles/circle/sincos.htm (link no longer exists). And position: absolute, left and top will let you draw: http://www.w3.org/TR/CSS2/visuren.html#choose-position Any further...

Should I use a code converter (Python to C++)?

Generally it's an awful way to write code, and does not guarantee that it will be any faster. Things which are simple and fast in one language can be complex and slow in another. You're better off either learning how to write fast Python code or learning C++...

tkinter: cannot concatenate ‘str’ and ‘float’ objects

This one line is more than enough to cause the problem: text="რეგულარი >> "+2.23+ 'GEL' 2.23 is a floating-point value; 'GEL' is a string. What does it mean to add an arithmetic value and a string of letters? If you want the string label 'რეგულარი...