Home » What defines the maximum size for a command single argument?

What defines the maximum size for a command single argument?

Solutons:


Answers

  1. Definitely not a bug.
  2. The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h:

    /*
     * These are the maximum length and maximum number of strings passed to the
     * execve() system call.  MAX_ARG_STRLEN is essentially random but serves to
     * prevent the kernel from being unduly impacted by misaddressed pointers.
     * MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
     */
    #define MAX_ARG_STRLEN (PAGE_SIZE * 32)
    #define MAX_ARG_STRINGS 0x7FFFFFFF
    

    As is shown, Linux also has a (very large) limit on the number of arguments to a command.

  3. A limit on the size of a single argument (which differs from the overall limit on arguments plus environment) does appear to be specific to Linux. This article gives a detailed comparison of ARG_MAX and equivalents on Unix like systems. MAX_ARG_STRLEN is discussed for Linux, but there is no mention of any equivalent on any other systems.

    The above article also states that MAX_ARG_STRLEN was introduced in Linux 2.6.23, along with a number of other changes relating to command argument maximums (discussed below). The log/diff for the commit can be found here.

  4. It is still not clear what accounts for the additional discrepancy between the result of getconf ARG_MAX and the actual maximum possible size of arguments plus environment. Stephane Chazelas’ related answer, suggests that part of the space is accounted for by pointers to each of the argument/environment strings. However, my own investigation suggests that these pointers are not created early in the execve system call when it may still return a E2BIG error to the calling process (although pointers to each argv string are certainly created later).

    Also, the strings are contiguous in memory as far as I can see, so no memory gaps due do alignment here. Although is very likely to be a factor within whatever does use up the extra memory. Understanding what uses the extra space requires a more detailed knowledge of how the kernel allocates memory (which is useful knowledge to have, so I will investigate and update later).

ARG_MAX Confusion

Since the Linux 2.6.23 (as result of this commit), there have been changes to the way that command argument maximums are handled which makes Linux differ from other Unix-like systems. In addition to adding MAX_ARG_STRLEN and MAX_ARG_STRINGS, the result of getconf ARG_MAX now depends on the stack size and may be different from ARG_MAX in limits.h.

Normally the result of getconf ARG_MAX will be 1/4 of the stack size. Consider the following in bash using ulimit to get the stack size:

$ echo $(( $(ulimit -s)*1024 / 4 ))  # ulimit output in KiB
2097152
$ getconf ARG_MAX
2097152

However, the above behaviour was changed slightly by this commit (added in Linux 2.6.25-rc4~121).
ARG_MAX in limits.h now serves as a hard lower bound on the result of getconf ARG_MAX. If the stack size is set such that 1/4 of the stack size is less than ARG_MAX in limits.h, then the limits.h value will be used:

$ grep ARG_MAX /usr/include/linux/limits.h 
#define ARG_MAX       131072    /* # bytes of args + environ for exec() */
$ ulimit -s 256
$ echo $(( $(ulimit -s)*1024 / 4 ))
65536
$ getconf ARG_MAX
131072

Note also that if the stack size set lower than the minimum possible ARG_MAX, then the size of the stack (RLIMIT_STACK) becomes the upper limit of argument/environment size before E2BIG is returned (although getconf ARG_MAX will still show the value in limits.h).

A final thing to note is that if the kernel is built without CONFIG_MMU (support for memory management hardware), then the checking of ARG_MAX is disabled, so the limit does not apply. Although MAX_ARG_STRLEN and MAX_ARG_STRINGS still apply.

Further Reading

  • Related answer by Stephane Chazelas – https://unix.stackexchange.com/a/110301/48083
  • In detailed page covering most of the above. Includes a table of ARG_MAX (and equivalent) values on other Unix-like systems – http://www.in-ulm.de/~mascheck/various/argmax/
  • Seemingly the introduction of MAX_ARG_STRLEN caused a bug in with Automake which was embedding shell scripts into Makefiles using sh -c – http://www.mail-archive.com/bug-make@gnu.org/msg05522.html

In eglibc-2.18/NEWS

* ARG_MAX is not anymore constant on Linux.  Use sysconf(_SC_ARG_MAX).
Implemented by Ulrich Drepper.

In eglibc-2.18/debian/patches/kfreebsd/local-sysdeps.diff

+      case _SC_ARG_MAX:
+   request[0] = CTL_KERN;
+   request[1] = KERN_ARGMAX;
+   if (__sysctl(request, 2, &value, &len, NULL, 0) == -1)
+       return ARG_MAX;
+   return (long)value;

In linux/include/uapi/linux/limits.h

#define ARG_MAX       131072    /* # bytes of args + environ for exec() */

And 131072 is your $(getconf ARG_MAX)/16-1, perhaps you should start at 0.

You are dealing with glibc, and Linux. It would be good to patch getconf also in order to get the “right” ARG_MAX value returned.

Edit:

To clearify a little (after a short but hot discussion)

The ARG_MAX constant which is defined in limits.h, gives the max length of one argument passed with exec.

The getconf ARG_MAX command returns the max value of cumulated arguments size and environment size passed to exec.

So @StephaneChazelas rightly corrects me in the comments below – the shell itself does not dictate in any way the maximum argument size permitted by your system, but rather it’s set by your kernel.

As several others have already said, it seems the kernel limits to 128kb the maximum argument size you can hand to a new process from any other when first execing it. You experience this problem specifically due to the many nested $(command substitution) subshells that must execute in place and hand the entirety of their output from one to the next.

And this one’s kind of a wild guess, but as the ~5kb discrepancy seems so close to the standard system page size, my suspicion is that it is dedicated to the page bash uses to handle the subshell your $(command substitution) requires to ultimately deliver its output and/or the function stack it employs in associating your array table with your data. I can only assume neither comes free.

I demonstrate below that, while it might be a little tricky, it is possible to pass very large shell variable values off to new processes at invocation, so long as you can manage to stream it.

In order to do so, I primarily used pipes. But I also evaluated the shell array in a here-document pointed at cat's stdin. Results below.

But one last note – if you’ve no particular need for portable code, it strikes me that mapfile might simplify your shell jobs a little.

time bash <<-CMD
    ( for arg in `seq 1 6533` ; do
        printf 'args+=(' ; printf b%.0b `seq 1 6533` ; echo ')'
    done ;
    for arg in `seq 1 6533` ; do
        printf %s\n printf '%s\n' ""${args[$arg]}"" ;
    done ) | . /dev/stdin >&2
CMD
bash <<<''  66.19s user 3.75s system 84% cpu 1:22.65 total

Possibly you could double this up and then do so again if you did it in streams – I’m not morbid enough to find out – but definitely it works if you stream it.

I did try changing the printf generator part in line two to:

printf  b%.0b

It also works:

bash <<<''  123.78s user 5.42s system 91% cpu 2:20.53 total

So maybe I’m a little morbid. I use zero padding here and add in the previous "$arg" value to the current "$arg" value. I get way beyond 6500…

time bash <<-CMD
    ( for arg in `seq 1 33` ; do
        echo $arg >&2
        printf 'args+=('"${args[$((a=arg-1))]}$(printf "%0${arg}0d" 
            `seq 1 6533` ; printf $((arg-1)))"')n'
    done ;
    for arg in `seq 1 33` ; do
        printf '/usr/bin/cat <<HEREn%snHEREn' "${args[$arg]}"
    done ) | . /dev/stdin >&2
CMD

bash <<<''  14.08s user 2.45s system 94% cpu 17.492 total

And if I change the cat line to look like this:

printf '/usr/bin/cat <<HERE | { printf '$arg'  ; wc -c ;}
    %snHEREn' "${args[$arg]}"

I can get byte counts from wc. Remember these are the sizes of each key in the args array. The array’s total size is the sum of all these values.

1 130662
2 195992
3 261322
4 326652
5 391982
6 457312
7 522642
8 587972
9 653302
10 718633
11 783963
12 849293
13 914623
14 979953
15 1045283
16 1110613
17 1175943
18 1241273
19 1306603
20 1371933
21 1437263
22 1502593
23 1567923
24 1633253
25 1698583
26 1763913
27 1829243
28 1894573
29 1959903
30 2025233
31 2090563
32 2155893
33 2221223

Related Solutions

Joining bash arguments into single string with spaces

[*] I believe that this does what you want. It will put all the arguments in one string, separated by spaces, with single quotes around all: str="'$*'" $* produces all the scripts arguments separated by the first character of $IFS which, by default, is a space....

AddTransient, AddScoped and AddSingleton Services Differences

TL;DR Transient objects are always different; a new instance is provided to every controller and every service. Scoped objects are the same within a request, but different across different requests. Singleton objects are the same for every object and every...

How to download package not install it with apt-get command?

Use --download-only: sudo apt-get install --download-only pppoe This will download pppoe and any dependencies you need, and place them in /var/cache/apt/archives. That way a subsequent apt-get install pppoe will be able to complete without any extra downloads....

Bulk rename, change prefix

I'd say the simplest it to just use the rename command which is common on many Linux distributions. There are two common versions of this command so check its man page to find which one you have: ## rename from Perl (common in Debian systems -- Ubuntu, Mint,...

Output from ls has newlines but displays on a single line. Why?

When you pipe the output, ls acts differently. This fact is hidden away in the info documentation: If standard output is a terminal, the output is in columns (sorted vertically) and control characters are output as question marks; otherwise, the output is...

mv: Move file only if destination does not exist

mv -vn file1 file2. This command will do what you want. You can skip -v if you want. -v makes it verbose - mv will tell you that it moved file if it moves it(useful, since there is possibility that file will not be moved) -n moves only if file2 does not exist....

Is it possible to store and query JSON in SQLite?

SQLite 3.9 introduced a new extension (JSON1) that allows you to easily work with JSON data . Also, it introduced support for indexes on expressions, which (in my understanding) should allow you to define indexes on your JSON data as well. PostgreSQL has some...

Combining tail && journalctl

You could use: journalctl -u service-name -f -f, --follow Show only the most recent journal entries, and continuously print new entries as they are appended to the journal. Here I've added "service-name" to distinguish this answer from others; you substitute...

how can shellshock be exploited over SSH?

One example where this can be exploited is on servers with an authorized_keys forced command. When adding an entry to ~/.ssh/authorized_keys, you can prefix the line with command="foo" to force foo to be run any time that ssh public key is used. With this...

Why doesn’t the tilde (~) expand inside double quotes?

The reason, because inside double quotes, tilde ~ has no special meaning, it's treated as literal. POSIX defines Double-Quotes as: Enclosing characters in double-quotes ( "" ) shall preserve the literal value of all characters within the double-quotes, with the...

What is GNU Info for?

GNU Info was designed to offer documentation that was comprehensive, hyperlinked, and possible to output to multiple formats. Man pages were available, and they were great at providing printed output. However, they were designed such that each man page had a...

Set systemd service to execute after fstab mount

a CIFS network location is mounted via /etc/fstab to /mnt/ on boot-up. No, it is not. Get this right, and the rest falls into place naturally. The mount is handled by a (generated) systemd mount unit that will be named something like mnt-wibble.mount. You can...

Merge two video clips into one, placing them next to each other

To be honest, using the accepted answer resulted in a lot of dropped frames for me. However, using the hstack filter_complex produced perfectly fluid output: ffmpeg -i left.mp4 -i right.mp4 -filter_complex hstack output.mp4 ffmpeg -i input1.mp4 -i input2.mp4...

How portable are /dev/stdin, /dev/stdout and /dev/stderr?

It's been available on Linux back into its prehistory. It is not POSIX, although many actual shells (including AT&T ksh and bash) will simulate it if it's not present in the OS; note that this simulation only works at the shell level (i.e. redirection or...

How can I increase the number of inodes in an ext4 filesystem?

It seems that you have a lot more files than normal expectation. I don't know whether there is a solution to change the inode table size dynamically. I'm afraid that you need to back-up your data, and create new filesystem, and restore your data. To create new...

Why doesn’t cp have a progress bar like wget?

The tradition in unix tools is to display messages only if something goes wrong. I think this is both for design and practical reasons. The design is intended to make it obvious when something goes wrong: you get an error message, and it's not drowned in...

OpenSSH: How to end a match block

To end up a match block with openssh 6.5p1 or above, use the line: Match all Here is a piece of code, taken from my /etc/ssh/sshd_config file: # Change to no to disable tunnelled clear text passwords PasswordAuthentication no Match host 192.168.1.12...