Home » Why do we need to fork to create new processes?

Why do we need to fork to create new processes?

Solutons:


The short answer is, fork is in Unix because it was easy to fit into the existing system at the time, and because a predecessor system at Berkeley had used the concept of forks.

From The Evolution of the Unix Time-sharing System (relevant text has been highlighted):

Process control in its modern form was designed and implemented within a couple of days. It is astonishing how easily it fitted into the existing system; at the same time it is easy to see how some of the slightly unusual features of the design are present precisely because they represented small, easily-coded changes to what existed. A good example is the separation of the fork and exec functions. The most common model for the creation of new processes involves specifying a program for the process to execute; in Unix, a forked process continues to run the same program as its parent until it performs an explicit exec. The separation of the functions is certainly not unique to Unix, and in fact it was present in the Berkeley time-sharing system, which was well-known to Thompson. Still, it seems reasonable to suppose that it exists in Unix mainly because of the ease with which fork could be implemented without changing much else. The system already handled multiple (i.e. two) processes; there was a process table, and the processes were swapped between main memory and the disk. The initial implementation of fork required only

1)
Expansion of the process table

2)
Addition of a fork call that copied the current process to the disk swap area, using the already existing swap IO primitives, and made some adjustments to the process table.

In fact, the PDP-7’s fork call required precisely 27 lines of assembly code. Of course, other changes in the operating system and user programs were required, and some of them were rather interesting and unexpected. But a combined fork-exec would have been considerably more complicated, if only because exec as such did not exist; its function was already performed, using explicit IO, by the shell.

Since that paper, Unix has evolved. fork followed by exec is no longer the only way to run a program.

  • vfork was created to be a more efficient fork for the case where the new process intends to do an exec right after the fork. After doing a vfork, the parent and child processes share the same data space, and the parent process is suspended until the child process either execs a program or exits.

  • posix_spawn creates a new process and executes a file in a single system call. It takes a bunch of parameters that let you selectively share the caller’s open files and copy its signal disposition and other attributes to the new process.

[I’ll repeat part of my answer from here.]

Why not just have a command that creates a new process from scratch? Isn’t it absurd and inefficient to copy one that is only going to be replaced right away?

In fact, that would probably not be as efficient for a few reasons:

  1. The “copy” produced by fork() is a bit of an abstraction, since the kernel uses a copy-on-write system; all that really has to be created is a virtual memory map. If the copy then immediately calls exec(), most of the data that would have been copied if it had been modified by the process’s activity never actually has to be copied/created because the process doesn’t do anything requiring its use.

  2. Various significant aspects of the child process (e.g., its environment) do not have to be individually duplicated or set based on a complex analysis of the context, etc. They’re just assumed to be the same as that of the calling process, and this is the fairly intuitive system we are familiar with.

To explain #1 a little further, memory which is “copied” but never subsequently accessed is never really copied, at least in most cases. An exception in this context might be if you forked a process, then had the parent process exit before the child replaced itself with exec(). I say might because much of the parent could be cached if there is sufficient free memory, and I am not sure to what extent this would be exploited (which would depend on the OS implementation).

Of course, that doesn’t on the surface make using a copy more efficient than using a blank slate — except “the blank slate” is not literally nothing, and must involve allocation. The system could have a generic blank/new process template that it copies the same way,1 but that would then not really save anything vs. the copy-on-write fork. So #1 just demonstrates that using a “new” empty process would not be more efficient.

Point #2 does explain why using the fork is likely more efficient. A child’s environment is inherited from its parent, even if it is a completely different executable. For example, if the parent process is a shell, and the child a web browser, $HOME is still the same for both of them, but since either could subsequently change it, these must be two separate copies. The one in the child is produced by the original fork().

1. A strategy that may not make much literal sense, but my point is that creating a process involves more than copying it’s image into memory from disk.

I think the reason Unix had only the fork function to create new processes is a result of the Unix philosophy

They build one function that does one thing well. It creates a child process.

What one does with the new process is then up to the programmer.
He can use one of the exec* functions and start a different program,
or he could not use exec and use the two instances of the same program, which can be useful.

So you get a bigger degree of freedom since you can use

  1. fork without exec*
  2. fork with exec* or
  3. just exec* without fork

and in addition you only have to memorize the fork and the exec* function calls, which in the 1970s you had to do.

Related Solutions

Joining bash arguments into single string with spaces

[*] I believe that this does what you want. It will put all the arguments in one string, separated by spaces, with single quotes around all: str="'$*'" $* produces all the scripts arguments separated by the first character of $IFS which, by default, is a space....

AddTransient, AddScoped and AddSingleton Services Differences

TL;DR Transient objects are always different; a new instance is provided to every controller and every service. Scoped objects are the same within a request, but different across different requests. Singleton objects are the same for every object and every...

How to download package not install it with apt-get command?

Use --download-only: sudo apt-get install --download-only pppoe This will download pppoe and any dependencies you need, and place them in /var/cache/apt/archives. That way a subsequent apt-get install pppoe will be able to complete without any extra downloads....

What defines the maximum size for a command single argument?

Answers Definitely not a bug. The parameter which defines the maximum size for one argument is MAX_ARG_STRLEN. There is no documentation for this parameter other than the comments in binfmts.h: /* * These are the maximum length and maximum number of strings...

Bulk rename, change prefix

I'd say the simplest it to just use the rename command which is common on many Linux distributions. There are two common versions of this command so check its man page to find which one you have: ## rename from Perl (common in Debian systems -- Ubuntu, Mint,...

Output from ls has newlines but displays on a single line. Why?

When you pipe the output, ls acts differently. This fact is hidden away in the info documentation: If standard output is a terminal, the output is in columns (sorted vertically) and control characters are output as question marks; otherwise, the output is...

mv: Move file only if destination does not exist

mv -vn file1 file2. This command will do what you want. You can skip -v if you want. -v makes it verbose - mv will tell you that it moved file if it moves it(useful, since there is possibility that file will not be moved) -n moves only if file2 does not exist....

Is it possible to store and query JSON in SQLite?

SQLite 3.9 introduced a new extension (JSON1) that allows you to easily work with JSON data . Also, it introduced support for indexes on expressions, which (in my understanding) should allow you to define indexes on your JSON data as well. PostgreSQL has some...

Combining tail && journalctl

You could use: journalctl -u service-name -f -f, --follow Show only the most recent journal entries, and continuously print new entries as they are appended to the journal. Here I've added "service-name" to distinguish this answer from others; you substitute...

how can shellshock be exploited over SSH?

One example where this can be exploited is on servers with an authorized_keys forced command. When adding an entry to ~/.ssh/authorized_keys, you can prefix the line with command="foo" to force foo to be run any time that ssh public key is used. With this...

Why doesn’t the tilde (~) expand inside double quotes?

The reason, because inside double quotes, tilde ~ has no special meaning, it's treated as literal. POSIX defines Double-Quotes as: Enclosing characters in double-quotes ( "" ) shall preserve the literal value of all characters within the double-quotes, with the...

What is GNU Info for?

GNU Info was designed to offer documentation that was comprehensive, hyperlinked, and possible to output to multiple formats. Man pages were available, and they were great at providing printed output. However, they were designed such that each man page had a...

Set systemd service to execute after fstab mount

a CIFS network location is mounted via /etc/fstab to /mnt/ on boot-up. No, it is not. Get this right, and the rest falls into place naturally. The mount is handled by a (generated) systemd mount unit that will be named something like mnt-wibble.mount. You can...

Merge two video clips into one, placing them next to each other

To be honest, using the accepted answer resulted in a lot of dropped frames for me. However, using the hstack filter_complex produced perfectly fluid output: ffmpeg -i left.mp4 -i right.mp4 -filter_complex hstack output.mp4 ffmpeg -i input1.mp4 -i input2.mp4...

How portable are /dev/stdin, /dev/stdout and /dev/stderr?

It's been available on Linux back into its prehistory. It is not POSIX, although many actual shells (including AT&T ksh and bash) will simulate it if it's not present in the OS; note that this simulation only works at the shell level (i.e. redirection or...

How can I increase the number of inodes in an ext4 filesystem?

It seems that you have a lot more files than normal expectation. I don't know whether there is a solution to change the inode table size dynamically. I'm afraid that you need to back-up your data, and create new filesystem, and restore your data. To create new...

Why doesn’t cp have a progress bar like wget?

The tradition in unix tools is to display messages only if something goes wrong. I think this is both for design and practical reasons. The design is intended to make it obvious when something goes wrong: you get an error message, and it's not drowned in...