Piped commands run concurrently. When you run
ps | grep …, it’s the luck of the draw (or a matter of details of the workings of the shell combined with scheduler fine-tuning deep in the bowels of the kernel) as to whether
grep starts first, and in any case they continue to execute concurrently.
This is very commonly used to allow the second program to process data as it comes out from the first program, before the first program has completed its operation. For example
grep pattern very-large-file | tr a-z A-Z
begins to display the matching lines in uppercase even before
grep has finished traversing the large file.
grep pattern very-large-file | head -n 1
displays the first matching line, and may stop processing well before
grep has finished reading its input file.
If you read somewhere that piped programs run in sequence, flee this document. Piped programs run concurrently and always have.
The order the commands are run actually doesn’t matter and isn’t guaranteed. Leaving aside the arcane details of
execve(), the shell first creates the pipe, the conduit for the data that will flow between the processes, and then creates the processes with the ends of the pipe connected to them. The first process that is run may block waiting for input from the second process, or block waiting for the second process to start reading data from the pipe. These waits can be arbitrarily long and don’t matter. Whichever order the processes are run, the data eventually gets transferred and everything works.
At the risk of beating a dead horse, the misconception seems to be that
A | B
is equivalent to
A > temporary_file B < temporary_file rm temporary_file
But, back when Unix was created and children rode dinosaurs to school,
disks were very small, and it was common for a rather benign command
to consume all the free space in a file system.
B was something like
the final output of the pipeline could be much smaller than that intermediate file.
Therefore, the pipe was developed, not as a shorthand for
the “run A first, and then run B with input from A’s output” model,
but as a way for
B to execute concurrently with
and eliminate the need for storing the intermediate file on disk.