Mastering UNIX pipes, Part 1
By Kamil Rytarowski
- 9 minutes read - 1774 wordsA pipe is a first-in-first-out interprocess communication channel. The pipe version as it is known today was invented by an American Computer Scientist Douglas McIlroy and incorporated into Version 3 AT&T UNIX in 1973 by Ken Thompson.
It was inspired by the observation that frequently the output of one application is
used as an input for another. This concept can be reused to connect a chain
of processes. This is frequently observed
in UNIX shell constructs that utilize the |
operator.
$ find lib -name *.c | awk -F '/' '{print $NF}' | sort -u | tail
yp_maplist.c
yp_master.c
yp_match.c
yp_order.c
yperr_string.c
yplib.c
ypprot_err.c
yyerror.c
zdump.c
zic.c
This can be illustrated as a sequence of processes and pipes connecting the programs.
This concept of connecting the UNIX tools has been expanded to various native tools,
such as the troff formatting system, that are specifically designed to be used in pipelines.
The troff format and the associated toolkit are still used in the NetBSD Operating System.
The build rules, producing the .ps
files (PostScript)
look like this one, for
the kernmalloc (the kernel allocator documentation) example:
# $NetBSD: Makefile,v 1.4 2003/07/10 10:34:26 lukem Exp $
#
# @(#)Makefile 1.8 (Berkeley) 6/8/93
DIR= papers/kernmalloc
SRCS= kernmalloc.t appendix.t
MACROS= -ms
paper.ps: ${SRCS} alloc.fig usage.tbl
${TOOL_SOELIM} ${SRCS} | ${TOOL_TBL} | ${TOOL_PIC} | \
${TOOL_EQN} | \
${TOOL_VGRIND} | ${TOOL_ROFF_PS} ${MACROS} > ${.TARGET}
.include <bsd.doc.mk>
Source src/share/doc/papers/kernmalloc/Makefile.
The C interface for pipes
The POSIX specification
declares the pipe
function with the following signature:
int pipe(int fildes[2]);
inside the <unistd.h>
header.
The pipe
function takes an array of two integers, and writes file descriptors
of the read and write end of the pipe into it upon successful return.
The fildes[0]
file descriptor is opened for reading and fildes[1]
for writing.
Some implementations of UNIX allow using the fildes[0]
end for writing too and fildes[1]
for reading
(the full duplex mode), but this behavior is unspecified by POSIX and it is only safe to assume that
they are unidirectional (half duplex mode).
The pipe
call can fail and return -1
, setting appropriate errno
if the process (EMFILE
) or the system (ENFILE
) expired the allowed number of
open file descriptors.
This interface as it looks is appropriate only for processes that have the shared ancestor (usually the direct parent)
and is usually combined with fork(2)
/vfork(2)
/posix_spawn(3)
or an equivalent interface
(otherwise the pipe would be a futile feature).
To workaround the limitation of having the shared predecessor,
the fifo special files or UNIX domain sockets can be used.
In the UNIX system, file descriptors are inherited by children by default (with some exceptions in modern APIs) and thus the created pipe, referenced by the array of two file descriptors, connects the child and the parent.
In order to make the pipe effective, the user has to decide the direction of the data flow and
close the other ends. If the intention is to send data from process A to process B, then we need
to close the fildes[0]
(reading) end in process A and fildes[1]
(writing) end in process B.
Now, the processes can transmit data over the pipe channel.
This algorithm is coded as follows:
/* CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication */
#include <sys/types.h>
#include <sys/wait.h>
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
char c;
int status;
pid_t child;
int fildes[2];
if (pipe(fildes) == -1)
err(EXIT_FAILURE, "pipe");
if ((child = fork()) == -1)
err(EXIT_FAILURE, "fork");
if (child == 0) {
/* child */
if (close(fildes[1]) == -1)
err(EXIT_FAILURE, "close");
read(fildes[0], &c, 1);
printf("Received: %c\n", c);
/* force the buffer to be printed on the output (screen) */
fflush(stdout);
_exit(0);
}
/* parent */
if (close(fildes[0]) == -1)
err(EXIT_FAILURE, "close");
if (write(fildes[1], "x", 1) == -1)
err(EXIT_FAILURE, "write");
/* wait for the child process termination */
if (wait(&status) == -1)
err(EXIT_FAILURE, "wait");
return EXIT_SUCCESS;
}
NB. For the sake of simplicity, certain code paths such as handling interrupts (EINTR
) were omitted.
The execution of this program results with:
$ ./a.out
Received: x
The UNIX designers put the following constraints on the pipes (assuming O_NONBLOCK
not set):
- Once the readable end of the pipe is closed, any attempt done to write results with
SIGPIPE
emitted into the writing process. A process can either be killed or catch or ignore the signal and then needs to handle the error (-1
and errno set toEPIPE
) manually. - Once the writable end of the pipe is closed, an attempt to read from the pipe returns
0
and notifesEOF
on the file descriptor.
Additionally:
- The amount of free space inside the pipe (kernel buffering) is limited and implementation specific.
- When the child process starts, the default stdio I/O buffering on pipes defaults to the fully buffered mode.
The three basic approaches to workaround this are:
- using
fflush(3)
explicitly, - changing the buffering mode (
setvbuf(3)
) or - using pseudo terminals if the child process is not modifiable.
- using
The kernel pipe buffer size
The size of the kernel buffer storing the pipe data is limited and will cause further attempts to write(2)
data to block
until the space is regained, by the read(2)
operation on the other end.
The minimum acceptable value in a POSIX system
is set to 512 bytes.
In order to check the maximum number of bytes that can be written atomically to a pipe, a programer can use the compiler constant PIPE_BUF
or
the dynamic value _PC_PIPE_BUF
passed to pathconf(2)
or fpathconf(2)
.
pathconf(2)
and fpathconf(2)
can be applied on:
- directories that can contain fifo files,
- fifo files.
Additionally, fpathconf(2)
can be applied on the pipe file descriptor.
/* CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication */
#include <err.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
int fildes[2];
if (pipe(fildes) == -1)
err(EXIT_FAILURE, "pipe");
printf("_PC_PIPE_BUF: %ld\n", fpathconf(fildes[1],_PC_PIPE_BUF));
printf("PIPE_BUF: %d\n", PIPE_BUF);
return EXIT_SUCCESS;
}
However, the real number is usually larger. It can be retrieved with ioctl(FIONSPACE)
on NetBSD.
This feature is unavailable on other systems: FreeBSD, OpenBSD and Linux, thus FreeBSD implements
FIONSPACE
for sockets, but not for pipes.
/* CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication */
#include <sys/types.h>
#include <sys/ioctl.h>
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
int fildes[2];
int n;
if (pipe(fildes) == -1)
err(EXIT_FAILURE, "pipe");
if (ioctl(fildes[1], FIONSPACE, &n) == -1)
err(EXIT_FAILURE, "ioctl");
printf("FIONSPACE fildes[1]: %d\n", n);
return EXIT_SUCCESS;
}
An alternative approach to check the maximum buffer size of the pipe feature is to count the
bytes writable into it manually, one by one, and to detect the hang.
This can be achieved for example with the alarm(3)
call, unblocking the hang.
/* CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication */
#include <err.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static int n;
static void
sighand(int s)
{
printf("bytes written into the pipe: %d\n", n);
exit(EXIT_SUCCESS);
}
int
main(int argc, char **argv)
{
int fildes[2];
if (signal(SIGALRM, sighand) == SIG_ERR)
err(EXIT_FAILURE, "signal");
if (pipe(fildes) == -1)
err(EXIT_FAILURE, "pipe");
alarm(5); /* arm the alarm to 5 seconds */
while (write(fildes[1], "x", 1) != -1)
++n;
/* if we ended up here, there was an error */
err(EXIT_FAILURE, "write");
}
Alternatively, one could set the pipe end in the non-blocking mode.
This can be achieved with
the fcntl(2)
call and the F_SETFL
+ O_NONBLOCK
arguments.
The O_NONBLOCK
mode on pipes causes the following change:
- Writing into a full pipe buffer returns with
-1
and errnoEAGAIN
, instead of blocking.
/* CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication */
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main(int argc, char **argv)
{
int fildes[2];
int n;
if (pipe(fildes) == -1)
err(EXIT_FAILURE, "pipe");
if (fcntl(fildes[1], F_SETFL, O_NONBLOCK) == -1)
err(EXIT_FAILURE, "fcntl");
while (write(fildes[1], "x", 1) != -1)
++n;
/* filter real errors from the unavailable for now resource */
if (errno != EAGAIN)
err(EXIT_FAILURE, "write");
printf("bytes written into the pipe: %d\n", n);
return EXIT_SUCCESS;
}
There are a few other kernel specific approaches to guess the maximum buffer size
that can be stored inside the kernel. One of them is to read PIPE_SIZE
from <sys/pipe.h>
on BSD systems, but given that it is 16384 for FreeBSD, NetBSD and OpenBSD, it’s merely an
internal implementation specific header.
In order to make the picture fuller, we need to mention that the FreeBSD and NetBSD kernels allow tuning of the pipe behavior and investigating the kernel virtual address spent on the buffers.
FreeBSD provides the following sysctl
knobs:
kern.ipc.piperesizeallowed
: Pipe resizing allowedkern.ipc.piperesizefail
: Pipe resize failureskern.ipc.pipeallocfail
: Pipe allocation failureskern.ipc.pipefragretry
: Pipe allocation retries due to fragmentationkern.ipc.pipekva
: Pipe KVA usagekern.ipc.maxpipekva
: Pipe KVA limit
NetBSD:
kern.pipe.maxbigpipes
: Maximum number of “big” pipeskern.pipe.nbigpipes
: Number of “big” pipeskern.pipe.kvasize
: Amount of kernel memory consumed by pipe buffers
OpenBSD does not provide any similar sysctl
functionality for pipes.
What are “big” pipes in NetBSD? They are special case pipes that exceed PIPE_SIZE
four times (giving 65536 bytes) on atomic writes.
The maximum number of “big” pipes is set by default to 32, but can be tuned dynamically in runtime.
Limit | FreeBSD 12.0 | NetBSD 9.0 | OpenBSD 6.6 | Linux 5.6.14 |
---|---|---|---|---|
_PC_PIPE_BUF | 512 | 512 | 512 | 4096 |
PIPE_BUF | 512 | 512 | 512 | 4096 |
PIPE_SIZE (implementation detail) | 16384 | 16384 | 16384 | N/A |
ioctl(FIONSPACE) | N/A | 16384 | N/A | N/A |
write(2) + alarm(3) | 65536 | 16384 | 16384 | 65536 |
write(2) + O_NONBLOCK | 98303 | 16384 | 49023 | 65536 |
"big" pipe on atomic write | N/A | 65536 | N/A | N/A |
As we can see, these limits highly depend on the Operating System and the portable
approach to pick the buffer size with guaranteed atomic writes is to use the POSIX limits
represented by PIPE_BUF
and _PC_PIPE_BUF
or fallback to the bare minimum allowed by POSIX at 512 bytes.
In practice, sometimes it’s not important whether an operation will block or not, as the kernel will handle the communication channel with a sequence of write and read operations, and blocking the appropriate end upon reaching the internal kernel buffer limit. Properly designed software shall be immune to the buffering sizes and defer the buffering sizes to the kernel designers who tuned the mechanism for maximal efficiency.
Why not raise the limits to very large sizes like 32 megabytes? Because the kernel would be prone to Denial of Service attacks, more easily going out of available kernel virtual memory.
Furthermore, the whole mechanism could lead to undesirable waste of kernel memory and in some corner cases even to the latencies similar to bufferbloat.
Summary
We have introduced the reader to the UNIX pipe concept and presented the basic characteristics of this interprocess communication channel. In the next part, we will dig into the examples of combining two processes and managing the byte transfers.