Linux System Calls

A Cloud Chef
5 min readAug 26, 2018

--

You can spend years working with Linux without acknowledging the existence of system calls and the underlying structure under it. It’s something always in the corner of the eye: a bread-and-butter command such as chown, the bind parameter for the listening port of your server, all funny methods on Python os module.

Put in a very simplified way, the operational system kernel is the intermediary between the hardware and any user program. This is useful for the following reasons:

  1. It abstracts the hardware implementation details. You don't need to know the vendor of network card you're using, or its speed, or if it's wired or wireless, if all you want to do is to connect to a remote host.
  2. It manages allows multiple programs to run at the same time and share the hardware. Otherwise things like overwriting a memory block other program is using, hogging all CPU time or writing to a file opened by another program would happen, unless the programs themselves coordinate their access.
  3. It controls and restrict access to the hardware. The hardware has no concept of permission, authentication or limits; this is implemented and enforced by the kernel.

The system calls, therefore, are the way user programs interact with the kernel and the resources managed by it. Linux system call set aim to be compatible with POSIX standard. This standard defines a minimum set of system calls that must be present, so porting user programs between different operational systems is easy, as they use the same system calls. The diagram below helps to picture the idea:

Linux System Call interface, from Wikimedia

But how exactly does a system call work? They are just functions: they have a name, may accept multiple arguments and optionally return a value. Some only describe the current state of the system, some change it. Some even trigger an action that continues running after the call returns.

We will not go much deeper than this for now, but I'd like to present some system calls used to accomplish some tasks.

Executing a program

  • execve: executes a program or a script and waits it to finish. It accepts three arguments: the full path to the program (or script, if the first line indicates the interpreter using the shebang notation); the list of arguments to the program (the zeroth argument being the program name itself, as called) and a list of environment variables in key=value format.
  • exit: ends the current process and return the value of the argument status to the parent process. All open file descriptors are closed, any child process is adopted by process 1 (init) and the parent process receives a SIGCHLD signal.
  • wait/waitpid: block the current process until the child process finishes. There are different ways to set the arguments, but basically they allow to pick a single child process, a group or all; and select the ways the child process can finish and stop the system call. It can be also used to collect the child process exit status. A child process that has terminated but is still not be "waited" by its parent process is known as zombie process.
  • fork: create a new process duplicating the current process. The child process has a copy of parent memory, inherits the opened file descriptors and continues executing right after fork() call. The function call returns the child process id to the parent process and 0 to the child process (so, the program code can tell if it's now running in the parent or in a child). On Linux, the child memory is implemented using copy-on-write pages, so it doesn't necessarily uses as much memory as its parent.
  • clone: similar to fork, but can be used to create a thread instead a process. Unlike a child process, a thread shares its parent memory space, file descriptors and table of signal handlers. Unlike fork(), it doesn't continue the execution from the same point: instead, a function and its arguments are passed to the system call: while the parent process continues the execution from the same point, the child starts from the specified position. Since the parent process and its threads share the same memory, they can exchange data faster than children processes; however they should be careful to not overwrite the memory space another thread is using.

Communication

  • socket: creates a communication socket endpoint, that can be used to exchange data between process in the same host or different hosts. Its arguments are the network protocol (despite the name, AF_UNIX/AF_LOCAL are valid choices and use local Unix sockets) and the transport protocol (or SOCK_RAW for no transport protocol). It returns the socket file descriptor, used as arguments for the other communication system calls.
  • bind: bind() associates a socket to a name (i.e. an IP address and port number), so it can can receive data. Once bound, a name can't be used by another process until the the socket is closed. It accepts the socket file descriptor and the name as arguments, and return 0 if the socket has been successfully bound to the name
  • listen/accept: for connection-oriented transport protocols (such as TCP), listen() is used to make the socket a passive endpoint and associate a queue of pending connections. Once a new connection request is received, accept() will create a fresh socket for that particular connection and the main socket can resume waiting for connections. listen() arguments are the listening socket file descriptor and the maximum length of pending connections queue. It returns 0 if successful. accept() receives the listening socket file descriptor and returns the connection socket file descriptor.
  • connect: again, for connection-oriented transport protocols, the client must initiate the connection before transmitting data. This is done using connect(). It requires the local socket file descriptor and the remote socket name (i.e. IP address and port, for TCP connections); returning 0 if the connection is established.
  • send/recv: used to send/receive data from a socket, respectively. Among the accepted arguments, they require the socket file descriptor and the buffer with the data to send/where the received data should be saved. If the transfer is successful, the number of transmitted bytes is returned.
  • close: like files, sockets need to be closed after they are not being used anymore. Its only argument is the socket file descriptor and it returns 0 once it's successfully closed.

The diagram below helps to understand how the these system calls work:

System calls for communicating between hosts using TCP and UDP

This is all for today.

--

--