Linux Boot Process — Part 1

A Cloud Chef
6 min readSep 3, 2018

--

For an ordinary user, turning the computer on seems a straightforward and simple process: you press a key, wait a couple of minutes and you're ready to use the computer.

Truth is, this process is anything but simple. It carries behaviors, workarounds and quirks accumulated in almost 40 years of evolution. We call this process boot, short for "bootstraping" as in "pulling oneself up by one's bootstrap". The sense is, unlike the long chain of programs that are called by by other programs, the computer needs somehow call itself when turned on.

To understand the boot process, it's important to understand some details about operational modes, protection rings and MBR. To be clear, this article will concentrate in PC platform with BIOS: UEFI is out of the scope as well other platforms. That said, later boot stages are very similar regardless the platform: once Linux starts, all underlying differences are abstracted and the process is pretty much the same.

Operational Modes

x86 computers have different operational modes, however we're interested in the main two: real and protected.

Real mode is the most basic operational mode and the mode the computer is set when it first starts. In this mode, programs have access only to the first megabyte of memory, but access to memory and I/O devices is completely unrestricted. All hardware assisted features such as memory protection, multitasking and protection rings are not available in real mode. This mode is where the first stages of the boot process happen, preparing the system to switch to protected mode.

Protected mode is the mode used by all modern operational systems. It supports the following features:

  • multitasking: in protected mode, multiple programs can run at the same time, sharing the CPU time using preemptive multitasking.
  • paging: to increase security and stability, each individual program access the physical memory using virtual memory page tables. Basically, each program sees a different memory table, these memory table pages (fixed-sized segments of 4096 bytes) are mapped to physical memory. Each mapped page has security flags that indicate whether the page can be shared with other process, is readable and/or writable, and holds data or executable code. These flags are used to prevent pages belonging to a program to be used or modified by another program.
  • protection rings: to improve security, the program executable code is further segmented in different privilege levels called rings. We'll see more about it in a moment.

Protected memory allows up to 4 GB of memory to be addressed by a single program. An extension to protected mode, known as long mode, expands this limit to 256 TB.

Protection Rings

Protected mode implements 4 distinct privilege levels ("rings"), numbered from 0 to 3, 0 being the most privileged and 3 the least. In practice though, Linux only uses rings 0 and 3, there is a non-negligible cost of switching between the rings:

  • Ring 0 is where the kernel, the device drivers and the system calls run. It has unrestricted access to memory and I/O devices. This mode of operation is known as kernel mode.
  • Ring 3 is where user programs and system libraries run. It can execute code, but it can't manipulate its own memory, access I/O devices or change the current protection ring (otherwise, it would be able to switch to ring 0). This mode of operation is known as user mode.

When a program running in user mode needs to access a restricted resource (like writing to disk or allocating more memory), it uses a system call. A system call is initiated generating a software interrupt; switching the content to kernel mode and checking the caller permissions for the requested operation. If the program has the required permissions, the system call code runs and the context is switched back to user mode.

BIOS

The BIOS (Basic Input/Output System) is a firmware used to perform the hardware initialization when the computer starts and provide low level runtime services for the operational system and the user programs. It's paired with a small amount of non-volatile memory that stores user settings and other configurations.

When the system boots, the firmware code is mapped to the memory and the CPU starts in real mode. Since CPU have no way to know beforehand where to look for the actual firmware code, so it always execute the code contained in the fixed physical memory address 000FFFF0. Normally, this address contains a JMP instruction, pointing to the actual BIOS code.

The BIOS code then performs two tasks:

  • POST (power on self-test): identifies and initialize the connected hardware devices, such as CPU, memory, video cards and disk controllers. Some devices have their own firmware; the POST process identifies these devices and execute their code as well. POST process only happens when performing a so-called "cold boot" (turning on a powered off system); a "warm boot" (i.e. restarting the computer pressing <ctrl>+<alt>+<del>) leaves a special flag in the BIOS non-volatile memory that bypasses this task.
  • Boot: once POST is complete, the BIOS calls INT 19h to start the boot process. The POST code is unloaded from memory and the BIOS looks for the boot devices as configured in its non-volatile memory. The process of booting from disk works loading its first sector to memory (known as master boot record, MBR) and trying to execute it. If it fails, it tries the next device until it runs out; if no device able to boot is found, it gives the user an error and stops.

A disk sector is 512 bytes long, but the MBR also keeps the disk partitioning schema (up to 4, each 16 bytes long) and a special magic number (0x55AA) that marks the device as a boot device. Therefore, the boot code must fit in the first 446 bytes of the MBR.

Boot Loader (GRUB 2)

The execution context provided by the BIOS is too restrictive to load the operational system directly. To help the Linux kernel to load, a special program called boot loader is used. The most used boot loader in Linux systems is GRUB 2, so we'll concentrate on it.

GRUB 2 is too big to fit in 446 bytes, all but the most simple boot loaders are. To locate and load the kernel, it needs to support dozens of file systems and features such as encryption, software RAID and LVM. The space just not enough.

To overcome this limitation, GRUB 2 code is divided in stages, the early stage enables the load of the following stage and so on. This is what it looks like:

GRUB 2 stages, original source
  • Stage 1: this is the bare minimal code necessary to boot the next stage. It's not aware of partitions, files or file systems; all it has is a LBA48 pointer to the following stage. The BIOS translates the LBA48 pointer to the physical address of the sector to load from; data is loaded direct to memory and executed.
  • Stage 1.5: it's used when Stage 1 can't access core.img directly from Stage 2 (for example, the /boot partition is stored in a encrypted device). It uses the empty space between the MBR and the first disk partition (sector 63), therefore it's 32,256 bytes long. Unlike Stage 1, it's aware of file systems and it loads Stage 2 using its full filename. Since there is not enough room to keep all possible file system drivers, it's dynamically built to hold all it needs to access /boot.
  • Stage 2: loads the configuration file and any driver from the file system. This is the stage that shows an text-based selection menu and allow the user to customize the boot options.

GRUB 2 can be used to load non-Linux operational systems, such as Windows. It presents a menu that allow the user to chose the operational system, in Linux case, multiple kernel versions can coexist in the same installation and can be selected in the same way. Each Linux system/kernel option will define the following arguments:

  • Kernel compressed image filename and path
  • Root file system (as understood by the kernel)
  • Optional initial RAM disk compressed image
  • Optional kernel arguments

Once you select, the kernel, GRUB 2 will load it to the memory and run it. If a initial RAM disk is defined, it's also loaded in memory and the memory pointer is passed to the kernel.

In the next part, we'll explore the kernel initialization process and the init process. See you there!

--

--