User Tools

Site Tools


wiki:kerneloverview

Kernel Overview

The seL4 kernel isolates applications and provides primitive mechanisms for these isolated instances to communicate with one another. It is up to the userspace libraries to abstract these primitives and provide easier to use API’s.

The kernel provides objects for threads, address spaces, inter-process communication, notifications, device primitives, and capability spaces. These objects can be used in user space to interact with the kernel and other objects. These services are invoked with system calls, which consist of the kernel object to be operated on and the capability to that object, some of which can take more arguments.

Threads and Thread Control Blocks

seL4 uses threads to group executing contexts together. Threads are contained by Thread Control Block tcb_t structures. As of version 8.0 of the seL4 code base, these can be seen in kernel/include/object/structures.h:

/* TCB: size 64 bytes + sizeof(arch_tcb_t) (aligned to nearest power of 2) */
struct tcb {
    /* arch specific tcb state (including context)*/
    arch_tcb_t tcbArch;
 
    /* Thread state, 12 bytes */
    thread_state_t tcbState;
 
    /* Notification that this TCB is bound to. If this is set, when this TCB waits on
     * any sync endpoint, it may receive a signal from a Notification object.
     * 4 bytes*/
    notification_t *tcbBoundNotification;
 
    /* Current fault, 8 bytes */
    seL4_Fault_t tcbFault;
 
    /* Current lookup failure, 8 bytes */
    lookup_fault_t tcbLookupFailure;
 
    /* Domain, 1 byte (packed to 4) */
    dom_t tcbDomain;
 
    /*  maximum controlled priorioty, 1 byte (packed to 4) */
    prio_t tcbMCP;
 
    /* Priority, 1 byte (packed to 4) */
    prio_t tcbPriority;
 
    /* Timeslice remaining, 4 bytes */
    word_t tcbTimeSlice;
 
    /* Capability pointer to thread fault handler, 4 bytes */
    cptr_t tcbFaultHandler;
 
    /* userland virtual address of thread IPC buffer, 4 bytes */
    word_t tcbIPCBuffer;
 
#ifdef ENABLE_SMP_SUPPORT
    /* cpu ID this thread is running on */
    word_t tcbAffinity;
#endif /* ENABLE_SMP_SUPPORT */
 
    /* Previous and next pointers for scheduler queues , 8 bytes */
    struct tcb* tcbSchedNext;
    struct tcb* tcbSchedPrev;
    /* Preivous and next pointers for endpoint and notification queues, 8 bytes */
    struct tcb* tcbEPNext;
    struct tcb* tcbEPPrev;
 
#ifdef CONFIG_BENCHMARK_TRACK_UTILISATION
    benchmark_util_t benchmark;
#endif
 
#ifdef CONFIG_DEBUG_BUILD
    /* Pointers for list of all tcbs that is maintained
     * when CONFIG_DEBUG_BUILD is enabled */
    struct tcb *tcbDebugNext;
    struct tcb *tcbDebugPrev;
    /* Use any remaining space for a thread name */
    char tcbName[];
#endif /* CONFIG_DEBUG_BUILD */
};
typedef struct tcb tcb_t;

The Root Thread

The root thread is the thread seL4 branches to after it has initialized the kernel and structures for user space. The root thread receives all of the capabilities and memory that the kernel knows about. It is up to the root thread to divide up these objects and create new objects, which can be passed around and further divided among other threads.

Processes

While processes are not seL4 constructs, they are included in the seL4 Utility Library (seL4_libs/libsel4utils). Processes in this library are defined in seL4_libs/libsel4utils/include/sel4utils/process.h:

typedef struct {
    vka_object_t pd;
    vspace_t vspace;
    sel4utils_alloc_data_t data;
    vka_object_t cspace;
    uint32_t cspace_size;
    uint32_t cspace_next_free;
    sel4utils_thread_t thread;
    vka_object_t fault_endpoint;
    void *entry_point;
    uintptr_t sysinfo;
    /* cptr (with respect to the process cnode) of the tcb of the first thread (0 means not supplied) */
    seL4_CPtr dest_tcb_cptr;
    seL4_Word pagesz;
    object_node_t *allocated_object_list_head;
    /* ELF headers that describe the sections of the loaded image (at least as they
     * existed at load time). Is different to the elf_regions, which have reservations,
     * these are the original headers from the elf and include nonloaded information regions */
    int num_elf_phdrs;
    Elf_Phdr *elf_phdrs;
    /* if the elf wasn't loaded into the address space, this describes the regions.
     * this permits lazy loading / copy on write / page sharing / whatever crazy thing
     * you want to implement */
    int num_elf_regions;
    sel4utils_elf_region_t *elf_regions;
    bool own_vspace;
    bool own_cspace;
    bool own_ep;
} sel4utils_process_t;

These process structures help keep seL4 objects together that relate to a particular TCB such as allocators, CSpace info, and other objects. The library consists of several helper functions to aid dealing with processes, some of which are:

  • sel4utils_spawn_process: Start a process, and copy arguments into the processes address space.
  • sel4utils_configure_process: Creates a simple CSpace and VSpace for you, allocates a fault endpoint and puts it into the new CSpace. The process will start at priority 0.
  • sel4utils_copy_path_to_process: Copy a capability object into a process’ CSpace.
  • sel4utils_destroy_process: Destroy a process.

Endpoints

  • Allow small amounts of data and capabilities to be transferred between two threads
  • Invoked with seL4 system calls
    • seL4_Send
    • seL4_Call
    • seL4_Recv
    • seL4_ReplyRecv

Inter-Process Communication

IPC is the mechanism for thread-to-thread and thread-to-kernel communication. IPC messages can be sent to either an “Endpoint” or other kernel objects. The seL4 IPC model consists of the thread having the capability to access the kernel object of interest and then loading up the thread’s message register. IPC message registers have 3 descriptive registers that are backed by CPU registers and anywhere from 1-4 “Message Registers” that are backed by the thread’s IPC buffer (& thread_t.tcbIPCBuffer). The IPC message registers are organized as follows:

  • Capability Register
  • Badge Register
  • Message tag (seL4_MessageInfo_t)
    • Has four fields
      1. Label
        • Kernel does not do anything with this
        • First data payload of the message
      2. Message Length
      3. Number of capabilities
      4. Capabilities Unwrapped
        • Used only on the receive side
        • Indicates the manner in which capabilities were received
  • Message Registers

There are helper functions to aid with sending and receiving messages:

  • seL4_MessageInfo_new
  • seL4_SetMR
  • seL4_Call
  • seL4_GetMR

Notification Service and Objects

Notifications are the non-blocking signaling mechanism that applications running on seL4 can use as binary semaphores.

Related sel4 system calls:

  • seL4_Signal
    • Updates the notification object’s word
    • Unblocks the first thread waiting on the notification, while all other threads that are waiting on the notification object keep waiting until the next time the notification is signaled.
    • If invoked with an un-badged capability the first queued thread is unblocked
  • seL4_Wait
    • If notification object’s word is zero the invoker blocks
    • If notification object’s word is not zero then the call returns and sets the notification object’s word to zero
  • seL4_Poll
    • If notification object’s word is zero the call will return immediately without blocking
    • If notification object’s word is not zero then the call returns and sets the notification object’s word to zero
  • seL4_TCB_BindNotification
    • Binds notification objects and TCBs together

Notification Objects

  • single data word
  • can be badged with
    • seL4_CNode_Mutate
    • seL4_CNode_Mint

Interrupt Objects

  • Interrupts are delivered as notifications
  • At boot, the root thread only has one IRQHandler capability. It is up to the developer to expand this out for needed interrupts.

Device Primitives

Devices can be accessed by “untyped device memory”. These device untyped memory regions can be retyped into special device frames. This type of object is managed by the kernel and prevents de-referencing it and mistaking it for RAM.

When in the root thread, the seL4_BootInfo struct contains an array of untyped capabilities (untypedList). Each element in this array is of type seL4_UntypedDesc, which is a struct that has a flag, isDevice to determine if the untyped capability is a device capability or not.

Capability Spaces (CSpaces) and Capability Nodes (CNodes)

One of the main features of the seL4 kernel is capabilities. Capabilities are a mechanism that is used to grant access to certain resources in the system. In seL4, there are a few related structures to be familiar with when talking about capabilities:

  • CSpace
    • Directed graph of CNodes
    • All capabilities reachable from the root of the CSpace
  • Capability Pointer (CPTR)
    • The address or pointer to a capability
  • CSpace Path (seL4_libs/libsel4vka/include/vka/cspacepath_t.h)
    • A structure for describing the capability in the graph
  • CRoot
    • The beginning of a CSpace
  • CSpace Node (CNode)
    • Table of slots
    • Each slot may contain a capability

The reason why capabilities form the basis of security in seL4 is the fact that the kernel keeps track of everything in the capability derivation tree and a capability is required for any operation on a kernel object. This prevents bad actor threads from gaining access to a resource in a trusted thread that they aren’t supposed to have access to. Capabilities are unforgeable, transferable, and extensible. If a thread is monitored and malicious behavior is detected, the capabilities that thread has can be revoked by a thread that owns a capability further up the capability derivation tree.

Virtual Address Spaces (VSpaces)

Virtual address spaces are chunks of kernel managed memory that threads can use to run in, while being isolated from the rest of the system. A thread must poses a capability to a chunk of untyped memory, which it can retype as virtual memory. This virtual memory can be associated with a thread object and when the thread is started, it will execute inside of that memory.

Virtual Address Space Objects

The memory objects are hardware dependent. For ARM, some important objects are:

  • Page Directory (PD)
  • Page Directory Entry (PDE)
    • Entry in the page directory
    • Contains a pointer to a page table, page, or it can be invalid
  • Page Global Directory (PGD)
  • Page Upper Directory (PUD)
  • Page Table (PT)
    • Contains page table entries
  • Page Table Entry (PTE)
    • An entry in a page table
    • Contains a pointer to a page or it can be invalid
  • Page
    • Actual physical memory, usually a 4k region
    • This can be mapped into a VSpace.

For 32-bit ARM architectures the mapping has two levels, Page Directory to Page Tables. 64-bit ARM architectures have four levels, Page Global Directories to Page Upper Directories to Page Directories to Page Tables. The Page Upper Directories allow for mapping 1GB pages. The user doesn’t necessarily need to know this information for successful seL4 programming, but it could be nice to know.

Untyped Memory

Untyped memory is the default classification of memory in seL4. Untyped memory is required to re-type into another kernel object. This means that the thread that invokes the untyped memory needs to have access to that to change it to something else. When untyped memory is retyped, the kernel knows about it and can keep track of how it is used.

The Kernel Boot Sequence

NOTE: This section focuses on the ARM architecture.

After the board is set up with the bootloader and execution is handed off to the elfloader, where it then branches to the kernel. The kernel boot sequence can be traced in src/arch/arm/kernel/boot.c. The first function to be called is init_kernel. Then try_init_kernel is called and depending on the number of CPUs, this function branches to try_init_kernel_secondary_core.

In try_init_kernel several important things happen:

  • Virtual memory for the kernel is set up in map_kernel_window.
    • This also calls map_kernel_devices
      • Where the platform dependent kernel_devices gets mapped
      • This is 32/64 bit dependent and in the src/arch/arm/$ARCH/machine/hardware.c source file.
  • CPU is initialized in init_cpu
    • The activate_global_pd is a macro that is dependent on whether it is running in 32 or 64 bit mode
      • In 64 bit mode this actually calls activate_kernel_vspace
        • This function is in src/arch/arm/64/kernel/vspace.c
    • The kernel stack is set up
    • The timer is initialized
  • Platform is initialized in init_plat
    • The IRQ Controller is initialized with initIRQController
      • The implementation of this is found in src/plat/$PLAT/machine/ or src/arch/arm/machine/gic_$GIC_VERSION.c
    • The L2 Cache is initialized
    • In the future other platform dependent initializations may take place here
  • Capabilities are created for the root environment
  • The boot info frame is allocated and initialized
  • The initial address space is created
  • The boot info frame capability is created
  • The initial threads IPC buffer is created
  • This is all brought together and the initial thread control block of type tcb_t is created with a call to create_initial_thread
  • L1 cache is invalidated
  • Execution is returned to the init_kernel function

Once execution returns to the init_kernel function, the thread is scheduled and execution branches to the root user space thread.

wiki/kerneloverview.txt · Last modified: 2018/05/08 19:30 (external edit)