Understanding SELinux secured virtualization

July 02, 2021

Virtualization is a core concept that plays a part in many infrastructural service designs. Ever since its inception in the early 1970s as a means of isolating workloads and abstracting hardware dependencies, virtualization implementations have grown tremendously. When we look at infrastructure service offerings today, we quickly realize that many cloud providers would be out of service if they could not rely on the benefits and virtues of virtualization.

One of the properties that virtualization offers is isolation, which SELinux can support and augment quite nicely.

Introducing virtualization

When we look at virtualization, we look at the abstraction layers it provides to hide certain resource views (such as hardware or processing power). Virtualization contributes to the development of more efficient hardware usage (which results in better cost control), centralized views on resources and systems, more flexibility in the number of operating systems that the company can deal with, standardization of resource allocation, and even improved security services.

There are several virtualization types around:

  • Full-system emulation: Where hardware is completely emulated through software. QEMU is an open source emulation software capable of handling full-system emulation, allowing administrators and developers to run virtual platforms with different processor architectures not otherwise compatible with their own systems.
  • Native virtualization: Where main parts of the hardware are shared across instances, and guests can run unmodified on them. Linux’s KVM, which is also supported through QEMU, is an example of this type of virtualization.
  • Paravirtualization: Where the guest operating system uses specific APIs offered by the virtualization layer (on which unmodified operating systems cannot be hosted). Initial releases of Xen only supported paravirtualization. Using KVM with VirtIO drivers is another, more modular example.
  • OS-level virtualization or containerization: Where the guest uses the host operating system (kernel) but does not see the processes and other resources running on the host. Docker containers or LXC containers are examples of OS-level virtualization.
  • Application virtualization: Where the application runs under a specialized software runtime. A popular example here is the support for Java applications, running on the Java Virtual Machine (JVM).

Many virtualization platforms support a few virtualization types. QEMU can range from full emulation to paravirtualization, depending on its configuration.

When we work with virtualization layers, the following terms come up frequently:

  • The host is the (native) operating system or server on which the virtualization software is running.
  • The guest is the virtualized service (generally an operating system or container) that runs on the host.
  • The hypervisor is the specialized virtualization software that manages the hardware abstraction and resource-sharing capabilities of the virtualization platform. It is responsible for creating and running the virtual machines.
  • An image is a file or set of files that represents the filesystem, disk, or other medium assigned to a guest.
  • A virtual machine is the abstracted hardware or resource set in which the guest runs.

Before we embark on configuring and tuning virtualization services, let’s first see what SELinux has to offer for virtualized environments.

Reviewing the risks of virtualization

Virtualization comes with a number of risks though. If we ask architects or other risk-conscious people about the risks of virtualization, they will talk about virtual machine sprawl, challenges related to secure or insecure APIs, the higher complexity of virtualized services, and whatnot.

Going over the challenges of virtualization itself is beyond the scope of this chapter, but there are a few notable risks that play directly into SELinux’s field of interest. If we can integrate SELinux with a virtualization layer, then we can mitigate these risks more proactively:

  • The first risk is data sensitivity within a virtual machine. Whenever multiple virtual machines are hosted together, you could have the risk that one guest is able (be it through a flaw in the virtualization software, the hypervisor’s networking capabilities, or through side-channel attacks) to access sensitive data on another virtual machine.

    With SELinux, data sensitivity can be controlled using sensitivity ranges. Guests can run with different sensitivity ranges, guaranteeing the data sensitivity even on the virtualization layer.

  • Another risk is the security of offline guest images. Here, either administrators or misconfigured virtual machines might gain access to another guest image. SELinux can prevent this through properly labeled guest images and ensuring that images of offline virtual machines are typed differently from online virtual machines.
  • Virtual machines can also exhaust the resources on a system. On Linux systems, many resources can be controlled through the control groups (cgroups) subsystem. As this subsystem is governed through system calls and regular file APIs, SELinux can be used to further control access to this facility, ensuring that the cgroups maintained by libvirt, for instance, remain solely under the control of libvirt.
  • Break-out attacks, where vulnerabilities within the hypervisor are exploited to try to reach the host operating system, can be mitigated through SELinux’s type enforcement as even a hypervisor does not require full administrative access to everything on the host.
  • SELinux can also be used to authorize access to the hypervisor, ensuring that only the right teams (through the role-based access controls) are able to control the hypervisor and its definitions.
  • Finally, SELinux also offers improved guest isolation, which goes beyond just the guest image accesses. Thanks to SELinux’s MCS implementation, guests can be separated from each other in a mandatory approach. With type enforcement, the allowed behavior of guests can be defined and controlled. This is a key capability used by hosting providers as they allow running (for them) untrusted guest virtual machines.

SELinux, however, is not a full security solution for virtualization providers. One main design constraint with SELinux is that it is not dynamic if the system itself is not SELinux-aware. When we assign a type to a virtual machine, this type is generally rigid and set in stone. Virtual machines will have different behavior characteristics depending on the software running on them.

A virtual machine running a web server has different behavior characteristics than one running a database or an email gateway. Although SELinux policy administrators would be capable of creating new domains for each virtual machine, this is not efficient. As a result, most SELinux policies will only offer a few domains usable by the virtual machine with broad characteristics.

With libvirt, these domains are part of the sVirt solution.

Reusing existing virtualization domains

When Red Hat introduced its virtualization solution, it also added SELinux support, calling the resulting technology sVirt, derived from secure virtualization. As secure virtualization as a term is hardly unique in the market, we use the term sVirt predominantly to refer to the SELinux integration within virtualization management solutions such as libvirt.

With sVirt, the open source community has a reusable approach for augmenting the security posture of virtualization and containerization through SELinux. It does this through the following domains and types, which can be used regardless of the underlying virtualization platform:

  • The hypervisor software itself, such as libvirtd, uses the virtd_t domain.
  • Guests (virtual machines) that do not require any interaction with the host system and resources beyond those associated with a generic virtual machine generally use the svirt_t domain. This domain is the most isolated guest domain for full virtualization solutions.
  • Guests that require more interaction with the host, such as using the QEMU networking capabilities and sharing services, will use the svirt_qemu_net_t domain.
  • Guests that use the KVM networking capabilities and sharing services will use the svirt_kvm_net_t domain. It is very similar in permissions to svirt_qemu_net_t but optimized for KVM.
  • Containerized guests will use the svirt_lxc_net_t domain, whose privileges are optimized for OS-level virtualization.
  • Guests that require more flexible memory accesses (such as executing writable memory segments and memory stacks) will use the svirt_tcg_t domain. This flexible memory access is common for full virtualization guests whose emulation/virtualization requires the use of a Tiny Code Generator (TCG), hence the name.
  • Image files that contain a guest’s data will be labeled with the svirt_image_t type.
  • Image files that are not in use at the moment will use the default virt_image_t type.
  • Image files used in a read-only fashion will have the virt_content_t type assigned to them.

To enable some flexibility in what the domains are allowed to do, additional SELinux booleans are put in effect, which we’ll cover next.

Fine-tuning virtualization-supporting SELinux policy

Use caution when toggling SELinux booleans to control the confinement of virtualization domains. Such booleans influence the SELinux policy on the host level, and cannot be used to change the access controls or privileges of individual guests. As such, when we change the value of an SELinux boolean, the change affects the permissions of all guests on that host.

Let’s see what the various SELinux booleans are for virtualized environments:

  • The staff_use_svirt boolean, if enabled, allows the staff_t user domain to interact with and manage virtual machines, as by default this is only allowed for unconfined users.
  • The unprivuser_use_svirt boolean, if enabled, allows unprivileged user domains (such as user_t) to interact with and manage virtual machines.
  • With the virt_read_qemu_ga_data and virt_rw_qemu_ga_data booleans, the QEMU guest agent (which is an optional agent running inside the guests, facilitating operations such as freezing filesystems during backup routines) can read or even manage data labeled with the virt_qemu_ga_data_t type. This type, however, is not in use by default, and these SELinux booleans are disabled by default.
  • The virt_sandbox_share_apache_content boolean allows the guest domains to share web content. This is most commonly used for containers but is possible on guests as well if the hypervisor supports mapping host filesystems into the guest.
  • With virt_sandbox_use_audit enabled, this boolean allows the guest domains to send audit messages to the host’s audit service.
  • The virt_sandbox_use_fusefs boolean grants the guest domains the privilege to mount and interact with Filesystem in Userspace (FUSE) filesystems. The virt_use_fusefs boolean allows the guests to read files on these filesystems.
  • If the virt_sandbox_use_netlink boolean is active, then guest domains can use Netlink system calls to manipulate the networking stack within the host.
  • With virt_transition_userdomain, containers can transition to a user domain (including the unconfined user domain unconfined_t).
  • When we enable virt_use_execmem, guests can use executable memory.
  • The virt_use_glusterd, virt_use_nfs, and virt_use_samba booleans allow guests to use network filesystems mounted on the host, offered through GlusterFS, NFS, and Samba respectively. Note that this does not involve mounts inside the guest itself, such as a guest that connects to an NFS server. The booleans handle interaction through filesystem mounts on the host.
  • Device access is also governed through some SELinux booleans, such as the virt_use_comm boolean to interact with serial and parallel communication ports, virt_use_pcscd to allow guests to access smartcards, and virt_use_usb to grant access to USB devices.
  • The virt_use_rawip boolean allows guests to use and interact with raw IP sockets, allowing network interaction that circumvents some of the processing logic within the regular network stack.
  • With virt_use_sanlock, guests can interact with the sanlock service, a lock manager for shared storage.
  • When virt_use_xserver is set to true, guests can use the X server on the host.

If security-sensitive operations need to be allowed for a single guest or a small set of guests, it is advisable to run those guests on an isolated host where these operations are then allowed while running the other guests on hosts where the policy does not allow these particular actions.

Administrators can also use different SELinux domains for specific guests, fine-tuning the access controls for an individual virtual machine. How we can assign specific domains depends on the underlying technology of course. In the Enhancing libvirt with SELinux support section, we will introduce this for libvirt-based virtualization.

Understanding sVirt’s use of MCS

The SELinux domains and the mentioned types are not enough to implement proper confinement and isolation between guests. sVirt adds another layer of security by using SELinux’s Multi-Category Security (MCS) extensively.

Within SELinux, some domains are marked as an MCS-constrained type. When this is the case, the domain will not be able to access resources that do not have the same set of categories (or more) assigned as the current context, as it will not be able to extend their own active category set.

The sVirt implementation ensures that the virtualization domains mentioned earlier are all marked as MCS-constrained types. This can be confirmed by asking the system which types have the mcs_constrained_type attribute set:

# seinfo -amcs_constrained_type -x
Type Attributes: 1
 attribute mcs_constrained_type
 container_t
 netlabel_peer_t
 openshift_app_t
 openshift_t
 sandbox_min_t
 sandbox_net_t
 sandbox_t
 sandbox_web_t
 sandbox_x_t
 svirt_kvm_net_t
 svirt_qemu_net_t
 svirt_t
 svirt_tcg_t

Through the MCS constraints, sVirt enables proper isolation between guests. Every running virtual machine (generally running as svirt_t) will be assigned two (random) SELinux categories. The images that virtual machine needs to use are assigned the same two SELinux categories.

Whenever a virtual machine wants to access the wrong image, the difference in MCS categories will result in SELinux denying the access. Similarly, if one virtual machine is trying to connect to or attack another virtual machine, the MCS protections will once again prevent these actions from happening.

sVirt selects two categories to allow a large number of guests to run even when there are only a few categories available. Assume that the hypervisor is running with the c10.c99 category range. That means that the hypervisor can only select 90 categories. If each guest only receives a single category, then the hypervisor can support 90 guests before allowing multiple guests to interact with each other (assuming a malicious actor found a vulnerability that allows that, of course, the hypervisor software will generally disallow such accesses as well). With two categories, however, the number of supported simultaneously running guests becomes 4,005 (the number of unique pairs in a set of 90, obtained through the formula n*(n-1)/2).

Let’s see what libvirt’s SELinux support looks like.

Related Articles

How to add swap space on Ubuntu 21.04 Operating System

How to add swap space on Ubuntu 21.04 Operating System

The swap space is a unique space on the disk that is used by the system when Physical RAM is full. When a Linux machine runout the RAM it use swap space to move inactive pages from RAM. Swap space can be created into Linux system in two ways, one we can create a...

read more

Lorem ipsum dolor sit amet consectetur

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

2 × one =