Virtualization is a core concept that plays a part in many infrastructural service designs. Ever since its inception in the early 1970s as a means of isolating workloads and abstracting hardware dependencies, virtualization implementations have grown tremendously. When we look at infrastructure service offerings today, we quickly realize that many cloud providers would be out of service if they could not rely on the benefits and virtues of virtualization.
One of the properties that virtualization offers is isolation, which SELinux can support and augment quite nicely.
When we look at virtualization, we look at the abstraction layers it provides to hide certain resource views (such as hardware or processing power). Virtualization contributes to the development of more efficient hardware usage (which results in better cost control), centralized views on resources and systems, more flexibility in the number of operating systems that the company can deal with, standardization of resource allocation, and even improved security services.
There are several virtualization types around:
- Full-system emulation: Where hardware is completely emulated through software. QEMU is an open source emulation software capable of handling full-system emulation, allowing administrators and developers to run virtual platforms with different processor architectures not otherwise compatible with their own systems.
- Native virtualization: Where main parts of the hardware are shared across instances, and guests can run unmodified on them. Linux’s KVM, which is also supported through QEMU, is an example of this type of virtualization.
- Paravirtualization: Where the guest operating system uses specific APIs offered by the virtualization layer (on which unmodified operating systems cannot be hosted). Initial releases of Xen only supported paravirtualization. Using KVM with VirtIO drivers is another, more modular example.
- OS-level virtualization or containerization: Where the guest uses the host operating system (kernel) but does not see the processes and other resources running on the host. Docker containers or LXC containers are examples of OS-level virtualization.
- Application virtualization: Where the application runs under a specialized software runtime. A popular example here is the support for Java applications, running on the Java Virtual Machine (JVM).
- The host is the (native) operating system or server on which the virtualization software is running.
- The guest is the virtualized service (generally an operating system or container) that runs on the host.
- The hypervisor is the specialized virtualization software that manages the hardware abstraction and resource-sharing capabilities of the virtualization platform. It is responsible for creating and running the virtual machines.
- An image is a file or set of files that represents the filesystem, disk, or other medium assigned to a guest.
- A virtual machine is the abstracted hardware or resource set in which the guest runs.
Before we embark on configuring and tuning virtualization services, let’s first see what SELinux has to offer for virtualized environments.
Reviewing the risks of virtualization
Virtualization comes with a number of risks though. If we ask architects or other risk-conscious people about the risks of virtualization, they will talk about virtual machine sprawl, challenges related to secure or insecure APIs, the higher complexity of virtualized services, and whatnot.
Going over the challenges of virtualization itself is beyond the scope of this chapter, but there are a few notable risks that play directly into SELinux’s field of interest. If we can integrate SELinux with a virtualization layer, then we can mitigate these risks more proactively:
- The first risk is data sensitivity within a virtual machine. Whenever multiple virtual machines are hosted together, you could have the risk that one guest is able (be it through a flaw in the virtualization software, the hypervisor’s networking capabilities, or through side-channel attacks) to access sensitive data on another virtual machine.
With SELinux, data sensitivity can be controlled using sensitivity ranges. Guests can run with different sensitivity ranges, guaranteeing the data sensitivity even on the virtualization layer.
- Another risk is the security of offline guest images. Here, either administrators or misconfigured virtual machines might gain access to another guest image. SELinux can prevent this through properly labeled guest images and ensuring that images of offline virtual machines are typed differently from online virtual machines.
- Virtual machines can also exhaust the resources on a system. On Linux systems, many resources can be controlled through the control groups (cgroups) subsystem. As this subsystem is governed through system calls and regular file APIs, SELinux can be used to further control access to this facility, ensuring that the cgroups maintained by libvirt, for instance, remain solely under the control of libvirt.
- Break-out attacks, where vulnerabilities within the hypervisor are exploited to try to reach the host operating system, can be mitigated through SELinux’s type enforcement as even a hypervisor does not require full administrative access to everything on the host.
- SELinux can also be used to authorize access to the hypervisor, ensuring that only the right teams (through the role-based access controls) are able to control the hypervisor and its definitions.
- Finally, SELinux also offers improved guest isolation, which goes beyond just the guest image accesses. Thanks to SELinux’s MCS implementation, guests can be separated from each other in a mandatory approach. With type enforcement, the allowed behavior of guests can be defined and controlled. This is a key capability used by hosting providers as they allow running (for them) untrusted guest virtual machines.
SELinux, however, is not a full security solution for virtualization providers. One main design constraint with SELinux is that it is not dynamic if the system itself is not SELinux-aware. When we assign a type to a virtual machine, this type is generally rigid and set in stone. Virtual machines will have different behavior characteristics depending on the software running on them.
A virtual machine running a web server has different behavior characteristics than one running a database or an email gateway. Although SELinux policy administrators would be capable of creating new domains for each virtual machine, this is not efficient. As a result, most SELinux policies will only offer a few domains usable by the virtual machine with broad characteristics.
With libvirt, these domains are part of the sVirt solution.
Reusing existing virtualization domains
When Red Hat introduced its virtualization solution, it also added SELinux support, calling the resulting technology sVirt, derived from secure virtualization. As secure virtualization as a term is hardly unique in the market, we use the term sVirt predominantly to refer to the SELinux integration within virtualization management solutions such as libvirt.
With sVirt, the open source community has a reusable approach for augmenting the security posture of virtualization and containerization through SELinux. It does this through the following domains and types, which can be used regardless of the underlying virtualization platform:
- The hypervisor software itself, such as
libvirtd, uses the
- Guests (virtual machines) that do not require any interaction with the host system and resources beyond those associated with a generic virtual machine generally use the
svirt_tdomain. This domain is the most isolated guest domain for full virtualization solutions.
- Guests that require more interaction with the host, such as using the QEMU networking capabilities and sharing services, will use the
- Guests that use the KVM networking capabilities and sharing services will use the
svirt_kvm_net_tdomain. It is very similar in permissions to
svirt_qemu_net_tbut optimized for KVM.
- Containerized guests will use the
svirt_lxc_net_tdomain, whose privileges are optimized for OS-level virtualization.
- Guests that require more flexible memory accesses (such as executing writable memory segments and memory stacks) will use the
svirt_tcg_tdomain. This flexible memory access is common for full virtualization guests whose emulation/virtualization requires the use of a Tiny Code Generator (TCG), hence the name.
- Image files that contain a guest’s data will be labeled with the
- Image files that are not in use at the moment will use the default
- Image files used in a read-only fashion will have the
virt_content_ttype assigned to them.
To enable some flexibility in what the domains are allowed to do, additional SELinux booleans are put in effect, which we’ll cover next.
Fine-tuning virtualization-supporting SELinux policy
Use caution when toggling SELinux booleans to control the confinement of virtualization domains. Such booleans influence the SELinux policy on the host level, and cannot be used to change the access controls or privileges of individual guests. As such, when we change the value of an SELinux boolean, the change affects the permissions of all guests on that host.
Let’s see what the various SELinux booleans are for virtualized environments:
staff_use_svirtboolean, if enabled, allows the
staff_tuser domain to interact with and manage virtual machines, as by default this is only allowed for unconfined users.
unprivuser_use_svirtboolean, if enabled, allows unprivileged user domains (such as
user_t) to interact with and manage virtual machines.
- With the
virt_rw_qemu_ga_databooleans, the QEMU guest agent (which is an optional agent running inside the guests, facilitating operations such as freezing filesystems during backup routines) can read or even manage data labeled with the
virt_qemu_ga_data_ttype. This type, however, is not in use by default, and these SELinux booleans are disabled by default.
virt_sandbox_share_apache_contentboolean allows the guest domains to share web content. This is most commonly used for containers but is possible on guests as well if the hypervisor supports mapping host filesystems into the guest.
virt_sandbox_use_auditenabled, this boolean allows the guest domains to send audit messages to the host’s audit service.
virt_sandbox_use_fusefsboolean grants the guest domains the privilege to mount and interact with Filesystem in Userspace (FUSE) filesystems. The
virt_use_fusefsboolean allows the guests to read files on these filesystems.
- If the
virt_sandbox_use_netlinkboolean is active, then guest domains can use Netlink system calls to manipulate the networking stack within the host.
virt_transition_userdomain, containers can transition to a user domain (including the unconfined user domain
- When we enable
virt_use_execmem, guests can use executable memory.
virt_use_sambabooleans allow guests to use network filesystems mounted on the host, offered through GlusterFS, NFS, and Samba respectively. Note that this does not involve mounts inside the guest itself, such as a guest that connects to an NFS server. The booleans handle interaction through filesystem mounts on the host.
- Device access is also governed through some SELinux booleans, such as the
virt_use_commboolean to interact with serial and parallel communication ports,
virt_use_pcscdto allow guests to access smartcards, and
virt_use_usbto grant access to USB devices.
virt_use_rawipboolean allows guests to use and interact with raw IP sockets, allowing network interaction that circumvents some of the processing logic within the regular network stack.
virt_use_sanlock, guests can interact with the sanlock service, a lock manager for shared storage.
virt_use_xserveris set to true, guests can use the X server on the host.
If security-sensitive operations need to be allowed for a single guest or a small set of guests, it is advisable to run those guests on an isolated host where these operations are then allowed while running the other guests on hosts where the policy does not allow these particular actions.
Administrators can also use different SELinux domains for specific guests, fine-tuning the access controls for an individual virtual machine. How we can assign specific domains depends on the underlying technology of course. In the Enhancing libvirt with SELinux support section, we will introduce this for libvirt-based virtualization.
Understanding sVirt’s use of MCS
The SELinux domains and the mentioned types are not enough to implement proper confinement and isolation between guests. sVirt adds another layer of security by using SELinux’s Multi-Category Security (MCS) extensively.
Within SELinux, some domains are marked as an MCS-constrained type. When this is the case, the domain will not be able to access resources that do not have the same set of categories (or more) assigned as the current context, as it will not be able to extend their own active category set.
The sVirt implementation ensures that the virtualization domains mentioned earlier are all marked as MCS-constrained types. This can be confirmed by asking the system which types have the
mcs_constrained_type attribute set:
# seinfo -amcs_constrained_type -x Type Attributes: 1 attribute mcs_constrained_type container_t netlabel_peer_t openshift_app_t openshift_t sandbox_min_t sandbox_net_t sandbox_t sandbox_web_t sandbox_x_t svirt_kvm_net_t svirt_qemu_net_t svirt_t svirt_tcg_t
Through the MCS constraints, sVirt enables proper isolation between guests. Every running virtual machine (generally running as
svirt_t) will be assigned two (random) SELinux categories. The images that virtual machine needs to use are assigned the same two SELinux categories.
Whenever a virtual machine wants to access the wrong image, the difference in MCS categories will result in SELinux denying the access. Similarly, if one virtual machine is trying to connect to or attack another virtual machine, the MCS protections will once again prevent these actions from happening.
sVirt selects two categories to allow a large number of guests to run even when there are only a few categories available. Assume that the hypervisor is running with the
c10.c99 category range. That means that the hypervisor can only select 90 categories. If each guest only receives a single category, then the hypervisor can support 90 guests before allowing multiple guests to interact with each other (assuming a malicious actor found a vulnerability that allows that, of course, the hypervisor software will generally disallow such accesses as well). With two categories, however, the number of supported simultaneously running guests becomes 4,005 (the number of unique pairs in a set of 90, obtained through the formula n*(n-1)/2).
Let’s see what libvirt’s SELinux support looks like.