we introduced systemd as an SELinux-aware application suite, capable of launching different services with configurable SELinux contexts. Besides service support, systemd has quite a few other features up its sleeve. One of these features is
systemd-nspawn, systemd provides container capabilities, allowing administrators to interact with systemd-managed containers in an integrated way, almost as if these containers were services themselves. It uses the same primitives as LXC from the Linux Containers project (which was the predecessor of the modern container frameworks) and Docker, based upon namespaces (hence the
The Linux Containers project has a product called LXC that combines several isolation and resource management services within the Linux kernel, such as control groups (cgroups) and namespace isolation. cgroups allow for capping or throttling resource consumption in the CPU, memory, and I/O, whereas namespaces allow for hiding information and limiting the view on system resources. Early versions of Docker were built upon LXC, although Docker has since embraced the Linux services itself directly without using LXC.
SELinux-wise, the software running inside the container might not have a correct view on the SELinux state (depending on the container configuration) as the container is isolated from the host itself. SELinux does not yet have namespace support to allow containers or other isolated processes to have their own SELinux view, so if a container has a view on the SELinux state, it should never be allowed to modify it.
Let’s see how
systemd-nspawn works and what its SELinux support looks like.
Initializing a systemd container
To create a systemd container, we need to create a place on the filesystem where its files will be stored, and then call
systemd-nspawn with the correct arguments. To prepare the filesystem, we can download prebuilt container images, or create one ourselves. Let’s use the Jailkit software, and build a container from it:
- First, create the directory the container runtimes will be hosted in:
# mkdir /srv/ctr
- Edit the
/etc/jailkit/jk_init.inifile and include the following section:
[nginx] comment = nginx runtime paths = /usr/sbin/nginx, /etc/nginx, /var/log/nginx, /var/lib/nginx, /usr/share/nginx, /usr/lib64/nginx, /usr/lib64/perl5/vendor_perl users = root,nginx groups = root,nginx includesections = netbasics, uidbasics, perl
This section tells Jailkit what it should copy into the directory, and which users to support.
- Execute the
jk_initcommand to populate the directory:
# jk_init -v -j /srv/ctr/nginx nginx
- Finally, start the container using
# systemd-nspawn -D /srv/ctr/nginx /usr/sbin/nginx \ -g "daemon off;"
As Nginx will by default attempt to run as a daemon, the container would immediately stop as it no longer has an active process. By launching with the
daemon off option,
nginx will remain in the foreground, and the container can continue to work.
Using a specific SELinux context
-Zfor short) allows the administrator to define the SELinux context for the runtime processes of the container.
-Lfor short) allows the administrator to define the SELinux context for the files and filesystem of the container.
The SELinux types that can be used here, however, need to be carefully selected. The processes running inside a container cannot perform any type of transitions, so regular SELinux domains are often not feasible to use. Taking our Nginx example again, the
httpd_t domain cannot be used for this container.
We can use the SELinux types that the distribution provides for container workloads. Recent CentOS versions will use a domain such as
container_t (which was previously known as
svirt_lxc_net_t) and a file-oriented SELinux type,
container_file_t. While this domain does not hold all possible privileges needed for any container, it provides a good baseline for containers.
Let’s use this type for our container:
- First, we need to extend the
container_tprivileges with some additional rights for the
nginxdaemon. Create a CIL policy file with the following content:
(typeattributeset cil_gen_require container_t) (typeattributeset cil_gen_require container_file_t) (typeattributeset cil_gen_require http_port_t) (typeattributeset cil_gen_require node_t) (allow container_t container_file_t (chr_file (read open getattr ioctl write))) (allow container_t self (tcp_socket (create setopt bind listen accept read write))) (allow container_t http_port_t (tcp_socket (name_bind))) (allow container_t node_t (tcp_socket (node_bind))) (allow container_t self (capability (net_bind_service setgid setuid)))
# semodule -i custom_container.cil
- Relabel the files of the container with the
# chcon -R -t container_file_t /srv/ctr/nginx
- Launch the container with the appropriate labels:
# systemd-nspawn -D /srv/ctr/nginx \ -Z system_u:system_r:container_t:s0 \ -L system_u:object_r:container_file_t:s0 \ /usr/sbin/nginx -g "daemon off;"
Whenever a container is launched, it remains attached to the current session. We can of course create service files that launch the containers in the background, or use session management services such as
tmux. A more user-friendly approach, however, is to use
Facilitating container management with machinectl
- First, download a ready-to-go container image with the
pull-tarargument and prepare it on the system:
# machinectl pull-tar https://nspawn.org/storage/archlinux/archlinux/tar/image.tar.xz archlinux
We can also download the archive manually, and then import it using
# machinectl import-tar archlinux.tar.xz
- List the available images with the
# machinectl list-images
- We can now clone this image and launch the container:
# machinectl clone archlinux test # machinectl start test
- To access the container environment, use the
# machinectl shell test
- We can shut down the container using the
# machinectl poweroff test