Linux firewalling and SECMARK support

June 13, 2021

The approach with TCP, UDP, and SCTP ports has a few downsides. One of them is that SELinux has no knowledge of the target host, so cannot reason about its security properties. This method also offers no way of limiting daemons from binding on any interface: in a multi-homed situation, we might want to make sure that a daemon only binds on the interface facing the internal network and not the internet-facing one, or vice versa.

In the past, SELinux allowed support for this binding issue through the interface and node labels: a domain could be configured to only bind to one interface and not to any other, or even on a specific address (referred to as the node). This support had its flaws though, and has been largely deprecated in favor of SECMARK filtering.

Before explaining SECMARK and how administrators can control it, let’s first take a quick look at Linux’s netfilter subsystem, the de facto standard for local firewall capabilities on Linux systems.

Introducing netfilter

Like LSM, the Linux netfilter subsystem provides hooks in various stages of its networking stack processing framework, which can then be implemented by one or more modules. For instance, ip_tables (which uses the iptables command as its control application) is one of those modules, while ip6_tables and ebtables are other examples of netfilter modules. Modules implementing a netfilter hook must inform the netfilter framework of that hook’s priority. This enables controllable ordering in the execution of modules (as multiple calls for the same hook can and will be used together).

The ip_tables framework is the one we will be looking at in more detail because it supports the SECMARK approach. This framework is commonly referred to as just iptables, which is the name of its control application. We will be using this term for the remainder of this book.

iptables offers several tables, functionally-oriented classifications for network processing. The common ones are as follows:

  • The filter table enables the standard network-filtering capabilities.
  • The nat table is intended to modify routing-related information from packets, such as the source and/or destination address.
  • The mangle table is used to modify most of a packet’s fields.
  • The raw table is enabled when administrators want to opt out certain packets/flows from the connection-tracking capabilities of netfilter.
  • The security table is offered to allow administrators to label packets once regular processing is complete.

Within each table, iptables offers a default set of chains. These default chains specify where in the processing flow (and thus which hook in the netfilter framework) rules are to be processed. Each chain has a default policy – the default return value if none of the rules in a chain match. Within the chain, administrators can add several rules to process sequentially. When a rule matches, the configured action applies. This action can be to allow the packet to flow through this hook in the netfilter framework, be denied, or perform additional processing.

Commonly provided chains (not all chains are offered for all tables) include the following:

  • The PREROUTING chain, which is the first packet-processing step once a packet is received
  • The INPUT chain, which is for processing packets meant for the local system
  • The FORWARD chain, which is for processing packets meant to be forwarded to another remote system
  • The OUTPUT chain, which is for processing packets originating from the local system
  • The POSTROUTING chain, which is the last packet-processing step before a packet is sent

Overly simplified, the implementation of these tables and their chains roughly associates with the priority of the calls within the netfilter framework. The chains are easily associated with the hooks provided by the netfilter framework, whereas the table tells netfilter which chain implementations are to be executed first.

Implementing security markings

With packet labeling, we can use the filtering capabilities of iptables (and ip6tables) to assign labels to packets and connections. The idea is that the local firewall tags packets and connections and then the kernel uses SELinux to grant (or deny) application domains the right to use those tagged packets and connections.

This packet labeling is known as SECurity MARKings (SECMARK). Although we use the term SECMARK, the framework consists of two markings: one for packets (SECMARK) and one for connections, that is, CONNection MARKings (CONNMARK). The SECMARK capabilities are offered through two tables, mangle and security. Only these tables currently have the action of tagging packets and connections available in their rule set:

  • The mangle table has a higher execution priority than most other tables. Implementing SECMARK rules on this level is generally done when all packets need to be labeled, even when many of these packets will eventually be dropped.
  • The security table is next in execution priority after the filter table. This allows the regular firewall rules to be executed first, and only tag those packets allowed by the regular firewall. Using the security table allows the filter table to implement the discretionary access control rules first and have SELinux execute its mandatory access control logic only if the DAC rules are executed successfully.

Once a SECMARK action triggers, it will assign a packet type to the packet or communication. SELinux policy rules will then validate whether a domain is allowed to receive (recv) or send packets of a given type. For instance, the Firefox application (running in the mozilla_t domain) will be allowed to send and receive HTTP client packets:

allow mozilla_t http_client_packet_t : packet { send recv };

Another supported permission set for SECMARK-related packets is  forward_in  and  forward_out. These permissions are checked when using forwarding in netfilter.

One important thing to be aware of is that once a SECMARK action is defined, then all the packets that eventually reach the operating system’s applications will have a label associated with them — even if no SECMARK rule exists for the packet or connection that the kernel is inspecting. If that occurs, then the kernel applies the default unlabeled_t label. The default SELinux policy implemented in some distributions (such as CentOS) allows all domains to send and receive unlabeled_t packets, but this is not true for all Linux distributions.

Assigning labels to packets

When no SECMARK-related rules are loaded in the netfilter subsystem, then SECMARK is not enabled and none of the SELinux rules related to SECMARK permissions are checked. The network packets are not labeled, so no enforcement can be applied to them. Of course, the regular socket-related access controls still apply — SECMARK is just an additional control measure.

Once a single SECMARK rule is active, SELinux starts enforcing the packet-label mechanism on all packets. This means that all the network packets now need a label on them (as SELinux can only deal with labeled resources). The default label (the initial security context) for packets is unlabeled_t, which means that no marking rule matches this network packet.

Because SECMARK rules are now enforced, SELinux checks all domains that interact with network packets to see whether they are authorized to send or receive these packets. To simplify management, some distributions enable send and receive rights against the unlabeled_t packets for all domains. Without these rules, all network services would stop functioning properly the moment a single SECMARK rule becomes active.

To assign a label to a packet, we need to define a set of rules that match a particular network flow, and then call the SECMARK logic (to tag the packet or communication with a label). Most rules will immediately match the ACCEPT target as well, to allow this particular communication to reach the system.

Let’s implement two rules:

  • The first is to allow communication toward websites (port 80) and tag the related network packets with the http_client_packet_t type (so that web browsers are allowed to send and receive these packets).
  • The second is to allow communication toward the locally running web server (port 80 as well) and tag its related network packets with the http_server_packet_t type (so that web servers are allowed to send and receive these packets).

For each rule set, we also enable connection tracking so that related packets are automatically labeled correctly and passed.

Use the following commands for the web server traffic:

# iptables -t filter -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
# iptables -t filter -A INPUT -p tcp -d --dport 80 -j ACCEPT
# iptables -t security -A INPUT -p tcp --dport 80 -j SECMARK --selctx "system_u:object_r:http_server_packet_t:s0"
# iptables -t security -A INPUT -p tcp --dport 80 -j CONNSECMARK --save

Use these commands for the browser traffic:

# iptables -t filter -A OUTPUT -m conntrack --ctstate ESTABLISHED -j ACCEPT
# iptables -t filter -A OUTPUT -p tcp --dport 80 -j ACCEPT
# iptables -t security -A OUTPUT -p tcp --dport 80 -j SECMARK --selctx "system_u:object_r:http_client_packet_t:s0"
# iptables -t security -A OUTPUT -p tcp --dport 80 -j CONNSECMARK --save

Finally, to copy connection labels to the established and related packets, use the following commands:

# iptables -t security -A INPUT -m state --state ESTABLISHED,RELATED -j CONNSECMARK --restore
# iptables -t security -A OUTPUT -m state --state ESTABLISHED,RELATED -j CONNSECMARK --restore

Even this simple example shows that firewall rule definitions are an art by themselves, and that the SECMARK labeling is just a small part of it. However, using the SECMARK rules makes it possible to allow certain traffic while still ensuring that only well-defined domains can interact with that traffic. For instance, it can be implemented on kiosk systems to only allow one browser to communicate with the internet while all other browsers and commands aren’t. Tag all browsing-related traffic with a specific label, and only allow that browser domain the send and recv permissions on that label.

Transitioning to nftables

While iptables is still one of the most widely used firewall technologies on Linux, two other contenders (nftables and bpfilter) are rising rapidly in terms of popularity. The first of these, nftables, has a few operational benefits over iptables, while retaining focus on the netfilter support in the Linux kernel:

  1. The code base for nftables and its Linux kernel support is much more streamlined.
  2. Error reporting is much better.
  3. Filtering rules can be incrementally changed rather than requiring a full reload of all rules.

The nftables framework has recently received support for SECMARK, so let’s see how to apply the http_server_packet_t and http_client_packet_t labels to the appropriate traffic.

The most common approach for applying somewhat larger nftables rules is to use a configuration file with the nft interpreter set:

#!/usr/sbin/nft -f
flush ruleset
table inet filter {
 secmark http_server {
 secmark http_client {
 map secmapping_in {
 type inet_service : secmark
 elements = { 80 : "http_server" }
 map secmapping_out {
 type inet_service : secmark
 elements = { 80 : "http_client" }
 chain input {
 type filter hook input priority 0;
 ct state new meta secmark set tcp dport map @secmapping_in
 ct state new ct secmark set meta secmark
 ct state established,related meta secmark set ct secmark
 chain output {
 type filter hook output priority 0;
 ct state new meta secmark set tcp dport map @secmapping_out
 ct state new ct secmark set meta secmark
 ct state established,related meta secmark set ct secmark

The syntax that nftables uses is recognizable when we compare it with iptables. The script starts with defining the SECMARK values. After that, we create a mapping between a port (80 in the example) and the value used for the SECMARK support. Of course, already established sessions also receive the appropriate SECMARK labeling.

If we define multiple entries, the elements variable uses commas to separate the various values:

elements = { 53 : "dns_client" , 80 : "http_client" , 443 : "http_client" }

Next to nftables. A second firewall solution that is gaining traction is eBPF, which we cover next.

Assessing eBPF

eBPF (and the bpfilter command) is completely different in nature compared to iptables and nftables, so let’s first see how eBPF functions before we cover the SELinux support details for it.

Understanding how eBPF works

The extended Berkeley Packet Filter (eBPF) is a framework that uses an in-kernel virtual machine that interprets and executes eBPF code, rather low-level instructions comparable to processor instruction set operations. Because of its very low-level, yet processor-agnostic language, it can be used to create very fast, highly optimized rules.

BPF was originally used for analyzing and filtering network traffic (for example, within tcpdump). Because of its high efficiency, it was soon found in other tools as well, growing beyond the plain network filtering and analysis capabilities. As BPF expanded toward other use cases, it became extended BPF, or eBPF.

The eBPF framework in the Linux kernel has been successfully used for performance monitoring, where eBPF applications hook into runtime processes and kernel subsystems to measure performance and feed back the metrics to user-space applications. It, of course, also supports filtering on (network) sockets, cgroups, process scheduling, and many more — and the list is growing rapidly.

As with the LSM framework, which uses hooks into the system calls and other security-sensitive operations in the Linux kernel, eBPF hooks into the Linux kernel as well. Occasionally it can use existing hooks (as with the Linux kernel probes or kprobes framework) and thus benefit from the stability of these interfaces. We can thus expect eBPF to grow its support further in other areas of the Linux kernel as well.

eBPF applications (eBPF programs) are defined in user space, and then submitted to the Linux kernel. The kernel verifies the security and consistency of the code to ensure that the virtual machine will not attempt to break out of the boundaries it works in. If approved (possibly after the code is slightly altered, as the Linux kernel has some operations that modify eBPF code to suit the environment or security rules), the eBPF program runs in the Linux kernel (within its virtual machine) and executes its purpose.


The Linux kernel can compile the eBPF instructions into native, processor-specific instructions, rather than having the virtual machine interpret them. However, as this leads to a higher security risk, this Just-In-Time (JIT) eBPF support is sometimes disabled by Linux distributions in their Linux kernels. It can be enabled by setting  /proc/sys/net/core/bpf_jit_enable  to  1.

These programs can load and save information in memory, called maps. These eBPF maps can be read or written to by user-space applications, and thus offer the main interface to interact with running eBPF programs. These maps are accessed through file descriptors, allowing processes to pass along and clone these file descriptors as needed.

Various products and projects are using eBPF to create high-performance network capabilities, such as software-defined network configurations, DDoS mitigation rules, load balancers, and more. Unlike the netfilter-based firewalls, which rely on a massive code base within the kernel tuned through configuration, eBPF programs are built specifically for their purpose and nothing more, and only that code is actively running.

Securing eBPF programs and maps

The default security measures in place for eBPF programs and maps are very limited, partly because lots of trust is put in the Linux kernel verifier (which verifies the eBPF code before it passes the code on to the virtual machine), and partly because the eBPF code was only allowed to be loaded when the process involved has the CAP_SYS_ADMIN capability. And as this capability basically means full system access, additional security controls were not deemed necessary.

Since Linux kernel 4.4, some types of eBPF programs (such as socket filtering) can be loaded even by unprivileged processes (but, of course, only toward the sockets these processes have access to). The system allows loading programs to work on cgroups socket buffers (skb) if the process has the CAP_NET_ADMIN capability. Recently, the permission to load eBPF programs has been added to the CAP_BPF and CAP_TRACING capabilities, although not all Linux distributions offer a Linux kernel that supports these capabilities already. But Linux administrators that want more fine-grained control over eBPF can use SELinux to tune and tweak eBPF handling.

SELinux has a bpf class, which governs the basic eBPF operations: prog_load, prog_run, map_create, map_read, and map_write. Whenever a process creates a program or map, this program or map inherits the SELinux label of this process. If the file descriptors regarding these maps or programs are leaked, the malicious application still requires the necessary privileges toward this label before it can exploit it.

User-space operations can interact with the eBPF framework through the /sys/fs/bpf virtual filesystem, so some Linux distributions associate a specific SELinux label (bpf_t) with this location as well. This allows administrators to manage access through SELinux policy rules in relation to this type.

While eBPF is extremely extensible, the number of simplified frameworks surrounding it is small given its very early phase. We can, however, expect that more elaborate support will come soon, as a new tool called bpfilter is showing off the capabilities of eBPF-based firewalling on Linux systems.

Filtering traffic with bpfilter

The bpfilter application is an application that builds a new eBPF program to filter and process traffic. It allows administrators to build firewall capabilities without understanding the low-level eBPF instructions, and has recently started supporting iptables: administrators create rules with iptables, and bpfilter translates and converts these into eBPF programs.


While bpfilter is part of the Linux kernel tree, it should be considered a proof-of-value currently, rather than a production-ready firewall capability.

bpfilter creates eBPF programs that hook inside the Linux kernel between the network device driver and the TCP/IP stack in a layer called the eXpress Data Path (XDP). At this level, the eBPF programs have access to the full network packet information (including link layer protocols such as Ethernet).

To use bpfilter, the Linux kernel needs to be built with the appropriate settings, including CONFIG_BPFILTER and CONFIG_BPFILTER_UMH. The latter is the bpfilter user mode helper that will capture iptables-generated firewall rules, and translate those into eBPF applications.

Before we load the bpfilter user mode helper, we need to allow execmem permission in SELinux:

# setsebool allow_execmem on

Next, load the bpfilter module, which will have the user mode helper active on the system:

# modprobe bpfilter
# dmesg | tail
bpfilter: Loaded bpfilter_umh pid 2109

Now, load the iptables firewall using the commands listed previously. The instructions are translated into eBPF programs, as shown with bpftool:

# bpftool p
1: xdp  tag 8ec94a061de28c09 dev ens3
        loaded_at Apr 25/23:19  uid:0
        xlated 533B  jited 943B  memlock 4096B

The eBPF code itself can be displayed as well, but is hardly readable at this point for administrators.

All of the aforementioned firewall capabilities interact with the TCP/IP stack supported within the Linux kernel. There are, however, networks that do not rely on TCP/IP, such as InfiniBand. Luckily, even on those more specialized network environments, SELinux can be used to control communication flows.

Related Articles

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.

Lorem ipsum dolor sit amet consectetur


Submit a Comment

Your email address will not be published. Required fields are marked *

fifteen + 9 =