White Paper   |   July 18, 2021

TCU™ uses network AI to manage control plane security in real-time

The number of malicious attacks on the network infrastructure has increased together with the recent gains in efficiency due to the remote administration of computing and infrastructure. This increase in attack surfaces calls for greater network-based attack detection. Specifically, the need for trust services that establish a hardware root of trust that extends through software support at the virtualization and orchestration layers to containers running on the platform.

Enter trusted control/compute unit (TCU™) that takes a grounds-up view of platform level AI detection and protection of the network by incorporating functions such as secure boot, secure key store, remote attestation, runtime protection, and firmware resiliency. The on-chip secure vault establishes the root of trust and runs the trust services in a secure enclave. In other words, a secure AI computational trust engine is used for runtime trustworthiness evaluation of the containers based on their behavior.

Lack of real-time threat detection

Platform-level trust and security is vital to ensuring data center level and/or end-to-end network security as shown in Figure 1. Some of the platform critical functions at boot time and runtime determine the level of robustness of any platform. These include timely execution of locally and/or remotely administered functions like generating and maintaining cryptographic keys, updating firmware patches, trust services, attestation, authoring, and authentication. The ability to remotely access the network to perform these functions has increased the efficiency of managing the infrastructure control and management planes of a network.
Figure 1 Platform-level trust entails end-to-end network security.Figure 1 Platform-level trust entails end-to-end network security.

Some of the gains in efficiency, as well as cost savings, have been achieved from not having an on-site IT/OT person. However, the increase in remote accessibility also poses challenges to network security. These challenges often start from platform level and can quickly percolate up to node or even cross-node data center and network infrastructure level. For example, in 5G base station environments, careful sequencing of node bring-up is orchestrated to avoid remote entity authentication.

However, it’s hard to rely just on sequencing of the node bring-up or authentication of the remote administrator one time and assume the control plane agent behind that station is not malicious. For example, a remote employee’s credentials can be stolen by hackers, who can use the credentials to try and gain access to the network. The problem with current solutions on the market is that they are unable to detect this type of abnormal activity with any level of certainty.

The recognition of malicious activity on the network must occur in real-time in order to avoid the deleterious and sometimes long-lasting negative effects that can occur when a network is breached. Some of these malicious activities, how they would be recognized, and some of the motivations for breaching a network are described here.

Types of malicious activities

  1. Those involving detection of fully known malicious nodes—known and identified by higher layer software—and/or users operating on these nodes. These types of nodes can be managed using static rules enforcement.
  2. Those involving detection of partially known malicious (suspect) nodes and/or users operating on these nodes. An example of this threat would be knowing only the upper 16 bits of a 32-bit IPv4 address.
  3. Those involving detection of malicious intent by one node or a group of nodes using traffic volume per a given time interval. An example of this threat would be an attempt to bring down the network by flooding it with packets.

Malicious actors on the network

  1. Those intent on creating havoc on the network by flooding it with packets as mentioned above, or other malicious activity meant to disrupt the network.
  2. Those intent on stealing data or secrets. These actors can gain access to a network, either by hacking into it externally, or internally in the case of a rogue employee or contractor. These actors may try to elevate their privileges to gain access to more sensitive areas of the network such as the vault where the keys are kept.
  3. Those intent on extorting money from the owner of the infrastructure like a server or network element. Once malicious actors get access to the node/platform, they can install ransomware to lock out legitimate administrators as a way of extorting money from the owner.

For example, if a rogue employee with credentials accesses the network and engages in activity such as moving large amounts of data, asking for access to security keys, or other activity that is out of the ordinary, this information would only be known after the event.

Existing (flawed) model

In general, network security can be classified into:

  • Control/management plane
  • Data plane

The control/management plane contains the “intelligence” used to configure the data plane for security policy enforcement. In data plane systems, limited security rules can be enforced at wire-speed packet processing. When the data plane encounters any exceptions to the configured security rules, it bifurcates the traffic to the control plane entity for further checks. The control/management plane is expected to provide a trusted shield that allows the data plane to enforce inline security.

Figure 2 summarizes network security solutions in server environments. Security aspects of the server platform control plane are distributed across the TPM, BMC, and root-of-trust components in today’s platforms. The distribution of these various functional components increases the attack surface for hackers.

Figure 2 A synopsis of network security solutions in server environments shows why disaggregated model increases attack surfaces.Figure 2 A synopsis of network security solutions in server environments shows why disaggregated model increases attack surfaces.

In this model, malicious and suspect actors are detected and trapped using a network packet header-based analysis. For example, fully known malicious users can be detected by enforcing simplified static rules across various platforms/nodes, indicating that all traffic coming from IP address should be discarded. However, note that this detection cannot be done in real time using conventional techniques.

Similarly, partially known rules are configured in hardware tables and are enforced using one or more content addressable memories (CAM) in hardware. For example, all traffic from a particular IP subnet mask and TCP port number may be subject to further action by the hardware. Lastly, traffic volume-based attacks can be detected using hardware-based traffic policers that detect denial-of-service (DoS) attacks.

Security gaps in existing model

The biggest gap in this model is that the rules enforcement function, both in the control and management planes, is not done in real time. In other words, today’s control plane elements cannot offer any real-time malicious entity detection or help to distribute that information across end-to-end networks. This is especially true for malicious actors where their identity is only partially known, and their behaviors cannot be fully monitored as shown by the red text in Figure 3.

Figure 3 Malicious actors cannot be fully monitored when their identity is known known only partially.Figure 3 Malicious actors cannot be fully monitored when their identity is known known only partially.

Therefore, if malicious activity on the network occurs, it will only be known once the problem created by the activity has manifested itself. This gap can be addressed using AI-based threat detection techniques.

The TCU™ solution

Axiado addresses this gap by adding hardware acceleration to run behavior analysis using neural networks. This helps the control plane to mark some of the network flow traffic or end stations as ‘suspect’ or ‘malicious’ based on their access patterns.

Figure 4 illustrates a Secure AI pipeline which includes packet header parsing, network flow classification, and AI acceleration engine to do inferencing. This Secure AI pipeline can be accessed by control/management plane entities as well as by the platform host for network traffic exceptions for further behavioral analysis. All of these components are part of the Axiado TCU, including a 4 tera operations per second (TOPS) AI acceleration engine.

Figure 4 Secure AI pipeline includes packet header parsing, network flow classification, and AI acceleration engine.Figure 4 Secure AI pipeline includes packet header parsing, network flow classification, and AI acceleration engine.

The Axiado TCU is capable of managing complex AI network models. A simplistic version of a network model is captured in the NSL-KDD dataset that is used to detect network intrusion.

This dataset covers four different classes of attacks:

  • Denial-of-service (DoS)
  • Probe
  • User to root (U2R)
  • Remote to local (R2L)

The U2R and R2L types are used by attackers to promote their status within the network and hack into various stations/platforms connected to these networks.

An efficient neural network can be built to detect all four of these attack types by leveraging:

  • Intrinsic features derived from the packet header
  • Content features derived from the packet payload
  • Time-based features derived from packets over a time window
  • Host-based features derived from packets to/from a malicious actor’s machine

For example, using the Axiado TCU™ with the NSL-KDD dataset for benchmarking, tests have been run on the base-model, using only the Files 1-9 (of 41 files) and achieve an accuracy of better than 90%. Next, using all the 41 files of the NSL-KDD per sample, we achieved an accuracy of more than 98%.

Security meets AI

The level of security required by today’s complex networks necessitates the ability to detect and manage network threats in real-time. These threats can include intentional disruptions to the network, the theft of data or secrets, and the injection of ransomware for the purposes of extortion.

Behavioral anomaly detection using AI techniques is a key component that will help identify/detect malicious network transactions. The NIST 800-193 standard defines how a root of trust will protect, detect, and recover platforms. Axiado’s TCU™ takes this standard to the next level since runtime protection, detection, and automated recovery will be possible using machine learning.

The AI techniques integrated into the Axiado TCU™ can be used to detect and quarantine malicious user activity. That allows TCU™ to greatly reduce the frequency and severity of attacks by detecting malicious behavior on the network in real time, thereby minimizing its negative impact on the network infrastructure.

Written by Gopi Sirineni, CEO of Axiado Corporation. Published in EDN.