In OT, process stability is the most valued factor. For this reason, anomalies and risks that jeopardize the availability of systems or the entire infrastructure are of particular importance in day-to-day operations. The same applies as for security risks: there is usually a lack of visibility in OT to detect these technical error states and misconfigurations at an early stage – i.e. before a malfunction occurs.
The lack of visibility often also means that locating the cause of a network error becomes a search for the proverbial needle in a haystack.
The five most common availability risks in OT in 2023
Anomalies indicating network overload were found in all OT networks. While in IT this may only lead to amusingly frozen faces during video calls, in OT it can jeopardize real-time communication and thus system availability and occupational safety.
The other four anomalies from the top 5 availability risks can also have similar effects.
For this reason, it is always worth keeping an eye on the network quality aspect when monitoring OT.
Close-up view: unconfigured or misconfigured switch topology
Unconfigured or misconfigured switch topology is a problem that often has arisen because IT expertise in network configuration has not yet been applied in OT and switches were integrated as mere distributors. The perceived competition between IT managers and OT operators often plays a role in this.
If the Spanning Tree Protocol is not configured, the switch with the smallest MAC address becomes the so-called root bridge by default. This can be a switch that is far away from the other switches and may be weakly integrated or even prone to errors.
An incorrectly configured spanning tree leads to the entire network coming to a complete standstill for 30 seconds if a switch fails and no more data can be transmitted. This in turn can lead to system failures and malfunctions.
The priorities of the switches should therefore always reflect the actual topology of the network. This means that the main switch, which channels all connections centrally, should also be the root bridge.
Close-up view: classic STP
STP (Spanning Tree Protocol) is an important protocol that ensures the absence of loops in the network. In OT, it is generally used in the networks behind the ring.
However, the original version of STP can still be found in many OT networks. The problem with classic STP is that when changes are made to the tree (spanning tree), the switches need up to 30 seconds before they can forward traffic again. In other words, a change such as a dropped connection or a failed switch can result in a complete network outage of up to 30 seconds.
Even in networks in which the much faster successors Rapid Spanning Tree Protocol (RSTP) or Multiple Spanning Tree Protocol (MSTP) are already partially used, a single device with classic STP poses a risk: both successors have a compatibility mode in which they fall back to classic STP on the port as soon as one bridge speaks classic STP. This means that the network section is now degraded to STP and will remain so even if the STP bridge has been switched. In order to change this, manual intervention is required to restart the identification.
Visibility, intrusion detection and system availability go hand in hand
As the outlined results from the 2023 vulnerability assessments show, security risks and optimization potential for operations can be found in all OT networks. However, both issues can only be meaningfully addressed in a company if both the security teams and the operators have visibility into the processes of their OT infrastructure. A vulnerability assessment and risk analysis is the first step that can be carried out quickly and with minimal effort in order to rid the OT networks of the vulnerabilities already present in the infrastructure or at least make them visible for the purpose of finding a solution. As a rule, this process takes a maximum of 60 days for our customers.
Continuity is needed afterwards. On the one hand, this is important in order to detect attacks at an early stage during operation. On the other hand, it can be used to monitor security vulnerabilities that cannot be remedied for operational or technical reasons. To this end, the OT monitoring with anomaly detection used in the vulnerability assessment can be switched to continuous operation at the click of a mouse in order to be informed of risky changes in OT communication in real time.
Rhebo OT Security supports companies in all these steps – from the initial vulnerability assessment to the operation of the network intrusion detection system. In addition, optional service level agreements make it possible to bridge the OT security skills gap that still plagues most companies. At the same time, the company's own employees can gain expertise in this new field, so that in the long term, external support is no longer required to operate the intrusion detection system.