How OT monitoring enables and secures TSN on a large scale

In this episode of OT Security Made Simple, our guest Moritz Flüchter from the University of Tübingen explains how OT monitoring can also be used to enable Time Sensitive Networking (TSN) in networks in which some of the systems and end devices are not TSN-capable. Last but not least, he shows how integrated anomaly detection monitors the existing uncertainties of TSN and detects denial-of-service attacks.

Duration:

25 min

Guest in this episode:

Moritz Flüchter

PhD student at the University of Tübingen at the Chair of Communication Networks

Transcript

Klaus Mochalski

Hello and welcome to a new episode of OT Security Made Simple. I'm Klaus Mochalski, founder of Rhebo. My guest today is Moritz Flüchter. Moritz is a PhD student at the German University of Tübingen at the Department for Communication Networks. Rhebo and the department, represented by Moritz, have been working on a research project in the field of industrial networking for some time. That's what we're going to talk about today.

Although we won't directly talk about OT cybersecurity today, we will hopefully provide an exciting outlook on security as well. Before we do that, Moritz, why don't you take a moment to tell us in your own words what we did together in this research project, what it was about and perhaps also what the results were.

Moritz Flüchter

Hi, thanks for having me. Exactly, we worked together with Rhebo. We thought and brainstormed a bit about what we could do in such a cooperation and the idea came up relatively quickly to see how we could expand the use of network monitoring that Rhebo provides in the context of TSN [Time Sensitive Networking].

In other words, [through Rhebo] we have a device that looks at the communication in the network and we thought that this could be used to ensure the transmission quality of devices in the network that do not support TSN, i.e. this technology per se. So, if we're talking about real-time communication or quality service requirements in general, applications that don't support TSN can still benefit from the advantages of TSN.

Klaus Mochalski

TSN. Good keyword. Why don't you briefly explain to our listeners what TSN is?

Moritz Flüchter

The origin of TSN comes from industry, from the production levels. Up to now, there have been many different machines of course, different manufacturers, and with Industry 4.0 and network convergence, the aim is to pull it all together into one large network where the machines can talk to each other and you don't have such encapsulated systems. And the problem, as you probably know, is protocols such as Profinet or EtherCAT. These work at the application level and can provide transmission security for software time transmissions. But the problem is that they don't work well together because they can't communicate with each other and don't have an interface.

TSN is an approach to tackle this issue further down in the network stack, i.e. at the level of the switches that forward the actual data. The aim there is to create a common platform based on Ethernet, so to speak, via which everyone can communicate with each other and also ensure that the transmissions fulfill the requirements - such as maximum time, maximum delay from sender to receiver or something like that - to unite these protocols with each other a little bit.

Basically, the TSN network has an admission control mechanism. The transmitters and receivers announce their transmission to the network before they even start sending. The network can then reserve the resources it needs, for example bandwidth, or guarantee a certain latency. And the network then configures itself [accordingly] and reports back to the sender or receiver: ‘Okay, this is working now’ or ‘This is not working.’ And the great thing about it, which is also a huge advantage, is that you can do these real-time streams, but also best effort, so any other traffic can also be sent over the same network.

Klaus Mochalski

I think that's quite understandable to start with. So it's about the quality of service, as it's also called. This is a fairly old topic on the Internet, I would say, and one that has actually been considered more or less solved for a long time. What is being done differently here in the industrial sector? Of course, we often talk about the requirements for real-time communication. Especially when we talk about tightly synchronized communication between control systems and, for example, production robots in the automotive industry, I think everyone understands that we have real-time control processes at the communication level here. What is being done differently here compared to the quality of service that has existed on the Internet for some time? What are the special challenges in the industrial sector?

Moritz Flüchter

You've already mentioned it a bit. Real-time communication, in this case even with TSN, has even tougher real-time requirements than in other areas when we go into industry. With TSN, for example, some people are considering using it in airplanes or a vehicle system.

The timing requirements [are stricter]. A packet is sent and must arrive with an expected maximum latency. Let's assume you have a controller in the network and it controls a robot arm, for example. The [time requirements] are much stricter than you currently have on the internet.

The Ethernet itself neither supports the timely arrival of packets nor the arrival of packets at all. It's [rather] this best effort that people often talk about. And if you look right at the top at the end user, you can say that we can guarantee [via the TCP protocol] that the packet will arrive at all. For example, you say: ‘OK, a packet hasn't arrived, report it to the sender’, and the sender sends it again. But this high up [this notification and correction] is problematic [in industrial environments]. Because if you can only respond at the end user [e.g. a robot arm] saying: ‘OK, it didn't arrive on time after all’, that's a bit too late. That's why [in industrial environments] you have to intervene much further in advance, namely at the actual forwarding level. And in a TSN network like this, every switch, every node that forwards packets and data, has special functions and configurable settings to provide a really good real-time guarantee. This goes down to the microsecond range, depending on what you need.

Klaus Mochalski

In other words, the aim is to guarantee the real-time delivery of individual data packets for time-critical processes and also at a much lower time level than is currently the case with Internet communications, such as those we are currently using. And you also said that if we take a look at the layer model - the OSI or the TCP/IP layer stack - it's about layers 1 and 2 that are being replaced. Or which are to be upgraded with an additional function to provide this added value to the higher layers, for example protocols such as Profinet and EtherCAT. Is that correct?

Moritz Flüchter

Exactly. TSN refers to layer 2, i.e. to the Ethernet switches. This is also where the standards are located. It extends the switching standard for Ethernet, so to speak.

Klaus Mochalski

What are the challenges here? I'm thinking specifically of how precise or how comprehensive the support of individual components and also the implementation by different vendors must be in order to be able to use this function? Do they all have to play along? Do I need new components? Do I have to install software updates? What does the picture look like right now? That sounds to me like a potentially very heterogeneous support landscape.

Moritz Flüchter

The important thing about the TSN is that it is a collection of extensions. It's like a toolbox. You have options for limiting bandwidths, scheduling packets, i.e. making a plan for what is forwarded where and when. It's very flexible, depending on what you need. It is probably possible to expand this for the simpler things, i.e. for less demanding real-time requirements. In other words, not all devices have to support it. But if you really want to benefit fully from the advantages of TSN, then every switch through which this TSN traffic is routed must support it.

Klaus Mochalski

The switches must support this in any case. What about the end devices? For example, my controller, my Profinet controller, which then also controls the end devices directly.

Moritz Flüchter

The important thing is that TSN does not somehow introduce a new header or new types of kit, but it all works with existing Ethernet frames. The important or most important part of this is the so-called signaling. This is the process where the transmitter or controller tells the network: ‘I now need these and these requirements for a new transmission’. And the end devices must then also support this, they must be configurable. For example, TSN frames are identified via the WiFi tag and the transmitters must use a specific value. So you need a little support for this on the end devices. Someone has to tell the network: ‘Hey, these and these requirements are needed now’.

Klaus Mochalski

Yes, and this is where I think we come to the point where [OT monitoring like that from] Rhebo comes into play [in your research]. Our solution is, in the broadest sense, a monitoring solution that is often used by customers to detect security-related incidents. However, many of our customers also use it for general network monitoring. Where does [OT network monitoring] come into play with all the TSN support?

Moritz Flüchter

The basic issue behind the whole construct is that applications that do not support TSN lack the ability to signal, to announce the transmission in the network. And assuming [it] could somehow be signaled, then users could benefit from TSN after all. In other words, not much more is needed.

So our idea was that the Rhebo Industrial Protector could sit in the network, monitor the communication and record the data. And as soon as it recognises a transmission that is not TSN-capable and cannot yet benefit from it, it also recognises what kind of protocol it is and derives the parameters that are required for TSN and then communicates the requirements to the network on behalf of this end device, which does not support TSN itself.

Klaus Mochalski

This means it's all about the end devices that cannot provide TSN support and cannot signal to the network themselves what resources they need. The OT monitoring solution then takes over on their behalf. In other words, it monitors the communication, then learns what type of system it is, then probably has some form of communication profile stored, which is then signaled accordingly, as if the [OT monitoring] were the end device itself.

Moritz Flüchter

Exactly. Yes. It's completely transparent for devices that are connected to the network and communicate just like that. Let's assume there are two computers connected and they are doing a voice-over-IP broadcast, a voice transmission. There are no changes for them, except that their connection works better without you [the user] having to adjust anything in the network yourself. The network is then configured by the Rhebo controller, which does the staking. This means that the connected end devices [and therefore the users] experience better transmission.

Klaus Mochalski

To me, it sounds like a perfect example of how security solutions that are purchased for a completely different purpose can also bring additional benefits. These are things that we also observe with many of our customers. Our solution is often installed to fulfill risk reduction requirements. Often because an information security management system has been introduced in critical infrastructures, for example, and it has been determined that they must be able to detect incidents in a timely manner, ideally in real time.

However, these incidents only occur relatively rarely in well-secured systems and then many customers ask themselves: ‘What else can I do with the monitoring system?’ And then they realize that the system monitors the communication all the time and simply notices a lot of things that [happen in the network and go wrong]. In other words, many people use it to monitor technical errors, to detect outages and overload situations.

In other words, this is another application scenario where you can use a system like this in a mixed infrastructure - which I believe many customers will have for a long time to come - to produce real added value and without having to install anything new. I also find the concept itself very interesting, as you just mentioned, because there is this device that collects, presents and visualizes an incredible amount of information to show people what is actually happening on the network.

Moritz Flüchter

I also worked with the Rhebo Industrial Protector and looked: Let's do bigger traffic now. What's the situation with the devices? How are they distributed? Who is communicating with what? I think it's also quite good to get an overview like this and then just do it: If this data is already available, the idea is to incorporate more functions. What can be expanded? Because it's already being collected. Why not use that too?

Klaus Mochalski

That's actually what we often hear from customers, which is the initial benefit of such a system. That you first learn what your own infrastructure actually looks like, at a level of detail that you don't usually have. In other words, starting with: Which systems are there? But also: Which protocols are spoken? How often? What data volumes are we talking about, what are the packet rates?

These are often things that are largely unknown to most operators of such infrastructures. And of course they are very, very important. Not only to ensure security, but also the normal operation and stability of this infrastructure. What interests me, of course, is how well did the whole thing work? Is this parallel signaling, which we provided via detection through OT monitoring, just as good as direct support from the devices? Is it an adequate replacement or is it a migration journey? How would you see it?

Moritz Flüchter

It depends very much on the case. What is important is that when we talk about TSN and this real hard real time, we really want to have extreme real-time requirements in the millisecond to microsecond range. This is actually needed so that the end devices are also time-synchronized. This is because the so-called time-aware shaper is used to plan which packets are to be forwarded at which time and to which switch on each device. You can then really guarantee incredibly low latencies - or rather not incredibly low latencies, but a small variation in the latencies. And for this, the end devices must have time synchronization. This is not the case here, but there are also very few applications that have these really blatant real-time requirements that we are focussing on now.

I like to take this voice transmission, Voice over IP, as an example. It doesn't have the requirements of a robot control device. So we can't do that with it. But everything else that we have identified with the robot controller uses the normal interface to register this with the network. We have developed methods to extract these parameters, for example a description of the transmission rate and form, from the monitoring. It's all very precise. So the integration is excellent.

Of course, that's always a bit of a problem. We do this based on observations. Let's assume that an end device changes its behavior because it has been moved or reconfigured, for example. But this monitoring concept is also an important part of this. You have to keep observing: Is the device still doing what we think it is? But of course, if you add TSN capabilities or TSN support to the device and also configure everything manually, you are always a bit more accurate than what we recognise from the network - from the observations, the monitoring.

Klaus Mochalski

But for existing infrastructures, I hear that this is a perfectly adequate replacement in many situations for many requirements. As an operator, I can certainly save a lot of money if I don't have to upgrade all my devices and replace them with new versions, but can continue to operate my existing infrastructure in this way - until perhaps really critical parts of the infrastructure have to be replaced because I absolutely need precisely these real-time requirements that you described, which require this time synchronization of the end devices.

Now, our podcast is called OT Security Made Simple. That's why, once I have you on the show, as a TSN expert or general expert in industrial networking, I have to ask you: from your point of view, what are the security-related challenges and problems that can be found in such networks? Perhaps you have also observed things there as part of the project. So what do you have to look out for from a security perspective when operating such an infrastructure?

Moritz Flüchter

So the fact that the infrastructures or networks are configured in such a way as to ensure the properties of the streams and also have this model with which a transmission is registered is - from our point of view - initially a major problem for security. There is now also a publication on this from a colleague at the department. The point is that if such a device announces its requirements, then it is also possible that these announcements have been changed. For example, an attacker could go and say: ‘OK, this little device now needs 100 megabytes of bandwidth’, to put it bluntly.

Klaus Mochalski

You mean, I could launch a denial-of-service attack against the network if I can utilize this functionality.

Moritz Flüchter

You could somehow smuggle in packages. The content of the colleague's publication was to look at one of these protocols, which is called the Resource Allocation Protocol, or RAP for short. The protocol is designed for distributed networks. It propagates these requirements through the network so that every switch can adjust. The protocol itself is not secure. Messages can be changed, imported and used to carry out denial-of-service attacks in the network by over-reserving resources or changing priorities.

If [such a protocol] is not secured, it is very problematic at first. You first have to be in the network as an attacker to be able to do this at all. And there is already a basis in the standards on which such security can actually be provided. So a standard for the industry for TSN. There is already the concept of so-called device IDs, or device identifiers, which uses certificates, a certificate hierarchy. This is similar to how web browsers and web servers work to verify the identity of devices. So when you install a new machine, you can check that it is actually the right machine or from the right vendor. That nothing has been changed on it. And these certificates can be used to secure communication, for example.

Klaus Mochalski

In other words, it goes in the same direction. It's not a TSN-specific problem. There is a huge trend towards zero trust, including in the OT sector, where people are saying that all devices and all communication should be authenticated. In other words, it's going in a similar direction, as I understand it.

Moritz Flüchter

Exactly. So, as I said, for the attack you have to be on site or get some kind of access to the network, but then you can do a great deal of nonsense. And that's why the idea is that for really comprehensive security, you have to secure this signaling, the communication between the controller and the device that configures the network. Otherwise you'll run into a lot of problems as soon as someone gets to it.

Klaus Mochalski

You mentioned that there are already initial efforts to incorporate this into the protocol. Perhaps this has already been standardized. Is that the ultimate solution? Of course, I can't help thinking: couldn't such security-relevant manipulations, for example in signaling communication, be directly detected with the monitoring system? That you say, these are the communication profiles that are permitted in my infrastructure. And if I see [with the monitoring in my network] requirements [from a device] that I haven't seen in the past or that deviate greatly from previous requirements, then I can raise the alarm in some way. That's the idea of anomaly detection. Is that a possible solution?

Moritz Flüchter

Yes, as long as you have access to this reservation data, absolutely, I think so. TSN also has some internal safeguarding methods with which you can say, for example: If a device transmits at a higher rate than it has actually announced, then the traffic is dropped or given a lower priority. This is to guarantee that the network does not suffer as a result.

There are also various approaches in the direction that, if this has been signaled to the network, it can be secured. But also what you just said, that would also be an idea for the Rhebo Controller, that it sits in the network and takes a look: Okay, an application is now signaling its transmission and it asks for something strange or says that it would like real-time requests, even though it is only a data transfer. That can make you prick up your ears.

Klaus Mochalski

Yes, exactly. The Rhebo Controller also monitors the communication profiles of certain protocols to ensure that they follow a regular behavior. And that could certainly be extended to this case. In any case, it's exciting that we also have a security perspective or a security challenge here, which of course you always have to keep an eye on. Perhaps I really need to talk to a colleague about this again. Maybe that would be content for another episode.

I was very pleased to have you here today, Moritz. It was a very exciting insight, even though we were talking about a topic other than security. But I think it's something that will have a major impact on industrial networks in particular over the next few years, which is why I think it's also a very exciting topic for our listeners. And security also plays an important role here. Thank you very much for being here. I enjoyed talking to you about it.

Moritz Flüchter

Thank you for having me.