Press Releases Rhebo

News

What is Zero Trust really (and does it work in OT)?

Klaus Mochalski invited cyber security expert Stefan Sebastian from Zscaler to talk about the paradigm shift Zero Trust entails. Stefan Sebastian explains who Zero Trust would have prevented the cyber attack on the Danish energy sector in May 2023 from the onset and how companies can implement this security concept step by step and can get rid of the notoriously problematic VPN connections in OT.

 

 

 

Listen to us:

  

 

Transcript

Klaus Mochalski

Hello and welcome to a new episode of the OT Security Made Simple podcast. I'm Klaus Mochalski, founder of Rhebo. My guest today is a very old acquaintance. We met probably about 15 years ago, if I'm not wrong - Stefan Sebastian. Stefan Sebastian is director of product management at Zscaler. And with this, I'm handing over to you, Stefan, for a quick introduction of yourself.

Stefan Sebastian

Hey there, Klaus. Good to be here on the podcast. Yeah. As you said, my name is Stefan Sebastian. I'm currently with Zscaler as the director of product management. But our time goes back to Rhebo and what we built there. So, yeah, looking forward to connect on a few topics today, particularly around zero trust.

Klaus Mochalski

Yeah, absolutely. Great to have you here. At his time at Rhebo, Stefan was also heading the product and strategy department. So we have worked together quite closely on building the product and building our go-to-market strategy. But today, as you said, we want to talk about zero trust. Especially in OT, it's still kind of a novel concept that not many people are familiar with. And that's why I would like to ask you to give our listeners a quick introduction of what zero trust means on a very high level. We don't want to get too technical here. And also what this means specifically in OT environments.

Stefan Sebastian

Yeah, that can be a long answer or a longer answer, but we'll try to keep it brief. Actually, it reminds me of – in Germany, there was this thing we used to have – the bullshit bingo card. I know that this is probably not appropriate language for all mediums, but it’s that card of these marketing terms. And if you're in a meeting with too many marketing terms, you're allowed to go bingo on the person.

Klaus Mochalski

I remember that one.

Stefan Sebastian

Actually, I think it first started in Germany, if I'm not mistaken, but I think other people kind of know about that. And zero trust is one of those things that is on a lot of these bullshit bingo cards. And that's unfortunate, but it comes from a position where it actually means a lot of things. Because at its most fundamental, it means designing a network and access to resources on that network from a presumption of a breach, which means that: Okay, you've got all these services and you're connected via networking to those services and applications but assume no one of those ends, those users usually are breached. Now, what do you do? When you change the paradigm from the presumption of breach you're looking at everything from that, you know? How do you limit that user or machine or malicious process to as little as possible? How do you reduce the attack surface? How do you make it harder for the attacker or the insider to expand the scope and get more resources to attack? And how do you get more time for discovery and recovery? Those are all factors. But fundamentally – and that’s still kind of at that marketing bingo level – it is about designing the network with this presumption of breach. Yeah. Anyway, that's kind of the high-level view in practical terms. It actually means a lot of different things. And I'm sure we'll cover them as we go. 

Klaus Mochalski

So, summarizing, it sounds to me like it's a bit of a whitelisting approach for access control, where you only allow the minimum set of access that you absolutely need. And not just on a personal level, like who is allowed to access, but also what does this access entail? So what applications are allowed, what targets are allowed through this access channel, but only allow the minimum set. Is this correct?

Stefan Sebastian

Yeah, I think that's a decent high level view. In practical terms, that means that you view everything like… There's a standard on this from NIST. It's the SP 800-27 standard [superceded bySP 800-160]. The basis is you have resources and everything's a resource. And you have users on devices that are trying to access those. And you want those to kind of fit into a policy enforcement point. That means, you only want to connect the users on these devices with specific resources and that presumption, that design model, is first that they are authenticated users via multi-factor authentication. But those users can only connect to the resources, the applications that they need, that they're authorized for. And those applications themselves call out to the policy enforcement point and say, "I'm available".

And of course, the zero trust is that policy enforcement point that says "Klaus can, on this device, connect to this resource using this communication". And we're going to monitor that communication. And every session between Klaus and that resource will be examined for things like any sort of attack by Klaus or Klaus's machine on that resource, any sort of exfiltration of data from that resource. And it really means that you're changing the paradigm that the network implicitly trusts.

Like when you think about a VPN connecting. It’s very common in the OT world where you use VPN to make modifications to PLCs or access historian data or otherwise check your MES or whatever, that implicit trust is connected to the network. And now you have access and it becomes a chase. And this is why the existing security paradigm [of implicit trust] actually doesn't work and costs too much to get the risk level that you need. That you are connecting users as if they're networking points, their home computer, with your corporate network. And now you're chasing, you're saying Klaus can only have access to this application or whatever, and if he goes elsewhere, then I'm going to try to kill the flow. And it really is a very costly way of implementing things that zero trust fixes.

[With Zero Trust,] Klaus can only connect to these series of applications, and I'm going to monitor just those connections. So there's no strangers, there's no applications that are available to the Internet, there's no inbound connection, there's only outbound. And it really changes the cost paradigm. And it's something that is slow probably coming to OT, just the nature of so much trust in that. But yeah, it's one of those things that is really changing the IT landscape significantly and has a lot of legs and a lot of applications in OT.

Klaus Mochalski

Okay, so let's look at OT a bit. I think it's pretty clear how zero trust is intuitively an improvement in IT environments. So you talked about people getting access, trust levels, accessing devices and resources. And now let's look at how this applies to OT environments here. Here suddenly people, personal information are less important because the critical items are devices, systems talking to each other. Maybe being accessed by people, but usually they operate pretty much unattendedly by themselves. So how do you use the zero trust model? Or can you use the zero trust model to provide or to reduce the risk level like you do in IT? Also in OT, where you have mostly these low end devices that operate mostly by themselves?

Stefan Sebastian

Yeah, it probably starts from the top down a bit. And like the top down in the Purdue model it’s that at a certain point, there's a level in OT where it's not applicable today. And you're talking about real time communication between PLCs and controllers, for example. So if you set up that zero trust model, like you brokered all that communication, then you'd add some milliseconds of latency in that communication. What you would do is you walk up that stack a bit and say, “Well, okay, we're not going to change a working factory's paint system”, for example. So that controller will continue to touch the PLCs. But there's a lot of different users that then go into and program the controller that reach into other aspects higher up the Purdue network stack that are applicable to zero trust today. And that is anytime you have an HMI, anytime you have this access to a database, every time you have access to a ladder logic, it's usually from some sort of maintenance device reaching into that device.

And how the zero trust would work in the OT in that case is that you essentially broker that communication. So you have that Windows XP machine and you have Johnny programmers coming to check the status. Well, first Johnny has to authenticate Klaus or Stefan. We have to prove who we are on our mobile devices using multi-factor authentication. And then that device itself has got a fingerprint and we'll say, “Well ok, is that device in a posture that is considered secure enough for my network? Does the XP machine need to be upgraded or whatever? Or is there a certain risk associated with that?”

Klaus Mochalski

So summarizing, it's really a way of securing the perimeter. So you're basically regulating who gets access to a certain protected area and acknowledging that there's not a single parameter, but that there are different shells of parameters that are stacked into each other.

Stefan Sebastian

Yeah, but as long as you're not saying it's securing the perimeter, because it's not. It's actually securing that resource, that shell, that device, that controller, or that historian, and connecting just those users that need to see that historian.

Attacks on OT networks follow the same model of attacks anywhere. I mean, you're basically finding the attack surface. So finding the vulnerable point, you're compromising that point. Then you're moving laterally, like you're finding that the PLC is connected to the Internet, you're compromising that and taking it over. A PLC, the controller, is not much of a compromise, but say some of the other systems, then you're moving laterally onto those systems around it and saying, “What else can I exploit?” And then you're stealing data or exfiltrating data.

And the zero trust model would say, “Well, those resources and those endpoints need to communicate, need to see each other. And they can only see each other if there are these endpoints, if they're authenticated, if they're authorized there.” So it's not securing a perimeter, but it's securing users and devices that need to talk to each other. Resources as well are a factor there. So it's very much not perimeter centric. And I make that distinction because when you think of an inbound firewall rule, for example. Some of the biggest compromises are that these firewalls are VPN concentrators or they allow inbound connections. Well, anybody that can connect to that inbound, just basically the Internet, can exploit that firewall if it's unpatched and we see that quite a bit. And once it does the exploit, then it has access. It gets through that perimeter and now has access everywhere. And that's exactly the wrong model.

Better go and say, “Well, the firewall could be a resource, or the historian or the MES or the controller could be those accesses. And those don't have to be accessed by everyone on the Internet. It has to be accessed by Klaus, it has to be accessed by Stefan and nobody else.” And that means the attack surface is just Klaus and Stefan can attack that controller, that MES, that historian. And that becomes the attack surface, that becomes the vector, and it greatly reduces the attack surface because of it. But that becomes the way that you would conduct your exploit.

Klaus Mochalski

That sounds very powerful. Let's look at this from a very practical perspective. So you already mentioned how OT attacks usually work, and many of them work very similarly. And also, what we have to say about OT attacks is that they are still a very rare event. Quite often there are attacks against the IT infrastructure and then they spill over into the OT part. But really, targeted OT attacks are difficult to come by. 

But we just recently had a great publication by the Danish SektorCERT, which is the cybersecurity center for the critical infrastructure in Denmark. And [the Danish energy sector] suffered from an attack that happened in May 2023. They published a very detailed report on what happened during this. So I just want to use it as an example to see how this attack may or may not have been prevented if they had followed zero trust principles. So what happened here is, well, first of all, it's interesting that they got all the information that they used for the publication by running an extended network of network sensors. So they have deployed about 270 traffic monitoring sensors in the critical infrastructures. So basically at all the utility electricity companies in Denmark. And so they had all the data distributed over the entire critical infrastructure to do a proper analysis. And the attack itself, from what I understood from the report, was a rather simple one. So they were using initially a known vulnerability in a Zyxel firewall, which they could exploit by sending a malformed UDP packet to a certain port, which then later was used to do a download of malicious firmware updates, firmware software, into these firewalls. And then these firewalls became part of a botnet of the Mirai botnet, or of a variant of the Mirai botnet, I believe. And then they mounted attacks against certain targets in the Internet. And then there was a second wave of attack, which looked a bit more targeted and interesting using unknown vulnerabilities, but generally it looked very standard, I would say. [Note: It was actually the other way around. The first wave was targeted, the second was more widespread.] So there was nothing really new about this attack. What's new is that we have this great analysis and so if we look at what zero trust could have done. Could this have been avoided in the first place? Or how does zero trust fit in the picture here?

Stefan Sebastian

Yeah, you're right. The attack looked very standard here. I mean, they found this attack surface. There were 16 or so power stations or whatever that they specifically targeted [Note: initially it were 12]. They weren't on Shodan. I think you pointed that out. So there might have been some insider information there. But then, of course, they went and compromised those Zyxel firewalls. I mean, very low end firewalls, but it doesn't really matter if they're low end or high end. They found the firewalls and compromised them. And you would think that the next step would have been a lateral move. So now they basically own the firewall, and they would move laterally. I think they got some usernames out of there, and maybe those can be used to maybe search LDAP directories and find other targets that way. But basically, you want to move laterally and then move on a target, because the firewall, you wouldn't think, is the primary target. Owning a firewall and then using it in a botnet doesn't seem like a good use of resources, especially if it's this low end Zyxel. It doesn't have a lot of processing power.

But the way that zero trust would look at this, I mean, that UDP packet, that UDP was basically the IKE [Internet Key Exchange] SA [Security Association] Handshake and the negotiating that goes there. So presumably, it's something like a VPN communication inbound. I mean, they have to allow this because the VPN communication can come in at any point. And that's probably the first spot where zero trust helps. That is that the whole world doesn't get access to your firewall. And then you challenge them. At that point, the endpoints that you know and recognize have to get challenged that they are who they say they are, and only at that point. So once they've authenticated, once they've provided multi-factors of identification, only then can they reach your resource. In this case, it's a firewall, but it could be your historian or your controller or whatever.

Klaus Mochalski

Just to be clear here. It all started with this UDP packet, which was sent through to the firewalls. And this would not have been possible if a proper zero trust model would have been implemented. Is this correct?

Stefan Sebastian

Well, it would have only been possible without any other changes. This first step would have to be that the whole Internet can't reach my firewall. Only Klaus and Stefan and maybe the 200 other people that have access to reaching in and doing whatever in my critical infrastructure. So the first change is that I identify these people. Those people, I know who they are, who they say they are because they've had this multi-factor authentication and they're on devices that I trust, and those devices have a posture that's acceptable to me.

You know they're not XP devices and they haven't been compromised. Klaus or Stefan might not know their XP machine is compromised. Well, XP has a higher probability of compromise than, say, the more recent versions of Windows or Mac or. So identifying the endpoint and knowing what your risk is at the endpoint is your first step. And that only means that 200 plus Klaus and Stefan, 202 people, can access that firewall without doing anything else. Those people, yes, can maybe craft that UDP packet to exploit the firewall because they're the only ones that can see [this firewall]. And so the next step of zero trust as you go through the model, is what communication is possible and how do you validate that communication. Of course, the other steps of a proper zero trust system now validates what can be exchanged, in that it validates the content of that communication, validates the data that's exchanged, what's going in, what's going outbound. 

That's really how the zero trust model would have addressed this. In a true zero trust, it wouldn't be a perimeter firewall. Which is the key thing, is that the Danish admins are using the firewall to log into. So whatever they're trying to communicate with would be an application that would advertise itself and would go through something like a firewall in the sense of, you know, that service advertises itself to that location and that location is then connected to by authenticated users on authenticated devices, and then that communication is then brokered.

Can these two endpoints even see each other? Are these endpoints who I think they are? Okay, they are.

Well, can they communicate this way with each other? Now I'm going to examine every transaction, every flow, every transaction, every session that passes that, and make sure that they're doing what's expected to be doing at the time and day. And only between these two points that I expected.

And you've removed the whole entirety of the Internet population problem. You've eliminated your attack surface from billions and billions to 202 people in this case. And that is kind of the first thing. Yes, the firewall would disappear, but there's still work to do. But zero trust changes that paradigm and says, no more implicit trust. I mean, connecting to a device and then having that device or application or service to try to prevent an attack, simply doesn't work. It doesn't work with the amount of money that you have to spend to get security risk in some form of control.

Klaus Mochalski

So it's great to hear that we finally can get rid of the firewall because we don't need it anymore, because there's nothing to block, because we only allow the communication and the users that we know need access to a certain resource and system. And so there's no room for a firewall anymore. So that's great.

Stefan Sebastian

I guess it takes a different shape. Most of your effort then is brokering communications and investigating the content of those communications. So you want to do things like proxy connections from the right side to the left side, so you can fully inspect files. 

For example, sandboxing has a role there where if you're trying to send a file to that endpoint or extract a file, then you want to make sure that there's no malware in that. You want to make sure that you can examine the content of what's being extracted so there's not personal identifiable information that's being extracted. That's the move on target aspect. And it’s what's missing from this [Danish energy] sector attack. It’s what their ultimate goal was. If their ultimate goal was to own a couple of Zyxel firewalls, they could have done that much cheaper. It was 12 locations with backup. So it's 32 Zyxel firewalls there, $400 a piece. I mean they could have owned those for far cheaper. I'm not sure we have the whole story because they usually move on that firewall so that they can access something else. But just assume it was moving to the firewall itself.

Yeah, you can control that. And in a zero trust world, only those that you want to give access to or that want to be challenged will have access to those firewalls.

Klaus Mochalski

Okay, so in a proper zero trust model, this initial UDP packet which came from somewhere in the Internet would not have been accepted by the firewall. It would only have been accepted by any of the authenticated users. And so any attacker still wanting to attack the infrastructure would have to go through these. And they are of course also part presumably of the zero trust infrastructure. So they would run into the same problem. So this begin of the attack, this initial UDP transfer would already not have been possible. And so I think this is very important for the understanding how zero trust can help. 

Stefan Sebastian

It is. Another point is that zero trust is a huge paradigm shift. But you mentioned that these sensors in the network are still important. I mean there's still a role for detecting anomaly because what zero trust does, it eliminates horizontal traffic. So all traffic is between the resource and the user. It's not between resources themselves or users themselves. So that vector is also gone. And if there is communication, you want an anomaly detection system to be able to say that's unusual communication between those two devices.

That actually ties into the Rhebo message. The Rhebo controller would monitor any sort of anomalous communication, but also anomalous connections to certain locations that haven't had that connection prior. And that's an important thing because in a zero trust model there's no horizontal, there's no lateral movement anymore, there's just an authenticated user reaching to a resource and that resource is reaching out. And these communications are brokered. So there's no left and right, no east west, no lateral movement. There's only outbound connections and anything else can be considered anomalous.

Klaus Mochalski

Okay. It's of course good to hear that there's still room for OT monitoring solutions like the ones Rhebo builds, but that shouldn't be the point here. So this I think provides a very interesting perspective on how zero trust can help in OT infrastructures.

What advice would you give a company? And let's use as an example one of the utilities that were target of the attack. So the Danish utility market is spread across a few small to medium sized players. So they probably don't have big CERTs [Computer Emergency Response Teams]. That's also why they use the service of SektorCERT. So a medium sized utility, if they wanted to start their own zero trust journey, where should they start? What would be the first useful step? Like one, two, three steps that you would advise them to take?

Stefan Sebastian

Yeah, that's a great question. I mean, probably the first and maybe even second steps are basically to understand how the paradigm has shifted. So understand what zero trust means. And hopefully I've given something of an introduction here. Maybe it was too complicated, but there is an egghead version of this and we've gone through it here.

So understand what zero trust actually means and start to develop a strategy around zero trust. Now that you know what it means, what does it start to mean to your organization? And those are probably the first and second kind of stepwise approaches. But then you're left with something actionable and that is, “Well, what am I going to move on first?“ 

And this is the great part about zero trust or the way that zero trust is implemented by folks like my company I don't want to necessarily talk about. But the way Zscaler implements it allows you to implement this in a stepwise function. So you can just choose, for example, in this case, from what I know about the breach or the general attack and how the breach actually happened. Probably you want to start doing things like how can I replace the problem of VPNs, of networks connecting to networks and chasing that problem and always rushing through the patch cycle.

And the second wave of attacks in the [Danish energy] sector was this unknown vulnerability. The first wave was known vulnerabilities. So they should have patched well, there's a bunch of patches that they couldn't have applied because they didn't know it well. How do you then eliminate the attack surface? Probably the best place to start in this case would be how do I eliminate VPNs as a way of accessing certain things?

And then they'll look back at the strategy, the first two steps and say, “Well, there's a certain class of administrators that need to connect remotely to access this”, and start mapping those applications, those resources that need to be accessed by what users need to access those, and start by bringing those under a zero trust umbrella. And that will be a small step, but that will be a huge step where you don't get breaches like [the one from the SektorCERT case]. Where you don't get somebody in Russia or the Middle east who can connect to your firewall. Is there any use case that makes that sensible and that then goes away from consideration? It's only these authenticated, these contractors that I like or these employees that I trust can connect to these devices.

So this stepwise implementation of zero trust, particularly around VPNs in this case, would probably be the recommended path on how I would start the first steps in a zero trust journey. And it doesn't mean throwing out those Zyxel [firewalls]. It just means that it looks very different, that those Zyxels no longer accept inbound communication and the internal devices use the Zyxels to go outbound to a zero trust cloud. And everybody gets brokered through that cloud. Everybody's outbound and there's no horizontal traffic.

And then they could do other steps, start saying: Okay, well, there's no horizontal movement. But there are going to be some cases of firewalls that you want firewalling or secure outbound to the Internet, connections that those contractors or those employees may want to do from home, so you need some sort of policies you want to apply to those. But the first step is maybe to secure that communication, that eliminates those VPNs so they don't have this problem repeatedly. [So the attackers] don't have the billion different possibilities for attacks that can reach in and compromise their devices.

Klaus Mochalski

I think this is indeed very interesting and very applicable advice for our OT listeners. So, summarizing, if I wanted to do this in, say, in three steps, the first step is familiarize yourself with the concept. Then the second step is to start looking at your own VPN infrastructure and use this as a starting point. And then in the third step go the rest of the let's call it zero trust journey by looking at all the other areas.

But I think VPN is a particularly good starting point because also from our engagement with customers, we've always seen that VPNs are a significant concern because everybody started to use them also in critical areas of the infrastructure years ago, and everyone has the problem of securing them. So that's a very good starting point and I think as a takeaway to our listeners today, it's just the perfect advice.

Of course, always familiarize yourself with the concept. It's not too difficult as you try to explain. And then look at your own VPN access as an example and use this as a laboratory environment to start your own zero trust journey.

Stefan Sebastian

Yeah, absolutely. And of course, zero trust only makes sense if it's going to save you money to get the risk level that you want. So when you do that VPN replacement, challenge your vendor and say, "Is this going to be cheaper than this VPN concentration stuff, considering what the cost of this breach was?" And so it should be simpler and easier in whatever approach you take. The VPN replacement needs to overcome that bar. Like, it shouldn't just be another tool that doesn't get deployed or only half deployed. Challenge it to be as good or better than what you have and the risk level to be much [lower] than what you're mitigating today.

Klaus Mochalski

I think that's a great closing remark, because we always strive to reduce risk levels by doing this and at the same time saving money or reducing cost. It's always a great way to approach it.

Stefan Sebastian

Absolutely.

Klaus Mochalski

Thank you very much, Stefan, for this interesting discussion. It was a pleasure having you on this podcast and maybe see you next time.

Stefan Sebastian

Yeah, looking forward to it. Thanks.