How to deal with the complexity of the OT security challenge?

This episode welcomes Alexander Harsch, the Head of the Cybersecurity Consulting Practice of E.ON's grid companies that runs the Cyberrange-e training center. We reflect on the change in OT security awareness over the last 4 years, why AI is not the general key to success, how to reduce the complexity of the OT security challenge and what electrical companies perceive as the biggest cyber threats today.

Duration:

19 min

Guest in this episode:

Alexander Harsch

Head of the Cybersecurity Consulting Practice of E.ON's grid companies

Transcript

Klaus Mochalski

Hello and welcome to a new episode of the OT Security Made Simple podcast. I'm sitting here today with Alexander Harsch of Cyberrange-e.

Alexander Harsch

Hi, Klaus.

Klaus Mochalski

We are sitting at E-World in Essen, which is one of the main trade shows here in Germany for the utility and energy community. Before we start and discuss the topic of OT Security Made Simple, a few words of introduction from you, Alex.

Alexander Harsch

Thank you very much, Klaus. I'm Alexander Harsch. I work for E.ON [a multinational energy utility company headquartered in Essen, Germany]. I am the Head of the Cybersecurity Consulting Practice of E.ON's grid companies. Cybersecurity in the grid is my top topic. I'm also accountable for the Cyberrange-e, which is a training center for cybersecurity at E.ON.

Klaus Mochalski

This is also how we got to know each other. Rhebo is one of the vendors providing their solution [for training] at the Cyberrange-e. Can you tell our listeners a little bit more about what Cyberrange-e is and what you're doing there?

Alexander Harsch

With the IT Security law in Germany, all the grid operators had to become more secure, had to have a higher degree or high maturity in cybersecurity. They did a lot of efforts both in preventive security and reactive security. But we kind of observed that they were more successful with preventive security, risk management, patch management, change management. All these types of things went really well.

Yet the reactive part – analyzing log files, noticing stuff that is not good, that is not normal state – was not at any mature level, to put it this way. People were browsing the event logs without actually looking so they never detected anything and they wouldn't know how to react. That was the point where we said we want to improve their doing. We thought long about how you can actually accomplish that.

Doing it in practice we thought is the right thing to do. The Cyberrange-e is an incident response simulation platform where people can come, work with the systems, work within the day-to-day operational life, and where they get attacked and have to react as they would in the real world.

Klaus Mochalski

Okay. It's a cyber training ground where you have the typical red team / blue team setup, and where you exercise real attacks and how to counter them.

Alexander Harsch

That's exactly it. Red team, blue team activity.

Klaus Mochalski

You described some of the challenges that you saw initially. How would you describe the awareness of the groups of people coming in to take the training? Do you think that they understand the problem? Is the awareness high? Has it been growing over the past years?

Alexander Harsch

There's definitely a lot of movement in there. I remember back in 2019, so basically four years ago, people didn't understand why we're taking all that effort. They were considering the squirrel or the bulldozer as the biggest threat to the grid. Today we do not have that at all. Everybody is talking about cyber threats.

Awareness is still a very important part of it, but rather on how could it actually happen. Everybody knows an engineering workstation could be a potential attack vector, but they do not know how. Awareness today is basically the same as for the IT world: Don't click the link. Today, this is what people can do at the engineering workstation.

Klaus Mochalski

That's certainly a good development that we're seeing here. In preparation of this podcast, we spoke a little bit about the complexity of the task. The topic of this podcast is OT security made simple, which sounds simple, but we both know it's not.

Can you, with your Cyberrange-e operations, contribute to making OT security easier? Because, well, we all know the paradigm that if you understand a complex problem, it tends to become more complex and not easier. How is your observation here?c

Alexander Harsch

Well, you're right. Most people who come and see and face the situation for the first time, say there's way more complexity in it than I expected. But then, of course, once you did it, you understand exactly what happened and what worked well and what did not. You can do preparation in your reporting tools. You can maybe choose a response tool, a documentation tool that better suits your needs. In that moment, of course, it becomes simpler, easier to handle, more familiar. I would say complexity goes up, but you can reduce it back to a level you can handle.

Klaus Mochalski

What's the secret to reducing complexity? Is it routine? Is it proper workflows? Is it the right tool? Is it a mix of all of this? If you had to sort this by priority, what would be your key advice to make this whole problem seem less complex to the responsible people?

Alexander Harsch

Well, surprisingly, today I would say is getting people together and letting them create their network. That's what facilitates things. Who will I be talking to [in case of an incident]? Who will be approaching me? What type of question will they ask? What do they want to know? What are the tools? What are the interfaces that they will get to? Really bringing people together, making them work together, solving a task together, that really helps them in that sense, and that's what really works.

Klaus Mochalski

Sometimes the people's network is more important than the communications network.

Alexander Harsch

Right. People are a big part. But then, of course, you have to repeat yourself. You have to do it again and again and again and learn and work the tools. That's the second part.

Klaus Mochalski

That's something that we certainly observe in our service engagement with customers that establishing routine is something that really helps. If you treat a cyber incident like a one-off event, of course, it's a big challenge. It's always a firefight. But if you practice it and it doesn't catch you by surprise and you know exactly what to do, who to call, which buttons to push, this makes it much less exciting but also much more efficient. That's only something that we should strive for.

If you look at some recent training sessions, what were some typical exercises that you ran your team through? What's a typical, let's say, cyber attack exercise that you usually train and exercise?

Alexander Harsch

Well, I wouldn't say typical because everybody has a different idea on what it would look like.

Klaus Mochalski

Does it mean that the teams bring in their own ideas? There's not a set curriculum?

Alexander Harsch

We typically do not start with “what's your idea?”. We would rather ask them what scares them most. A couple of years ago people said the stuff that happened in Ukraine – somebody would break into your network with spear phishing, then move laterally through your organization. But it changed over the last few years.

Today, having already spoken about the engineering workstation, people are going into the fields with the engineering workstation. They have them in the car, they go to a restaurant, they have them with them. It's directly attached to the OT, so people are worried about that. What if somebody breaks into my substation [via the remote workstation]? That's a threat vector. What if my service operators or my third parties are compromised in a way? It changes. Everybody has a big threat in the back of his head which they tell us. And we typically have a scenario that exploits exactly this type of risk and then that's what we do.

Klaus Mochalski

Probably also ransomware because that's something that we hear a lot, which we didn't see as a typical OT attack. They are certainly not targeted against OT, but right now from a quantitative perspective, they seem to be the most relevant attack. How do you handle them? Do you also train for ransomware attacks?

Alexander Harsch

Yeah, we can practice ransomware attacks. I perfectly agree with what you say. I never considered ransomware to be a threat for OT because, well, it's a bit of a risk. If you only want to have money and you take a critical infrastructure operator, it might not be the best idea for an attacker. On the other hand, I talked to some person from the authorities in the US and they said: We expect this to happen because if you encrypt something in the OT, you will definitely get a lot of money. Different thoughts on that. We did that once.

But for me, I would rather look at the worst scenario that could happen as somebody turns off the lights. Of course, you could encrypt stuff and that could lead to the situation that later on you do not have certain files that you need and maybe it will have a service interruption. But there will be a way bigger impact if an attacker sends commands into the field, open circuit breakers etc. That's an impact. That is basically what could actually happen. Many communication lines, the landlines, they're not encrypted, the protocols per se are not robust in terms of cybersecurity.

Klaus Mochalski

You're really trading for a, let's call it, a real OT incident, where a human attacker is using knowledge about the devices that are typically used in substations, for instance, and to try around and issue commands and try to bring down the infrastructure.

Alexander Harsch

Absolutely. I think today grid operators are so sensitive and so good in security. A script kiddie probably could not compromise or breach the OT in any way. But a person that has knowledge about the protocols and the devices in the field – who maybe can buy them, play around with them in a lab – they certainly could.

Klaus Mochalski

One of the challenges that we hear from many companies which use different types of tools is the expertise needed to operate these tools and the time spent operating these tools. Especially tools like intrusion detection systems, tools that are mandated by the German IT Security law. They usually need significant human interaction. You can't do everything with artificial intelligence. There's always a human decision involved.

Many customers are worried about [too many] incident notifications or anomaly notifications. During your training session, is this something you practice? A situation where you may have a seemingly simple attack flooding your monitoring infrastructure, but buried in this huge amount of low-level indications and notifications, there is this one event that you described before which may trigger a problem in the substation. How do you solve this problem and what's the observation that your teams take away? What's the effort? How much do they have to invest?

Alexander Harsch

Good one. What I know from many CIRTs [Computer Incident Response Teams], even from CIRTs with a high maturity, with many analysts working in them, they have the situation that they come to the SIEM [Security Information & Event Management] or monitoring solution, and have dozens of alarms, maybe hundreds of alarms. That is not uncommon. I would rather say it's very common to have it like that.

What will happen to the people who come every morning in the office and see it's 100 alarms on hand? There has to be some mechanism where the analyst sees the alarm, and learns what’s a good alarm or a false positive, and then feeds it back into the system. Bringing intelligence back into the system in a normal SIEM would be modifying the rules, adjusting them to your environment.

What it requires is a good understanding of the type of alarm and the programming language to describe the type of alarm. You will definitely need to do that. The very nice part about self-learning systems or smart systems or intelligent systems, of course, is that you just tell them this is good and this is not. You do not have to feedback in a very complex way, but in a very human natural way. Feeding back is necessary no matter what type of system you're using. But of course, with AI [artificial intelligence] it's much easier.

Klaus Mochalski

AI is a good point here. How much do you personally trust AI algorithms in basically differentiating important from unimportant stuff with regard to the events that are being reported by a system? We know in IT there's a lot of data and you need to rely on some AI mechanism to basically prioritize the huge amount of data. In OT, we observed that the amount is not quite as much.

Alexander Harsch

Well, personally I am a very conservative person in that sense. What I really like is to understand the type of alarm, and then being able to explain why it did not work for that specific attack. But I mean, that's the same question, what is better, a screwdriver or a hammer? If you have a nail, the hammer is most definitely the better tool. Having some artificial intelligence to analyze the same problem complementary, I think that is just perfect. You have your alarm, you detect stuff that you want to detect, and then you have a second system that will identify things you haven't even thought about maybe.

Klaus Mochalski

Basically getting the AI verdict, the information and algorithm may present as an additional piece of information that helps you justify or helps you qualify what you're observing, but still having the chance to look at the original piece of data.

Alexander Harsch

Right. I would want to do that. I would love to have both opportunities.

Klaus Mochalski

It's probably a similar approach as with AI used in medicine. Where everybody wants to make sure that no decisions are ever being made with regard to human's health by an algorithm. But it should provide additional data in helping a doctor, a real doctor, do a better analysis. Probably that's the same here.

We already spoke about the alerts that these monitoring systems generate, and we spoke about CIRTs or SIEM systems being used, security operations centers (SOC). There has been a debate for quite some time about the proper setup for, let's call it, a security dashboard, a SOC or a SIEM, if we look at the tactical system. Many people say that the ideal system would be a fully integrated OT and IT SOC with a central dashboard correlating all the security-relevant data that is coming from the IT side and from the OT side.

From a theoretical perspective, this makes a lot of sense because many OT incidents start in the IT public network and quite often they spill over. In practice, we've seen very few OT targeted attacks. Usually, they are really IT attacks spilling over using poor network segmentation or the maintenance laptop you described before.

Do you see this as an ideal scenario for today or what is necessary to get there? Because I know that many of these projects are currently underway and many companies and organizations are struggling with the complexity of this endeavor.

Alexander Harsch

I mean, those big incidents have been there, Black Energy, Crashoverride, and also Ukraine the last year, always IT with lateral movement to OT. Having an integrated CIRT will give you exactly this visibility. There's an incoming connection from the IT to the OT. If you have an integrated CIRT, you bring it into context. Who is it that's coming? Has there anything suspicious jumped from IT to OT before?

I perfectly agree that having the CIRT integrated is a thing that you want to have. Yet, if you look at OT itself, there are many buckets, different departments, and having an umbrella over them, that is already complicated enough. If we develop software, if we develop new solutions, new tools, we always talk about agile, having a minimum viable product, I would say why not to go the same way with the CIRT as well. Build your CIRT for the OT, build your CIRT maybe only for the process data network. Make it bigger, make it better, align your processes, align the data that you collect. Maybe bring it together, integrate it in the future. That might be a good path to follow.

Klaus Mochalski

It's an interesting approach. Maybe this is an important takeaway from this discussion today that you shouldn't try to solve all the problems at once. Because then usually you are up against a huge complexity. But you should take it in a step-by-step approach, if I understand you correctly.

Looking at the OT first, build your OT security dashboard, however this may look like, if this incorporates using a SIEM or not, if you set up a separate SOC as an operational team or not. And only when this is working properly in the OT domain, then actually think about interfacing it with IT data and align the workflows in the IT and the OT portion and the IT and the OT SOC. Only then, maybe as a further follow-up step, really start integrating these operations technically. Is this correct?

Alexander Harsch

That's definitely what I said. It's also to be honest, the way that we follow at E.ON. Maybe people have a vision of what it's going to look like in the future, but today people are working to get the monitoring, logging, and OT working. That's what they have to focus on. That's what they're really being successful with. Once they have it running, they might be tackling new issues.

Klaus Mochalski

Okay, that's great. Let's take this as a final key takeaway for this episode. I really like this. Take it step by step, try to make it as simple as possible, but as complex as necessary. But not more, and then take it to the next step once you have a mature operation in your OT environment.

Thank you, Alex, for the great discussion. I really enjoyed talking with you. See you next time.

Alexander Harsch

My pleasure, Klaus. Thanks.

‍