Dealing with industrial field devices in IOT


Why is this important? In contrast to IOT scenarios in the consumer space, the industrial space comes with a large number of requirements and challenges that may not be directly visible to people who were not in industry before. Connectivity, security, safety and lifecycle management requirements generate high entry barriers for start-ups to establish in an industrial IOT market. At the same time, these areas build chances for specialization and custom solutions. Let’s look at some of the main issues.

The following assumes that you have some devices in the field that either emit data, collect data, analyze data or are required to transport data. And you need some piece of software on those devices that needs to be managed in order to implement your IOT use case. This can be client software that is required to talk to a cloud service, a container to run your local analytics, protocol adapters that acquire data from other devices or software system, or similar.

Internet Connection

It may sound strange to non-industry people, but internet connectivity should never be assumed when thinking of devices in an industrial environment. This is a major difference to the consumer space, where it can be assumed that a household has some internet router and a DHCP server that serves a valid IP and a route to a public DNS server. In industrial applications this assumption will most probably be wrong. There are several reasons for that. Some devices may not have an Ethernet port at all but communicate over e.g. bus protocols (e.g. AS-I-Bus, CAN Bus, Profi-Bus) and so do not have the ability to talk TCP/IP. In this case you need a device that interacts as intermediate, like a PLC that has Ethernet connectivity to obtain data points from the devices behind an industrial field bus. Industrial equipment that has Ethernet and IP connectivity typically is connected in a production network that is quite isolated from the rest of the world. Even office networks of the same company may only have very secured access to data from the production. Internet access in any way is totally blocked for the production networks in most facilities. This is to secure the production from any attack, data theft or other harms.

Security policies, specifically in larger companies, are quite strict and not rarely you will miss a DHCP or DNS server in these networks. All is hard-wired and hard configured to avoid accidental or planned impacts to the production. Unfortunately, some of the most valuable data sits within these highly secured networks – so how to get it out of there? This is where gateway devices come into play. Whether they are on dedicated hardware or use existing hardware (e.g. station control machines, network routers, engineering stations, …) they may connect to the production network on the one side and connect to an office network on the other side. This allows network administrators to have only one device that needs to route traffic into the internet and the cloud, which is better to govern and control than having many devices in a production network that need to talk to the internet simultaneously.

There might also be the need to go over proxies here or to use other customer specific security features for outbound connections. So, when you think of your software and devices that shall operate on floor level, include specific functions to configure IP addresses, DNS, DHCP, use of proxies and other security measures to support security policies of your customers. This will require that you bring configuration to the devices – before they are able to reach any cloud server outside the local network. Also consider that not all environments allow use of WLAN or 3G/4G wireless due to their remoteness or harsh electromagnetic conditions.

The main use case that comes to people’s mind is often “data out”, so consider upload bandwidth of the internet route that you are using. But also be clear that data will flow back from the cloud system into your device. Configuration data and software updates may be the most obvious. Restrictive industrial enterprise may generate extensive security requirements in the area of data flow control, what is leaving and entering the corporate network. They cannot afford that you edge device undermines the network security of the production facilities and might ask for proof and control over all data that is being transmitted in any direction.

Protocols, Data Models and Device Topology

Protocols are important in two directions. The protocol that is used to offload production data into the cloud and to retrieve e.g. updates or new configurations from the cloud. There is a notion of being “firewall friendly” which often relates to communication via outbound HTTP or HTTPs (ports 80 or 442). Other protocols may be used as well, but you may expect discussions with the security officers of your customer around their firewall configuration when you make use of other protocols. While there is no technical issue with this, the security discussion around the governance of company policies will come into play. We have seen customers who have very specific demands about the protocols that are used to transport data from their on-premise production network into the cloud. Encryption is the most obvious ask but there are customers that for example do not allow protocol switching (e.g. switch from http to websocket on port 80) or want to restrict outbound traffic to specific destination IPs (which makes DNS more difficult e.g. for load balancing). So, be clear about the impact on security controls around the protocols that you use. AWS IOT uses MQTT as the inner protocol to transfer data from an IOT device to the cloud side message broker but provides an HTTP version of that to be more firewall friendly.

On the other side you will need to use protocols that are specific to the industry and devices from which you want to obtain data. Examples are OPC-UA for the automation industry, BACNet for building automation or IEC61850 for electrical substation control. There are hundreds of protocols and you will need a mechanism to add more protocols and update the existing ones over time. Whatever your protocol for data transmission into the cloud is, the data source may provide the data in other forms, specific to the used protocol and you need to think of data transformation to match the transport protocol or make sure that you can transport the source protocol inside the cloud transport protocol (transparent transport). Example: You can envelope OPC-UA messages inside an MQTT message that is transported via HTTPs. This is possible because HTTP and MQTT allow to transport any payload in their body – this is not true for all protocols. Key here is to have a clear understanding which part of your system will terminate which protocol. In AWS IOT for instance, HTTPs and MQTT are terminated by the AWS IOT Core Message Broker that is part of the AWS IOT service, from here you can forward the actual payload into downstream systems. This payload is then application specific from the viewpoint of AWS IOT. It is important to understand that many industrial protocols define a data model along with a transport model. So, making use of a protocol to acquire data from a device or system in the field will require a semantic understanding of this protocol to make use of the data and e.g. to transform it into another protocol.

At this point it is also important to consider device topologies, specifically if you make use of gateway devices that collect data from many devices within a production network. As you can imagine, there might be multiple layers of gateways that collect data from devices in their reach and may aggregate some of the data (or send it straight through). This might include some protocol transformations on the way.

In the simplest setup one device has some own sensors and directly talks to the cloud – in this case you have a 1:1 mapping from data source and data transferring device. From the perspective of the cloud you directly receive messages from the device that is the source of the data. So in your downstream processing you know how to interpret this. When you add one gateway, things change in a way that you may need to know where the original data was coming from, so you need to understand the topology behind the gateway and must find a concept to identify each data set and map it to the devices behind the gateway. The cloud service will only see the transport device (the gateway) as connected entity. This gets more complex when you have gateways behind gateways. So, a clear virtual separation of data-source-devices and gateway-devices is advisable, even if a single device may have both roles at a time. In the most advanced cases, device management features should be able to deal with hierarchies of gateways, intermediates and actual data source devices, along with potential protocol and data model transformation on the way.

SECURE, YET COST EFFICIENT Device Onboarding

Device onboarding refers to the process to connect a device (data source, intermediate or gateway) for the first time to your cloud backend. This process needs some considerations, specifically when your business model includes a ready-to use device that you provide to your customers. But also when you only provide SW that needs to be installed into an existing HW, you need to take care of this process. Why? Because it can be tricky to do it right in terms of security and operational effectiveness – which essentially means to keep the cost of onboarding low by minimizing manual work in the field because field service engineers are expensive and that does not scale well when thinking of larger number of devices.

Let’s take an example that you want to equip 2000 motors in a facility with vibration analytics boxes that report their findings to a cloud-based service via HTTP/MQTT (e.g. via AWS IOT Core). Field service engineer hours are expensive, so how to minimize the time required for these people to install and configure the boxes – and keep the solution secure at the same time?

There are certainly options to discuss on the physical side (in the end the device needs physical placement and wiring or batteries), but let’s focus on the SW side in two dimensions: 1) Some base requirements and technical measures, and 2) the resulting onboarding process and options for it .

1) Some base requirements and technical measures

  • You need to able to uniquely identify each and every device that connects to your backend. Connected devices need some management and data that is coming in needs to be mapped to the right digital twin, tenant or shadow (however you call your virtual data container for this device). So, how to do unique IDs in a secure way to avoid abuse? MAC addresses do not work well; they are easy to fake. Generated unique IDs may be an idea, better are Hardware IDs that are baked into the device firmware. You need to consider how easy it is to fake or reproduce device IDs, one of the attack scenarios in IOT is to build an army of fake devices that you allow to send data and then you get either bad data or become subject to a DDoS attack. So, good unique IDs are hard to reproduce, duplicate or guess. All this is part of the device, so when you make your own devices, baking HW IDs in may be something that might happen during the manufacturing of the device already. There is a set of good literature that can be referenced to dive deeper into this problem space.
  • Let’s assume you build IOT devices that shall act as IOT gateway, sold with pre-configured SW. What if you manufacture the device and put your current firmware version on it but before the devices are commissioned at customer site, you fixed a severe bug in the firmware. How do you update the device without putting your backend at risk? You need a strategy how you get new firmware versions into the device before onboarding occurs. E.g. you have a separate web service for that with lower restrictions for devices to connect (e.g. allow older versions to connect) and explain to the customer why they need to whitelist this web service as destination in the corporate firewall. This could enable automatic updates via internet once the device has internet connectivity. Or you have a mechanism to update the firmware manually (consider the cost for the service field engineer), e.g. via a USB Port or a dedicated LAN based commissioning tool.
  • In addition to this, you need to get a base configuration into the device to adjust basic IP settings like IP address, DNS server, default gateway and proxies if there is no DHCP present – assuming you use wired LAN or WLAN (if you equip all devices with 3G/4G and a SIM, you can skip this). You can do this with an integrated web server (like on home devices) which a field service engineer uses via LAN to type in this data. You could also think of providing a configuration file via FTP or USB-Stick, both also manual options requiring field engineer time. Or you have a commissioning tool that injects this data via LAN and a proprietary protocol (e.g. the device exposes a REST API via a local web server). Consider all open ports on a device as a security risk that people might want to use to break into the device. So, a local web server on the device is a valid option, but you need to secure it.
  • Most IOT cloud backend systems require also the device (or the device’s software) to provide a valid client certificate that was provided by the backend before. So, during the onboarding of the device, it might be required to generate at least such a certificate on the cloud side that needs to be loaded into the device. Even more secure systems demand that an onboarding token is generated for a given device ID that allows onboarding only during a specific time frame to make malicious device onboarding harder. Here, a field service engineer is needed who can generate the required data on the cloud side (or locally – and register the data to the cloud in a later stage), brings it into the device and so links the device with the cloud backend in a meaningful and secure way.
  • One security measure is that certificates expire. So, IOT devices have to renew their certificates regularly (e.g. every 12 month) to reduce attack surface with old and stolen certificates. So, a concept needs to be in place to update the certificates. This plays in concert with the need to update configuration and firmware in later stages during the life of the device, not only for security reasons.
  • If you equip all devices with 3G/4G and a SIM, you may need other things that need to be configured. E.g. activating the SIM card with the carrier or SIM provider. Please consider here that telco companies provide several options to handle e.g. worldwide connectivity for SIM cards and mass activation. Since the process may be different from provider to provider, it needs to be checked what’s required from the service engineers to do.

So looking into this, you may now wonder how to design an onboarding process that provides a decent level of security and minimizes the effort of field service engineers to deal with certificates, onboarding tokens and firmware updates. Please note that not all enterprises and companies have field service engineers that are trained and experienced in SW management, SW engineering or cloud computing concepts. Still a large portion of these organizations are focused on mechanical and electrical tasks. So making it easy for them is a key quality in many industries. Reducing service engineer time may also play a key role in turning a business model from bad into good.

Here are some traditional approaches:

  • Provide an internal web server and pre-configure the device with a default IP address. When no DHCP is present, an engineer can use a local laptop to connect to the device and enter the base configuration. Pro: This opens room for more features for device onboarding, base configuration, error diagnosis and more; and also exposing e.g. REST interfaces to provide automation APIs. Con: You need to open a port on the device and secure it. Also, the field engineers need to follow this new UI, which is cumbersome when all device vendors do it this way due to the variety that unfolds for them. Also they need basic knowledge around setup of local LANs and IP address manipulation on their laptop. In addition to this you would need to provide a web base UI to these engineers where they can enter the device ID and retrieve certificates and onboarding tokens, maybe also firmware updates. This must be downloaded to their laptop and from there uploaded to the device. Some more IT know how required – and the device needs to be in physical presence to attached it to a local LAN with the Laptop.
  • Use the USB port. You could provide the same web based UI and download the data to a local laptop, this time you the service engineer writes the data to an USB stick and inserts it to the device. The device automatically pulls the data and configures itself. Pro: this is much faster and less error-prone for the service engineer to do. Also, you do not need to expose open ports at the device. Con: This requires physical presence in front of the device, still.
  • You can provide a locally installed tool (on the service engineer’s laptop) that interacts with both devices and cloud backend to reduce manual steps for the service engineer. Here you can use other means to do the base configuration for the devices, e.g. using UDP multicast or other ethernet protocols like IGMP or others. With decent investment you can make the life for service engineers quite easy with this.
  • You implement a device landing zone concept where you ask the service engineer to attached the device to a DHCP enabled lab environment only for the sake of being connected to the internet (unless you have 3G/4G and an active SIM on board). This assumes that all customers can provide such an environment. In this environment, the device has free connectivity to the internet but no connection to the production network. The device contacts the cloud service to retrieve latest firmware and then reports with its unique ID that it is ready for onboarding. The cloud system here needs to be implemented in a way that it allows only limited functions and data sending for such devices until they were full onboarded (e.g. discard all data offloading). The service engineer can now find the device by its ID (needs to be found somewhere) and so can configure the device online. All cloud-side generated information is downloaded into the device automatically. In the moment it is applied on the device, the device will disconnect from the landing zone and is ready for placement into the production plant (including static IP address configuration if required). This can minimize the effort for service engineers to a simple “yes I know the device and confirm it belongs to customer XYZ and is called <theDevice> in cloud” process. As you can imagine, this is the hardest to achieve in a truly secure way. This landing zone needs special treatment in terms of attack surface reduction since it is exposed to DoS, cross-tenant access and other attacks as well, with less enforceable device side security.

Around these basic options, you will find corner cases that add complexity but should not be discussed here. Examples: device replacement (device broken), device decommissioning, device reset for new commissioning, and so on.

Once the device was onboarded successfully, the cloud backend may allow data inflow for those devices while making sure that data is routed into the right data store or tenant (depending if you design a multi-tenant or single tenant solution). From now on you can also use over-the-air update mechanisms to deploy new configurations and firmware into the devices. But this also has some elements to consider. See next chapter.

Updates to Device Software, Configuration and Certificates

AWS IOT provides over-the-air updates for devices that are managed by AWS IOT Device Management. Also a concept called device shadow allows to synchronize configuration state between the cloud side and the actual devices. This all plays into the operational aspect of dealing with industrial devices.

In the consumer space we are used to regular updates to our devices. Sometimes this can be frustrating because an update download kills our bandwidth while watching and streaming our favorite TV-Show. Or the update is applied without notification and we have to wait for the device to reboot – and go for a coffee in the meantime. Such things are annoying in the private area, in the industrial area they kill businesses. Industrial customers are very sensitive when it comes to changes to their productive environment. They need to understand what is changed and why, when it is going to be applied and what the rollback strategy is. Zero downtime updates and automated roll-back in error case are typical requirements. Also planning of updates into service time-windows (e.g. when production is stopped anyway) is required. So there are hard operational requirements that need specific software designs and processes to be in place. You cannot assume that specifically larger enterprises will accept uncontrolled (to their point of view) SW updates and configuration changes.

Who owns the devices you want to update? Where is the device located? From a legal standpoint we need to consider that if the devices are owned by the customer and installed at their premises, we load SW and configuration into their equipment. So you better not break it or any system it talks to. Example – some customers may ask for a gurantee that the new software will not negatively impact other productive systems. Lets assume your software pulls data from an OPC-UA server and the new configuration pulls now every second instead of every minute and so overloads the OPC-UA server so that a SCADA system becomes slower in terms of end-user perception. What can you do to make sure your SW does not have negative impact on other systems in a continuous way, every time you do an update. And what’s your roll back strategy?
Another aspect is export control. The country where you distribute your software from may be another country as you deliver the software into. Legally, this is an export with the respective consequences. You should make sure that your software distribution solution takes care of legal requirements.

So, besides the obvious required features like device certification rotation (file download), device notifications for new firmware versions, software version and inventory management (what SW is active on which device) and a good process for large scale SW rollouts into a worldwide distributed device landscape – you need management functions to deal with legal demands, maintenance scheduling, zero-downtime deployments and rollback strategies in the area of industrial enterprises.

Protection against bad Devices

IOT comes with an entire new set of attack scenarios and security challenges. We discussed some of those around the device onboarding already. But the topic of securing industrial IOT solutions is much broader. Here, we only will scratch the surface of the topic to gain some awareness. Hosting web applications is a security challenge of its own. AWS provides several security whitepapers and best practices how you can secure your cloud-based workloads. With IOT, the devices become part of the security concept and this opens new attack surfaces that also need to be considered in addition to the cloud-based security measures. Here is a list of headlines to get started:

  • How to ensure that device IDs and other identifiers that are used to securely connect devices and accept data are hard to clone, duplicate, guess and steal.
  • How to minimize the impact of compromised device software on cloud side and on field network side
  • How to avoid that software and configuration that is placed into the device (via cloud) is not maliciously modified
  • How to store required secrets for cloud communication securely on the device
  • How to handle certificate exchange and other security tokens to expire without causing devices to disconnect (and then require costly manual reset)
  • How to identify anomalies in device behavior on the cloud side to allow disconnecting devices that behave outside the expected (e.g. devices send too much or malformed data due to compromise device software)
  • How to protect the cloud against distributed denial of service attacks (DDoS) based on mass-cloned devices and e.g. generated device IDs
  • How to protect the customer data from the moment it touches the field device (e.g. gateway) until it comes to rest in the right place in your cloud system (at rest in the device, during transport, at rest at the cloud)
  • How to reset devices and swipe all data when a device is decommissioned
  • How to protect the customer’s production facility against malicious software on the device
  • How to enable secure bi-directional communication for device management across network and firewall boundaries

You may think of measure that e.g. protects the SW that you want to distribute to thousands of devices from a central repository alone. If that system or the SW is compromised, you distribute malicious code into a large set of customers. This would have massive business impact. So, thinking of bad devices is a good guidance here. There needs to be measures in place that a) avoid that devices become bad, b) detect devices that are bad quickly, and c) disconnect and block bad devices automatically (plus some features to recover them).


Categories: Internet of ThingsTags: , ,

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.