Generalizing Platform Security Hardening

01 Nov 2021

One of my roles within Intel’s Internet of Things Group (IOTG)¹ is developing products that demonstrate Intel’s hardware and software to get strategic partners excited about what they can with them. Sometimes we’re selling these products outright, but other times we’re simply offering them as a base platform, intended for integrators or partners to extend upon.

As a software engineer, this presents as competing forces. Our system should work with a wide range of customer setups, but the more options we support, the more likely it becomes a configuration nightmare or too big to reason about.

This is in fact the core of software engineering: managing complexity.

This is especially true when balancing security and ease-of-use. We want to enable low-friction deployment that gets them to something useful right away. Meanwhile, we want to ensure our systems are secure-by-default, and that we minimize the consequences of someone running our code but ignoring our warnings and security recommendations. Furthermore, we need to ensure any security features we do build are able to integrate with or be replaced by a partner’s existing authorization and authentication systems.

A Specific Example

I worked for some time on a “Smart City” demo platform. The system took in video streams from RTSP sources, analyzed them using OpenVINO and Intel’s AI accelerators, stored the annotated video and inference results, and presented it in a web UI with search and playback features. Our users were technical types, running on a Linux platform, and familiar with docker, compose, and make, but not necessarily an expert in any of the above topics. The goal was that the “getting started” guide was as easy as “run a make command, add a camera URI, and view results”.

This system was potentially dealing with sensitive information. There could be usernames and passwords in RTSP URIs; it made timestamped video recording, potentially of individuals; and of course, it collected inference results with metadata that might be correlated to those individuals, including those in potentially vulnerable or protected classes. Considering the potentials for harm, I wanted to ensure that access to this data was restricted in scope, duration, and availability, and that it was possible to audit access requests, including failures.

Using Vault

The setup described below was designed and implemented in 2021, so it is possible there have been material changes to HashiCorp and or Vault since then, which may make some of the details obsolete. Nevertheless, the overall gist remains the same.

To accomplish the above, I started by integrating HashiCorp Vault into the project. It provided a convenient abstraction for secrets and authorization, independent of our services, so it could be replaced or augmented as desired.

While Vault is a great abstraction, it comes with some assembly required. I started by writing a Python script to handle initialization and startup. It united several best-practices from Vault’s documentation to configure public key infrastructure, rotate root database credentials, and manage AppRole policies for the application’s services. Each time the application starts, the script runs, checks and initializes Vault if necessary, unseals it, configures rate-limiting, and provisions the backends. Finally, it generates and saves a special service_watcher_token.

I added a series of Make targets to start services. These served as a stand-in for a more typical Trusted Orchestrator. These targets used the service_watcher_token to create and inject secret-ids via a tmpfs for each service it started. The secret-id was one-time-use and only valid for a handful of seconds, which meant services could raise audit alarms if they tried to use their credentials but found they were invalid.

Vault’s internal data is encrypted at rest, and when Vault starts, it must be unsealed to decrypt the root key. A production-worthy deployment must take great care to secure the unseal keys, as anyone with access to them can gain full access to anything Vault protects.

Here is where we chose usability over security.

As the project’s focus was to demonstrate capabilities and provide a springboard more sophisticated deployments, I expected users would stop and restart the application often. Since each restart had to unseal the Vault and generate initial service credentials, during initialization, I chose to write the unseal keys to a Docker volume. My reasoning is that I expected it would be used in a single-user environment, and primarily for short-term evaluation, hopefully with low-stakes data.

Even though anyone with access to the keys or service_watcher_token effectively had the keys-to-the-kingdom, I took several steps to mitigate the potential impact. First, the service_watcher_token expired after 96 hours, after which they were essentially forced to read the security documentation, if they hadn’t already, in order to make any further use the application or saved data. Next, I logged prominent warnings at every startup, using bright bold colors, and ensured there were several clear messages of the danger and consequences within the security hardening guide, along with information on how to improve. Additionally, using the PKI setup, my colleague ensured data transfer outside the local network was encrypted in transit².

Limited Data Access

The first time the database service starts, it creates a superuser, gives it a password, creates a database for that user, then runs my initialization scripts. These scripts create an unprivileged user to own the application database, initializes that database with the appropriate schema, then revokes LOGIN privileges and removes its password. Next, it configures a series of roles each with a narrow scope of privileges based on the service that should use them.

Intentionally, none of these roles are granted LOGIN rights. Instead, database credential management is delegated to Vault. The first time the setup script runs, it rotates the database’s root credentials; at this point, they are only stored in Vault. On every subsequent restart, it rotates the root credentials. I configured the database roles and service policies so that when a service requested a relevant database role, Vault would generate a username and password specific to it. These credentials expired after just 10 minutes, though the service could extend the lease as long as it was running.

Splitting Roles

During setup, I generated two distinct AppRole backends. The first was named services, and as the name suggests, it managed a role per service. These roles had policies that permitted them to request database roles that limited their access to only read or write the data narrowly relevant to that service.

The second AppRole backend was named cameras and was used like this: when the user configured a new camera, I had the camera-config-service generate a new AppRole for it with a policy that scoped its access to its camera ID. Each camera the user configured was associated with a “media-pipeline” service, and so when starting the camera’s associated service, its injected secret-id only allowed access to secrets associated with the camera_id. Importantly, the policy did not permit reading video or inference data from any camera, including the one associated with the service. Of course, with access to the RTSP configuration, they could view the incoming stream, but all historical data required separate credentials.

I wanted this granularity to stand out, since in a production deployment, these services would likely run closer to the hardware, and consequently in environments more difficult to physically secure. For related reasons, I made sure the service itself cached its buffered data to a tmpfs, not to disk³.

Security Is Everyone’s Responsibility

In every system like this, there are trade-offs. We had to strike a balance in our platform’s security and usability. Part of that balance for us was providing a roadmap others could use as a catalyst for improvement.

While I enjoyed the technical challenge involved in developing this security solution, I later came to realize that the Security Hardening Guide itself may have been the more important thing I produced. The technical contributions laid the groundwork for a robust security framework, but security isn’t just ticking boxes on a checklist. It’s a mindset, and fostering that culture requires reflection and collaboration so we can continuously adapt our practices according to the nuances of each deployment scenario.

Footnotes

IOTG was reorganized and renamed several times over the years, and as of 2024, the relevant organization is named Network and Edge (NEX).
Ideally, we would have enforced that all data was encrypted at rest, but in this instance, we felt the balance favored recommendation over implementation. The ideal solution would have been full-drive encryption and barring that, volume-level encryption, but these weren’t realistically within our application’s domain. We considered integrating database-level encryption, but unlike data in transit, the security risk for the database data was the same as that of the encryption keys. Moreover, it would have made it more difficult to replace our security implementation, as that tight coupling is exactly what we wanted to avoid.
It was still up to those handling the production deployment to ensure the processes ran in a way that would not page this data to disk. I included these details in the security hardening guide.