Confidential Computing in Public Cloud

Confidential Computing in Public Cloud

Confidential computing is the notion of carving out and strictly isolating certain compute capacity for running trusted workloads within untrusted computing environments. The idea is to create a special execution context (a.k.a. enclave) where code can run protected from even the OS kernel, with the guarantee that even the root user cannot extract the secrets within the context or compromise its integrity.

A key use case for confidential computing is data and code security in public cloud environments. Confidential computing renders sensitive data and code opaque to public cloud providers that otherwise have root access to the host OS running customers virtual machines or containers.

Where does Confidential Computing fit into public cloud data security?

The shared responsibility model of the public cloud establishes that the cloud provider is responsible for the security of the cloud (meaning cloud infrastructure) and that the customer is responsible for security in the cloud (meaning application and data security). When it comes to securing data in untrusted environments, three phases of data life cycle are considered – data at-rest, data in-transit, and data in-use.

Securing data at-rest and in-transit in public clouds is a well-established problem with standardized solutions. During such phases, once the data is protected (e.g. encrypted), it is treated opaquely as a binary blob since it is either passively stored or transmitted but not actually used in computations. The life cycle of protecting such data is usually well-defined and the flow of information tends to be only point-to-point or restricted to a silo. For example, protecting data in-transit through encryption embedded in TLS, IPsec or SSH communications lives only for the life of the transmission and is point-to-point. Likewise, data at-rest protection lives only while the data lives in the data store silo since the data is written as encrypted and read as decrypted.

Data in-use protection, however, is typically a much harder problem. It is usually a multi point problem where there are multiple producers and consumers and there is no fixed life cycle of such data protection. Further, data in-use can’t be converted into and processed as encrypted binary blobs since the data needs to be actively computed over and must maintain its schema. There currently exist multiple ways to secure data in-use, some more mature than the others. These include:

  • Data de-identification where select sensitive data is pseudonymized using fine-grained data protection techniques such as format-preserving tokenization or FPE. Since the data maintains its format after protection, it does not break the application code and storage schema and can, therefore, be processed opaquely within untrusted environments.
  • Homomorphic encryption is a form of encryption that allows computation on ciphertexts, generating an encrypted result which, when decrypted, matches the result of the operations as if they had been performed on the plaintext. This has been a topic of research for the last several years and while there are some signs that the math is being optimized for a select few practical applications, it is far from being used as a general-purpose tool and the research continues.
  • Confidential computing introduces a concept called as trusted execution environments (TEEs) to protect data while in-use. TEEs, also known as enclaves, are hardware or software implementations designed to execute applications in a protected environment that separates applications inside the TEE from the regular operating system and from other applications on the device.

Let’s look at confidential computing a bit closer and review how it is emerging in the public cloud computing world.

Trusted Execution Environment (TEE):

The TEE (a.k.a. enclave) technologies are being designed to enable new forms of isolation beyond the usual kernel/user-space separation to reduce the possibility of a successful attack on application components and the data contained inside the TEE. Typically, application components are chosen to execute inside a TEE because those application components perform security-sensitive operations or operate on sensitive data. Only authorized code is permitted to run and to access data inside TEE, so code and data are protected against viewing and modification from outside of TEE.

As per terminology proposed by an IETF draft that is looking to standardize the TEE architecture, an application component running inside a TEE is referred to as a Trusted Application (TA), while a normal application running in the regular operating system is referred to as an Untrusted Application (UA).

When applied to the public cloud, TEEs can offer several security assurances such as:

  • Safeguarding of customer sensitive data from malicious and insider threats while it’s in-use.
  • Protection and validation of the integrity of code running in the cloud.
  • Assurance that customer’s data and code are opaque to the public cloud provider.
  • Cryptographic attestation that provides an unforgeable proof that enables a remote party to verify what has run inside the enclave even if they don’t have physical access to the machine.

Hardware and Software TEEs:

Two types of TEE architectures have emerged recently both with their strengths and weaknesses –

  1. Hardware TEEs provide isolation in the microprocessor itself. Examples include – Intel Software Guarded Extensions (SGX), AMD Secure Encrypted Virtualization (SEV), and ARM TrustZone
  2. Software TEEs provide isolation through additional software layers such as a hypervisor. 

Confidential Computing announcements by Public Cloud providers

As the microprocessor vendors have started offering TEEs and the hypervisors are evolving, the public cloud providers are starting to announce previews of their confidential computing services. For instance,

  • AWS announced a preview of Nitro Enclaves in December 2019 at AWS re:Invent. This allows customers to build TEEs within their Amazon EC2 instances. Nitro Enclaves uses the same Nitro Hypervisor technology that provides CPU and memory isolation for EC2 instances. The service is integrated with AWS KMS. Being software (hypervisor) backed, Nitro Enclaves allows creating enclaves with varying combinations of CPU cores and memory. This ensures applications have enough resources (which is a serious limitation in CPU hardware-backed TEE technologies) to perform the size of workload necessary to be computed as a trusted application.
  • Microsoft Azure announced the public preview of Azure Confidential Computing and open-sourced Open Enclave SDK in October 2018. It announced its DC-series VMs that run on Intel XEON processors with SGX technology. Combined with Open Enclave SDK, customers can build secure enclave-based applications using underlying Intel SGX technology. Since hardware-backed implementations such as Intel SGX are constrained in memory and CPU so the application architectures must be carefully split to run most sensitive code and data inside the enclave.
  • Google announced and open-sourced Asylo framework in May 2018. It is an enclave backend abstraction layer designed to run with any available backends such as Intel SGX.

Dev Tooling

As the major public cloud providers are starting to announce TEE services in preview, various open-source TEE SDK projects are emerging. These include:

We are yet to see which of these open-source SDK/API initiatives will see greater adoption. Ultimately, the hope will be that the industry standardizes on one API so that application developers are not locked into a cloud provider’s proprietary TEE API.

Conclusion

The availability (even though preview and not GA) of confidential computing services in public cloud environments is a great new tool for customers to consider for securing sensitive data in-use in public cloud environments. The technology is in the early stages currently. Hardware and software backed TEEs have their own advantages and disadvantages. We are yet to see how this technology will become more practical, cloud-native such that it can support elastic scaling, serverless workloads, and get standardized around APIs and tooling. 

Would TEE be a good alternative to HE for protected training of private ML models?

Like
Reply

Very Good article on confidential computing in the age of public cloud computing, as someone who has worked for the defense and finance industries, there was always an open question on encrypting data in-use. Microsoft just announces the General Availability of DCsv2-VM from Azure Confidential Computing. see link below... https://www.infoq.com/news/2020/05/dcsv2series-vm-ga-azure/

Like
Reply

To view or add a comment, sign in

More articles by Raj Jain

Others also viewed

Explore content categories