Attestation remains a cornerstone of modern computing because our machines now run models and data too valuable to trust on faith alone.
As Artificial Intelligence (AI) systems grow increasingly complex, distributed, and attractive to threat actors, the ability to provide hardware authenticity, validate the state of software and highlight trustworthy signals to customers is crucial. As a result, attestation has become the foundation for transparency, vulnerability recovery and secure operations at the sale today’s AI infrastructure demands.
What is attestation?
Imagine a device which has a unique key pair built into it, which is accompanied by a vendor-issued certificate that details the key’s authenticity to establish trust in the device. This device keeps the key pair secure, and uses it to sign claims – statements about the things the device believes to be true. The device then sends both the certificate and the signed attestation to a verifier, which checks the attested claims against a policy to decide whether the attestation passes or fails.
The verifier also needs to have privileges that it grants or withholds based on the attestation result. This part is essential: if the verifier simply consumes attestations and logs them in a place nobody checks, it will have no effect on system security. Yet if real enforcement is achieved, attestation proves hardware authenticity, enables large-scale vulnerability recovery, and supports greater transparency – pivotal capabilities as AI systems handle increasingly sensitive and valuable data.
At the same time, a complex supply chain is creating opportunities for counterfeit components to slip in. Attestation can help raise the bar for device security by making it harder for attackers to introduce fake hardware which would potentially be accepted as legitimate.
How does attestation aid vulnerability recovery?
Most software will always contain bugs. As systems grow more complex, the number of bugs inevitably grows with them, meaning the only effective strategy is a continuous cycle of shipping the best code available, finding bugs that were missed, fixing them, and then verifying that they have worked.
In data center environments – and especially when the vulnerability itself could cause the device to lie about its current state – attestation can help carry out the appropriate checks and mitigate issues.
How does attestation enable greater transparency?
In the age of AI, customers no longer accept that a cloud operator alone should be responsible for securing hardware. Instead, they now expect continuous, remote proof that the machines running their sensitive workloads are genuine and behaving correctly. This underscores customers’ desires for ongoing transparency, shared assurance and verifiable evidence that the underlying hardware meets the appropriate security policy at all times.
Achieving this through attestation depends on three pillars: the vendor endorsing each device’s identity before shipping, the device being able to produce signed claims in a standardized format, and the verifier checking those claims against policy to determine what the device is allowed to do. This creates a chain of trust from manufacturing to runtime.
Attestation within the cloud
Each stage of this process requires significant work, but the verifier is where operators and customers meet. A cloud verifier’s policy effectively has two halves – one focused on hardware, the other on software.
Hardware trust depends on questions like whether the device was manufactured by an approved vendor, observed correctly in the supply chain, or installed in the expected data center. Software trust depends on whether code came from an approved repository, and is the exact version intended for the machine it’s found in. These signals make sense for cloud operators because they control vendors, supply chains, and deployment pipelines.
Yet customers and end users don’t have much visibility into these processes, so the challenge is turning these internal trust signals into something demonstrable, meaningful and consumable.
How the attestation process provides assurance
This assurance can be provided by exposing a small subset of signals which confirm the device comes from a trusted vendor, the verification of hardware serial numbers, and proof that the machine is running the intended software version required of it. Customers can also independently verify the vendor’s certificate authority. Together, these fields form a customer-facing attestation policy that provides real value: they can trust that the device is build by a trusted entity, and in some cases that the vendor and cloud operators are separate, which reduces the likelihood of collusion during an attack.
However, the process has become more nuanced as cloud providers increasingly manufacture their own silicon. Today, verifying the serial number reassures the customer that the specific physical machine they were promised is the one actually running their workload, and this approach turns internal cloud trust signals into externally meaningful transparency for customers desiring continuous verification.
This process highlights whether the cloud operator has done their due diligence. Customers can trust hardware if the cloud operator has accurately reported which device should be running workloads. However, in stronger threat models like a nation state compelling a cloud provider to deploy compromized builds, opaque hashing provides no meaningful significance. At the same time, a single bad software push can instantly make an entire fleet unsafe, regardless of the hardware’s trustworthiness. That’s another reason customers desire transparency, as they need a way to audit and validate the software image itself.
Scalable audits
Auditing every piece of software is impossible, so the only workable approach is to reduce the number of required audits, and increase oversight on the software. Vendors can undergo a single vetted audit that all cloud customers can rely on, amortizing the cost across the entire ecosystem.
The other path is broader scrutiny. Publishing deployed binaries in an open repository means anyone can inspect them, backed up by a transparency log to prevent retroactive tampering. Together, these approaches make reviewing software a more scalable endevour, giving customers meaningful visibility into what’s running on their machines.
Why is this important?
Well, a software hash only becomes meaningful once tied to something real like an audit, so customers can see whether the code was reviewed or publicly available. While a transparency log doesn’t guarantee anyone actually inspected the software, it does prevent an operator from quietly deploying a malicious build and hiding it.
Confidential compute
Modern machines have become so complex that this process has become harder, which is why the concept of confidential compute is growing in popularity. By encrypting data in use and authenticating the environment through attestation, confidential Virtual Machines (VMs) shrink the trusted computing base, meaning sensitive workloads can run in an enclave where the host Operating System (OS) and Basic Input/Output System (BIOS) can’t read or tamper with them. But when considering AI, Machine Learning (ML) workloads don’t stay on the Central Processing Unit (CPU) – they hand data to accelerators – so protection must extend across that path.
Thankfully, new accelerators are supporting key compute features such as the Security Protocol and Data Model (SPDM), allowing encrypted transport that terminates inside the confidential VM rather than in the host OS. This finally removes the host OS from the trust boundary for both CPU and accelerator interactions.
Unfortunately, side channel attacks remain a real risk, as research shows a malicious host can still infer or influence enclave behavior through subtle leakage. Depending on the customer’s threat model, these mitigations may need to be part of the trusted base again. Some customers are comfortable relying on the cloud provider’s internal, non transparent hardening, while others require full transparency, including the measurement, attestation, and auditing of the software that implements these protections.
The economics of attestation
These processes are crucial as AI supercomputers are incredibly expensive and time-consuming to build. A verifier’s decision will effectively determine whether a machine gets scrapped; a financially unacceptable step for modern ML hardware. Today’s systems are amplifying the stakes further, as when racks of machines are deployed, some will inevitably fail, and one bad attestation signal can wrongly condemn hardware worth millions. It’s fair to say that getting attestation right is now a top-tier operational priority.
ML supercomputers which contain hundreds of racks and thousands of TPUs have almost no slack, and need almost every component operational for the system to function at all. If a cluster of such size sees widespread failures, the entire computer will sit idle, burning money and producing no useful work. Attestation reliability is the difference maker here.
The goal is to have a world where nearly all system failures reflect genuine problems worth investigating, not just bugs. If only a small fraction is causing defects, then it becomes defensible to scrap machines that fail attestation processes because the signal they produce can be trusted.
Make no mistake, this will be a challenge: for this to happen, every device, bus, transport and interconnect must behave perfectly for attestation to work effectively. Yet this reliability is the price of operating modern AI infrastructure that demands data sensitivity, model value and trusted hardware. Attestation is the best way to provide strong authenticity, scalable vulnerability recovery, and deep transparency.
Membership in the Trusted Computing Group is your key to participating with fellow industry stakeholders in the quest to develop and promote trusted computing technologies.
Standards-based Trusted Computing technologies developed by TCG members now are deployed in enterprise systems, storage systems, networks, embedded systems, and mobile devices and can help secure cloud computing and virtualized systems.
Trusted Computing Group announced that its TPM 2.0 (Trusted Platform Module) Library Specification was approved as a formal international standard under ISO/IEC (the International Organization for Standardization and the International Electrotechnical Commission). TCG has 90+ specifications and guidance documents to help build a trusted computing environment.