Launching the Blyss Confidential AI service

January 12, 2024

-https://api.openai.com/v1
+https://enclave.blyss.dev/v1

We’re excited to launch the Blyss Confidential AI service today at enclave.blyss.dev. Using secure hardware in the latest NVIDIA GPUs and AMD CPUs, all API calls to our LLM service are completely confidential. It is impossible for anyone, including us, to observe any user interaction with models, and that claim is publicly verifiable.

The OpenAI-compatible API is live at https://enclave.blyss.dev/v1. You can plug this endpoint into any LLM app and immediately get private chat completions. You can use the service through a chat UI at enclave.blyss.dev.

We believe that most LLM service providers will eventually offer confidential AI, and our service is only the first of many coming offerings from model owners. We are striving to build standards for public verifiability that can earn user trust and deliver real-world security and privacy to the millions of users of LLMs today.

As the capabilities of models scale up, so do the risks of sending every person and company’s most valuable data to a select few model owners. The only sustainable bargain between data owners and model owners is giving both parties privacy and autonomy over their most valuable asset.

How Confidential AI works

Special hardware in new AMD CPUs (‘SEV-SNP’) lets us run secure virtual machines. These are VM’s with memory that cannot be tampered with or observed by the host machine.

The secure VM requests a fresh TLS certificate from Let’s Encrypt each time it launches, and uses a private key that is generated from within the secure VM. This means that every TLS connection to a secure VM is an end-to-end secure tunnel: nobody, not even Blyss, can observe traffic going over this tunnel.

At launch, a secure VM also hashes all code it was started with (kernel + OS + application), places this code hash in an attestation report, and signs the report using a special secret key that only the secure hardware on the CPU knows. The signed report helps prove that the VM is running trustworthy code on genuine AMD hardware.

We also use new technology in the NVIDIA H100 GPU to enable fast, confidential execution of LLMs. In addition, the NVIDIA H100 GPU is placed in “Confidential Compute” (CC) mode, which encrypts all PCIe traffic between the secure VM and secure hardware on the GPU, ensuring that user interaction with models is confidential.

Finally, we leverage Certificate Transparency logs to allow anyone to verify our claims. These are immutable records of TLS certificate issuance that a consortium of issuers and large companies already manage and audit.

A more detailed explanation is available in our technology deep-dive.

Verifying our claims

We provide a verification package that lets you check the confidentiality claims of the live service at https://enclave.blyss.dev in a single command:

pip install --upgrade blyss-verifier
python -m blyss_verifier.verify https://enclave.blyss.dev

Thanks to Certificate Transparency, as long as some users run this verification, other users that just connect normally via TLS can be sure that model interactions are confidential.

Open source

All code for our VMs and tooling is open source:

blyss-verifier: Python package that lets you check our claims
sev-attest-tool: Rust and Python packages to request and verify AMD SEV-SNP attestations
shim: service that verifies GPU attestation, requests certificates from Let’s Encrypt, and proxies requests to the LLM
gpu-enclave-img: base Ubuntu disk image with the Docker and NVDIA Container Tookit, and the firmware and kernel for secure VMs
blyss-chat-ui: code for the web chat UI
vllm/TensorRT-LLM: code to execute LLMs on the GPU

Because the entire trusted computing base is open source, we believe this is the first enclave-based confidential LLM inference service whose confidentiality claims are publicly verifiable. We welcome contributions to all of our open source projects.