Technical architecture

High level architecture and protocol

Our identity proofs are made possible by the fact that whenever a user retrieves data from a website, the response is encrypted using TLS. As was demonstrated by various researchers, it is possible to leverage this encryption as an attestation to the underlying data (see for a literature review this thesis).

setup layout

As illustrated in the image above, our backend infrastructure is involved in creating the proofs. When you make an identity proof on zkPortal using this trusted backend infrastructure, two properties should hold:

  • (1) Integrity: is the proof of your personal details correct?
  • (2) Privacy: are you only sharing data that you want to share?

Today, the backend consists of a popular type of Trusted Execution Environment called SGX. This gives strong guarantees about which code is exactly running in the background. In the future, we will strengthen the security model by leveraging secure Multi-Party Computation (MPC), so even multiple different Trusted Execution Environments can be used to guarantee integrity and privacy. Importantly, our backend infrastructure never stores Personal Identifiable Information (PII)! The data lives with the user on their mobile phone and the user is always in charge with whom, what data is shared. Our infrastructure provides a unique identifier - which we call UID below - and allows users to prove statements about themselves such as: am I a US citizen?

setup architecture

The image above provides a high level overview of the different types of services involved in making your private identity proof possible. The image excludes many details (for which you can check out our open source code soon™ ) but it should give you a clear intuition. It is important to note again in advance that at no point personal information can be leaked outside of the servers - more details will follow below. Let's see what happens at each step:

  • (1) First, the user logs into their data provider of choice in order to get an authentication cookie. Next, the zkPortal main service performs a TLS handshake with the data provider, after which the user receives a stream of AES masking material. With this masking material, the user is able to send an encrypted request to get their personal data from the provider. The data provider's response is forwarded to the zkPortal main service which verifies and decrypts the response.
  • (2) The final response data received from the data provider will essentially be notarized. At this point, the zkPortal main service has access to the user's full name, date of birth and Nationality. This Personal Identifiable Information (PII) is returned to the user. The problem we now face is, that we do not want to save the PII directly in the backend but rather a pseudonymous identifier. We will achieve this in the next step.
  • (3) The backend hashes the PII and sends it to the zkPortal UID service, which uses a keyed Pseudo-Random Function to transform the PII_HASH to a deterministic yet indistinguishable UID. The UID is returned and stored by the zkPortal main service, after which the PII and PII_HASH are thrown away.
  • (4) From this point onwards, our zkPortal main service knows a private and unique representation of the user's identity (namely, the UID) which can be linked to other identifiers (such as the users cryptocurrency public keys) and which can be used to prove whether or not the unique user participated in voting or airdrops already.

Architecture deep dive

We leverage a Trusted Execution environment called Intel SGX so you can cryptographically verify that our backend infrastructure is running a particular piece of software. SGX is used all around the world, including by Signal for contact discovery. Despite the periodic discovery of security vulnerabilities, the organisation (Intel) behind SGX handles them professionally and rapidly shares updates as soon as either theoretical or practical flaws are found. We will always make it possible for you to check that we are running the latest patched version to make sure we are not susceptible to previously found issues. Nonetheless, we want to make it even more robust in the future by switching to an MPC model, more on that later. Our backend code is written in Golang where we use the wonderful project EGo that transforms our Go code into SGX compatible binaries which we then can run.

In order to trust that the zkPortal mobile app is not leaking your personal data, there are three pieces to the puzzle you need to verify: A) what software are the SGX backends running? B) which services do the SGX backends communicate with? C) does the zkPortal mobile app only communicate with SGX?

We will answer all questions in the sections below.

A) what software are the SGX backends running?

To save you some work, we'd like to share that we will work with independent auditors to attest to which source code is running exactly in our backend. However, we believe it is essential to keep available the opportunity to let anyone verify whether they can trust our infrastructure!

SGX gives us the guarantee (based on the assumptions mentioned in the previous paragraph) that only a specific piece of code is running. You do not need an SGX-enabled machine to verify what software we are running, verification of what our backend is doing is available to everyone! If you really want to dig deep into the attestation process of SGX you can read more about it here.

You can verify the code running on our backend as follows. Soon ™ you can download our source code from Github: https://github.com/zkportal. You can familiarise yourself already by following the instructions here. We will host this howto in the future ourselves with proper links to the source code. When you run the docker build command it will output a unique ID for the source code you just compiled. If you look at the Dockerfile you will see that the build script checks out our backend code, compiles it and it spits out our unique ID. For the next step you have to make a connection to our SGX servers, you can find them at the IPs: https://158.175.165.168:2379 or https://51.89.97.117:2379. To make it even easier easy for you we will provide a tool in the near future. But you can already attest today to the fact that you are talking to an SGX instance and it provides the unique ID which links it back to the source code you can see on Github.

Now you have a full audit trail from open source code to compiled code which is running on SGX and SGX attesting to the fact which code is running. You can verify that PII is only ever stored in memory, as it is quickly transformed to an unrecognisable UID by the zkPortal UID service. No private data is ever stored in our database.

Code-bound secrets

Note that it should not be possible for anyone to access the zkPortal UID service's secret key, which it uses to derive UIDs from a user's PII_HASH. For this purpose, we use SGX's “Unique Seal Key”, which is bound to both CPU and code. That means if you either change the code or the CPU you will not be able to recover the UID service's key, all data is irrevocably lost. The Unique Seal Key allows us not to break our promise in the future but also prohibits any changes to the code.

Loading certificates during boot

If you have checked our code and we really hope you did, you will see sth peculiar during the boot process into SGX. We are loading certificates into SGX. Those certificates are for access rights to our database, if we would put them into the source code we would have opened up a huge security issue as everybody would be able to access the database. Therefore we need to keep those secret, but if you look further in the code you will see that information that is loaded via a local unix socket is merely used as reading some bytes to authenticate with the database. This setup allows us to have a proper certificate based authentication with our database while still having reproducible builds to be able to verify what data is running on SGX.

B) which services do the SGX backends communicate with?

The zkPortal UID service is stateless, and is therefore in principle open to communicate with any service from the outside world. That leaves us to only explain why the zkPortal stateful backend only shares PII with the zkPortal UID service, and not elsewhere. Well, the zkPortal UID service can leverage reproducible builds and SGX to prove to other parties which connect to it that it is running a particular piece of code.

A problem with forcing our backend infrastructure to only communicate with a specific version of the zkPortal UID service, is that it makes updates hard. This might be familiar to people in the cryptocurrency space, where updating "immutable" smart contracts in a secure fashion often turns out to be a very hard problem. Fortunately, any security updates to SGX itself do not reflect the reproducible build mentioned above. Moreover, our zkPortal UID service is so simple, and leverages such stable algorithms, that it is unlikely to frequently require updates.

C) does the zkPortal mobile app only communicate with SGX?

Now that you have verified the hash of the code that is running on the SGX you still need to verify that the client (the zkPortal mobile application) is actually communicating only with the SGX instances defined above. This can easily be done with sniffing the traffic between the mobile phone and the internet using a tool like Wireshark. We will provide more detail and what to do in the near future.

Even more security: Multi-Party Computation

In the future, we will leverage Multi-Party Computation, in order to ensure no single TEE can cause the user’s data to leak. Each TEE holds only a portion of a TLS session’s private key, requiring all TEE nodes to collude in order to break integrity and privacy. This addresses two threat models: the unbribable hardware security of TEEs is therefore enhanced by the human security of MPC. If new serious flaws in TEEs are discovered, the human involvement in the MPC makes it possible for us to apply patches without significantly reducing security. This is the most secure approach that is possible given currently available technologies.