System Design

This section is being frequently updated. Last update 26.05.2020, 01.00 CEST

The spreadit system architecture has been designed with user data security and privacy being the primary requirements. Furthermore, the current architecture allows for a modular implementation, whereupon different parts of the system can be adopted by different entities, be it organizations, companies, governments or actual software implementations.

Below you will find the architectural overview of spreadit together with detailed descriptions of each subsystem and how they function to ensure your privacy. As spreadit is still a work in progress, please note - any implementational differences, if present, from the described system to the system currently being used for the pilot will be outlined in bold at the end of each section.

Data Privacy & User Anonymity

Maintaining data privacy and user anonymity with the deployment of electronic contract-tracing implementations has been a heated debate subject since the appearance of COVID-19. Currently there appears to be a divide between centralized and decentralized Bluetooth Low Energy (BLE) implementations [link].

On the one hand, central data processing and retention is highly problematic. While it is better suited to provide accurate and reliable contact-tracing results, such a centralization inherently allows the possibility of using the same contact-tracing data for identification and user tracking purposes [link].

On the other hand, a completely decentralized system such as the joint Google/Apple implementation [link], while preserving user anonymity and data privacy to a large extent, may not have a positive or measurable contribution effect to contract-tracing itself [link].

In addition, for any implementation chosen, centralized or otherwise, there is a variety of inherent issues that exist within the core BLE system, ranging from security exploits [link], to user reach [link1],[link2],[link3], even to privacy preservation itself [link1],[link2].

We believe a middle ground exists, with a system that is centralized enough to provide accurate and meaningful contract-tracing results, while at the same time decentralized to the extent required for user anonymity and data privacy to be inherently protected.

N-Way Data Decomposition

For contact-tracing to yield accurate results, knowledge of (1) users, (2) locations, (3) dates & times of user presence at these locations, and (4) COVID-19 test results for these users is necessary.

Having the above information centralized allows the procurement of meaningful contact-tracing results, but at the same time also permits user privacy to be compromized, in the sense that (1) the locations they visited, (2) at which time, along with (3) the users they were in contact with at that time, and (4) their test results are all known by the central entity having access to the complete contact-tracing data.

If we however (a) decompose this information to the primitives it consists of, and (b) distribute the centralization of each primitive to separate entities, user privacy can be preserved, while still maintaining a multi-centralized system.

User anonymity, as well as data privacy is preserved, as knowledge of a single information primitive cannot be used to (1) identify a particular user, (2) associate them with their test results or (3) relate a particular behaviour to them. Enter the spreadit system.

Architectural Overview

The architectural overview of the spreadit system

Backend Architecture

Each of the four Manager and two Provider implementations of the spreadit backend system is being operated by a separate physical entity, while at the same time, only has access to the data relevant to their particular implementation.

Example Implementation

An example fictional implementation could be as follows

The LocationProvider is operated by Google and their Location Services API, maintaining a database of real-world locations together with the corresponding Location-ID
The TestProvider is operated by the Federal Office of Public Health, maintaining a database of anonymised Result-IDs, corresponding personal information, and COVID-19 test results
The ResultsManager is operated by eggze Technik GmbH, maintaining a database of anonymised User-IDs and anonymised Result-IDs
The UserManager is operated by Dynamic Devices AG, maintaining the main database of anonymised User-IDs where the rest of the managers validate their operations from
The ScanManager is operated by esriSuisse, maintaining a database of Scan-IDs, User-IDs, Location-IDs
The AuditManager is operated by the Federal Data Protection and Information Commissioner and actively monitors each entity for data protection breaches

In the above fictional scenario, each physical entity operating a Provider or Manager can only have partial knowledge of the complete contact-tracing information. As an example, for the above fictional scenario, esriSuisse can never know what the actual location a particular user visited is, who that user actually is, or their test results. In a similar fashion, each of the entities lacks crucial information needed to reconstruct a complete user profile. In this sense, both user anonymity and data privacy are preserved.

Entity description

LocationProvider

The LocationProvider entity maintains a description of physical (unsafe) locations in the system, as described in the Why QR? Section. As such, each location managed by the LocationProvider is described by

LocationID - UUID Version 4, generated at the backend
Address - The street address of that location
Postcode - The postcode of that location
Latitude - The location latitude, if relevant
Longitude - The location longitude, if relevant
Type - The location type (e.g. shop, public transport)
Size - The relative size (open space) of the location
Touch rate - The relative frequency objects are handled
Infection rate - The estimate of how infectious the location is
Description - The location description, optional

and is a candidate entity to be managed by businesses with visitable venues, public transport companies, or a larger location provider such as Google.

Pilot - The LocationProvider is not implemented/active.

TestProvider

The TestProvider maintains the COVID-19 test results, together with the user’s contact information. As such, each test managed by the TestProvider is described by

TestID - UUID Version 4, generated at the backend
Test status - The test result once available, pending otherwise
Test date - The date the test was performed
Personal details - Personal details of the person tested

and is a candidate entity to be managed by medical centers, be part of larger healthcare data systems or the Federal Office of Public Health (FOPH) itself.

The test date and status information are entered by the medical centers performing the COVID-19 tests, when they become available.

The personal details being managed are implementation dependent and can range from none (completely anonymous), to basic contact information, to Health Insurance Number, to SuisseID.

Pilot - The TestProvider is not implemented/active.

UserManager

The UserManager maintains a list of spreadit users. As such, each user managed by the UserManager is described by

UserID - UUID Version 4, generated at the backend
Creation date - The date the user was first created
Contacts - The number of contacts this user has

The contacts field indicates the number of contacts with verified COVID-19 positive users, a user had in the past 15 days (period subject to change).

ScanManager

The ScanManager maintains each of the location scans. As such, each scan managed by the ScanManager is described by

ScanID - UUID Version 4, generated at the app
LocationID - UUID Version 4, generated at the backend
UserID - UUID Version 4, generated at the backend
Scan time - The time this Location-QR was scanned
Contacts - The number of contacts the scan has

The contacts field indicates the number of verified (indirectly, via the TestProvider) COVID-19 positive users, that were at the same location, within a restricted time window (-30 minutes, +1 hour) with the user that provided the scan.

ResultManager

The ResultManager maintains the COVID-19 test results of each user. As such, each result managed by the ResultManager is described by

ResultID - UUID Version 4, generated at the backend
UserID - UUID Version 4, generated at the backend
Result status - The test result once available, pending otherwise
Result date - The date the test was performed

The result status and date are automatically updated from the TestProvider once they are available.

Pilot - The automatic fetch of results currently assumes (TestID == ResultID). If necessary, this could be further abstracted by incorporating a ResultID field in the TestProvider records.

AuditManager

The AuditManager oversees the operation of the spreadit system. It is designed to periodically assess the validity of each of the Manager and Provider entities. Furthermore, each entity uses the AuditManager to report invalid system functionality that will in turn be logged and acted upon. As such, it acts on

Invalid packet header - Incoming packets with an invalid packet header
Unknown UserIDs - Incoming packets with UserIDs not in the system
Invalid Test-QR scans - Reception of a Test-QR scan that is (1) already registered in the system or (2) does not exist in the TestProvider database
Invalid Location-QR scans - Reception of a Location-QR scan that is (1) already registered with the ScanManager or (2) its LocationID is not in the LocationProvider database
Invalid result requests - Requesting results when a user has not scanned a Test-QR
Invalid contact requests - Requesting contact scan dates where a user does not have any contacts for any of their Location-QR scans registered

Pilot - The AuditManager is currently only displaying the cases described above; nothing is logged to disk during the pilot.

StatsManager

The StatsManager maintains transient data statistics of the spreadit system in memory. Each of these statistics is accessed via their corresponding entity and are currently not stored. Specifically, it maintains the following

Users - The total number of users, requested from the UserManager
Scans - The total number of scans, requested from the ScanManager
Contacts - The total number of contacts across all users, requested from either the ScanManager or the UserManager
Tested users - The total number of users that have registered at least one Test-QR, requested from the ResultManager
Results - The total number of COVID-19 test results, requested from the ResultManager
Positive results - The total number of positive COVID-19 test results, requested from the ResultManager
Update timestamp - The timestamp of when the last update took place

These statistics are periodically requested from the app implementation, serving as user feedback. They are periodically updated during the data propagation cycle of the spreadit system.

Contact-tracing & Result Propagation

Contact tracing and result propagation happens periodically and is split across all entities, each with their corresponding update process.

Update Period

The periodic update happens daily (subject to change). The period choice is based on the slowest process of the system, which is the update period of the test results by the TestProvider. The assumption made is that once a user is tested, their results are available approximately three days later. As such, having a daily periodic update period ensures that propagation happens "as soon as" results are available.

Data Purging

Purging of user data outside the specified period is the first step of the propagation process and is individually called on both the ScanManager and the ResultManager. Any user scans or results older than 15 days (period subject to change) is purged from the system before further steps take place.

Positive Users

This is the second step of the propagation process. The ResultManager checks if there exist UserIDs with a positive COVID-19 test result. If so, a list of these UserIDs is communicated to the ScanManager.

Positive Scans

Upon reception of the positive UserID list, the ScanManager uses it to enumerate a list of positive scans in memory, each of which contains

LocationID - The LocationID of each scan of a positive UserID
Scan time - The scan time of each scan of a positive UserID

Upon creation of the positive scan list, the positive UserID list is immediately discarded - all UserID data is first zero-filled and then discarded.

Scan Contacts

The complete scan dataset of the ScanManager is then traversed once for each scan in the positive scans list. Each scan with a scan time falling within a time window of -30, +60 minutes (subject to change) at the same LocationID of a positive scan, is added to a contact list implemented as a MultiSet.

Once the contact list has been fully populated, the positive scan list is immediately discarded - all LocationID and scan time data is first zero-filled and then discarded.

An update cycle follows for each unique scan in the contact list. Each such scan has its contacts field updated, where the number of contacts equals the multiplicity of that scan. The list is then discarded, and a contact update request to the UserManager is made.

User Contacts

Upon reception of a contact update request, the UserManager, for each user, updates the total number of contacts across all scans of that user. The total number of contacts for each user is provided from the ScanManager via a contacts request.

After the update is complete, a stats update request to the StatsManager is made.

Stats Update

Upon reception of a stats update request, the StatsManager performs an update of all relevant statistics from each of the now updated Manager entities. This step also concludes the data propagation cycle of the spreadit system.

Data Retention

Backend

The ScanManager and ResultManager purge any data older than 15 days (period subject to change) via an automatic rolling-purge system. UserManager data is currently kept until pilot suspension, whereupon all data will be cleared (incl. those kept by the ScanManager and ResultManager entities). The decision of data retention timelines for LocationProvider and TestProvider data is left for the respective entities that manage them.

App

Data is stored in the app indefinitely (subject to change). This allows the users to both use the app offline for their own records if they so wish for and to have access to all their test results and scans. Data stored in the app that is older than 15 days (period subject to change) is not communicated to the backend.

Data Encryption

Data kept is encrypted at multiple levels throughout the spreadit system

Database storage
Communication encryption
App storage

Database storage

Coming soon.

Data transmission

Every data transmission over TCP/IP is encrypted using TLS1.2 via the SecureNIO library. A TLS1.3 implementation will be available soon, while TLS1.2 will remain available, allowing access for the majority of older devices to the spreadit system. Please note that any TLS1.2 vulnerabilities (e.g. POODLE, GOLDENDOODLE [link]) will be both accounted for and mitigated against.

Backend to app

Data transmission between the backend and the app implements one-way TLS authentication, where the UserManager, ScanManager, ResultManager identities need to be verified seperately by the app.

Pilot - As the backend entities are not yet decoupled via TCP/IP, only the identity of a single TCP/IP server is being verified by the app. Communications between the app and the three Manager entities are currently decoupled and abstracted via Interface and Proxy implementations.

Backend entities

Data transmission between all backend entities implements mutual TLS authentication, whereupon the identity of all entities needs to be seperately verified against each other.

Pilot - As the backend entities are not yet decoupled via TCP/IP, no encryption is active. Communications are currently decoupled and abstracted via Interface and Proxy implementations.

Database to entity

Data transmission between the database each entity has access to, and the entity itself, implements mutual TLS authentication, whereupon the identities of both the database and the entity itself need to be verified against each other.

App storage

Coming soon.

Data Communication

NewUserPacket

A client packet where a new user requests a new userUUID from the UserManager.

Properties

Source - spreadit app
Destination - UserManager
Transmission frequency - Once after accepting the privacy policy.

Attributes

Protocol Version - The communication protocol version (1 byte)
Packet Index - The packet index (2 bytes)
UserID - UUID Version 4, empty (16 bytes)

Replies

UserCreatedPacket
ServerErrorPacket

UserHelloPacket

A client packet where a client periodically communicates with the UserManager. At each such communication, if a user has contacts in any of the scans they have sent, this will be indicated in the ServerHelloPacket sent as a response.

Properties

Source - spreadit app
Destination - UserManager
Transmission frequency - Periodically, every few hours

Attributes

Protocol Version - The communication protocol version (1 byte)
Packet Index - The packet index (2 bytes)
UserID - UUID Version 4(16 bytes)

Replies

ServerHelloPacket
ServerErrorPacket

UserCreatedPacket

A ca sub-class of the ServerHelloPacket. It contains a fresh userUUID for new users.

Properties

Source - UserManager
Destination - spreadit app
Transmission frequency - Once, after reception of a NewUserPacket

Attributes

Protocol Version - The communication protocol version (1 byte)
Packet Index - The packet index (2 bytes)
UserID - UUID Version 4(16 bytes)

Replies - None

Source code

The spreadit system is open source, licensed under the AGPL3. The source code for the android app implementation is available at https://github.com/eggze/spreadit-android/

Open source technologies used

The core open source technologies used for the spreadit system are

SecureNIO, A minimal, non-blocking, Java NIO TCP framework supporting SSL/TLS [link] - We use this framework to support fast, scalable, two-way secure communications between the spreadit app, the backend and all manager entities.
zxing ("zebra crossing"), an open-source, image processing library implemented in Java [link] - We use zxing to both generate and scan QR codes.
Dagger, a fast dependency injector for Java and Android.[link] - We use Dagger to manage the asynchronous secure data communication between the spreadit app and the backend.
Guava, a set of core Java libraries from Google.[link] - We use Guava for some of their fast alternative or supplementary implementations to the standard Java libraries.
MariaDB Server, an open source relational database [link] - We use MariaDB as our backend database implementation.
diagrams.net, an open source diagram creation system [link] - We use diagrams.net to create the diagrams that describe our system.