Smart data applications must abide by data protection regulations

Siemens is developing tools that ensure smart data applications abide by data protection regulations. The reliable protection of data privacy is very important because it is a precondition for people or institutions to provide applications with personal data.

In cooperation with the Fraunhofer Institute for Intelligent Analysis and Information Systems (Fraunhofer IAIS), the researchers at Siemens Corporate Technology (CT) are therefore creating a toolbox that helps users of smart data to adhere to the data protection regulations that apply to their applications. 

Although a great variety of algorithms exist for making data anonymous, many of them are not suited to the software environments that are typical of smart data. The new toolbox will contain a selection of algorithms for such environments.

Smart data poses new data protection challenges as it intelligently analyses huge sets of data. In smart data applications, personal data does not come from only a single source, which is why data protection also has to prevent people from becoming identifiable through the combination of a variety of data sets. 

The method used to ensure the user’s privacy depends very much on the application in question. In the simplest case, all you need to do is delete some of the individual properties of the data set.

In other cases, specific information is generalised, e.g. the person’s age is given as a range. You can also encrypt the user’s name in such a way that it is no longer recognisable as clear text but still very distinctive. 

There are also algorithms that guarantee that the search for large data sets always generates a certain minimum number of hits. This prevents the analysis of medical data sets from identifying specific people and their illnesses. 

Called the Privacy Preserving Big Data Analytics Toolbox, the system will contain algorithms for a wide variety of anonymization processes. One important requirement is that the system can quickly process huge amounts of data. 

To make this possible, algorithms have to use the database architectures that are typical of smart data and also be able to process large sets of data in parallel. To ensure this is the case, the researchers are adapting the toolbox to conventional systems such as Hadoop and massive parallel databases.

The toolbox will either be used to read the data so that the information is directly stored in an anonymous manner, or it will be employed to process data that has already been saved. Data cannot be reconstructed in its original form after it has been made anonymous.