As May 25th and the General Data Protection Regulation is approaching fast, we are hearing more and more about it. A lot has been discussed about the extraterritorial applicability, a wide range of rights given to data subjects, the transfer of personal data outside the EU and much more about the regulation.

One key change that the GDPR will bring is the obligation to integrate privacy into systems and operations when processing personal data, which definitely requires both an exhaustive understanding of privacy and its practices, as well as serious work around this.

The concept of ‘Privacy by Design’ was developed by Ann Cavoukian, the former Information and Privacy Commissioner of Ontario, to indicate the philosophy and approach to embedding privacy into the design of information technology, networked infrastructure, and business practices. This concept brings an extensive understanding of principles to achieve privacy.

The purpose of this article is to introduce the concept and explain it by giving examples of its implications for the artificial intelligence sector. And, most especially, to explain the ways to help prevent organizations leaving privacy to chance and encourage them to have it by design instead.

Integrated privacy in the full lifecycle of systems, operations, and products

As electronic data about individuals is becoming more and more detailed and as technology allows ever more powerful collection and processing of these data, consumers are getting more cautious about the information they share and want to have more control over it (Public Opinion on Privacy by epic.org).

Some of the most significant findings of the 2016 European Commission study are that the majority of EU citizens think that it is unacceptable to have their online activities monitored and to have companies share information about them, and almost all of the participants think that websites should ask permission to access their information.

It is important to remember that the fundamental reason behind the GDPR is the demands of people in the EU for their privacy rights.

For any business, it is critical to integrate privacy into its systems and operations as well as the end products and services it delivers. This is a direct answer to the consumers’ demand for the protection of their privacy rights and can be a key selling point for innovative technologies.

As stated before, integrating privacy into the full lifecycle of systems, operations and products will be a requirement with the GDPR. This is stated as the ‘Data Protection by Design and by Default’ concept in the GDPR text and it is explained in Article 25(1) in the lines:

…the controller shall, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures…

The concept of Privacy by Design

To clarify and make sense of the explanation in the GDPR text on the concept, let’s first dig deeper into ‘Privacy by Design’, which is the core of the privacy embedded into the design concept. The concept has 7 foundational principles:

  1. Proactive not reactive; preventative not remedial:
    Privacy exists before any privacy-invasive event happens, not after the fact.
  2. Privacy as default setting:
    Privacy is there irrespective of any action; if an individual remains inactive, their privacy still remains intact. Privacy by default is also becoming a requirement with the GDPR. As an example, this principle would especially be crucial when designing wearable devices where the individuals do not have the possibility to opt out of any collection practice.
  3. Embedded privacy in design:
    Privacy is an essential component of the product or service which is being delivered. Privacy is embedded into the system without diminishing any functionality.
  4. Full functionality – positive-sum instead of zero-sum:
    No false belief that business interest and privacy are a zero-sum game. Instead, all legitimate interests and objectives are in a positive-sum relationship. Businesses can have both privacy and business interests increasing at the same time.
  5. Transparency and visibility – make it open:
    The system operations and components are visible and transparent to the stakeholders and to the users.
  6. End-to-end security – full lifespan protection:
    Security measures are essential for privacy. Ensure security of data from retention to erasure.
  7. Respect for the privacy of the user – make it user-centric:
    Keep the developed products and services user-centric by providing information to the user about their data, offering privacy by default.

The detailed description of these principles can be found here.

Let’s take a look at what the implementations of the Privacy by Design concept are for the artificial intelligence sector. Any artificial intelligence company worth its salt should put great effort into obtaining a positive-sum result in its AI development. Some of the key learnings Silo.AI has gained to embed privacy into AI systems are:

  • Putting privacy on your engineers’ radar is a very good starting point. Talk to them: you’ll see how much they can add to that.
  • Having control over who accesses the data and how that data is accessed is very important as it was the reason behind some of the past data breaches.
  • Data minimization is critical to protecting personally identifiable information, which means that one should collect and process the minimum amount of data in terms of the personal identifiers linked to the data.
  • It is vital to provide a simple way of removing parts and rectifying data at the request of the user. Thereafter, fit models on updated data to avoid accidental memorizing of the removed parts.
  • The use of strong de-identification techniques (i.e. pseudonymization) for personal identifiers, data aggregation, and encryption techniques is absolutely crucial.
  • Beware of quasi-identifiers, which are not unique identifiers (e.g. gender, postcode, profession, languages spoken), as when they are combined they can re-identify people. Latanya Sweeney showed that combining just three quasi-identifying pieces of information enabled identification of 87% of the U.S. population. Quasi-identifiers have been the basis of several attacks on released data, where, in Netflix and AOL logs cases, the de-identified data could be re-identified by merging the quasi-identifiers with the available information from other sources.
  • When AI is involved in making decisions about people, it is essential – and soon required as a result of GDPR – to give information on the logic involved. On top of that, tools like LIME are suggested to provide explanations of the reasons behind the decisions and to keep track of the meaningfulness of predictions.

AI is growing rapidly, so it is necessary to embed privacy and appropriate technical and organizational measures for it into the process so privacy issues do not affect its growth negatively but instead lead to positive outcomes.

Privacy assurance – the default mode to operate

All in all, privacy cannot be assured just by compliance with the GDPR or with other laws and regulations. Regulations are able to address only the tip of the iceberg. To have full privacy, privacy assurance should be the business’s default mode to operate. By embedding privacy into systems and operations, we can best prevent harm from arising and avoid data breaches.

Privacy by Design is attainable by all and will be the cornerstone on which organizations build their GDPR compliant business. Be smart and lead your business with Privacy by Design, not Privacy by Chance!

This article was written by Erlin Gulbenkoglu from Silo.Ai and originally appeared on Silo.ai blog. Silo is a private AI laboratory to empower companies with AI solutions. Find information about them on silo.ai.

No more articles